CN110287788A

CN110287788A - A kind of video classification methods and device

Info

Publication number: CN110287788A
Application number: CN201910432237.XA
Authority: CN
Inventors: 陈迅
Original assignee: Xiamen Wangsu Co Ltd
Current assignee: Xiamen Wangsu Co Ltd
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2019-09-27

Abstract

The present invention provides a kind of video classification methods and devices, this method comprises: the feature for the video frame for including according to video to be sorted, obtains the characteristics of image of the video to be sorted；Using the feature for the audio data that the video to be sorted includes, the phonetic feature of the video to be sorted is obtained；According to the feature of the text information for describing the video to be sorted, the text feature of the video to be sorted is obtained；Classification belonging to the video to be sorted is determined using described image feature, phonetic feature and text feature.Using scheme provided in an embodiment of the present invention, it can be classified using the characteristics of image, phonetic feature and text feature of video to video, compared to the mode classified in the related technology by artificial mode to video, human resources can be saved, reduce the cost classified to video.

Description

A kind of video classification methods and device

Technical field

The present invention relates to computer application technology, in particular to a kind of video classification methods and device.

Background technique

Video service provider is when providing a user Video service, it is often necessary to first according to the content of video to video into Row classification.For example, one section of section of football match video can be divided into " sport " classification, and the video of one section of natural land then may be used To be divided into " tourism " classification.By classifying to video, video service provider can be provided preferably to client Service.For example, sub-category displaying video on webpage or in mobile terminal application (APP), can help user more quickly Interested video is found, and is classified to video also provided convenience for the search of video and recommendation in advance.

It in the related technology, is by manually being watched video content and being audited, thus real when classifying to video Existing visual classification.And when being classified by artificial mode to video, when the video number for needing to classify is larger, video clothes Business provider then needs to spend a large amount of human cost.

Summary of the invention

In order to solve problems in the prior art, the embodiment of the invention provides a kind of video classification methods and devices.It is described Technical solution is as follows:

In a first aspect, providing a kind of video classification methods, which comprises

According to the feature for the video frame that video to be sorted includes, the characteristics of image of the video to be sorted is obtained；

Using the feature for the audio data that the video to be sorted includes, the phonetic feature of the video to be sorted is obtained；

According to the feature of the text information for describing the video to be sorted, the text for obtaining the video to be sorted is special Sign；

Classification belonging to the video to be sorted is determined using described image feature, phonetic feature and text feature.

Optionally, the feature of the video frame for including according to video to be sorted obtains the image of the video to be sorted The step of feature, comprising:

Obtain the image high dimensional feature for each video frame that the video to be sorted includes；

According to the playing sequence of the video frame, the view to be sorted is determined using the image high dimensional feature of each video frame The first kind characteristics of image of frequency；

The image high dimensional feature of each video frame is clustered, the second class image for obtaining the video to be sorted is special Sign；

Using the first kind characteristics of image and the second class characteristics of image as described image feature.

Optionally, the playing sequence according to the video frame is determined using the image high dimensional feature of each video frame The step of first kind characteristics of image of the video to be sorted, comprising:

According to the playing sequence of the video frame, the image high dimensional feature of each video frame is input to the of training in advance One shot and long term Memory Neural Networks；

The output of the first shot and long term Memory Neural Networks is obtained as a result, obtaining the First Kind Graph of the video to be sorted As feature.

Optionally, the feature of the audio data for including using the video to be sorted, obtains the video to be sorted Phonetic feature the step of, comprising:

It is moved according to scheduled frame length and frame and the audio data that the video to be sorted includes is divided into multiple audio frames；

The feature of each audio frame is extracted using scheduled Speech processing algorithm；

According to the playing sequence of each audio frame, the view to be sorted is obtained using the feature of extracted each audio frame The audio frequency characteristics of frequency.

Optionally, the playing sequence according to each audio frame is obtained using the feature of extracted each audio frame The step of audio frequency characteristics of the video to be sorted, comprising:

According to the playing sequence of each audio frame, the feature of extracted each audio frame is input to the of training in advance Two shot and long term Memory Neural Networks；

The output of the second shot and long term Memory Neural Networks is obtained as a result, the audio for obtaining the video to be sorted is special Sign.

Optionally, the basis is used to describe the feature of the text information of the video to be sorted, obtains described to be sorted The step of text feature of video, comprising:

The text information is segmented to obtain word segmentation result；

For each word for including in the word segmentation result, using word embedded technology neural network based, obtain every The corresponding term vector of one word；

According to appearance sequence of each word in the word segmentation result in the text information, the word of each word is utilized Vector determines the first kind text feature of the text information；

Dimension-reduction treatment is carried out to the term vector of each word for including in the word segmentation result, obtains the of the text information Two class text features；

Using the first kind text feature and the second class text feature as the text feature.

Optionally, appearance sequence of each word according in the word segmentation result in the text information, utilizes The term vector of each word determines the step of first kind text feature of the text information, comprising:

According to each word in the word segmentation result in the text information appearance sequence, by the word of each word to Amount is input to third shot and long term Memory Neural Networks trained in advance；

The output of the third shot and long term Memory Neural Networks is obtained as a result, obtaining the first class text of the text information Feature.

Optionally, described to be determined belonging to the video to be sorted using described image feature, phonetic feature and text feature Classification the step of, comprising:

Described image feature, phonetic feature and text feature are input to neural network classification model trained in advance；

The output of the neural network classification model is obtained as a result, obtaining classification belonging to the video to be sorted.

Second aspect provides a kind of visual classification device, comprising:

Module is obtained, the feature of the video frame for including according to video to be sorted obtains the figure of the video to be sorted As feature；

Module is obtained, the feature of the audio data for including using the video to be sorted obtains the view to be sorted The phonetic feature of frequency；

Acquisition module obtains described wait divide for the feature according to the text information for describing the video to be sorted The text feature of class video；

Determining module, for determining the video institute to be sorted using described image feature, phonetic feature and text feature The classification of category.

The third aspect provides a kind of computer equipment, comprising:

At least one processor；And

The memory being connect at least one described processor communication；Wherein,

The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes, so that at least one described processor is able to carry out video classification methods described in any of the above embodiments.

Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium Computer program, the computer program realize video classification methods described in any of the above embodiments when being executed by processor.

Video classification methods and device provided in an embodiment of the present invention, the video frame that can include according to video to be sorted Feature obtains the characteristics of image of video to be sorted；Using the feature for the audio data that video to be sorted includes, view to be sorted is obtained The phonetic feature of frequency；According to the feature of the text information for describing video to be sorted, the text feature of classified video is obtained； Classification belonging to video to be sorted is determined using characteristics of image, phonetic feature and text feature.It is provided using the embodiment of the present invention Scheme, can be classified using the characteristics of image, phonetic feature and text feature of video to video, compared to related skill The mode classified by artificial mode to video in art can save human resources, what reduction classified to video Cost.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 is a kind of flow diagram of video classification methods provided in an embodiment of the present invention；

Fig. 2 is a kind of flow diagram for obtaining video image characteristic method provided in an embodiment of the present invention；

Fig. 3 is a kind of flow diagram for obtaining video speech characterization method provided in an embodiment of the present invention；

Fig. 4 is a kind of flow diagram for obtaining videotext characterization method provided in an embodiment of the present invention；

Fig. 5 is a kind of structural schematic diagram of visual classification device provided in an embodiment of the present invention；

Fig. 6 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.

Below in conjunction with specific embodiment, the process of video classification methods shown in FIG. 1 is described in detail, it is interior Appearance can be such that

Step 100, the feature for the video frame for including according to video to be sorted obtains the characteristics of image of video to be sorted.

In an implementation, video is made of each video frame, the spy for the video that the feature of each video frame also just characterizes Sign is based on this, the feature for the video frame that can include according to video to be sorted, to obtain the characteristics of image of video to be sorted.

It is a kind of acquisition video image characteristic method provided in an embodiment of the present invention referring to fig. 2 in a kind of implementation Flow diagram, comprising:

Step 20, the image high dimensional feature for each video frame that video to be sorted includes is obtained.

Image high dimensional feature usually extracts from specific image, for example extracts from video frame.Image high dimensional feature is image A kind of form of expression of information, image high dimensional feature have floating type feature, also there is the binary features of high latitude.

In an implementation, each video frame can be input in the convolutional neural networks of pre-training, to obtain each view The image high dimensional feature of frequency frame.Specifically, above-mentioned convolutional neural networks can be with are as follows: ResNet (Residual Neural Network, residual error neural network), VGG (Visual Geometry Group Network, visual geometric group network) etc..

Step 21, according to the playing sequence of video frame, view to be sorted is determined using the image high dimensional feature of each video frame The first kind characteristics of image of frequency.

In an implementation, according to the playing sequence of video frame, view to be sorted is obtained using the image high dimensional feature of video frame The characteristics of image of frequency, due to the playing sequence of the video frame fully considered, the characteristics of image because obtained from has timing, The accuracy of visual classification is improved with this.

It, can be special by the image higher-dimension of each video frame according to the playing sequence of the video frame in a kind of implementation Sign is input to the first shot and long term Memory Neural Networks of training in advance；Obtain the output knot of the first shot and long term Memory Neural Networks Fruit, to obtain the first kind characteristics of image of video to be sorted.

It is handled using image high dimensional feature of the shot and long term Memory Neural Networks to each video frame, the obtained first kind Characteristics of image is the feature for having timing information, that is, it is subsequent classify to video when can fully consider characteristics of image In timing information, thus improve subsequent video classification accuracy.

Step 22, the image high dimensional feature of each video frame is clustered, obtains the second class image of video to be sorted Feature.

It is clustered by the image high dimensional feature of the video frame contained to video bag to be sorted, the high image of similarity is high Dimensional feature is classified as one kind, to obtain to characterize the second class characteristics of image of video to be sorted.

Specifically, predetermined quantity image high dimensional feature can be randomly selected as cluster centre, a cluster centre Just represent a class cluster；

Then it is directed to each image high dimensional feature, calculates the similarity between image high dimensional feature and each cluster centre, And image high position feature is added in class cluster corresponding with the highest cluster centre of its similarity according to similarity calculated, Also predetermined quantity class cluster has just been obtained；

The average characteristics of such cluster are calculated according to the image high dimensional feature in such cluster then for each class cluster, are had Body, average characteristics can be the average value for the image high dimensional feature for including in class cluster, and the average characteristics being calculated with In the case that the cluster centre of such cluster is different, the cluster centre of such cluster is updated to the average characteristics being calculated；

It returns to for each image high dimensional feature, calculates similar between image high dimensional feature and any cluster centre Degree, and image high position feature is added to by class cluster corresponding with the highest cluster centre of its similarity according to similarity calculated In process, until when the average characteristics of each class cluster are identical as the cluster centre of such cluster, the class cluster that will cluster at this time As cluster result, the second class characteristics of image of obtained cluster result i.e. video to be sorted.

Step 23, using first kind characteristics of image and the second class characteristics of image as characteristics of image.

Step 110, the feature for the audio data for including using video to be sorted obtains the phonetic feature of video to be sorted.

In an implementation, the audio data that different video is included also is different, thus can use the spy of audio data Sign is to obtain the feature of video.

It is a kind of acquisition video speech characterization method provided in an embodiment of the present invention referring to Fig. 3 in a kind of implementation Flow diagram, comprising:

Step 30, it is moved according to scheduled frame length and frame and the audio data that video to be sorted includes is divided into multiple audios Frame.

The playing duration for each audio frame that above-mentioned frame length namely divides.

Play time between above-mentioned frame shifting i.e. adjacent two frames audio frame is poor, for example, adjacent two frames audio frame is broadcast Put the time be respectively as follows: 9 points 30 seconds -10 points 0 second；9 points 40 seconds -10 points 10 seconds, correspondingly, frame, which moves, then subtracts 9 point 30 in 40 seconds for 9 points Second is equal to 10 seconds.

Step 31, the feature of each audio frame is extracted using scheduled Speech processing algorithm.

Specifically, can use mel-frequency cepstrum coefficient algorithm to extract the feature of each audio frame.

Step 32, it according to the playing sequence of each audio frame, is obtained using the feature of extracted each audio frame wait divide The audio frequency characteristics of class video.

In an implementation, according to the playing sequence of audio frame, the audio of video to be sorted is obtained using the feature of audio frame Feature, due to the playing sequence of the audio frame fully considered, the audio frequency characteristics because obtained from have timing, are mentioned with this The accuracy of high visual classification.

The sequencing of the starting play time of the playing sequence i.e. each audio frame of above-mentioned each audio frame.

It, can be according to the playing sequence of each audio frame, by the spy of extracted each audio frame in a kind of implementation Sign is input to the second shot and long term Memory Neural Networks of training in advance；Then the output of the second shot and long term Memory Neural Networks is obtained As a result, to obtain the audio frequency characteristics of video to be sorted.

It is handled using feature of the shot and long term Memory Neural Networks to each audio frame, obtained audio frequency characteristics are that have The feature of timing information, that is, it is subsequent classify to video when can fully consider timing information in audio frequency characteristics, To improve the accuracy of subsequent video classification.

Step 120, according to the feature of the text information for describing video to be sorted, the text for obtaining video to be sorted is special Sign.

The information that text information is namely introduced video to be sorted generally includes the title of video to be sorted and interior Hold summary.

It is a kind of acquisition videotext characterization method provided in an embodiment of the present invention referring to fig. 4 in a kind of implementation Flow diagram, comprising:

Step 40, text information is segmented to obtain word segmentation result.

In an implementation, it can use Forward Maximum Method algorithm, reverse maximum matching algorithm or two-way maximum matching to calculate Method segments text information.

Step 41, it is obtained for each word for including in word segmentation result using word embedded technology neural network based The corresponding term vector of each word.

Specifically, can use frequency of occurrence of each word in text information and pre-establish based on neural network Probabilistic model construct the term vector of each word.

Step 42, the appearance sequence according to each word in word segmentation result in text information, utilizes the word of each word Vector determines the first kind text feature of text information.

In an implementation, according to each word in text information appearance sequence, obtained using the term vector of each word to The text feature of classification video, due to the appearance sequence of the term vector fully considered, the text feature because obtained from is when having Sequence, the accuracy of visual classification is improved with this.

, can be according to the sequence of the appearance in text information of each word in word segmentation result in a kind of implementation, it will be every The term vector of one word is input to third shot and long term Memory Neural Networks trained in advance；Then third shot and long term memory mind is obtained Output through network is as a result, to obtain the first kind text feature of text information.

It is handled using term vector of the shot and long term Memory Neural Networks to each word, obtained first kind text feature Be have the feature of timing information, that is, it is subsequent classify to video when can fully consider timing in text feature Information, to improve the accuracy of subsequent video classification.

Step 43, dimension-reduction treatment is carried out to the term vector of each word for including in word segmentation result, obtains the of text information Two class text features.

Dimension-reduction treatment is carried out to term vector, that is, reduces the dimension of each term vector, to improve subsequent carry out video The efficiency of classification.

Specifically, can use local feature aggregation description subalgorithm to handle each term vector, to reduce each word The dimension of vector.

Step 44, using first kind text feature and the second class text feature as the text feature.

Step 130, classification belonging to video to be sorted is determined using characteristics of image, phonetic feature and text feature.

Characteristics of image, phonetic feature possessed by video and text feature also just characterize the feature of a video, a kind of In implementation, video library, characteristics of image, the language of the video comprising each classification and video in video library can be pre-established Sound feature and text feature.Correspondingly, can use the image of video to be sorted when needing to classify to video to be sorted Feature, phonetic feature and text feature are matched with the characteristics of image of video, phonetic feature and text feature in video library, are looked for To the highest video of similarity, the classification of the video found i.e. video to be sorted between each feature of video to be sorted Classification.

In another implementation, the classification of video can use as supervision message, characteristics of image, the voice of video are special Text feature of seeking peace is trained neural network as sample to obtain neural network classification model.Correspondingly, needing When classifying to video to be sorted, the characteristics of image, phonetic feature and text feature of video to be sorted can be input to pre- First trained neural network classification model；The output of neural network classification model is obtained as a result, to obtain video institute to be sorted The classification of category.

Video classification methods provided in an embodiment of the present invention, the feature for the video frame that can include according to video to be sorted, Obtain the characteristics of image of video to be sorted；Using the feature for the audio data that video to be sorted includes, video to be sorted is obtained Phonetic feature；According to the feature of the text information for describing video to be sorted, the text feature of classified video is obtained；It utilizes Characteristics of image, phonetic feature and text feature determine classification belonging to video to be sorted.Using side provided in an embodiment of the present invention Case can classify to video using the characteristics of image, phonetic feature and text feature of video, compared in the related technology The mode classified by artificial mode to video can save human resources, reduce the cost classified to video. And when classifying to video, the timing information of each feature possessed by video can be fully considered, to improve view The accuracy of frequency division class.

Based on the same technical idea, the embodiment of the invention also provides a kind of visual classification devices, as shown in figure 5, institute Stating device includes:

Module 500 is obtained, the feature of the video frame for including according to video to be sorted obtains the video to be sorted Characteristics of image；

Module 510 is obtained, the feature of the audio data for including using the video to be sorted obtains described to be sorted The phonetic feature of video；

Obtain module 520, for the feature according to the text information for describing the video to be sorted, obtain it is described to The text feature of classification video；

Determining module 530, for determining the video to be sorted using described image feature, phonetic feature and text feature Affiliated classification.

Optionally, the acquisition module 500, comprising:

Acquisition submodule, for obtaining the image high dimensional feature for each video frame that the video to be sorted includes；

First determines that submodule utilizes the image higher-dimension of each video frame for the playing sequence according to the video frame Feature determines the first kind characteristics of image of the video to be sorted；

First obtains submodule, clusters for the image high dimensional feature to each video frame, obtains described to be sorted Second class characteristics of image of video；

First is used as submodule, for using the first kind characteristics of image and the second class characteristics of image as described image spy Sign.

Optionally, the determining submodule, is specifically used for,

It is optionally, described to obtain module 510, comprising:

Submodule is divided, is divided for moving the audio data for including by the video to be sorted according to scheduled frame length and frame For multiple audio frames；

Extracting sub-module, for extracting the feature of each audio frame using scheduled Speech processing algorithm；

Second obtains submodule, for the playing sequence according to each audio frame, utilizes extracted each audio frame Feature obtains the audio frequency characteristics of the video to be sorted.

Optionally, described second submodule is obtained, be specifically used for

Optionally, the acquisition module 520, comprising:

Submodule is segmented, for being segmented to obtain word segmentation result to the text information；

Submodule is constructed, for using word neural network based for each word for including in the word segmentation result Embedded technology obtains the corresponding term vector of each word；

Second determines submodule, for suitable according to appearance of each word in the text information in the word segmentation result Sequence determines the first kind text feature of the text information using the term vector of each word；

Third obtains submodule, carries out dimension-reduction treatment for the term vector to each word for including in the word segmentation result, Obtain the second class text feature of the text information；

Second is used as submodule, for the first kind text feature and the second class text feature is special as the text Sign.

Optionally, it described second determines submodule, is specifically used for

Optionally, the determining module, is specifically used for

Visual classification device provided in an embodiment of the present invention, the feature for the video frame that can include according to video to be sorted, Obtain the characteristics of image of video to be sorted；Using the feature for the audio data that video to be sorted includes, video to be sorted is obtained Phonetic feature；According to the feature of the text information for describing video to be sorted, the text feature of classified video is obtained；It utilizes Characteristics of image, phonetic feature and text feature determine classification belonging to video to be sorted.Using side provided in an embodiment of the present invention Case can classify to video using the characteristics of image, phonetic feature and text feature of video, compared in the related technology The mode classified by artificial mode to video can save human resources, reduce the cost classified to video. And when classifying to video, the timing information of each feature possessed by video can be fully considered, to improve view The accuracy of frequency division class.

Fig. 6 is the structural schematic diagram of computer equipment provided in an embodiment of the present invention.The computer equipment 600 can be because of configuration Or performance is different and generate bigger difference, may include one or more central processing units 622 (for example, one or one A above processor) and memory 632, the storage medium 630 of one or more storage application programs 642 or data 644 (such as one or more mass memory units).Wherein, memory 632 and storage medium 630 can be it is of short duration storage or Persistent storage.The program for being stored in storage medium 630 may include one or more modules (diagram does not mark), Mei Gemo Block may include to the series of instructions operation in computer equipment 600.Further, central processing unit 622 can be set To communicate with storage medium 630, the series of instructions operation in storage medium 630 is executed in computer equipment 600.

Computer equipment 600 can also include one or more power supplys 624, one or more are wired or wireless Network interface 650, one or more input/output interfaces 658, one or more keyboards 654, and/or, one or More than one operating system 641, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..

Computer equipment 600 may include having memory and one or more than one computer program, wherein one A perhaps more than one computer program is stored in memory and is configured to be executed by one or more than one processor The one or more computer program to realize above-mentioned domain name analytic method, specifically, this method comprises:

Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..

It should be understood that visual classification device provided by the above embodiment, computer equipment and computer-readable depositing Storage media is when carrying out visual classification, only the example of the division of the above functional modules, in practical application, Ke Yigen Above-mentioned function distribution is completed by different functional modules according to needs, i.e., the internal structure of device is divided into different functions Module, to complete all or part of the functions described above.In addition, visual classification device provided by the above embodiment, calculating The embodiment of machine equipment and computer readable storage medium and video classification methods belongs to same design, implements process It is detailed in embodiment of the method, which is not described herein again.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of video classification methods, which is characterized in that the described method includes:

According to the feature of the text information for describing the video to be sorted, the text feature of the video to be sorted is obtained；

2. the method as described in claim 1, which is characterized in that the feature of the video frame for including according to video to be sorted, The step of obtaining the characteristics of image of the video to be sorted, comprising:

According to the playing sequence of the video frame, the video to be sorted is determined using the image high dimensional feature of each video frame First kind characteristics of image；

The image high dimensional feature of each video frame is clustered, the second class characteristics of image of the video to be sorted is obtained；

3. method according to claim 2, which is characterized in that the playing sequence according to the video frame, utilization are each The image high dimensional feature of video frame determines the step of first kind characteristics of image of the video to be sorted, comprising:

According to the playing sequence of the video frame, the image high dimensional feature of each video frame is input to the first length of training in advance Short-term memory neural network；

The output of the first shot and long term Memory Neural Networks is obtained as a result, the First Kind Graph picture for obtaining the video to be sorted is special Sign.

4. the method as described in claim 1, which is characterized in that the audio data for including using the video to be sorted Feature, the step of obtaining the phonetic feature of the video to be sorted, comprising:

According to the playing sequence of each audio frame, the video to be sorted is obtained using the feature of extracted each audio frame Audio frequency characteristics.

5. method as claimed in claim 4, which is characterized in that the playing sequence according to each audio frame, using being mentioned The feature of each audio frame taken obtains the step of audio frequency characteristics of the video to be sorted, comprising:

According to the playing sequence of each audio frame, the feature of extracted each audio frame is input to the second length of training in advance Short-term memory neural network；

The output of the second shot and long term Memory Neural Networks is obtained as a result, obtaining the audio frequency characteristics of the video to be sorted.

6. the method as described in claim 1, which is characterized in that the basis is used to describe the text envelope of the video to be sorted The step of feature of breath, the text feature of the acquisition video to be sorted, comprising:

The text information is segmented to obtain word segmentation result；

Each is obtained using word embedded technology neural network based for each word for including in the word segmentation result The corresponding term vector of word；

According to appearance sequence of each word in the word segmentation result in the text information, the term vector of each word is utilized Determine the first kind text feature of the text information；

Dimension-reduction treatment is carried out to the term vector for each word for including in the word segmentation result, obtains the second class of the text information Text feature；

7. method as claimed in claim 6, which is characterized in that each word according in the word segmentation result is in the text Appearance sequence in this information, the step of the first kind text feature of the text information is determined using the term vector of each word Suddenly, comprising:

It is according to appearance sequence of each word in the word segmentation result in the text information, the term vector of each word is defeated Enter to third shot and long term Memory Neural Networks trained in advance；

The output of the third shot and long term Memory Neural Networks is obtained as a result, the first class text for obtaining the text information is special Sign.

8. the method according to claim 1 to 7, which is characterized in that described to utilize described image feature, phonetic feature The step of determining classification belonging to the video to be sorted with text feature, comprising:

9. a kind of visual classification device, which is characterized in that described device includes:

Module is obtained, the feature of the video frame for including according to video to be sorted obtains the image spy of the video to be sorted Sign；

Module is obtained, the feature of the audio data for including using the video to be sorted obtains the video to be sorted Phonetic feature；

Module is obtained, for the feature according to the text information for describing the video to be sorted, obtains the view to be sorted The text feature of frequency；

Determining module, for being determined belonging to the video to be sorted using described image feature, phonetic feature and text feature Classification.

10. a kind of computer equipment characterized by comprising

At least one processor；And

The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one It manages device to execute, so that at least one described processor is able to carry out the described in any item video classification methods of claim 1-8.

11. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program realizes claim 1-8 described in any item video classification methods when the computer program is executed by processor.