Summary of the invention
Inventor's discovery of the disclosure is above-mentioned, and there are the following problems in the related technology: image and work as process object
It is single for the neural network type of processing means, the individual behavior in video can not be characterized, comprehensively so as to cause visual classification
Accuracy rate it is not high.In view of the above technical problems, the present disclosure proposes a kind of visual classification technical solutions of high-accuracy.
According to some embodiments of the present disclosure, a kind of video classification methods are provided, comprising: mention from video to be sorted
Multiframe RGB image is taken, multiframe light stream image is obtained according to the RGB image of consecutive frame;For every frame RGB image of the video,
According to present frame RGB image and its former frame RGB image, convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through, obtain institute
The first class probability vector of video is stated, each element in the first class probability vector is represented based on view described in RGB image
Frequency belongs to the probability of each classification;For every frame light stream image of the video, according to present frame light stream image and its former frame light
Stream picture passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtains the second class probability vector of the video, described
Each element in second class probability vector represents the probability for belonging to each classification based on video described in light stream image;According to described
Second class probability vector described in one class probability vector sum determines the classification of the video.
Optionally, according to the present frame RGB image of the video and its former frame RGB image, convolutional Neural is passed sequentially through
Network and Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of the present frame RGB image;Schemed according to the present frame RGB
The circulation sign vector of picture obtains the first class probability vector of the present frame RGB image;According to the present frame RGB image
First class probability vector obtains the first class probability vector of the video.
Optionally, the convolution feature vector of present frame RGB image is obtained by convolutional neural networks;By the present frame
The convolution feature vector of RGB image and its cycle specificity vector of former frame RGB image input Recognition with Recurrent Neural Network, to obtain
State the cycle specificity vector of present frame RGB image.
Optionally, the cycle specificity vector of the present frame RGB image is inputted into preset full articulamentum, worked as described in acquisition
First class probability vector of previous frame RGB image;The average value of the first class probability vector of all RGB images is calculated, to obtain
Take the first class probability vector of the video.
Optionally, according to the present frame light stream image of the video and its former frame light stream image, convolution mind is passed sequentially through
Through network and Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of the present frame light stream image;According to the present frame light
The cycle specificity vector of stream picture obtains the second class probability vector of the present frame light stream image, according to the present frame light
Second class probability vector of stream picture obtains the second class probability vector of the video.
Optionally, the convolution feature vector of present frame light stream image is obtained by convolutional neural networks;By the present frame
The convolution feature vector of light stream image and its cycle specificity vector of former frame light stream image input Recognition with Recurrent Neural Network, to obtain
The cycle specificity vector of the present frame light stream image.
Optionally, the cycle specificity vector of the present frame light stream image is inputted into preset full articulamentum, described in acquisition
Second class probability vector of present frame light stream image;The average value of the second class probability vector of all light stream images is calculated,
To obtain the second class probability vector of the video.
Optionally, the average value of the second class probability vector described in the first class probability vector sum is calculated, to obtain
The third class probability vector of the video;The corresponding classification of the maximum element of the third class probability vector intermediate value is determined
For the classification of the video.
Optionally, the convolutional neural networks are ResNet-101, and the Recognition with Recurrent Neural Network is LSTM (Long
Short-Term Memory, shot and long term memory network).
According to other embodiments of the disclosure, a kind of visual classification device is provided, comprising: image zooming-out module is used for
Multiframe RGB image is extracted from video to be sorted, and multiframe light stream image is obtained according to the RGB image of consecutive frame;First classification
Probability vector obtains module, for every frame RGB image for the video, according to present frame RGB image and its former frame RGB
Image passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtains the first class probability vector of the video, and described
Each element in one class probability vector represents the probability for belonging to each classification based on video described in RGB image;Second class probability
Vector obtains module, for every frame light stream image for the video, according to present frame light stream image and its former frame light stream
Image passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtains the second class probability vector of the video, and described
Each element in two class probability vectors represents the probability for belonging to each classification based on video described in light stream image;Classify and determines mould
Block determines the classification of the video for the second class probability vector according to the first class probability vector sum.
Optionally, the first class probability vector obtains module according to the present frame RGB image of the video and its preceding
One frame RGB image, passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, to obtain the circulation of the present frame RGB image
Feature vector obtains the first class probability of the present frame RGB image according to the circulation sign vector of the present frame RGB image
Vector obtains the first class probability vector of the video according to the first class probability vector of the present frame RGB image.
Optionally, the first class probability vector obtains module and obtains present frame RGB image by convolutional neural networks
Convolution feature vector, by the convolution feature vector of the present frame RGB image and its cycle specificity of former frame RGB image to
Amount input Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of the present frame RGB image.
Optionally, the first class probability vector obtains module for the cycle specificity vector of the present frame RGB image
Preset full articulamentum is inputted, the first class probability vector of the present frame RGB image is obtained, calculates all RGB images
The average value of first class probability vector, to obtain the first class probability vector of the video.
Optionally, the second class probability vector obtains module according to the present frame light stream image of the video and its preceding
One frame light stream image, passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, to obtain following for the present frame light stream image
Ring feature vector obtains second point of the present frame light stream image according to the cycle specificity vector of the present frame light stream image
Class probability vector obtains the second class probability of the video according to the second class probability vector of the present frame light stream image
Vector.
Optionally, the second class probability vector obtains module and obtains present frame light stream image by convolutional neural networks
Convolution feature vector, by the convolution feature vector of the present frame light stream image and its cycle specificity of former frame light stream image
Vector inputs Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of the present frame light stream image.
Optionally, the second class probability vector obtains module for the cycle specificity vector of the present frame light stream image
Preset full articulamentum is inputted, the second class probability vector of the present frame light stream image is obtained, calculates all light stream images
The second class probability vector average value, to obtain the second class probability vector of the video.
Optionally, the classification determining module calculates the second class probability vector described in the first class probability vector sum
Average value, to obtain the third class probability vector of the video, by the maximum member of the third class probability vector intermediate value
The corresponding classification of element is determined as the classification of the video.
Optionally, the convolutional neural networks are ResNet-101, and the Recognition with Recurrent Neural Network is LSTM.
According to the other embodiment of the disclosure, a kind of visual classification device is provided, comprising: memory and be coupled to institute
The processor of memory is stated, the processor is configured to executing above-mentioned based on the instruction being stored in the memory device
Video classification methods described in any one embodiment.
According to the still other embodiments of the disclosure, a kind of computer readable storage medium is provided, computer is stored thereon with
Program, the program realize video classification methods described in any of the above-described a embodiment when being executed by processor.
In the above-described embodiments, pass sequentially through convolutional neural networks and Recognition with Recurrent Neural Network in video RGB image and
Light stream image is handled, and the visual classification situation of acquisition is merged the classification so that it is determined that video.It can combine in this way
Different processing means merge different image informations to analyze time varying image, and using between consecutive frame image when
Between dependence classify to video, to improve the accuracy of visual classification.
Specific embodiment
The various exemplary embodiments of the disclosure are described in detail now with reference to attached drawing.It should also be noted that unless in addition having
Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally
Scope of disclosure.
Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality
Proportionate relationship draw.
Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the disclosure
And its application or any restrictions used.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable
In the case of, the technology, method and apparatus should be considered as authorizing part of specification.
It is shown here and discuss all examples in, any occurrence should be construed as merely illustratively, without
It is as limitation.Therefore, the other examples of exemplary embodiment can have different values.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.
Fig. 1 shows the flow chart of some embodiments of the video classification methods of the disclosure.
As shown in Figure 1, the video classification methods include: step 110, the RGB image and light stream image of video are obtained;Step
120, the first class probability vector is obtained based on RGB image;Step 130, the second class probability vector is obtained based on light stream image;
Step 140, the classification of video is determined.
In step 110, multiframe RGB image is extracted from video to be sorted, is obtained according to the RGB image of consecutive frame more
Frame light stream image.For example, N frame (i.e. N number of moment) continuous image can be extracted from video.According in the continuous image of N frame
Every two frames adjacent image calculates a frame light stream image, to obtain the continuous light stream image of N-1 frame.
In the step 120, for every frame RGB image of video, according to present frame RGB image and its former frame RGB image,
Pass sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtain the first class probability vector of video, the first class probability to
Each element in amount represents the probability for belonging to each classification based on RGB image video.
In one embodiment, convolutional neural networks can be ResNet-101, and Recognition with Recurrent Neural Network can be LSTM.With
Other convolutional neural networks are compared, and ResNet-101 has stronger feature learning ability, it is hereby achieved that higher video
Classification accuracy.Therefore, it can use ResNet-101 and extract input of the characteristics of image as LSTM, without using ResNet-
101 last full articulamentum is exported.For example, the last full articulamentum of ResNet-101 can be removed, merely with ResNet-
101 extract the convolution feature vector of current frame image, then convolution feature vector is inputted and obtains following for current frame image in LSTM
Ring feature vector obtains the class probability vector of video finally by a preset full articulamentum.Convolution can be combined in this way
The image characteristics extraction advantage and Recognition with Recurrent Neural Network of neural network realize visual classification to the processing advantage of time associated data,
To improve visual classification accuracy rate.
In one embodiment, can by Fig. 2 shows embodiment obtaining step 120 in the first class probability to
Amount.
Fig. 2 shows the flow charts for some embodiments for obtaining the first class probability vector.
As shown in Fig. 2, the first class probability vector can be obtained as follows: step 1201, obtaining RGB image
Convolution feature vector;Step 1202, the cycle specificity vector of RGB image is obtained;Step 1203, first point of RGB image is obtained
Class probability vector;Step 1204, the first class probability vector of video is obtained.
In step 1201, the convolution feature vector of present frame RGB image can be obtained by convolutional neural networks.
In one embodiment, the last one full articulamentum that ResNet-101 can be removed, only extract full articulamentum it
Convolution feature vector of the preceding down-sampled output as current frame image.For example, convolution feature vector can be for one and current
Corresponding 2048 dimensional vector of frame RGB image.
In step 1202, by the convolution feature vector of present frame RGB image and its cycle specificity of former frame RGB image
Vector inputs Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of present frame RGB image.The circulation of present frame RGB image is special
Sign vector can be used as one of the input of Recognition with Recurrent Neural Network of next frame RGB image.For the image of not former frame, such as the
One frame image can obtain cycle specificity vector by way of initial value is arranged.
It in one embodiment, can be by 2048 dimensional vectors and previous frame of present frame RGB image in above-described embodiment
The cycle specificity vector of RGB image inputs LSTM, to obtain M dimensional vector as the cycle specificity vector of present frame RGB image.So
One of afterwards, the cycle specificity vector of present frame RGB image can be inputted as the LSTM of next frame RGB image.Such iteration
Go down, the cycle specificity vector of available all RGB images.
In step 1203, the cycle specificity vector of present frame RGB image is inputted into preset full articulamentum, is obtained current
First class probability vector of frame RGB image.
In one embodiment, a full articulamentum can be connected behind LSTM in the above-described embodiments.For example, Quan Lian
The input for connecing layer is the shared M input node of cycle specificity vector of present frame RGB image, is exported as present frame RGB image
First class probability vector shares C output node.C output node corresponds to the probability that video belongs to C classification.Each input
Node is connected with each output node respectively, i.e., shared M × C connection.Each connection all has corresponding weight, i.e., shared
M × C weight.First class probability vector of C output composition present frame RGB image, the first class probability vector intermediate value is most
Big element can represent the classification of present frame RGB image.Can thus preset full articulamentum, then by cycle specificity to
Amount inputs full articulamentum and obtains the classification of video.
It in step 1204, can be by the average value of the first class probability vector of all RGB images of calculating, to obtain
First class probability vector of video.For example, corresponding element in the first class probability vector of all RGB images can be calculated separately
The average value of element obtains the first class probability vector of video.
1201-1204 obtains the probability for belonging to each classification based on RGB image video through the above steps, below can be after
It is continuous to execute the step 130-140 in Fig. 1 embodiment to determine visual classification.Step 120 and step 130 can execute parallel,
It can serially execute, the sequence serially executed can be interchanged.
In step 130, for every frame light stream image of video, according to present frame light stream image and its former frame light stream figure
Picture passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtains the second class probability vector of video, the second class probability
Each element in vector represents the probability for belonging to each classification based on light stream image/video.
In one embodiment, the method in above-described embodiment can be applied in light stream image and is obtained based on light stream
Second class probability vector of the video of image, details are not described herein.
In step 140, the second class probability vector according to the first class probability vector sum determines the classification of video.
For example, the average value of first class probability vector sum the second class probability vector can be calculated, to obtain the third classification of video
The corresponding classification of the maximum element of third class probability vector intermediate value is determined as the classification of video by probability vector.
In order to more clearly describe the process of the video classification methods, the mistake that above-described embodiment can be summarized as in Fig. 3
Journey.
Fig. 3 shows the schematic diagram of some embodiments of the video classification methods of the disclosure.
As shown in figure 3, by the RGB image extracted from video and light stream image pass sequentially through respectively ResNet-101 and
LSTM.The cycle specificity vector that LSTM is extracted exports the first class probability vector sum the as the input of full articulamentum respectively
Two class probability vectors.First class probability vector sum the second class probability vector is merged to point so that it is determined that video
Class, for example, by being averaged or being directly averaged after the two weighted sum.The network parameter of ResNet-101 can make
It is initialized with ResNet-101 model parameter trained on ImageNet data set, other network parameters can lead to
Cross random initializtion acquisition.
In above-described embodiment, convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through to the RGB image and light in video
Stream picture is handled, and the visual classification situation of acquisition is merged the classification so that it is determined that video.It in this way can be in conjunction with not
Same processing means merge different image informations to analyze time varying image, and utilize the time between consecutive frame image
Dependence classifies to video, to improve the accuracy of visual classification.
Fig. 4 shows the structure chart of some embodiments of the visual classification device of the disclosure.
As shown in figure 4, visual classification device 4 include image zooming-out module 41, the first class probability vector obtain module 42,
Second class probability vector obtains module 43 and classification determining module 44.
Image zooming-out module 41 extracts multiframe RGB image from video to be sorted, is obtained according to the RGB image of consecutive frame
Multiframe light stream image.
For every frame RGB image of video, the first class probability vector obtain module 42 according to present frame RGB image and its
Former frame RGB image, passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtains the first class probability vector of video,
Each element in first class probability vector represents the probability for belonging to each classification based on RGB image video.For example, convolutional Neural net
Network can be ResNet-101, and Recognition with Recurrent Neural Network can be LSTM.
In one embodiment, firstly, the first class probability vector obtains module 42 according to the present frame RGB image of video
And its former frame RGB image, convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through, to obtain following for present frame RGB image
Ring feature vector.For example, the first class probability vector, which obtains module 42, obtains present frame RGB image by convolutional neural networks
Convolution feature vector inputs the cycle specificity vector of the convolution feature vector of present frame RGB image and its former frame RGB image
Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of present frame RGB image.
Then, the first class probability vector obtains module 42 and levies vector acquisition currently according to the circulation of present frame RGB image
First class probability vector of frame RGB image obtains the first of video according to the first class probability vector of present frame RGB image
Class probability vector.The cycle specificity vector of present frame RGB image is inputted for example, the first class probability vector obtains module 42
Preset full articulamentum obtains the first class probability vector of present frame RGB image, calculates the first classification of all RGB images
The average value of probability vector, to obtain the first class probability vector of video.
For every frame light stream image of video, the second class probability vector obtain module 43 according to present frame light stream image and
Its former frame light stream image, passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtain the second class probability of video to
It measures, each element in the second class probability vector represents the probability for belonging to each classification based on video described in light stream image.
In one embodiment, firstly, the second class probability vector obtains module 43 according to the present frame light stream figure of video
Picture and its former frame light stream image, pass sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, to obtain present frame light stream image
Cycle specificity vector.For example, the second class probability vector, which obtains module 43, obtains present frame light stream by convolutional neural networks
The convolution feature vector of image, by the convolution feature vector of present frame light stream image and its cycle specificity of former frame light stream image
Vector inputs Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of present frame light stream image.
Then, the second class probability vector obtains module 43 and is worked as according to the acquisition of the cycle specificity vector of present frame light stream image
Second class probability vector of previous frame light stream image obtains video according to the second class probability vector of present frame light stream image
Second class probability vector.For example, the second class probability vector obtain module 43 by the cycle specificity of present frame light stream image to
Amount inputs preset full articulamentum, obtains the second class probability vector of present frame light stream image, calculates all light stream images
The average value of second class probability vector, to obtain the second class probability vector of video.
Classification determining module 44 determines point of the video according to first class probability vector sum the second class probability vector
Class.For example, classification determining module 44 calculates the average value of first class probability vector sum the second class probability vector, to be regarded
The corresponding classification of the maximum element of third class probability vector intermediate value is determined as point of video by the third class probability vector of frequency
Class.
In above-described embodiment, convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through to the RGB image and light in video
Stream picture is handled, and the visual classification situation of acquisition is merged the classification so that it is determined that video.It in this way can be in conjunction with not
Same processing means merge different image informations to analyze time varying image, and utilize the time between consecutive frame image
Dependence classifies to video, to improve the accuracy of visual classification.
Fig. 5 shows the structure chart of the other embodiment of the visual classification device of the disclosure.
As shown in figure 5, the device 5 of the embodiment includes: memory 51 and the processor 52 for being coupled to the memory 51,
Processor 52 is configured as executing the video in the disclosure in any some embodiments based on the instruction being stored in memory 51
Classification method.
Wherein, memory 51 is such as may include system storage, fixed non-volatile memory medium.System storage
Such as be stored with operating system, application program, Boot loader (Boot Loader), database and other programs etc..
Those skilled in the art should be understood that embodiment of the disclosure can provide as method, system or computer journey
Sequence product.Therefore, complete hardware embodiment, complete software embodiment or combining software and hardware aspects can be used in the disclosure
The form of embodiment.Moreover, it wherein includes the calculating of computer usable program code that the disclosure, which can be used in one or more,
Machine can use the meter implemented in non-transient storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of calculation machine program product.
So far, the video classification methods, device and computer readable storage medium according to the disclosure are described in detail.
In order to avoid covering the design of the disclosure, some details known in the field are not described.Those skilled in the art are according to upper
The description in face, completely it can be appreciated how implementing technical solution disclosed herein.
Disclosed method and system may be achieved in many ways.For example, can by software, hardware, firmware or
Software, hardware, firmware any combination realize disclosed method and system.The said sequence of the step of for the method
Merely to be illustrated, the step of disclosed method, is not limited to sequence described in detail above, special unless otherwise
It does not mentionlet alone bright.In addition, in some embodiments, also the disclosure can be embodied as to record program in the recording medium, these programs
Including for realizing according to the machine readable instructions of disclosed method.Thus, the disclosure also covers storage for executing basis
The recording medium of the program of disclosed method.
Although being described in detail by some specific embodiments of the example to the disclosure, the skill of this field
Art personnel it should be understood that above example merely to be illustrated, rather than in order to limit the scope of the present disclosure.The skill of this field
Art personnel are it should be understood that can modify to above embodiments in the case where not departing from the scope of the present disclosure and spirit.This public affairs
The range opened is defined by the following claims.