CN109753984A - Video classification methods, device and computer readable storage medium - Google Patents

Video classification methods, device and computer readable storage medium Download PDF

Info

Publication number
CN109753984A
CN109753984A CN201711084116.8A CN201711084116A CN109753984A CN 109753984 A CN109753984 A CN 109753984A CN 201711084116 A CN201711084116 A CN 201711084116A CN 109753984 A CN109753984 A CN 109753984A
Authority
CN
China
Prior art keywords
video
vector
class probability
present frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711084116.8A
Other languages
Chinese (zh)
Inventor
张立成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Qianshi Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201711084116.8A priority Critical patent/CN109753984A/en
Publication of CN109753984A publication Critical patent/CN109753984A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

This disclosure relates to which a kind of video classification methods, device and computer readable storage medium, are related to technical field of data processing.This method comprises: extracting multiframe RGB image from video to be sorted, multiframe light stream image is obtained according to the RGB image of consecutive frame;According to present frame RGB image and its former frame RGB image, convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through, obtain the first class probability vector of video, each element in the first class probability vector represents the probability for belonging to each classification based on RGB image video;According to present frame light stream image and its former frame light stream image, pass sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, the second class probability vector of video is obtained, each element in the second class probability vector represents the probability for belonging to each classification based on light stream image/video;The classification of the video is determined according to first class probability vector sum the second class probability vector.This method and device can be improved the accuracy of visual classification.

Description

Video classification methods, device and computer readable storage medium
Technical field
This disclosure relates to technical field of data processing, in particular to a kind of video classification methods, device and computer-readable Storage medium.
Background technique
With the development of digital storage technique, video database technology, multimedia messages become increasingly abundant.For effective land productivity With these multimedia messages, needs automatically to organize multimedia messages, indexes retrieval to facilitate multi-medium data.Depending on The classification of frequency content is the important link for concluding, understanding, retrieving video data.Visual classification technology using image procossing, The method of video processing analyzes video, to judge behavior individual in video, for example, various ball game, daily Activity etc..
The relevant technologies mainly use convolutional neural networks, such as AlexNet, GoogleNet, to the frame image in video into Row processing, to realize visual classification.
Summary of the invention
Inventor's discovery of the disclosure is above-mentioned, and there are the following problems in the related technology: image and work as process object It is single for the neural network type of processing means, the individual behavior in video can not be characterized, comprehensively so as to cause visual classification Accuracy rate it is not high.In view of the above technical problems, the present disclosure proposes a kind of visual classification technical solutions of high-accuracy.
According to some embodiments of the present disclosure, a kind of video classification methods are provided, comprising: mention from video to be sorted Multiframe RGB image is taken, multiframe light stream image is obtained according to the RGB image of consecutive frame;For every frame RGB image of the video, According to present frame RGB image and its former frame RGB image, convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through, obtain institute The first class probability vector of video is stated, each element in the first class probability vector is represented based on view described in RGB image Frequency belongs to the probability of each classification;For every frame light stream image of the video, according to present frame light stream image and its former frame light Stream picture passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtains the second class probability vector of the video, described Each element in second class probability vector represents the probability for belonging to each classification based on video described in light stream image;According to described Second class probability vector described in one class probability vector sum determines the classification of the video.
Optionally, according to the present frame RGB image of the video and its former frame RGB image, convolutional Neural is passed sequentially through Network and Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of the present frame RGB image;Schemed according to the present frame RGB The circulation sign vector of picture obtains the first class probability vector of the present frame RGB image;According to the present frame RGB image First class probability vector obtains the first class probability vector of the video.
Optionally, the convolution feature vector of present frame RGB image is obtained by convolutional neural networks;By the present frame The convolution feature vector of RGB image and its cycle specificity vector of former frame RGB image input Recognition with Recurrent Neural Network, to obtain State the cycle specificity vector of present frame RGB image.
Optionally, the cycle specificity vector of the present frame RGB image is inputted into preset full articulamentum, worked as described in acquisition First class probability vector of previous frame RGB image;The average value of the first class probability vector of all RGB images is calculated, to obtain Take the first class probability vector of the video.
Optionally, according to the present frame light stream image of the video and its former frame light stream image, convolution mind is passed sequentially through Through network and Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of the present frame light stream image;According to the present frame light The cycle specificity vector of stream picture obtains the second class probability vector of the present frame light stream image, according to the present frame light Second class probability vector of stream picture obtains the second class probability vector of the video.
Optionally, the convolution feature vector of present frame light stream image is obtained by convolutional neural networks;By the present frame The convolution feature vector of light stream image and its cycle specificity vector of former frame light stream image input Recognition with Recurrent Neural Network, to obtain The cycle specificity vector of the present frame light stream image.
Optionally, the cycle specificity vector of the present frame light stream image is inputted into preset full articulamentum, described in acquisition Second class probability vector of present frame light stream image;The average value of the second class probability vector of all light stream images is calculated, To obtain the second class probability vector of the video.
Optionally, the average value of the second class probability vector described in the first class probability vector sum is calculated, to obtain The third class probability vector of the video;The corresponding classification of the maximum element of the third class probability vector intermediate value is determined For the classification of the video.
Optionally, the convolutional neural networks are ResNet-101, and the Recognition with Recurrent Neural Network is LSTM (Long Short-Term Memory, shot and long term memory network).
According to other embodiments of the disclosure, a kind of visual classification device is provided, comprising: image zooming-out module is used for Multiframe RGB image is extracted from video to be sorted, and multiframe light stream image is obtained according to the RGB image of consecutive frame;First classification Probability vector obtains module, for every frame RGB image for the video, according to present frame RGB image and its former frame RGB Image passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtains the first class probability vector of the video, and described Each element in one class probability vector represents the probability for belonging to each classification based on video described in RGB image;Second class probability Vector obtains module, for every frame light stream image for the video, according to present frame light stream image and its former frame light stream Image passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtains the second class probability vector of the video, and described Each element in two class probability vectors represents the probability for belonging to each classification based on video described in light stream image;Classify and determines mould Block determines the classification of the video for the second class probability vector according to the first class probability vector sum.
Optionally, the first class probability vector obtains module according to the present frame RGB image of the video and its preceding One frame RGB image, passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, to obtain the circulation of the present frame RGB image Feature vector obtains the first class probability of the present frame RGB image according to the circulation sign vector of the present frame RGB image Vector obtains the first class probability vector of the video according to the first class probability vector of the present frame RGB image.
Optionally, the first class probability vector obtains module and obtains present frame RGB image by convolutional neural networks Convolution feature vector, by the convolution feature vector of the present frame RGB image and its cycle specificity of former frame RGB image to Amount input Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of the present frame RGB image.
Optionally, the first class probability vector obtains module for the cycle specificity vector of the present frame RGB image Preset full articulamentum is inputted, the first class probability vector of the present frame RGB image is obtained, calculates all RGB images The average value of first class probability vector, to obtain the first class probability vector of the video.
Optionally, the second class probability vector obtains module according to the present frame light stream image of the video and its preceding One frame light stream image, passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, to obtain following for the present frame light stream image Ring feature vector obtains second point of the present frame light stream image according to the cycle specificity vector of the present frame light stream image Class probability vector obtains the second class probability of the video according to the second class probability vector of the present frame light stream image Vector.
Optionally, the second class probability vector obtains module and obtains present frame light stream image by convolutional neural networks Convolution feature vector, by the convolution feature vector of the present frame light stream image and its cycle specificity of former frame light stream image Vector inputs Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of the present frame light stream image.
Optionally, the second class probability vector obtains module for the cycle specificity vector of the present frame light stream image Preset full articulamentum is inputted, the second class probability vector of the present frame light stream image is obtained, calculates all light stream images The second class probability vector average value, to obtain the second class probability vector of the video.
Optionally, the classification determining module calculates the second class probability vector described in the first class probability vector sum Average value, to obtain the third class probability vector of the video, by the maximum member of the third class probability vector intermediate value The corresponding classification of element is determined as the classification of the video.
Optionally, the convolutional neural networks are ResNet-101, and the Recognition with Recurrent Neural Network is LSTM.
According to the other embodiment of the disclosure, a kind of visual classification device is provided, comprising: memory and be coupled to institute The processor of memory is stated, the processor is configured to executing above-mentioned based on the instruction being stored in the memory device Video classification methods described in any one embodiment.
According to the still other embodiments of the disclosure, a kind of computer readable storage medium is provided, computer is stored thereon with Program, the program realize video classification methods described in any of the above-described a embodiment when being executed by processor.
In the above-described embodiments, pass sequentially through convolutional neural networks and Recognition with Recurrent Neural Network in video RGB image and Light stream image is handled, and the visual classification situation of acquisition is merged the classification so that it is determined that video.It can combine in this way Different processing means merge different image informations to analyze time varying image, and using between consecutive frame image when Between dependence classify to video, to improve the accuracy of visual classification.
Detailed description of the invention
The attached drawing for constituting part of specification describes embodiment of the disclosure, and together with the description for solving Release the principle of the disclosure.
The disclosure can be more clearly understood according to following detailed description referring to attached drawing, in which:
Fig. 1 shows the flow chart of some embodiments of the video classification methods of the disclosure.
Fig. 2 shows the flow charts for some embodiments for obtaining the first class probability vector.
Fig. 3 shows the schematic diagram of some embodiments of the video classification methods of the disclosure.
Fig. 4 shows the structure chart of some embodiments of the visual classification device of the disclosure.
Fig. 5 shows the structure chart of the other embodiment of the visual classification device of the disclosure.
Specific embodiment
The various exemplary embodiments of the disclosure are described in detail now with reference to attached drawing.It should also be noted that unless in addition having Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally Scope of disclosure.
Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality Proportionate relationship draw.
Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the disclosure And its application or any restrictions used.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as authorizing part of specification.
It is shown here and discuss all examples in, any occurrence should be construed as merely illustratively, without It is as limitation.Therefore, the other examples of exemplary embodiment can have different values.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.
Fig. 1 shows the flow chart of some embodiments of the video classification methods of the disclosure.
As shown in Figure 1, the video classification methods include: step 110, the RGB image and light stream image of video are obtained;Step 120, the first class probability vector is obtained based on RGB image;Step 130, the second class probability vector is obtained based on light stream image; Step 140, the classification of video is determined.
In step 110, multiframe RGB image is extracted from video to be sorted, is obtained according to the RGB image of consecutive frame more Frame light stream image.For example, N frame (i.e. N number of moment) continuous image can be extracted from video.According in the continuous image of N frame Every two frames adjacent image calculates a frame light stream image, to obtain the continuous light stream image of N-1 frame.
In the step 120, for every frame RGB image of video, according to present frame RGB image and its former frame RGB image, Pass sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtain the first class probability vector of video, the first class probability to Each element in amount represents the probability for belonging to each classification based on RGB image video.
In one embodiment, convolutional neural networks can be ResNet-101, and Recognition with Recurrent Neural Network can be LSTM.With Other convolutional neural networks are compared, and ResNet-101 has stronger feature learning ability, it is hereby achieved that higher video Classification accuracy.Therefore, it can use ResNet-101 and extract input of the characteristics of image as LSTM, without using ResNet- 101 last full articulamentum is exported.For example, the last full articulamentum of ResNet-101 can be removed, merely with ResNet- 101 extract the convolution feature vector of current frame image, then convolution feature vector is inputted and obtains following for current frame image in LSTM Ring feature vector obtains the class probability vector of video finally by a preset full articulamentum.Convolution can be combined in this way The image characteristics extraction advantage and Recognition with Recurrent Neural Network of neural network realize visual classification to the processing advantage of time associated data, To improve visual classification accuracy rate.
In one embodiment, can by Fig. 2 shows embodiment obtaining step 120 in the first class probability to Amount.
Fig. 2 shows the flow charts for some embodiments for obtaining the first class probability vector.
As shown in Fig. 2, the first class probability vector can be obtained as follows: step 1201, obtaining RGB image Convolution feature vector;Step 1202, the cycle specificity vector of RGB image is obtained;Step 1203, first point of RGB image is obtained Class probability vector;Step 1204, the first class probability vector of video is obtained.
In step 1201, the convolution feature vector of present frame RGB image can be obtained by convolutional neural networks.
In one embodiment, the last one full articulamentum that ResNet-101 can be removed, only extract full articulamentum it Convolution feature vector of the preceding down-sampled output as current frame image.For example, convolution feature vector can be for one and current Corresponding 2048 dimensional vector of frame RGB image.
In step 1202, by the convolution feature vector of present frame RGB image and its cycle specificity of former frame RGB image Vector inputs Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of present frame RGB image.The circulation of present frame RGB image is special Sign vector can be used as one of the input of Recognition with Recurrent Neural Network of next frame RGB image.For the image of not former frame, such as the One frame image can obtain cycle specificity vector by way of initial value is arranged.
It in one embodiment, can be by 2048 dimensional vectors and previous frame of present frame RGB image in above-described embodiment The cycle specificity vector of RGB image inputs LSTM, to obtain M dimensional vector as the cycle specificity vector of present frame RGB image.So One of afterwards, the cycle specificity vector of present frame RGB image can be inputted as the LSTM of next frame RGB image.Such iteration Go down, the cycle specificity vector of available all RGB images.
In step 1203, the cycle specificity vector of present frame RGB image is inputted into preset full articulamentum, is obtained current First class probability vector of frame RGB image.
In one embodiment, a full articulamentum can be connected behind LSTM in the above-described embodiments.For example, Quan Lian The input for connecing layer is the shared M input node of cycle specificity vector of present frame RGB image, is exported as present frame RGB image First class probability vector shares C output node.C output node corresponds to the probability that video belongs to C classification.Each input Node is connected with each output node respectively, i.e., shared M × C connection.Each connection all has corresponding weight, i.e., shared M × C weight.First class probability vector of C output composition present frame RGB image, the first class probability vector intermediate value is most Big element can represent the classification of present frame RGB image.Can thus preset full articulamentum, then by cycle specificity to Amount inputs full articulamentum and obtains the classification of video.
It in step 1204, can be by the average value of the first class probability vector of all RGB images of calculating, to obtain First class probability vector of video.For example, corresponding element in the first class probability vector of all RGB images can be calculated separately The average value of element obtains the first class probability vector of video.
1201-1204 obtains the probability for belonging to each classification based on RGB image video through the above steps, below can be after It is continuous to execute the step 130-140 in Fig. 1 embodiment to determine visual classification.Step 120 and step 130 can execute parallel, It can serially execute, the sequence serially executed can be interchanged.
In step 130, for every frame light stream image of video, according to present frame light stream image and its former frame light stream figure Picture passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtains the second class probability vector of video, the second class probability Each element in vector represents the probability for belonging to each classification based on light stream image/video.
In one embodiment, the method in above-described embodiment can be applied in light stream image and is obtained based on light stream Second class probability vector of the video of image, details are not described herein.
In step 140, the second class probability vector according to the first class probability vector sum determines the classification of video. For example, the average value of first class probability vector sum the second class probability vector can be calculated, to obtain the third classification of video The corresponding classification of the maximum element of third class probability vector intermediate value is determined as the classification of video by probability vector.
In order to more clearly describe the process of the video classification methods, the mistake that above-described embodiment can be summarized as in Fig. 3 Journey.
Fig. 3 shows the schematic diagram of some embodiments of the video classification methods of the disclosure.
As shown in figure 3, by the RGB image extracted from video and light stream image pass sequentially through respectively ResNet-101 and LSTM.The cycle specificity vector that LSTM is extracted exports the first class probability vector sum the as the input of full articulamentum respectively Two class probability vectors.First class probability vector sum the second class probability vector is merged to point so that it is determined that video Class, for example, by being averaged or being directly averaged after the two weighted sum.The network parameter of ResNet-101 can make It is initialized with ResNet-101 model parameter trained on ImageNet data set, other network parameters can lead to Cross random initializtion acquisition.
In above-described embodiment, convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through to the RGB image and light in video Stream picture is handled, and the visual classification situation of acquisition is merged the classification so that it is determined that video.It in this way can be in conjunction with not Same processing means merge different image informations to analyze time varying image, and utilize the time between consecutive frame image Dependence classifies to video, to improve the accuracy of visual classification.
Fig. 4 shows the structure chart of some embodiments of the visual classification device of the disclosure.
As shown in figure 4, visual classification device 4 include image zooming-out module 41, the first class probability vector obtain module 42, Second class probability vector obtains module 43 and classification determining module 44.
Image zooming-out module 41 extracts multiframe RGB image from video to be sorted, is obtained according to the RGB image of consecutive frame Multiframe light stream image.
For every frame RGB image of video, the first class probability vector obtain module 42 according to present frame RGB image and its Former frame RGB image, passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtains the first class probability vector of video, Each element in first class probability vector represents the probability for belonging to each classification based on RGB image video.For example, convolutional Neural net Network can be ResNet-101, and Recognition with Recurrent Neural Network can be LSTM.
In one embodiment, firstly, the first class probability vector obtains module 42 according to the present frame RGB image of video And its former frame RGB image, convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through, to obtain following for present frame RGB image Ring feature vector.For example, the first class probability vector, which obtains module 42, obtains present frame RGB image by convolutional neural networks Convolution feature vector inputs the cycle specificity vector of the convolution feature vector of present frame RGB image and its former frame RGB image Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of present frame RGB image.
Then, the first class probability vector obtains module 42 and levies vector acquisition currently according to the circulation of present frame RGB image First class probability vector of frame RGB image obtains the first of video according to the first class probability vector of present frame RGB image Class probability vector.The cycle specificity vector of present frame RGB image is inputted for example, the first class probability vector obtains module 42 Preset full articulamentum obtains the first class probability vector of present frame RGB image, calculates the first classification of all RGB images The average value of probability vector, to obtain the first class probability vector of video.
For every frame light stream image of video, the second class probability vector obtain module 43 according to present frame light stream image and Its former frame light stream image, passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtain the second class probability of video to It measures, each element in the second class probability vector represents the probability for belonging to each classification based on video described in light stream image.
In one embodiment, firstly, the second class probability vector obtains module 43 according to the present frame light stream figure of video Picture and its former frame light stream image, pass sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, to obtain present frame light stream image Cycle specificity vector.For example, the second class probability vector, which obtains module 43, obtains present frame light stream by convolutional neural networks The convolution feature vector of image, by the convolution feature vector of present frame light stream image and its cycle specificity of former frame light stream image Vector inputs Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of present frame light stream image.
Then, the second class probability vector obtains module 43 and is worked as according to the acquisition of the cycle specificity vector of present frame light stream image Second class probability vector of previous frame light stream image obtains video according to the second class probability vector of present frame light stream image Second class probability vector.For example, the second class probability vector obtain module 43 by the cycle specificity of present frame light stream image to Amount inputs preset full articulamentum, obtains the second class probability vector of present frame light stream image, calculates all light stream images The average value of second class probability vector, to obtain the second class probability vector of video.
Classification determining module 44 determines point of the video according to first class probability vector sum the second class probability vector Class.For example, classification determining module 44 calculates the average value of first class probability vector sum the second class probability vector, to be regarded The corresponding classification of the maximum element of third class probability vector intermediate value is determined as point of video by the third class probability vector of frequency Class.
In above-described embodiment, convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through to the RGB image and light in video Stream picture is handled, and the visual classification situation of acquisition is merged the classification so that it is determined that video.It in this way can be in conjunction with not Same processing means merge different image informations to analyze time varying image, and utilize the time between consecutive frame image Dependence classifies to video, to improve the accuracy of visual classification.
Fig. 5 shows the structure chart of the other embodiment of the visual classification device of the disclosure.
As shown in figure 5, the device 5 of the embodiment includes: memory 51 and the processor 52 for being coupled to the memory 51, Processor 52 is configured as executing the video in the disclosure in any some embodiments based on the instruction being stored in memory 51 Classification method.
Wherein, memory 51 is such as may include system storage, fixed non-volatile memory medium.System storage Such as be stored with operating system, application program, Boot loader (Boot Loader), database and other programs etc..
Those skilled in the art should be understood that embodiment of the disclosure can provide as method, system or computer journey Sequence product.Therefore, complete hardware embodiment, complete software embodiment or combining software and hardware aspects can be used in the disclosure The form of embodiment.Moreover, it wherein includes the calculating of computer usable program code that the disclosure, which can be used in one or more, Machine can use the meter implemented in non-transient storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of calculation machine program product.
So far, the video classification methods, device and computer readable storage medium according to the disclosure are described in detail. In order to avoid covering the design of the disclosure, some details known in the field are not described.Those skilled in the art are according to upper The description in face, completely it can be appreciated how implementing technical solution disclosed herein.
Disclosed method and system may be achieved in many ways.For example, can by software, hardware, firmware or Software, hardware, firmware any combination realize disclosed method and system.The said sequence of the step of for the method Merely to be illustrated, the step of disclosed method, is not limited to sequence described in detail above, special unless otherwise It does not mentionlet alone bright.In addition, in some embodiments, also the disclosure can be embodied as to record program in the recording medium, these programs Including for realizing according to the machine readable instructions of disclosed method.Thus, the disclosure also covers storage for executing basis The recording medium of the program of disclosed method.
Although being described in detail by some specific embodiments of the example to the disclosure, the skill of this field Art personnel it should be understood that above example merely to be illustrated, rather than in order to limit the scope of the present disclosure.The skill of this field Art personnel are it should be understood that can modify to above embodiments in the case where not departing from the scope of the present disclosure and spirit.This public affairs The range opened is defined by the following claims.

Claims (20)

1. a kind of video classification methods, comprising:
Multiframe RGB image is extracted from video to be sorted, and multiframe light stream image is obtained according to the RGB image of consecutive frame;
Convolution is passed sequentially through according to present frame RGB image and its former frame RGB image for every frame RGB image of the video Neural network and Recognition with Recurrent Neural Network, obtain the first class probability vector of the video, in the first class probability vector Each element represent and belong to the probability of each classification based on video described in RGB image;
Every frame light stream image of the video is passed sequentially through according to present frame light stream image and its former frame light stream image Convolutional neural networks and Recognition with Recurrent Neural Network, obtain the second class probability vector of the video, second class probability to Each element in amount represents the probability for belonging to each classification based on video described in light stream image;
The second class probability vector according to the first class probability vector sum determines the classification of the video.
2. video classification methods according to claim 1, wherein obtain the first class probability vector packet of the video It includes:
According to the present frame RGB image and its former frame RGB image of the video, convolutional neural networks and circulation mind are passed sequentially through Through network, to obtain the cycle specificity vector of the present frame RGB image;
The first class probability vector of the present frame RGB image is obtained according to the circulation sign vector of the present frame RGB image;
The first class probability vector of the video is obtained according to the first class probability vector of the present frame RGB image.
3. video classification methods according to claim 2, wherein obtain the cycle specificity of the present frame RGB image to Amount includes:
The convolution feature vector of present frame RGB image is obtained by convolutional neural networks;
The cycle specificity vector of the convolution feature vector of the present frame RGB image and its former frame RGB image is inputted and is recycled Neural network, to obtain the cycle specificity vector of the present frame RGB image.
4. video classification methods according to claim 2, wherein obtain the first class probability vector packet of the video It includes:
The cycle specificity vector of the present frame RGB image is inputted into preset full articulamentum, obtains the present frame RGB image The first class probability vector;
Calculate the average value of the first class probability vector of all RGB images, with obtain the first class probability of the video to Amount.
5. video classification methods according to claim 1, wherein obtain the second class probability vector packet of the video It includes:
According to the present frame light stream image of the video and its former frame light stream image, convolutional neural networks and circulation are passed sequentially through Neural network, to obtain the cycle specificity vector of the present frame light stream image;
The second class probability of the present frame light stream image is obtained according to the cycle specificity vector of the present frame light stream image Vector obtains the second class probability vector of the video according to the second class probability vector of the present frame light stream image.
6. video classification methods according to claim 5, wherein obtain the cycle specificity of the present frame light stream image to Amount includes:
The convolution feature vector of present frame light stream image is obtained by convolutional neural networks;
The input of the cycle specificity vector of the convolution feature vector of the present frame light stream image and its former frame light stream image is followed Ring neural network, to obtain the cycle specificity vector of the present frame light stream image.
7. video classification methods according to claim 5, wherein obtain the second class probability vector packet of the video It includes:
The cycle specificity vector of the present frame light stream image is inputted into preset full articulamentum, obtains the present frame light stream figure Second class probability vector of picture;
Calculate the average value of the second class probability vector of all light stream images, with obtain the second class probability of the video to Amount.
8. video classification methods according to claim 1, wherein the classification for determining the video includes:
The average value of the second class probability vector described in the first class probability vector sum is calculated, to obtain the of the video Three class probability vectors;
The corresponding classification of the maximum element of the third class probability vector intermediate value is determined as to the classification of the video.
9. video classification methods according to claim 1-8, wherein
The convolutional neural networks are ResNet-101, and the Recognition with Recurrent Neural Network is shot and long term memory network LSTM.
10. a kind of visual classification device, comprising:
Image zooming-out module is obtained for extracting multiframe RGB image from video to be sorted according to the RGB image of consecutive frame Multiframe light stream image;
First class probability vector obtains module, for every frame RGB image for the video, according to present frame RGB image And its former frame RGB image, convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through, the first classification for obtaining the video is general Rate vector, each element in the first class probability vector represent the probability for belonging to each classification based on video described in RGB image;
Second class probability vector obtains module, for every frame light stream image for the video, according to present frame light stream figure Picture and its former frame light stream image, pass sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, obtain second point of the video Class probability vector, each element in the second class probability vector, which is represented, belongs to each classification based on video described in light stream image Probability;
Classification determining module, determines the view for the second class probability vector according to the first class probability vector sum The classification of frequency.
11. visual classification device according to claim 10, wherein
The first class probability vector obtains module according to the present frame RGB image and its former frame RGB image of the video, Convolutional neural networks and Recognition with Recurrent Neural Network are passed sequentially through, to obtain the cycle specificity vector of the present frame RGB image, according to The circulation sign vector of the present frame RGB image obtains the first class probability vector of the present frame RGB image, according to described First class probability vector of present frame RGB image obtains the first class probability vector of the video.
12. visual classification device according to claim 11, wherein
The first class probability vector obtain module by the convolution feature of convolutional neural networks acquisition present frame RGB image to The cycle specificity vector of the convolution feature vector of the present frame RGB image and its former frame RGB image is inputted circulation mind by amount Through network, to obtain the cycle specificity vector of the present frame RGB image.
13. visual classification device according to claim 11, wherein
The first class probability vector obtains module, and the cycle specificity vector input of the present frame RGB image is preset complete Articulamentum obtains the first class probability vector of the present frame RGB image, calculates the first class probability of all RGB images The average value of vector, to obtain the first class probability vector of the video.
14. visual classification device according to claim 10, wherein
The second class probability vector obtains module according to the present frame light stream image and its former frame light stream figure of the video Picture passes sequentially through convolutional neural networks and Recognition with Recurrent Neural Network, to obtain the cycle specificity vector of the present frame light stream image, The second class probability vector of the present frame light stream image is obtained according to the cycle specificity vector of the present frame light stream image, The second class probability vector of the video is obtained according to the second class probability vector of the present frame light stream image.
15. visual classification device according to claim 14, wherein
The second class probability vector obtains the convolution feature that module obtains present frame light stream image by convolutional neural networks Vector follows the input of the cycle specificity vector of the convolution feature vector of the present frame light stream image and its former frame light stream image Ring neural network, to obtain the cycle specificity vector of the present frame light stream image.
16. visual classification device according to claim 14, wherein
The second class probability vector obtains module, and the cycle specificity vector input of the present frame light stream image is preset Full articulamentum obtains the second class probability vector of the present frame light stream image, calculates the second classification of all light stream images The average value of probability vector, to obtain the second class probability vector of the video.
17. visual classification device according to claim 10, wherein
The classification determining module calculates the average value of the second class probability vector described in the first class probability vector sum, with The third class probability vector for obtaining the video, by the corresponding classification of the maximum element of the third class probability vector intermediate value It is determined as the classification of the video.
18. the described in any item visual classification devices of 0-17 according to claim 1, wherein
The convolutional neural networks are ResNet-101, and the Recognition with Recurrent Neural Network is shot and long term memory network LSTM.
19. a kind of visual classification device, comprising:
Memory;And it is coupled to the processor of the memory, the processor is configured to based on the storage is stored in Instruction in device device executes video classification methods as claimed in any one of claims 1-9 wherein.
20. a kind of computer readable storage medium, is stored thereon with computer program, realized such as when which is executed by processor Video classification methods of any of claims 1-9.
CN201711084116.8A 2017-11-07 2017-11-07 Video classification methods, device and computer readable storage medium Pending CN109753984A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711084116.8A CN109753984A (en) 2017-11-07 2017-11-07 Video classification methods, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711084116.8A CN109753984A (en) 2017-11-07 2017-11-07 Video classification methods, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN109753984A true CN109753984A (en) 2019-05-14

Family

ID=66401142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711084116.8A Pending CN109753984A (en) 2017-11-07 2017-11-07 Video classification methods, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109753984A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826475A (en) * 2019-11-01 2020-02-21 北京齐尔布莱特科技有限公司 Method and device for detecting near-duplicate video and computing equipment
CN111695627A (en) * 2020-06-11 2020-09-22 腾讯科技(深圳)有限公司 Road condition detection method and device, electronic equipment and readable storage medium
CN111797912A (en) * 2020-06-23 2020-10-20 山东云缦智能科技有限公司 System and method for identifying film generation type and construction method of identification model
CN112579824A (en) * 2020-12-16 2021-03-30 北京中科闻歌科技股份有限公司 Video data classification method and device, electronic equipment and storage medium
CN113837576A (en) * 2021-09-14 2021-12-24 上海任意门科技有限公司 Method, computing device, and computer-readable storage medium for content recommendation
CN113837457A (en) * 2021-09-14 2021-12-24 上海任意门科技有限公司 Method, computing device and storage medium for predicting interactive behavior state of posts

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407889A (en) * 2016-08-26 2017-02-15 上海交通大学 Video human body interaction motion identification method based on optical flow graph depth learning model
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
CN106407889A (en) * 2016-08-26 2017-02-15 上海交通大学 Video human body interaction motion identification method based on optical flow graph depth learning model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHRISTOPH FEICHTENHOFER等: "Spatiotemporal Multiplier Networks for Video Action Recognition", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
JEFF DONAHUE等: "Long-Term Recurrent Convolutional Networks for Visual Recognition and Description", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
KAREN SIMONYAN 等: "Two-Stream Convolutional Networks for Action Recognition in Videos", 《ARXIV》 *
LIN SUN等: "Lattice Long Short-Term Memory for Human Action Recognition", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826475A (en) * 2019-11-01 2020-02-21 北京齐尔布莱特科技有限公司 Method and device for detecting near-duplicate video and computing equipment
CN110826475B (en) * 2019-11-01 2022-10-04 北京齐尔布莱特科技有限公司 Method and device for detecting near-duplicate video and computing equipment
CN111695627A (en) * 2020-06-11 2020-09-22 腾讯科技(深圳)有限公司 Road condition detection method and device, electronic equipment and readable storage medium
CN111797912A (en) * 2020-06-23 2020-10-20 山东云缦智能科技有限公司 System and method for identifying film generation type and construction method of identification model
CN111797912B (en) * 2020-06-23 2023-09-22 山东浪潮超高清视频产业有限公司 System and method for identifying film age type and construction method of identification model
CN112579824A (en) * 2020-12-16 2021-03-30 北京中科闻歌科技股份有限公司 Video data classification method and device, electronic equipment and storage medium
CN113837576A (en) * 2021-09-14 2021-12-24 上海任意门科技有限公司 Method, computing device, and computer-readable storage medium for content recommendation
CN113837457A (en) * 2021-09-14 2021-12-24 上海任意门科技有限公司 Method, computing device and storage medium for predicting interactive behavior state of posts

Similar Documents

Publication Publication Date Title
CN109753984A (en) Video classification methods, device and computer readable storage medium
Sindagi et al. Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting
Fu et al. Fast crowd density estimation with convolutional neural networks
CN110516536B (en) Weak supervision video behavior detection method based on time sequence class activation graph complementation
CN109993269B (en) Single image crowd counting method based on attention mechanism
CN109410239A (en) A kind of text image super resolution ratio reconstruction method generating confrontation network based on condition
CN108961675A (en) Fall detection method based on convolutional neural networks
CN112668522B (en) Human body key point and human body mask joint detection network and method
Li et al. Sign language recognition based on computer vision
CN110738160A (en) human face quality evaluation method combining with human face detection
Wei et al. P3D-CTN: Pseudo-3D convolutional tube network for spatio-temporal action detection in videos
CN107818307A (en) A kind of multi-tag Video Events detection method based on LSTM networks
CN109753985A (en) Video classification methods and device
Wang et al. Basketball shooting angle calculation and analysis by deeply-learned vision model
Dong et al. Holistic and Deep Feature Pyramids for Saliency Detection.
Lin et al. Joint learning of local and global context for temporal action proposal generation
Zhao et al. Multifeature fusion action recognition based on key frames
Kondo et al. Siamese-structure deep neural network recognizing changes in facial expression according to the degree of smiling
Liao et al. Residual attention unit for action recognition
Wang et al. SLMS-SSD: Improving the balance of semantic and spatial information in object detection
Qiao et al. Two-Stream Convolutional Neural Network for Video Action Recognition.
Zhao et al. Object detector based on enhanced multi-scale feature fusion pyramid network
Li et al. Trajectory-pooled spatial-temporal architecture of deep convolutional neural networks for video event detection
CN115311518A (en) Method, device, medium and electronic equipment for acquiring visual attribute information
Luo et al. An modified video stream classification method which fuses three-dimensional convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210305

Address after: 101, 1st floor, building 2, yard 20, Suzhou street, Haidian District, Beijing 100080

Applicant after: Beijing Jingbangda Trading Co.,Ltd.

Address before: 100195 Beijing Haidian Xingshikou Road 65 West Cedar Creative Garden 4 District 11 Building East 1-4 Floor West 1-4 Floor

Applicant before: BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY Co.,Ltd.

Applicant before: BEIJING JINGDONG CENTURY TRADING Co.,Ltd.

Effective date of registration: 20210305

Address after: Room a1905, 19 / F, building 2, No. 18, Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Beijing Jingdong Qianshi Technology Co.,Ltd.

Address before: 101, 1st floor, building 2, yard 20, Suzhou street, Haidian District, Beijing 100080

Applicant before: Beijing Jingbangda Trading Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20190514

RJ01 Rejection of invention patent application after publication