CN109902634A

CN109902634A - A kind of video classification methods neural network based and system

Info

Publication number: CN109902634A
Application number: CN201910158877.6A
Authority: CN
Inventors: 包怡欣
Original assignee: SHANGHAI QINIU INFORMATION TECHNOLOGIES Co Ltd
Current assignee: SHANGHAI QINIU INFORMATION TECHNOLOGIES Co Ltd
Priority date: 2019-03-04
Filing date: 2019-03-04
Publication date: 2019-06-18

Abstract

The embodiment of the invention provides a kind of video classification methods neural network based and systems, which comprises obtains target video to be processed；Target frame is extracted from the target video according to preset parameter；The target frame is inputted into target nerve network, the target nerve network includes at least one layer of convolutional layer, and at least one layer convolutional layer is used to the corresponding feature vector of the target frame being converted to high-order feature vector；Obtain the output result of the target nerve network；The type of the target video is determined according to the output result.The embodiment of the present invention carries out the fitting of data by higher order term, can be conducive to the transmitting of parameter, extract the feature of more picture frames, convenient for the identification of picture frame, to improve the performance of visual classification.

Description

A kind of video classification methods neural network based and system

Technical field

The present invention relates to field of computer technology more particularly to a kind of video classification methods neural network based and it is System.

Background technique

With the continuous development of technology, video content is growing.During using video content, often It needs to classify to video.Traditional manually being classified is not only time-consuming, but also waste of manpower, inefficient.

The classification to video may be implemented in existing neural network, generallys use 2D convolutional neural networks and temporal model knot Conjunction, double-current convolutional neural networks or 3D convolutional neural networks are easy to cause the information of time-space domain to lose in above-mentioned way, Influence visual classification performance.

Summary of the invention

The embodiment of the present invention provides a kind of video classification methods neural network based, can carry out data using higher order term Fitting, improve the performance of visual classification.

First aspect of the embodiment of the present invention provides a kind of video classification methods neural network based, comprising:

Obtain target video to be processed；

Target frame is extracted from the target video according to preset parameter；

The target frame is inputted into target nerve network, the target nerve network includes at least one layer of convolutional layer, described At least one layer of convolutional layer is used to the corresponding feature vector of the target frame being converted to high-order feature vector；

Obtain the output result of the target nerve network；

The type of the target video is determined according to the output result.

It is optionally, described that target frame is extracted from the target video according to preset parameter, comprising:

Preset parameter is parsed, obtains and extracts frequency and initial extraction frame；

Target frame is extracted from the target video according to the initial extraction frame and extraction frequency.

Optionally, before acquisition target video to be processed, further includes:

Create target nerve network；

The target nerve network is trained using training data；

Target nerve network after being trained.

Optionally, at least one layer of convolutional layer is used to the corresponding feature vector of the target frame being converted to high-order feature Vector, comprising:

The target frame is handled as first eigenvector；

Using the first eigenvector as input, it is input to the first convolutional layer, obtains second feature vector；

The second feature vector is sequentially input into the second convolutional layer and third convolutional layer, obtains third feature vector；

The second feature vector and third feature vector are subjected to multiplication, obtain fourth feature vector, wherein described Fourth feature vector is that the high-order of the first eigenvector indicates；

The fourth feature vector is inputted into Volume Four lamination, obtains fifth feature vector.

It is optionally, described that the target frame is inputted into target nerve network, further includes:

By the first eigenvector and the fifth feature vector carries out and operation, obtains sixth feature vector；

The sixth feature vector is input to the classifier of the target nerve network, obtains output result.

Optionally, the dimension of the first eigenvector is T*W*H*1024.

Optionally, the convolution kernel of first convolutional layer is 1*1*1, channel dimension 512.

Optionally, the convolution kernel of second convolutional layer is 1*3*3, and the convolution kernel of the third convolutional layer is 3*3*3, institute The dimension for stating third feature vector is T*H*W*27.

Optionally, the convolution kernel of the Volume Four lamination is 1*1*1, channel dimension 1024.

A kind of video classification system of second aspect of the embodiment of the present invention, the system comprises processor and memory,

The memory is for storing executable program；

The processor is for executing the executable program to realize video classification methods described above.

The implementation of the embodiments of the present invention has the following beneficial effects:

Video classification methods neural network based and system in the embodiment of the present invention, by extracting the mesh in video Frame is marked, using target frame as input, is input to target nerve network, the target nerve network includes convolutional layer, be can be realized Target frame corresponds to the high-order conversion of vector, and the fitting of data is carried out by higher order term, can be conducive to the transmitting of parameter, is extracted more The feature of more picture frames, convenient for the identification of picture frame, to improve the performance of visual classification.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 is a kind of method of video classification methods first embodiment neural network based provided in an embodiment of the present invention Flow chart.

Fig. 2 is a kind of method of video classification methods second embodiment neural network based provided in an embodiment of the present invention Flow chart.

Fig. 3 is the structural schematic diagram of target nerve network in the present embodiment.

Fig. 4 is method flow diagram the step of target frame is inputted target nerve network in the embodiment of the present invention.

Fig. 5 is a kind of structural schematic diagram of video classification system neural network based provided in an embodiment of the present invention.

Specific embodiment

The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the disclosure, rather than the restriction to the disclosure.It also should be noted that in order to just Part relevant to the disclosure is illustrated only in description, attached drawing rather than entire infrastructure.

It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail The processing or method described as flow chart.It is therein to be permitted although each step to be described as to the processing of sequence in flow chart Multi-step can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of each step can be rearranged, when its operation The processing can be terminated when completion, it is also possible to have the other steps being not included in attached drawing.Processing can correspond to In method, function, regulation, subroutine, subprogram etc..

Fig. 1 is a kind of method of video classification methods first embodiment neural network based provided in an embodiment of the present invention Flow chart, in the present embodiment, which can be applied to the equipment such as mobile terminal, computer, server.The video Classification method includes the following steps S101-S105.

In step s101, target video to be processed is obtained.

Specifically, obtaining the target video classified first.Target video may come from local side, can also be with From network-side.Classify for example, can be for the video information on network, is also possible to have on local side Video information classify.

In step s 102, target frame is extracted from the target video according to preset parameter.

Specifically, being handled after obtaining target component according to preset parameters on target video, phase is extracted The target frame answered.It in the present embodiment, when carrying out the classification of target video, needs to identify video, i.e., to composition video Frame identified.By presetting corresponding extracting parameter, can be carried out using target video of the extracting parameter to acquisition Processing, to obtain corresponding frame, the quantity of the frame is multiple.It is understood that since video can be regarded as a system The sequence of the frame picture composition of column consecutive variations, the variation between consecutive frame is very small, therefore for subsequent identification, needs From being carried out from the visual angle of timing, thus be also required to when the extraction of video frame in view of be separated by between video frame when Between.

In the present embodiment, which be may further include:

Specifically, according to initial extraction frame and extracting frequency, target video is handled, obtains corresponding multiple mesh Mark frame.

In step s 103, the target frame is inputted into target nerve network, the target nerve network includes at least one Layer convolutional layer, at least one layer convolutional layer are used to the corresponding feature vector of the target frame being converted to high-order feature vector；

Specifically, the target frame of extraction is inputted target nerve network, the target nerve network is for carrying out image knowledge Not with classification, in the present embodiment, which includes at least one layer of convolutional layer, and at least one layer convolutional layer is used for will The corresponding feature vector of the target frame is converted to high-order feature vector, consequently facilitating the feature that subsequent extracted is important, inhibits not Important feature reaches preferable training effect.

In step S104, the output result of the target nerve network is obtained.

Specifically, obtaining corresponding output result.

In step s105, the type of the target video is determined according to the output result.

Specifically, determining the type of target video according to the output result of target nerve network.In the present embodiment, specifically Classification depends on the training process of target nerve network.Before being identified, need using certain training set to target mind It is trained through network, to improve recognition accuracy.

Video classification methods neural network based in the embodiment of the present invention will by extracting the target frame in video Target frame is input to target nerve network, the target nerve network includes convolutional layer, can be realized target frame pair as input It answers the high-order of vector to convert, the fitting of data is carried out by higher order term, the transmitting of parameter can be conducive to, extract more pictures The feature of frame, convenient for the identification of picture frame, to improve the performance of visual classification.

Fig. 2 is a kind of method of video classification methods second embodiment neural network based provided in an embodiment of the present invention Flow chart.In the present embodiment, which includes the following steps S201-S208.

In step s 201, target nerve network is created.

In step S202, the target nerve network is trained using training data, adjusts the target nerve The parameter of network.

Target nerve network in step S203, after being trained.

In step S204, target video to be processed is obtained.

In step S205, target frame is extracted from the target video according to preset parameter.

In step S206, the target frame is inputted into target nerve network, the target nerve network includes at least one Layer convolutional layer, at least one layer convolutional layer are used to the corresponding feature vector of the target frame being converted to high-order feature vector.

In step S207, the output result of the target nerve network is obtained.

In step S208, the type of the target video is determined according to the output result.

Fig. 3 is the structural schematic diagram of target nerve network in the present embodiment, as shown, in the present embodiment, the target Neural network includes 4 layers of convolutional layer.Target nerve network described in Fig. 1 or Fig. 2 is described in detail below with reference to Fig. 3 and Fig. 4.

Fig. 4 is method flow diagram the step of target frame is inputted convolutional layer in the embodiment of the present invention.The method includes Following steps S401-405.In the present embodiment, the convolutional layer is illustrated for 4 layers, it will be appreciated by those skilled in the art that It is that the parameter type of the quantity of convolutional layer and each layer can also be that other forms, the embodiment of the present invention are not limited thereto.

In step S401, the target frame is handled as first eigenvector.

Specifically, as shown in figure 3, handle after getting target frame target frame, generate fisrt feature to Amount.In the present embodiment, it can be during processing target frame and carry out RGB conversion, be also possible to YUV conversion, can also be it His mode, the present embodiment are not limited thereto.In the present embodiment, the dimension of the first eigenvector is T*W*H*1024.

In step S402, using the first eigenvector as input, it is input to the first convolutional layer, obtains second feature Vector.

Specifically, first eigenvector is input to the first convolutional layer as input.In the present embodiment, the first convolutional layer Convolution kernel is 1*1*1, and the channel dimension of first convolutional layer is 512.Since the dimension of first eigenvector is 1024, After the first convolutional layer, the dimension of second feature vector is also reduced to 512, so that the dimensionality reduction to feature vector is realized, Convenient for subsequent processing.

In step S403, the second feature vector is sequentially input into the second convolutional layer and third convolutional layer, is obtained Third feature vector.

Specifically, inputting the second convolutional layer and third convolutional layer after obtaining second feature vector with this.This implementation In example, the convolution kernel of second convolutional layer is 1*3*3, and the convolution kernel of the third convolutional layer is 3*3*3.Passing through two layers volume After lamination processing, the third feature vector is obtained, the dimension of the third feature vector is T*H*W*27.In other implementations In mode, by choosing the convolution kernel of different the second convolutional layer and third convolutional layer, also available different third feature The dimension of vector, for example, the dimension of third feature vector is also possible to T*H*W*125.

In the present embodiment, third feature vector it is represented to the feature in first eigenvector in 3*3*3 size field into The recalibration of row importance.

In step s 404, the second feature vector and third feature vector are subjected to multiplication, obtain fourth feature Vector.

Specifically, since third feature vector can regard the weight vectors for first eigenvector as, by second It is the equal of the high-order expression for first eigenvector when feature vector and third feature multiplication of vectors, i.e., the described 4th is special Sign vector can be considered that the high-order of first eigenvector indicates.

In step S405, the fourth feature vector is inputted into Volume Four lamination, obtains fifth feature vector.

Specifically, the convolution kernel of the Volume Four lamination is 1*1*1, channel dimension 1024.The reason of using this step It is, in the first convolutional layer, dimension-reduction treatment has been carried out to first eigenvector, therefore, needs to carry out a liter dimension processing at this time, with Guarantee that the dimension of feature vector is constant.

Optionally, in other embodiments, above-mentioned steps further include the study to residual error.Further, further include by The first eigenvector and fifth feature vector progress and operation, obtain sixth feature vector.By by fisrt feature Vector is added in the fifth feature vector of high-order in the form summed it up, can be conducive to the transmitting of parameter, is convenient for nerve The training of network.

Optionally, further includes: after obtaining sixth feature vector, the sixth feature vector will be input to described The classifier of target nerve network is exported as a result, being classified according to the output result to video.

It should be noted that in the embodiment of the present invention, the dimension of weight vectors i.e. third feature vector be it is variable, By changing the parameter of the second convolutional layer and third convolutional layer, the dimension of weight vectors can be changed.The present embodiment By the way that weight vectors are arranged, the high-order expression to feature vector can be realized, so as to the non-linear of better fitting data Feature, meanwhile, the target nerve network in the embodiment of the present invention can be in self study video frame due to being indicated using high-order Attention feature is extracted for useful feature of classifying, while inhibiting some not too important features, to reach better instruction Practice effect.

Fig. 5 show a kind of structural representation of video classification system neural network based provided in an embodiment of the present invention Figure.As shown in figure 5, the system comprises processor 501, (quantity of the processor 501 can be one or more, and Fig. 5 is with one For a processor) and memory 502.In an embodiment of the present invention, processor 501, memory 502 can by bus or Other way connection, wherein in Fig. 5 for being connected by bus.It is understood that system in the present embodiment can also be with Applied in embodiment shown in fig. 1 or fig. 2.

Wherein, executable program is stored in memory 502, processor 501 executes the executable program to realize such as Lower step:

Obtain target video to be processed；

Obtain the output result of the target nerve network；

The type of the target video is determined according to the output result.

Optionally, the processor 501 extracts target frame from the target video according to preset parameter, comprising:

Optionally, before the processor 501 obtains target video to be processed, further includes:

Create target nerve network；

The target nerve network is trained using training data, adjusts the parameter of the target nerve network；

Target nerve network after being trained.

Optionally, the processor 501 is also used to:

The target frame is handled as first eigenvector；

Optionally, the processor 501 is also used to:

Optionally, the dimension of the first eigenvector is T*W*H*1024.

Video classification system neural network based in the embodiment of the present invention will by extracting the target frame in video Target frame is input to target nerve network, the target nerve network includes convolutional layer, can be realized target frame pair as input It answers the high-order of vector to convert, the fitting of data is carried out by higher order term, the transmitting of parameter can be conducive to, extract more pictures The feature of frame, convenient for the identification of picture frame, to improve the performance of visual classification.

There is provided in above-described embodiment each module between any two can be achieved communication connection, and each module can in platform Heart control device communication connection, can be performed in the disclosure by the quick recording device of fingerprint recognition for providing in above-described embodiment are appointed The method quickly recorded provided in meaning embodiment by fingerprint recognition has and executes the corresponding functional module of this method and have Beneficial effect, the technical detail not being described in detail in the above-described embodiments, reference can be made to fast provided in disclosure any embodiment The method of speed recording.

It will be appreciated that the disclosure also extends to the computer program for being suitable for that the disclosure tries out, especially Computer program on carrier or in carrier.Program can be with source code, object code, code intermediate source and such as part volume The form of the object code for the form translated, or it is suitble to the shape used in realization according to the disclosed method with any other Formula.Also it will be noted that, such program may have many different frame designs.For example, realizing the side according to the disclosure Functional program code of method or system may be subdivided into one or more subroutine.

For that will be apparent for technical personnel in the functional many different modes of these subroutine intermediate distributions. Subroutine can be collectively stored in an executable file, to form self-contained program.Such executable file can To include computer executable instructions, such as processor instruction and/or interpreter instruction (for example, Java interpreter instruction).It can Alternatively, one or more or all subroutines of subroutine may be stored at least one external library file, and And it statically or dynamically (such as at runtime between) is linked with main program.Main program contains at least one of subroutine At least one calling.Subroutine also may include to mutual function call.It is related to the embodiment packet of computer program product Include the computer executable instructions for corresponding at least one of illustrated method each step of the processing step of method.These refer to Subroutine can be subdivided into and/or be stored in one or more possible static or dynamic link file by enabling.

Another embodiment for being related to computer program product includes corresponding in illustrated system and/or product at least The computer executable instructions of each device in one device.These instructions can be subdivided into subroutine and/or be stored In one or more possible static or dynamic link file.

The carrier of computer program can be any entity or device that can deliver program.For example, carrier can wrap Containing storage medium, such as (ROM such as CDROM or semiconductor ROM) either magnetic recording media (such as floppy disk or hard disk).Into One step, carrier can be the carrier that can be transmitted, such as electricity perhaps optical signalling its can via cable or optical cable, or Person is transmitted by radio or other means.When program is embodied as such signal, carrier can be by such cable Or device composition.Alternatively, carrier can be the integrated circuit for being wherein embedded with program, and the integrated circuit is suitable for holding Row correlation technique, or used in execution for correlation technique.

Should be noted that embodiment mentioned above is to illustrate the disclosure, rather than limit the disclosure, and originally The technical staff in field will design many alternate embodiments, without departing from scope of the appended claims.It is weighing During benefit requires, the reference symbol of any placement between round parentheses is not to be read as being limitations on claims.Verb " packet Include " and its paradigmatic depositing using the element being not excluded for other than those of recording in the claims or step ?.The article " one " before element or "one" be not excluded for the presence of a plurality of such elements.The disclosure can pass through Hardware including several visibly different components, and realized by properly programmed computer.Enumerating several devices In device claim, several in these devices can be embodied by the same item of hardware.In mutually different appurtenance Benefit states that the simple fact of certain measures does not indicate that the combination of these measures cannot be used to benefit in requiring.

If desired, different function discussed herein can be executed with different order and/or be executed simultaneously with one another. In addition, if one or more functions described above can be optional or can be combined if expectation.

If desired, each step is not limited to the sequence that executes in each embodiment, different step as discussed above It can be executed with different order and/or be executed simultaneously with one another.In addition, in other embodiments, described above one or more A step can be optional or can be combined.

Although various aspects of the disclosure provides in the independent claim, the other aspects of the disclosure include coming from The combination of the dependent claims of the feature of described embodiment and/or the feature with independent claims, and not only It is the combination clearly provided in claim.

It is to be noted here that although these descriptions are not the foregoing describe the example embodiment of the disclosure It should be understood in a limiting sense.It is wanted on the contrary, several change and modification can be carried out without departing from such as appended right The scope of the present disclosure defined in asking.

Will be appreciated by those skilled in the art that each module in the device of the embodiment of the present disclosure can use general meter Device is calculated to realize, each module can concentrate in the group of networks of single computing device or computing device composition, and the disclosure is real The method that the device in example corresponds in previous embodiment is applied, can be realized, can also be led to by executable program code The mode of integrated circuit combination is crossed to realize, therefore the disclosure is not limited to specific hardware or software and its combination.

Will be appreciated by those skilled in the art that each module in the device of the embodiment of the present disclosure can use general shifting Dynamic terminal realizes that each module can concentrate in the device combination of single mobile terminal or mobile terminal composition, the disclosure Device in embodiment corresponds to the method in previous embodiment, can be realized by editing executable program code, It can be realized by way of integrated circuit combination, therefore the disclosure is not limited to specific hardware or software and its knot It closes.

Claims

1. a kind of video classification methods neural network based characterized by comprising

Obtain target video to be processed；

The target frame is inputted into target nerve network, the target nerve network includes at least one layer of convolutional layer, it is described at least One layer of convolutional layer is used to the corresponding feature vector of the target frame being converted to high-order feature vector；

Obtain the output result of the target nerve network；

The type of the target video is determined according to the output result.

2. the method as described in claim 1, which is characterized in that described to be extracted from the target video according to preset parameter Target frame, comprising:

3. the method as described in claim 1, which is characterized in that before acquisition target video to be processed, further includes:

Create target nerve network；

Target nerve network after being trained.

4. the method as described in claim 1, which is characterized in that at least one layer convolutional layer is used for the target frame is corresponding Feature vector be converted to high-order feature vector, comprising:

The target frame is handled as first eigenvector；

The second feature vector and third feature vector are subjected to multiplication, obtain fourth feature vector, wherein the described 4th Feature vector is that the high-order of the first eigenvector indicates；

5. method as claimed in claim 4, which is characterized in that further include:

6. method as claimed in claim 4, which is characterized in that the dimension of the first eigenvector is T*W*H*1024.

7. method as claimed in claim 6, which is characterized in that the convolution kernel of first convolutional layer is 1*1*1, channel dimension It is 512.

8. the method for claim 7, which is characterized in that the convolution kernel of second convolutional layer is 1*3*3, the third The convolution kernel of convolutional layer is 3*3*3, and the dimension of the third feature vector is T*H*W*27.

9. method according to claim 8, which is characterized in that the convolution kernel of the Volume Four lamination is 1*1*1, channel dimension It is 1024.

10. a kind of video classification system, which is characterized in that the system comprises processor and memory,

The memory is for storing executable program；

The processor is for executing the executable program to realize video classification methods described in claim 1-9.