CN105955708A

CN105955708A - Sports video lens classification method based on deep convolutional neural networks

Info

Publication number: CN105955708A
Application number: CN201610302292.3A
Authority: CN
Inventors: 王进军; 张顺; 刘桢琦
Original assignee: Xi'an Brision Information Technology Co Ltd
Current assignee: Beijing Hippo energy Sports Technology Co., Ltd.
Priority date: 2016-05-09
Filing date: 2016-05-09
Publication date: 2016-09-21

Abstract

The invention discloses a sports video lens classification method based on deep convolutional neural networks. The method comprises the following steps: 1) performing shot segmentation on the existing football video, each shot is a continuous image sequence photographed by one camera, selecting 3-10 key frame images from each lens fragment, and sticking a lens classification label on each image so as to construct a training sample set; 2) constructing seven layers of deep convolutional neural networks, wherein the seven layers of deep convolutional neural networks comprises five convolutional layers and three full-connecting layers; 3) training the deep convolutional neural networks in the step 2) using the training sample in the step 1), wherein the training of the convolutional neural network utilizes softmax regression as the classification algorithm, using the error back propagation algorithm to adjust the network parameters of the CNN; 4) testing a testing sample set using a convolutional neural network model obtained through the training in the step 3), and outputting the lens classification result of the final image.

Description

A kind of method for classifying physical education video lens based on degree of depth convolutional neural networks

Technical field:

The invention belongs to Video processing and machine learning field, be specifically related to a kind of based on degree of depth convolutional Neural net The method for classifying physical education video lens of network.

Background technology:

Shot classification is a basic technology of Sports Video Analysis, for particular event detection in sports video, The retrieval of sports video and the extraction of high-level semantics all have great importance, such as special during football video is analyzed Detection (red and yellow card, shoot, interruption etc. of competing) and the detection of specific sportsman of determining event are required for using mirror The result of head classification.One quickly and accurately shot classification method for subsequent analysis performance raising will produce Help greatly.

In the rebroadcast video of sports tournament, generally camera lens can be divided three classes: long shot, middle scape mirror Head and close-up shot.Long shot shooting is major part place, and medium shot is to regional area in place Some sportsman and scene shoot, close-up shot is to athletic half body feature or action message.Its In medium shot and close-up shot in addition to place is shot, also include the shooting to outside audience.

The current method of above a few class camera lenses of distinguishing is mainly by calculating the area ratio in domain color region.This The color in place in camera lens is defined as domain color (if pitch is with green as domain color) by class method, then The area ratio occupied in camera lens further according to domain color is to judge the classification belonging to this camera lens, and thinks and have The camera lens of bigger domain color area ratio is long shot, and the camera lens of less domain color area ratio is Close-up shot.Used by the method, domain color area ratio feature is subject in medium shot and close-up shot Background color interference is relatively big, limits final shot classification precision.

Summary of the invention:

In order to overcome the deficiencies in the prior art, the present invention provides a kind of physical culture based on degree of depth convolutional neural networks The method of video lens classification.The present invention passes through degree of depth convolutional neural networks, every class camera lens in learning database Characteristics of image, test time directly choose the classification that convolutional neural networks softmax layer maximum regressand value is corresponding As the result of shot classification, the key frame for being given is made can automatically to carry out the classification of affiliated camera lens.This The bright precision that can improve shot classification, and there is preferable feasibility and robustness.

For reaching above-mentioned purpose, the present invention adopts the following technical scheme that and realizes:

A kind of method for classifying physical education video lens based on degree of depth convolutional neural networks, comprises the following steps:

1) existing football video carrying out shot segmentation, each camera lens is the one section of company shot by certain photographic head Continuous image sequence, selects the key frame images of 3～10 from each camera lens fragment, and to every image patch Upper shot cluster distinguishing label, constructs training sample set；

2) constructing seven layer depth convolutional neural networks, these seven layers of convolutional Neural networkings include: five convolutional layers, Three full articulamentums；

3) utilize step 1) in training sample to step 2) described in degree of depth convolutional neural networks model enter Row training, the training of convolutional neural networks utilizes softmax to return as sorting algorithm, to biography after use error Broadcast algorithm and adjust the network parameter of CNN；

4) step 3 is utilized) train the convolutional neural networks model obtained that test sample collection is tested, and The shot classification result of output final image.

The present invention is further improved by, described step 1) in, shot cluster distinguishing label is divided into 6 kinds: remote Scape camera lens, medium shot in field, outside the venue medium shot, close-up shot in field, outside the venue close-up shot, and not Belong to other camera lenses of these 5 kinds of camera lenses.

The present invention is further improved by, described step 2) in, each input picture is scaled 256 × 256 sizes, and intercept the square block of 224 × 224 sizes the most at random, with tri-color dimension of RGB Input；First, second and the 5th convolutional layer excitation output after, through maximum pond, down-sampling operates, defeated Go out to next convolutional layer；Degree of depth convolutional neural networks finally exports the neuron response that dimension is 6, corresponding 6 kinds of camera lens kinds in image to be classified.

The present invention is further improved by, described step 3) in, during training, convolutional neural networks uses Different little randoms number initializes the parameter of neutral net.

Compared with prior art, the method have the advantages that

Method for classifying physical education video lens based on degree of depth convolutional neural networks of the present invention, design deep Degree convolutional neural networks, using key frame images as the input of network, implicitly learns the image in every class camera lens Feature, and then use this feature more efficiently to carry out shot classification.

Accompanying drawing illustrates:

Fig. 1 is the schematic flow sheet of the present invention.

Fig. 2 is the structural representation of convolutional neural networks in present example.

Detailed description of the invention:

Below in conjunction with the accompanying drawings the present invention is described in further detail:

Reference Fig. 1, the method for physical education video lens based on degree of depth convolutional neural networks of the present invention classification, Comprise the following steps:

1) existing football video carrying out shot segmentation, each camera lens is the one section of company shot by certain photographic head Continuous image sequence.From each camera lens fragment, select the key frame images of 5, and every image is sticked Label, constructs training sample set.Shot cluster distinguishing label is divided into 6 kinds: long shot, medium shot in field, Medium shot outside the venue, close-up shot in field, outside the venue close-up shot, and it is not belonging to other mirrors of these 5 kinds of camera lenses Head.

2) constructing seven layer depth convolutional neural networks (Convolutional Neural Network, CNN), these are seven years old Layer convolutional Neural networking includes: five convolutional layers, three full articulamentums.

Each input picture is scaled 256 × 256 sizes, and intercepts 224 × 224 sizes the most at random Square block, with tri-color dimension inputs of RGB.First, second and the 5th convolutional layer excitation output after, Through the down-sampling operation of maximum pond, next convolutional layer is given in output.Degree of depth convolutional neural networks finally exports Dimension is the neuron response of 6, corresponding to 6 kinds of camera lens kinds of image to be classified.As in figure 2 it is shown, it is defeated Enter image to include through the detailed process of each layer:

Ground floor convolutional layer is made up of the characteristic pattern that 96 sizes are 55 × 55.Operate through Max Pooling, The characteristic pattern of 96 27 × 27 sizes of output.

Second layer convolutional layer is made up of the characteristic pattern that 256 sizes are 27 × 27.Operate through Max Pooling, The characteristic pattern of 96 13 × 13 sizes of output.

Third layer convolutional layer is made up of the characteristic pattern that 384 sizes are 13 × 13.

4th layer of convolutional layer is made up of the characteristic pattern that 384 sizes are 13 × 13.

Layer 5 convolutional layer is made up of the characteristic pattern that 256 sizes are 13 × 13.Operate through Max Pooling, The characteristic pattern of 256 6 × 6 sizes of output.

Layer 6 and layer 7 are full articulamentum, the characteristic vector of output 4096 dimension.

8th layer is full articulamentum, exports the characteristic vector of one 6 dimension, softmax layer classify and export point Class result.

The convolutional layer of convolutional neural networks can be expressed as follows: the jth characteristic pattern matrix of l layerMay be by The weighting of several characteristic pattern convolution of preceding layer obtains,

x_{j}^{l} = f (\underset{i &Element; N_{j}}{Σ} x_{i}^{l - 1} * k_{i j}^{l} + b_{j}^{l}) - - - (1)

Wherein, f is neuron activation functions；N_jRepresenting the combination of input feature vector figure, * represents convolution algorithm, For convolution kernel matrix,For bias matrix.

Sampling process can be expressed as:

x_{j}^{l} = f (d o w n (x_{j}^{l - 1})) - - - (2)

Wherein, down () represents sampling function, and conventional has maximum sampling function (Max Pooling). Sampling process is similar with convolution process, uses a kind of sampling function without weight parameter, from input feature vector figure The upper left corner starts to slide by a fixed step size (or downward) to the right, samples the pixel of window respective block Rear output.

Each neuron of the full articulamentum of convolutional neural networks can be connected with each neuron of next layer.L Full articulamentum characteristic vector x of layer^lCan be expressed as follows:

x^l=f (w^lx^l-1+b^l),(3)

Wherein, w^lIt is weight matrix, b^lIt it is bias vector.

3) utilize step 1) in training sample to step 2) described in degree of depth convolutional neural networks model carry out Training.The training of convolutional neural networks utilizes softmax to return as sorting algorithm, uses error back-propagating Algorithm adjusts the network parameter of CNN.

Convolutional neural networks uses the little random number that some are different to initialize the parameter of neutral net.CNN model Training need continuous print iteration optimization, it can according to Iterative classification result go adjust next iteration ginseng Number.Picture is input to network, through propagated forward and two training stages of back-propagating, propagated forward mistake Journey is a sample input network, calculates corresponding actual output；Back-propagating process is that calculating is actual defeated Go out and the difference of preferable output, according to error rate, continue to optimize network parameter, carry out the training of model.

4) step 3 is utilized) train the convolutional neural networks model obtained that test sample collection is tested, and defeated Go out the shot classification result of final image.

Claims

1. a method for classifying physical education video lens based on degree of depth convolutional neural networks, it is characterised in that bag Include following steps:

A kind of physical education video lens based on degree of depth convolutional neural networks the most according to claim 1 is classified Method, it is characterised in that described step 1) in, shot cluster distinguishing label is divided into 6 kinds: long shot, field Interior medium shot, outside the venue medium shot, close-up shot in field, outside the venue close-up shot, and it is not belonging to these 5 kinds Other camera lenses of camera lens.

A kind of physical education video lens based on degree of depth convolutional neural networks the most according to claim 2 is classified Method, it is characterised in that described step 2) in, each input picture is scaled 256 × 256 sizes, And intercept the square block of 224 × 224 sizes the most at random, with tri-color dimension inputs of RGB；The first, Second and the 5th convolutional layer excitation output after, through the down-sampling operation of maximum pond, next volume is given in output Lamination；Degree of depth convolutional neural networks finally exports the neuron response that dimension is 6, corresponding to image to be classified 6 kinds of camera lens kinds.

A kind of physical education video lens based on degree of depth convolutional neural networks the most according to claim 1 is classified Method, it is characterised in that described step 3) in, during training convolutional neural networks use some different little with Machine number initializes the parameter of neutral net.