CN108334910A

CN108334910A - A kind of event detection model training method and event detecting method

Info

Publication number: CN108334910A
Application number: CN201810297702.9A
Authority: CN
Inventors: 孙源良; 夏虎; 李长升; 樊雨茂
Original assignee: Guoxin Youe Data Co Ltd
Current assignee: Guoxin Youe Data Co Ltd
Priority date: 2018-03-30
Filing date: 2018-03-30
Publication date: 2018-07-27
Anticipated expiration: 2038-03-30
Also published as: CN108334910B

Abstract

A kind of event detection model training method of the application offer and event detecting method；The event detection model training method includes：The training image frame in multiple training videos with label is obtained, and training image frame is divided into multiple batches；The use of target nerve network is the training image frame extraction feature vector in all batches；Network is handled using attention mechanism, and at least two-wheeled weight assignment is carried out to the feature vector of training image frame in each batch；The feature vector of training image frame in each batch for carrying out weight assignment is input to object classifiers, obtains the classification results of training video；According to the comparison result between the classification results of training video and the label of training video, target nerve network, attention mechanism processing network and object classifiers are trained.The embodiment of the present application can reduce required required calculation amount in training process under the premise of not influencing model accuracy, reduce computing resource and the consuming of training time.

Description

A kind of event detection model training method and event detecting method

Technical field

This application involves depth learning technology field, in particular to a kind of event detection model training method and Event detecting method.

Background technology

Fast development with neural network in fields such as image, video, voice, texts has pushed a series of intelligence productions The landing of product, user are also higher and higher to the required precision of the various models based on neural network.It is built based on neural network When event detection model, in order to allow neural network fully to learn the feature of image in video, to promote event detection model Classification, need a large amount of training video being input in neural network, neural network be trained.

But would generally be including very more images in training video, data volume is very large.Use these instructions When practicing video to neural metwork training, although the precision of the obtained model of training can be improved, also just because of Data volume is excessive, and required calculation amount is huge during leading to model training, expends excessive computing resource and training Time.

Apply for content

In view of this, the embodiment of the present application is designed to provide a kind of event detection model training method and event inspection Survey method can reduce required required calculation amount in training process under the premise of not influencing model accuracy, reduce and calculate The consuming of resource and training time.

In a first aspect, the embodiment of the present application provides a kind of event detection model training method, including：

The training image frame in multiple training videos with label is obtained, and the training image frame is divided into multiple batches It is secondary；Each batch includes preset quantity training image frame；

The use of target nerve network is the training image frame extraction feature vector in all batches；

Network is handled using attention mechanism, and at least two-wheeled power is carried out to the feature vector of training image frame in each batch Reassignment；

The feature vector of training image frame in each batch for carrying out weight assignment is input to object classifiers, obtains institute State the classification results of training video；

According to the comparison result between the label of the classification results of the training video and the training video, to described Target nerve network, attention mechanism processing network and the object classifiers are trained.

With reference to first aspect, the embodiment of the present application provides the first possible embodiment of first aspect, wherein：Institute The training image frame obtained in multiple training videos with label is stated, is specifically included：

Obtain multiple training videos with label；

According to preset sample frequency, the training video is sampled；

Using the image sampled to each training video as the training image frame in the training video.

With reference to first aspect, the embodiment of the present application provides second of possible embodiment of first aspect, wherein：Make Weight assignment is carried out to the feature vector of training image frame in each batch with attention mechanism processing network, is specifically included：

Using feature vector as granularity, using attention mechanism handle network to the feature of training image frame in each batch to Amount carries out weight assignment respectively, and, using batch as granularity, each batch is carried out respectively using attention mechanism processing network Weight assignment.

With reference to first aspect, the embodiment of the present application provides the third possible embodiment of first aspect, wherein：Institute It states using feature vector as granularity, handling network using attention mechanism distinguishes the feature vector of training image frame in each batch Weight assignment is carried out, the weight assignment result a (i) of obtained i-th of batch meets formula (1)：

(1) a (i)=tanh (W₁F₁+W₂F₂+…+W_nF_n+c)；

Wherein, n indicates the quantity of training image frame in i-th of batch；W₁To W_n1st is indicated in each batch respectively To the corresponding weight of n-th training image frame；F₁To F_nIndicate the 1st to n-th training image frame point in each batch Not corresponding feature vector；C indicates, using batch as granularity, to weigh each batch respectively using attention mechanism processing network Bigoted item when reassignment；Tanh indicates activation primitive；

It is described using batch as granularity, using attention mechanism processing network weight assignment is carried out respectively to each batch, obtain To the weight assignment result b (j) of j-th of batch meet formula (2)：

(2) b (j)=M₁a(1)+M₂a(2)+…+M_ma(m)+d；

M₁To M_mIt indicates from the corresponding weight of the 1st to m-th batch；D indicates, using batch as granularity, to use attention Mechanism processing network carries out each batch bigoted item when weight assignment respectively；

It is described using batch as granularity, using attention mechanism processing network each batch is carried out respectively weight assignment it Afterwards, further include：The weight assignment result of each batch is normalized.

With reference to first aspect, the embodiment of the present application provides the 4th kind of possible embodiment of first aspect, wherein：

The feature vector by training image frame in each batch for carrying out weight assignment is input to grader, obtains institute The classification results for stating training video, specifically include：

The corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively, obtains each batch Corresponding classification results；

According to being corresponding with classification results of the most classification results of batch size as the training video.

With reference to first aspect, the embodiment of the present application provides the 5th kind of possible embodiment of first aspect, wherein：Institute It states and the corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively, it is corresponding to obtain each batch Classification results specifically include：

The corresponding feature vector for carrying out weight assignment of each batch is inputted into the object classifiers respectively successively, is obtained The classification results of the training image frame of each feature vector characterization for having carried out weight assignment；

The most classification results of training image number of frames will be corresponding with as the classification results of the batch.

With reference to first aspect, the embodiment of the present application provides the 6th kind of possible embodiment of first aspect, wherein：

Further include：

The feature vector of the training image frame in each batch is spliced respectively, forms splicing feature vector；

It is described to handle feature vector progress at least two of the network to training image frame in each batch using attention mechanism Weight assignment is taken turns, is specifically included：

At least two-wheeled weight assignment is carried out using attention mechanism processing network splicing vector corresponding to each batch；

The corresponding splicing feature vector for having carried out weight assignment of each batch is input to object classifiers, obtains the instruction Practice the classification results of video.

With reference to first aspect, the embodiment of the present application provides the 7th kind of possible embodiment of first aspect, wherein：

It is described that the corresponding splicing feature vector for having carried out weight assignment of each batch is input to object classifiers, obtain institute The classification results for stating training video, specifically include：

The corresponding splicing feature vector for having carried out weight assignment of each batch is inputted into object classifiers respectively, is obtained each The corresponding classification results of batch；

With reference to first aspect, the embodiment of the present application provides the 8th kind of possible embodiment of first aspect, wherein：Institute It states according to the comparison result between the classification results of the training video and the label of the training video, to target god It is trained, specifically includes through network and the object classifiers：

Following comparison operations are executed, until the classification results of the training video and the label one of the training video It causes；

The comparison operates：

The label of the classification results of the training video and the training video is compared；

If the label of the classification results of the training video and the training video is inconsistent, to the target nerve The parameter of network, attention mechanism processing network and the object classifiers is adjusted；

Based on the parameter after adjustment, extracted newly for the training image frame in all batches using target nerve network Feature vector, and the new feature vector of training image frame in each batch is re-started using attention mechanism processing network Also lack two-wheeled weight assignment；

The new feature vector of training image frame in each batch for having carried out again weight assignment is input to grader, is obtained Obtain the new classification results of the training video；

And the comparison operation is executed again.

Second aspect, the embodiment of the present application also provide a kind of event detecting method, which is characterized in that including：

Obtain video to be detected；

By the video input to be detected to the event detection model training method by above-mentioned first aspect any one In obtained event detection model, the classification results of the video to be detected are obtained；

Wherein, the event detection model includes：The target nerve network, the attention mechanism processing network and The object classifiers.

The embodiment of the present application when being trained to event detection model using the training image frame in training video, Training image frame can be divided into multiple batches first, then carried for the training image frame in all batches using target nerve network Take feature vector.Then reuse attention mechanism processing network to the feature vector of training image frame in each batch carry out to Few two-wheeled weight assignment, to increase the power for the corresponding feature vector of training image frame for belonging to main matter in training video Weight reduces the weight for the corresponding feature vector of training image frame that main matter is not belonging in training video, is weighed based on have passed through During the feature vector of reassignment is to event detection model training, event detection model can be good at study to belonging to main The feature in the training image frame of event is wanted, ensures the precision of finally obtained event detection model；Simultaneously as training regards The weight of the corresponding feature vector of training image frame of main matter is not belonging in frequency to be reduced, that is, being not belonging to main matter The value of element in the corresponding feature vector of training image frame can be reduced accordingly, and Partial Elements even can directly be zeroed, Jin Erji When being not belonging to the corresponding feature vector of training image frame of main matter to event detection model training, reduce a large amount of Calculation amount, reduce required calculation amount during event detection model training, reduce computing resource and training event It expends.

To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate Appended attached drawing, is described in detail below.

Description of the drawings

It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 shows the flow chart for the event detection model training method that the embodiment of the present application one is provided；

Fig. 2 shows the use attention mechanism processing networks that the embodiment of the present application two provides to scheme to training in each batch As the feature vector of frame carries out the flow chart of the method for two-wheeled weight assignment；

Fig. 3 shows the spy by training image frame in each batch for carrying out weight assignment that application embodiment three also provides Sign vector is input to grader, obtains the flow chart of the specific method of the classification results of training video；

Fig. 4 shows a kind of flow chart for comparison operating method that application example IV is provided；

Fig. 5 shows a kind of flow chart for event detection model training method that application embodiment five is provided

Fig. 6 shows the structural schematic diagram for the event detection model training apparatus that the embodiment of the present application six is provided；

Fig. 7 shows the flow chart for the event detecting method that application embodiment seven is provided；

A kind of structural schematic diagram for computer equipment that Fig. 8 the embodiment of the present application nine provides.

Specific implementation mode

To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application Middle attached drawing, technical solutions in the embodiments of the present application are clearly and completely described, it is clear that described embodiment is only It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real Applying the component of example can be arranged and designed with a variety of different configurations.Therefore, below to the application's for providing in the accompanying drawings The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application Apply example.Based on embodiments herein, institute that those skilled in the art are obtained without making creative work There is other embodiment, shall fall in the protection scope of this application.

It is directly to be input to training video at present when being trained to event detection model using training video Neural network and grader, are trained neural network and grader.It is practical to be trained to neural network and grader In the process, neural network and grader is needed to be carried out operation to each image in training video.But training video is general Multiple events are will include, and the image of some events is actually positive contribution no to the classification of video, can be influenced instead Therefore normal training to event detection model using neural network and divides grader to these to the no front of visual classification The image of contribution carries out feature learning, can many calculation amounts be spent in unnecessary place instead, lead to model training process The calculation amount of middle needs is huge, expends excessive computing resource and training time.Based on this, the application provides a kind of event inspection Model training method and event detecting method are surveyed, institute in training process can be reduced under the premise of not influencing model accuracy Required calculation amount is needed, computing resource and the consuming of training time are reduced.

For ease of understanding the present embodiment, a kind of event detection model disclosed in the embodiment of the present application is instructed first Practice method to describe in detail.The event detection that the event detection model training method provided using the embodiment of the present application is obtained Model can efficiently accomplish the classification to event occurred in non-editing video；It can also be effectively realized simultaneously to Internet video certainly Dynamicization is classified；It is supported in addition it is possible to provide rational label for video recommendation system, also practices video for convenient pair and effectively pushed away It recommends.

Shown in Figure 1, the event detection model training method that the embodiment of the present application one provides includes：

S101：The training image frame in multiple training videos with label is obtained, and training image frame is divided into multiple Batch；Each batch includes preset quantity training image frame.

When specific implementation, training video is typically one section of long video, generally comprises at least one thing Part；, generally can be using some event as main matter when training video includes multiple events, other events are as time important affair Part, and based on the main matter to training video into the mark of row label.

Such as in the video of the swimming contest at one section, other than swimming contest this event, it is also possible to spectators can be related to Seat event and sportsman are with clapping event, but swimming contest can occupy larger proportion in entire video, therefore the ratio that will swim Match is used as main matter, and the label of the video is swimming contest.

Event detection model is trained using entire training video, usually can all there is the data volume due to input It is larger and cause the reduction of model convergence rate, the problems such as training process needs time for expending long, and resource is more.Therefore, in order to add Fast model convergence, reduces model training and needs the time expended and resource in the process, need to obtain from entire training video Training image frame；Training image frame is the part of all images included by entire training video.Usually, may be used according to Preset sample frequency respectively samples multiple training videos, using the image that each training video is sampled as Training image frame in the training video, the training image frame for each training video being then based on is to event detection model It is trained.

Meanwhile it also especially being regarded in training just because of usually including at least one event in each training video When frequency includes multiple events, different events, which would generally mutually be interted, to be appeared in training video, different events it Between also have linking.Therefore in order to preferably be positioned to the main matter in training video, strengthen main matter and exist The weight occupied in all events, and the weight that secondary event occupies in all events is weakened, it can be by training video frame point At multiple batches, each batch includes preset quantity training image frame, in this way can scheme the training included by different events As the cutting of frame as possible is opened, different event is divided into different batches.

Herein, the quantity of the corresponding training image frame of each batch can be specifically chosen according to actual needs；For example, It, can be by the quantity setting of corresponding training image frame in each batch if the event switching in training video is very fast It is less；If the event switching in training video is slower, the quantity of corresponding training image frame in each batch can be arranged It is more.

In addition, it should be noted that when training video is cut into multiple batches, since acquired training regards The integral multiple of the quantity of training image frame in frequency included training image number of frames in most of not each batches, therefore Carried out in the last one obtained batch of cutting for training video, the quantity of included training image frame usually all without Method reaches preset quantity, therefore the batch that can cannot be satisfied demand to training image number of frames is filled, in training image Transparent frame, completely black frame or complete white frame, the training image frame for allowing the batch to include are filled after the image sequence that frame is constituted Quantity reaches preset quantity.

S102：The use of target nerve network is the training image frame extraction feature vector in all batches.

When specific implementation, convolutional neural networks model (Convolutional may be used in target nerve network Neural Network, CNN) feature extraction is carried out to the training image frame in each batch, it obtains and every training image frame Corresponding feature vector.

Herein, in order to accelerate the convergence during event detection model training, used target network model can be Training image frame in training video is inputted to target nerve network to be trained, trained target nerve network is treated and is instructed Obtained from white silk.

S103：Network is handled using attention mechanism, and at least two are carried out to the feature vector of training image frame in each batch Take turns weight assignment.

When specific implementation, using attention mechanism learning training picture frame part to be processed, each current shape State can all learn the image for obtaining the position to be paid close attention to and/or currently inputting according to preceding state, go to processing attention part Pixel, rather than whole pixels of image.For example, certain training video A includes diving and two events of auditorium, and it is it to dive In main matter, attention mechanism can focus on more focus in diving event, strengthen the pass to event of diving Note, and weaken the concern to auditorium event.

Concern of the reinforcing to main matter, while the process that secondary event is paid close attention in reduction, namely to training image frame Feature vector carry out weight assignment, increase the weight of the training image frame corresponding to main matter, and reduce secondary event The weight of corresponding training image frame.

Specifically, shown in Figure 2, the embodiment of the present application two provide it is a kind of using attention mechanism handle network to each The method that the feature vector of training image frame carries out two-wheeled weight assignment in batch, including：

S201：Using feature vector as granularity, network is handled to training image frame in each batch using attention mechanism Feature vector carries out weight assignment respectively；And

S202：Using batch as granularity, weight assignment is carried out respectively to each batch using attention mechanism processing network.

Herein, due to will include multiple events in each training video, and can be when being divided into multiple batches, quilt Be divided into multiple batches, but this division can not strict guarantee in each batch only there are one the corresponding instruction of event Practice picture frame.Therefore, the weight occupied in all events in reinforcing main matter, and secondary event is weakened in all events The weight occupied, first has to using feature vector as granularity, and network is handled to training image in each batch using attention mechanism The feature vector of frame carries out weight assignment respectively, namely in each batch, increases the corresponding training image frame of main matter Weight reduces the weight of the corresponding training image frame of secondary event.

Similarly, the generation position due to event in training video, duration have uncertainty, lead to part batch The quantity of the included corresponding training image frame of main matter is more in secondary, and included main matter is corresponding in the batch of part The quantity of training image frame is few, in order to further increase the weight of the corresponding training image frame of main matter, and reduces secondary The weight of the corresponding training image frame of event, for each batch executed attention mechanism processing and then be directed to institute There is batch to carry out attention mechanism processing, that is, increasing the power for the batch for including the corresponding training image frame of more main matter Weight, it includes less or even the batch including the corresponding training image frame of main matter weight to reduce, to further It is primarily focused in the corresponding training image of main matter, further weakens secondary event caused by event detection model It influences.Meanwhile it when the weight of the corresponding training image frame of secondary event in reducing training video, can cause to want many times There are a great number of elements numerical value can all be lowered in the feature vector of the corresponding training image frame of event, and some can be even zeroed, And then when the feature vector using the training image frame for reducing weight is to event detection model training, simplify calculating Complexity reduce computing resource and training time to also just reduce required required calculation amount in training process Consuming.

Specifically, using feature vector as granularity, network is handled to training image frame in each batch using attention mechanism Feature vector carry out weight assignment respectively, the weight assignment result a (i) of obtained i-th of batch meets formula (1)：

A (i)=tanh (W₁F₁+W₂F₂+…+W_nF_n+c) (1)

Using batch as granularity, weight assignment is carried out respectively to each batch using attention mechanism processing network, is obtained The weight assignment result b (j) of j-th of batch meets formula (2)：

B (j)=M₁a(1)+M₂a(2)+…+M_ma(m)+d (2)

Wherein, M₁To M_mIt indicates from the corresponding weight of the 1st to m-th batch；D indicates, using batch as granularity, to use note Meaning power mechanism processing network carries out each batch bigoted item when weight assignment respectively.

Herein, it should be noted that network can also be handled to each batch using meaning power mechanism according to the actual needs The feature vector of middle training image frame carries out the weight assignment of more wheels, further to increase the corresponding training image of main matter The weight of frame reduces the weight of the corresponding training image frame of secondary event, further reduces calculation amount.

In another embodiment, using batch as granularity, using attention mechanism handle network to each batch respectively into After row weight assignment, further include：The weight assignment result of each batch is normalized.More simplify feature vector, further Reduce calculation amount.

S104：The feature vector of training image frame in each batch for carrying out weight assignment is input to object classifiers, Obtain the classification results of training video.

When specific implementation, the feature vector of training image in each batch for carrying out weight assignment is inputted respectively To grader, grader can feature based vector, to each feature vector characterization training image frame classify.For The feature vector of weight is increased, grader can more learn to its feature, for reducing the feature vector of weight, point Class device can reduce the study to this Partial Feature vector, then according to the classification results of each feature vector, entirely be trained The classification results of video.

Specifically, shown in Figure 3, the embodiment of the present application three also provide it is a kind of will be in each batch that carry out weight assignment The feature vector of training image frame is input to grader, obtains the specific method of the classification results of training video, including：

S301：The corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively, is obtained every The corresponding classification results of a batch.

When specific implementation, the corresponding classification results of each batch can use all training image frames in the batch Classification results weigh；In the batch, there is which kind of more training image frames belong to, then the batch belongs to this The probability that the probability of class will belong to other classes compared with it is high.

Therefore following manner, which may be used, obtains the corresponding classification results of each batch：

The corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively successively, is obtained each The classification results of the training image frame of the feature vector characterization of weight assignment are carried out；It is most that training image number of frames will be corresponding with Classification results of the classification results as the batch.

For example, training video A includes main matter and secondary event；The training image frame of the training video is divided Batch, the batch that obtained number is 1 includes 64 training image frames.Two-wheeled power is being carried out to this 64 training images After reassignment, the corresponding feature vector of 64 training images for having carried out weight assignment is input to grader successively, obtains this In 64 training image frames, classification results, which are the training image frame of main matter, 50, and classification results are the instruction of secondary event Practicing picture frame has 14, and the quantity for belonging to the training image frame of main matter is more than the number for the training image frame for belonging to secondary event Amount, therefore can determine that the classification results for the batch that the number is 1 are main matter.

S302：According to being corresponding with classification results of the most classification results of batch size as training video.

In the method according to above-mentioned S301, the corresponding classification results of each batch included by each training video are obtained Afterwards, the classification results most by batch size is corresponding with, the classification results as training video.

For example, training video A includes main matter and secondary event；Training image frame in training video A is divided into When multiple batches, 20 batches are formed altogether；The corresponding feature vector for carrying out weight assignment of each batch is inputted into target respectively Grader obtains the corresponding classification results of each batch, wherein it is main matter to have the classification results of 16 batches, there is 4 batches Secondary result is secondary event, then the classification results of the training video are main matter.

S105：According to the comparison result between the classification results of training video and the label of training video, to target god It is trained through network, attention mechanism processing network and object classifiers.

Specifically, the embodiment of the present application four also provides a kind of state according to the classification results of training video and training video Comparison result between label, to the specific method that target nerve network and object classifiers are trained, including：

Following comparison operations are executed, until the classification results of training video and the label of training video are consistent；

Shown in Figure 4, comparing operation includes：

S401：Whether the label of the classification results and training video that compare training video is consistent；If so, jumping to S402；If it is not, then jumping to S403；

S402：It completes to train the epicycle of target nerve network, attention mechanism processing network and object classifiers；It should Flow terminates.

S403：The parameter of target nerve network, attention mechanism processing network and object classifiers is adjusted；

S404：Based on the parameter after adjustment, extracted newly for the training image frame in all batches using target nerve network Feature vector, and using attention mechanism processing network to the new feature vector of training image frame in each batch again into Row at least two-wheeled weight assignment；And by the new feature of training image frame in each batch for having carried out again weight assignment to Amount is input to grader, obtains the new classification results of training video；And S401 is executed again.

When specific implementation, the feature vector of training image frame carries out weight assignment in first time is to each batch Before, initial assignment is carried out to attention mechanism processing network in the way of weight random distribution.Initial assignment is carried out The weight of the attention mechanism processing network training image frame that may will belong to main matter reduce, and time important affair will be belonged to The weight of the training image frame of part improves, and influences finally to the accuracy of training video classification results, therefore, it is necessary to paying attention to Power mechanism processing network is trained so that attention mechanism processing network is increasingly intended to improve the corresponding instruction of main matter Practice the weight of picture frame, and the direction for reducing the corresponding training image frame weight of secondary event is developed.

Meanwhile if target nerve network cannot learn well to the feature in training image frame, it is final right also to influence The accuracy of training video classification results, thus target nerve network is trained so that target nerve network is increasingly It tends to preferably learn to develop to the direction of feature in training image frame.Likewise, be also required to object classifiers into Row training so that object classifiers are correctly oriented development to when classifying to feature vector towards classification.

Shown in Figure 5, the embodiment of the present application five also provides another event detection model training method, including：

S501：The training image frame in multiple training videos with label is obtained, and training image frame is divided into multiple Batch；Each batch includes preset quantity training image frame.

Herein, similar with S101, S101 descriptions are referred to, details are not described herein.

S502：The use of target nerve network is the training image frame extraction feature vector in all batches.

Herein, similar with S102, S102 descriptions are referred to, details are not described herein.

S503：The feature vector of the training image frame in each batch is spliced respectively, forms splicing feature vector.

When specific implementation, the feature vector of the training image frame in each batch is spliced, the spelling of formation It is actually to constitute more high-dimensional splicing feature vector using the feature vector of multiple training image frames to connect vector.

Specifically, since the size for the training image frame for belonging to same training video is consistent, obtained institute There is the dimension of the feature vector of training image frame to be the same.Will be respectively by the feature of the training image frame in each batch Vector is spliced, and when forming splicing feature vector, can is horizontally-spliced, be can also be longitudinal spliced.Such as it is inciting somebody to action The dimension that the feature vector of training image frame is trained the feature vector of picture frame is 1*512, then by 10 training images The feature vector of frame carries out longitudinal spliced result：The feature vector of 10 training image frames is carried out lateral spelling by 10*512 The result connect is：1*5120.

S504：At least two-wheeled weight is carried out using attention mechanism processing network splicing vector corresponding to each batch to assign Value.

Herein, splicing vector corresponding to each batch carries out at least method of two-wheeled weight assignment and above-mentioned S103 classes Seemingly, details are not described herein.

S505：The corresponding splicing feature vector for having carried out weight assignment of each batch is input to object classifiers, is obtained The classification results of training video.

Herein, the method that above-mentioned S104 may be used obtains the classification results of training video.In addition it is different from above-mentioned S104, Splicing feature vector due to having carried out weight assignment has actually been a big vector, and object classifiers can be straight Connect based on to having gone the splicing feature vector of weight assignment, to gone weight assignment splicing feature vector characterization batch into Row classification, obtains classification results corresponding with batch.Herein, belong to secondary event due to having been reduced in above-mentioned S504 The weight of training image frame, and the weight for the training image frame for belonging to main matter is increased, thus reduce and belong to secondary The training image frame of event can realize the precise classification to batch to entirely splicing the interference of feature vector.Obtaining batch Classification results after, according to being corresponding with classification results of the most classification results of batch size as training video.

S506：According to the comparison result between the classification results of training video and the label of training video, to target god It is trained through network, attention mechanism processing network and object classifiers.

Herein, similar with S105, S105 descriptions are referred to, details are not described herein.

By the embodiment of the present application five, the feature vector of the training image frame in each batch is spliced respectively, shape At splicing feature vector, operation later is all based on the operation for splicing feature vector, can be by splicing feature vector, more preferably Response training picture frame weight occupied in batch, classified to batch based on splicing feature vector, relative to base Classify to training image frame in feature vector, then the classification results again based on training image frame in batch obtain batch Classification, can reduce the number of sort operation, be further reduced calculation amount.

Based on same inventive concept, thing corresponding with event detection model training method is additionally provided in the embodiment of the present application Part detection model training device, the principle and the above-mentioned thing of the embodiment of the present application solved the problems, such as due to the device in the embodiment of the present application Part detection model training method is similar, therefore the implementation of device may refer to the implementation of method, and overlaps will not be repeated.

Shown in Figure 6, the embodiment of the present application six provides a kind of event detection model training apparatus and includes：

Acquisition module 61, for obtaining the training image frame in multiple training videos with label, and by training image Frame is divided into multiple batches；Each batch includes preset quantity training image frame；

Characteristic extracting module 62, for using target nerve network be all batches in training image frame extract feature to Amount；

Attention mechanism processing module 63, for handling network to training image frame in each batch using attention mechanism Feature vector carry out at least two-wheeled weight assignment；

Sort module 64, for the feature vector of training image frame in each batch for carrying out weight assignment to be input to mesh Grader is marked, the classification results of training video are obtained；

Training module 65, for according to the comparison knot between the classification results of training video and the label of training video Fruit is trained target nerve network, attention mechanism processing network and object classifiers.

Optionally, acquisition module 61 are specifically used for：Obtain multiple training videos with label；

According to preset sample frequency, training video is sampled；

Optionally, attention mechanism processing module 63 is specifically used for using feature vector as granularity, at attention mechanism Reason network carries out weight assignment respectively to the feature vector of training image frame in each batch, and, using batch as granularity, use Attention mechanism processing network carries out weight assignment respectively to each batch.

Optionally, using feature vector as granularity, network is handled to training image frame in each batch using attention mechanism Feature vector carry out weight assignment respectively, the weight assignment result a (i) of obtained i-th of batch meets formula (1)：

(1) a (i)=tanh (W₁F₁+W₂F₂+…+W_nF_n+c)；

(2) b (j)=M₁a(1)+M₂a(2)+…+M_ma(m)+d；

Attention mechanism processing module 63 is additionally operable to, using batch as granularity, network be handled to every using attention mechanism After a batch carries out weight assignment respectively, the weight assignment result of each batch is normalized.

Optionally, sort module 64 is specifically used for：By the corresponding feature vector difference for carrying out weight assignment of each batch Object classifiers are inputted, the corresponding classification results of each batch are obtained；

According to being corresponding with classification results of the most classification results of batch size as training video.

Optionally, sort module 64 is specifically used for passing through following step by the corresponding spy for carrying out weight assignment of each batch Sign vector inputs object classifiers respectively, obtains the corresponding classification results of each batch：

The corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively successively, is obtained each The classification results of the training image frame of the feature vector characterization of weight assignment are carried out；

Optionally, further include：Concatenation module 66, for respectively by the feature vector of the training image frame in each batch into Row splicing forms splicing feature vector；

Attention mechanism processing module 63 is additionally operable to：Network is handled to the corresponding splicing of each batch using attention mechanism Vector carries out at least two-wheeled weight assignment；

Sort module 64 is additionally operable to：The corresponding splicing feature vector for having carried out weight assignment of each batch is input to target Grader obtains the classification results of training video.

Optionally, sort module 64 is specifically used for using following step by the corresponding spelling for carrying out weight assignment of each batch It connects feature vector and is input to object classifiers, obtain the classification results of training video：

Optionally, training module 65 is specifically used for：Execute following comparisons operation, until the classification results of training video and The label of training video is consistent；

Comparing operation includes：

The label of the classification results of training video and training video is compared；

If the classification results of training video and the label of training video are inconsistent, to target nerve network, attention Mechanism handles network and the parameter of object classifiers is adjusted；

Based on the parameter after adjustment, new feature is extracted for the training image frame in all batches using target nerve network Vector, and using attention mechanism processing network the new feature vector of training image frame in each batch is re-started it is also few Two-wheeled weight assignment；

The new feature vector of training image frame in each batch for having carried out again weight assignment is input to grader, is obtained Obtain the new classification results of training video；

And it executes compare operation again.

Shown in Figure 7, the embodiment of the present application seven also provides a kind of event detecting method, including：

S701：Obtain video to be detected；

S702：By video input to be detected to the event detection model training method by the embodiment of the present application any one In obtained event detection model, the classification results of video to be detected are obtained；

Wherein, event detection model includes：Target nerve network, attention mechanism processing network and object classifiers.

The embodiment of the present application eight also provides a kind of event detecting method, including：

Video acquiring module to be detected, for obtaining video to be detected；

Event checking module, for video input to be detected extremely to be passed through the event detection of the application any one embodiment In the event detection model that model training method obtains, the classification results of video to be detected are obtained；

Corresponding to the event detection model training method in Fig. 1, the embodiment of the present application nine additionally provides a kind of computer and sets It is standby, as shown in figure 8, the equipment includes memory 1000, processor 2000 and is stored on the memory 1000 and can be at this The computer program run on reason device 2000, wherein above-mentioned processor 2000 realizes above-mentioned thing when executing above computer program The step of part detection model training method.

Specifically, above-mentioned memory 1000 and processor 2000 can be general memory and processor, not do here It is specific to limit, when the computer program of 2000 run memory 1000 of processor storage, it is able to carry out above-mentioned event detection mould Type training method, in order to ensure model accuracy when to solve directly to be trained event detection model using training video, It is computationally intensive required for causing, the problem of expending excessive computing resource and training time, and then reach and do not influencing model Under the premise of precision, required required calculation amount in training process is reduced, reduces computing resource and the consuming of training time Effect.

Corresponding to the objective event detection model training method in Fig. 1, the embodiment of the present application also provides a kind of computers can Storage medium is read, computer program is stored on the computer readable storage medium, when which is run by processor The step of executing above-mentioned event detection model training method.

Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium Computer program when being run, above-mentioned objective event detection model training method is able to carry out, to solve to use training video It is computationally intensive required for causing in order to ensure model accuracy when being directly trained to event detection model, expend excessive meter The problem of calculating resource and training time, and then reach under the premise of not influencing model accuracy, it reduces required in training process Required calculation amount reduces computing resource and the effect of the consuming of training time.

The computer program of event detection model training method and event detecting method that the embodiment of the present application is provided Product, including the computer readable storage medium of program code is stored, the instruction that program code includes can be used for executing front Method in embodiment of the method, specific implementation can be found in embodiment of the method, and details are not described herein.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.

If function is realized in the form of SFU software functional unit and when sold or used as an independent product, can store In a computer read/write memory medium.Based on this understanding, the technical solution of the application is substantially in other words to existing There is the part for the part or the technical solution that technology contributes that can be expressed in the form of software products, the computer Software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal meter Calculation machine, server or network equipment etc.) execute each embodiment method of the application all or part of step.And it is above-mentioned Storage medium includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), magnetic disc or CD.

More than, the only specific implementation mode of the application, but the protection domain of the application is not limited thereto, and it is any to be familiar with Those skilled in the art can easily think of the change or the replacement in the technical scope that the application discloses, and should all cover Within the protection domain of the application.Therefore, the protection domain of the application should be subject to the protection scope in claims.

Claims

1. a kind of event detection model training method, which is characterized in that including：

The training image frame in multiple training videos with label is obtained, and the training image frame is divided into multiple batches； Each batch includes preset quantity training image frame；

Network is handled using attention mechanism, and at least two-wheeled weight tax is carried out to the feature vector of training image frame in each batch Value；

The feature vector of training image frame in each batch for carrying out weight assignment is input to object classifiers, obtains the instruction Practice the classification results of video；

According to the comparison result between the label of the classification results of the training video and the training video, to the target Neural network, attention mechanism processing network and the object classifiers are trained.

2. according to the method described in claim 1, it is characterized in that, the instruction obtained in multiple training videos with label Practice picture frame, specifically includes：

Obtain multiple training videos with label；

According to preset sample frequency, the training video is sampled；

3. according to the method described in claim 1, it is characterized in that, handling network to being instructed in each batch using attention mechanism The feature vector for practicing picture frame carries out weight assignment, specifically includes：

Using feature vector as granularity, feature vector point of the network to training image frame in each batch is handled using attention mechanism Weight assignment is not carried out, and, using batch as granularity, weight is carried out respectively to each batch using attention mechanism processing network Assignment.

4. according to the method described in claim 3, it is characterized in that, described using feature vector as granularity, attention mechanism is used Processing network carries out weight assignment respectively to the feature vector of training image frame in each batch, obtained i-th of batch Weight assignment result a (i) meets formula (1)：

(1) a (i)=tanh (W₁F₁+W₂F₂+…+W_nF_n+c)；

Wherein, n indicates the quantity of training image frame in i-th of batch；W₁To W_n1st to n-th is indicated in each batch respectively Open the corresponding weight of training image frame；F₁To F_nIndicate that the 1st to n-th training image frame is right respectively in each batch The feature vector answered；C is indicated using batch as granularity, and weight tax is carried out respectively to each batch using attention mechanism processing network Bigoted item when value；Tanh indicates activation primitive；

It is described using batch as granularity, using attention mechanism processing network weight assignment is carried out respectively to each batch, obtain The weight assignment result b (j) of j-th of batch meets formula (2)：

(2) b (j)=M₁a(1)+M₂a(2)+…+M_ma(m)+d；

It is described using batch as granularity, using attention mechanism processing network weight assignment is carried out respectively to each batch after, also Including：The weight assignment result of each batch is normalized.

5. according to the method described in claim 1, it is characterized in that, described scheme training in each batch for carrying out weight assignment As the feature vector of frame is input to grader, the classification results of the training video are obtained, are specifically included：

The corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively, each batch is obtained and corresponds to Classification results；

6. according to the method described in claim 5, it is characterized in that, described by the corresponding spy for carrying out weight assignment of each batch Sign vector inputs object classifiers respectively, obtains the corresponding classification results of each batch, specifically includes：

The corresponding feature vector for carrying out weight assignment of each batch is inputted into the object classifiers respectively successively, is obtained each The classification results of the training image frame of the feature vector characterization of weight assignment are carried out；

7. according to the method described in claim 1, it is characterized in that, further including：

It is described to handle feature vector progress at least two-wheeled power of the network to training image frame in each batch using attention mechanism Reassignment specifically includes：

The feature vector by training image frame in each batch for carrying out weight assignment is input to grader, obtains the instruction The classification results for practicing video, specifically include：

The corresponding splicing feature vector for having carried out weight assignment of each batch is input to object classifiers, the training is obtained and regards The classification results of frequency.

8. the method according to the description of claim 7 is characterized in that described by the corresponding spelling for carrying out weight assignment of each batch It connects feature vector and is input to object classifiers, obtain the classification results of the training video, specifically include：

The corresponding splicing feature vector for having carried out weight assignment of each batch is inputted into object classifiers respectively, obtains each batch Corresponding classification results；

9. according to the method described in claim 1, it is characterized in that, the classification results and institute according to the training video The comparison result between the label of training video is stated, the target nerve network and the object classifiers are trained, It specifically includes：

Following comparison operations are executed, until the classification results of the training video and the label of the training video are consistent；

The comparison operates：

If the label of the classification results of the training video and the training video is inconsistent, to the target nerve net The parameter of network, attention mechanism processing network and the object classifiers is adjusted；

The new feature vector of training image frame in each batch for having carried out again weight assignment is input to grader, obtains institute State the new classification results of training video；

And the comparison operation is executed again.

10. a kind of event detecting method, which is characterized in that including：

Obtain video to be detected；

The video input to be detected to the event detection model training method by claim 1-9 any one is obtained In event detection model, the classification results of the video to be detected are obtained；

Wherein, the event detection model includes：The target nerve network, attention mechanism processing network and described Object classifiers.