CN108334910A - A kind of event detection model training method and event detecting method - Google Patents
A kind of event detection model training method and event detecting method Download PDFInfo
- Publication number
- CN108334910A CN108334910A CN201810297702.9A CN201810297702A CN108334910A CN 108334910 A CN108334910 A CN 108334910A CN 201810297702 A CN201810297702 A CN 201810297702A CN 108334910 A CN108334910 A CN 108334910A
- Authority
- CN
- China
- Prior art keywords
- batch
- training
- image frame
- feature vector
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
A kind of event detection model training method of the application offer and event detecting method;The event detection model training method includes:The training image frame in multiple training videos with label is obtained, and training image frame is divided into multiple batches;The use of target nerve network is the training image frame extraction feature vector in all batches;Network is handled using attention mechanism, and at least two-wheeled weight assignment is carried out to the feature vector of training image frame in each batch;The feature vector of training image frame in each batch for carrying out weight assignment is input to object classifiers, obtains the classification results of training video;According to the comparison result between the classification results of training video and the label of training video, target nerve network, attention mechanism processing network and object classifiers are trained.The embodiment of the present application can reduce required required calculation amount in training process under the premise of not influencing model accuracy, reduce computing resource and the consuming of training time.
Description
Technical field
This application involves depth learning technology field, in particular to a kind of event detection model training method and
Event detecting method.
Background technology
Fast development with neural network in fields such as image, video, voice, texts has pushed a series of intelligence productions
The landing of product, user are also higher and higher to the required precision of the various models based on neural network.It is built based on neural network
When event detection model, in order to allow neural network fully to learn the feature of image in video, to promote event detection model
Classification, need a large amount of training video being input in neural network, neural network be trained.
But would generally be including very more images in training video, data volume is very large.Use these instructions
When practicing video to neural metwork training, although the precision of the obtained model of training can be improved, also just because of
Data volume is excessive, and required calculation amount is huge during leading to model training, expends excessive computing resource and training
Time.
Apply for content
In view of this, the embodiment of the present application is designed to provide a kind of event detection model training method and event inspection
Survey method can reduce required required calculation amount in training process under the premise of not influencing model accuracy, reduce and calculate
The consuming of resource and training time.
In a first aspect, the embodiment of the present application provides a kind of event detection model training method, including:
The training image frame in multiple training videos with label is obtained, and the training image frame is divided into multiple batches
It is secondary;Each batch includes preset quantity training image frame;
The use of target nerve network is the training image frame extraction feature vector in all batches;
Network is handled using attention mechanism, and at least two-wheeled power is carried out to the feature vector of training image frame in each batch
Reassignment;
The feature vector of training image frame in each batch for carrying out weight assignment is input to object classifiers, obtains institute
State the classification results of training video;
According to the comparison result between the label of the classification results of the training video and the training video, to described
Target nerve network, attention mechanism processing network and the object classifiers are trained.
With reference to first aspect, the embodiment of the present application provides the first possible embodiment of first aspect, wherein:Institute
The training image frame obtained in multiple training videos with label is stated, is specifically included:
Obtain multiple training videos with label;
According to preset sample frequency, the training video is sampled;
Using the image sampled to each training video as the training image frame in the training video.
With reference to first aspect, the embodiment of the present application provides second of possible embodiment of first aspect, wherein:Make
Weight assignment is carried out to the feature vector of training image frame in each batch with attention mechanism processing network, is specifically included:
Using feature vector as granularity, using attention mechanism handle network to the feature of training image frame in each batch to
Amount carries out weight assignment respectively, and, using batch as granularity, each batch is carried out respectively using attention mechanism processing network
Weight assignment.
With reference to first aspect, the embodiment of the present application provides the third possible embodiment of first aspect, wherein:Institute
It states using feature vector as granularity, handling network using attention mechanism distinguishes the feature vector of training image frame in each batch
Weight assignment is carried out, the weight assignment result a (i) of obtained i-th of batch meets formula (1):
(1) a (i)=tanh (W1F1+W2F2+…+WnFn+c);
Wherein, n indicates the quantity of training image frame in i-th of batch;W1To Wn1st is indicated in each batch respectively
To the corresponding weight of n-th training image frame;F1To FnIndicate the 1st to n-th training image frame point in each batch
Not corresponding feature vector;C indicates, using batch as granularity, to weigh each batch respectively using attention mechanism processing network
Bigoted item when reassignment;Tanh indicates activation primitive;
It is described using batch as granularity, using attention mechanism processing network weight assignment is carried out respectively to each batch, obtain
To the weight assignment result b (j) of j-th of batch meet formula (2):
(2) b (j)=M1a(1)+M2a(2)+…+Mma(m)+d;
M1To MmIt indicates from the corresponding weight of the 1st to m-th batch;D indicates, using batch as granularity, to use attention
Mechanism processing network carries out each batch bigoted item when weight assignment respectively;
It is described using batch as granularity, using attention mechanism processing network each batch is carried out respectively weight assignment it
Afterwards, further include:The weight assignment result of each batch is normalized.
With reference to first aspect, the embodiment of the present application provides the 4th kind of possible embodiment of first aspect, wherein:
The feature vector by training image frame in each batch for carrying out weight assignment is input to grader, obtains institute
The classification results for stating training video, specifically include:
The corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively, obtains each batch
Corresponding classification results;
According to being corresponding with classification results of the most classification results of batch size as the training video.
With reference to first aspect, the embodiment of the present application provides the 5th kind of possible embodiment of first aspect, wherein:Institute
It states and the corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively, it is corresponding to obtain each batch
Classification results specifically include:
The corresponding feature vector for carrying out weight assignment of each batch is inputted into the object classifiers respectively successively, is obtained
The classification results of the training image frame of each feature vector characterization for having carried out weight assignment;
The most classification results of training image number of frames will be corresponding with as the classification results of the batch.
With reference to first aspect, the embodiment of the present application provides the 6th kind of possible embodiment of first aspect, wherein:
Further include:
The feature vector of the training image frame in each batch is spliced respectively, forms splicing feature vector;
It is described to handle feature vector progress at least two of the network to training image frame in each batch using attention mechanism
Weight assignment is taken turns, is specifically included:
At least two-wheeled weight assignment is carried out using attention mechanism processing network splicing vector corresponding to each batch;
The feature vector by training image frame in each batch for carrying out weight assignment is input to grader, obtains institute
The classification results for stating training video, specifically include:
The corresponding splicing feature vector for having carried out weight assignment of each batch is input to object classifiers, obtains the instruction
Practice the classification results of video.
With reference to first aspect, the embodiment of the present application provides the 7th kind of possible embodiment of first aspect, wherein:
It is described that the corresponding splicing feature vector for having carried out weight assignment of each batch is input to object classifiers, obtain institute
The classification results for stating training video, specifically include:
The corresponding splicing feature vector for having carried out weight assignment of each batch is inputted into object classifiers respectively, is obtained each
The corresponding classification results of batch;
According to being corresponding with classification results of the most classification results of batch size as the training video.
With reference to first aspect, the embodiment of the present application provides the 8th kind of possible embodiment of first aspect, wherein:Institute
It states according to the comparison result between the classification results of the training video and the label of the training video, to target god
It is trained, specifically includes through network and the object classifiers:
Following comparison operations are executed, until the classification results of the training video and the label one of the training video
It causes;
The comparison operates:
The label of the classification results of the training video and the training video is compared;
If the label of the classification results of the training video and the training video is inconsistent, to the target nerve
The parameter of network, attention mechanism processing network and the object classifiers is adjusted;
Based on the parameter after adjustment, extracted newly for the training image frame in all batches using target nerve network
Feature vector, and the new feature vector of training image frame in each batch is re-started using attention mechanism processing network
Also lack two-wheeled weight assignment;
The new feature vector of training image frame in each batch for having carried out again weight assignment is input to grader, is obtained
Obtain the new classification results of the training video;
And the comparison operation is executed again.
Second aspect, the embodiment of the present application also provide a kind of event detecting method, which is characterized in that including:
Obtain video to be detected;
By the video input to be detected to the event detection model training method by above-mentioned first aspect any one
In obtained event detection model, the classification results of the video to be detected are obtained;
Wherein, the event detection model includes:The target nerve network, the attention mechanism processing network and
The object classifiers.
The embodiment of the present application when being trained to event detection model using the training image frame in training video,
Training image frame can be divided into multiple batches first, then carried for the training image frame in all batches using target nerve network
Take feature vector.Then reuse attention mechanism processing network to the feature vector of training image frame in each batch carry out to
Few two-wheeled weight assignment, to increase the power for the corresponding feature vector of training image frame for belonging to main matter in training video
Weight reduces the weight for the corresponding feature vector of training image frame that main matter is not belonging in training video, is weighed based on have passed through
During the feature vector of reassignment is to event detection model training, event detection model can be good at study to belonging to main
The feature in the training image frame of event is wanted, ensures the precision of finally obtained event detection model;Simultaneously as training regards
The weight of the corresponding feature vector of training image frame of main matter is not belonging in frequency to be reduced, that is, being not belonging to main matter
The value of element in the corresponding feature vector of training image frame can be reduced accordingly, and Partial Elements even can directly be zeroed, Jin Erji
When being not belonging to the corresponding feature vector of training image frame of main matter to event detection model training, reduce a large amount of
Calculation amount, reduce required calculation amount during event detection model training, reduce computing resource and training event
It expends.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate
Appended attached drawing, is described in detail below.
Description of the drawings
It, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the flow chart for the event detection model training method that the embodiment of the present application one is provided;
Fig. 2 shows the use attention mechanism processing networks that the embodiment of the present application two provides to scheme to training in each batch
As the feature vector of frame carries out the flow chart of the method for two-wheeled weight assignment;
Fig. 3 shows the spy by training image frame in each batch for carrying out weight assignment that application embodiment three also provides
Sign vector is input to grader, obtains the flow chart of the specific method of the classification results of training video;
Fig. 4 shows a kind of flow chart for comparison operating method that application example IV is provided;
Fig. 5 shows a kind of flow chart for event detection model training method that application embodiment five is provided
Fig. 6 shows the structural schematic diagram for the event detection model training apparatus that the embodiment of the present application six is provided;
Fig. 7 shows the flow chart for the event detecting method that application embodiment seven is provided;
A kind of structural schematic diagram for computer equipment that Fig. 8 the embodiment of the present application nine provides.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
Middle attached drawing, technical solutions in the embodiments of the present application are clearly and completely described, it is clear that described embodiment is only
It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real
Applying the component of example can be arranged and designed with a variety of different configurations.Therefore, below to the application's for providing in the accompanying drawings
The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application
Apply example.Based on embodiments herein, institute that those skilled in the art are obtained without making creative work
There is other embodiment, shall fall in the protection scope of this application.
It is directly to be input to training video at present when being trained to event detection model using training video
Neural network and grader, are trained neural network and grader.It is practical to be trained to neural network and grader
In the process, neural network and grader is needed to be carried out operation to each image in training video.But training video is general
Multiple events are will include, and the image of some events is actually positive contribution no to the classification of video, can be influenced instead
Therefore normal training to event detection model using neural network and divides grader to these to the no front of visual classification
The image of contribution carries out feature learning, can many calculation amounts be spent in unnecessary place instead, lead to model training process
The calculation amount of middle needs is huge, expends excessive computing resource and training time.Based on this, the application provides a kind of event inspection
Model training method and event detecting method are surveyed, institute in training process can be reduced under the premise of not influencing model accuracy
Required calculation amount is needed, computing resource and the consuming of training time are reduced.
For ease of understanding the present embodiment, a kind of event detection model disclosed in the embodiment of the present application is instructed first
Practice method to describe in detail.The event detection that the event detection model training method provided using the embodiment of the present application is obtained
Model can efficiently accomplish the classification to event occurred in non-editing video;It can also be effectively realized simultaneously to Internet video certainly
Dynamicization is classified;It is supported in addition it is possible to provide rational label for video recommendation system, also practices video for convenient pair and effectively pushed away
It recommends.
Shown in Figure 1, the event detection model training method that the embodiment of the present application one provides includes:
S101:The training image frame in multiple training videos with label is obtained, and training image frame is divided into multiple
Batch;Each batch includes preset quantity training image frame.
When specific implementation, training video is typically one section of long video, generally comprises at least one thing
Part;, generally can be using some event as main matter when training video includes multiple events, other events are as time important affair
Part, and based on the main matter to training video into the mark of row label.
Such as in the video of the swimming contest at one section, other than swimming contest this event, it is also possible to spectators can be related to
Seat event and sportsman are with clapping event, but swimming contest can occupy larger proportion in entire video, therefore the ratio that will swim
Match is used as main matter, and the label of the video is swimming contest.
Event detection model is trained using entire training video, usually can all there is the data volume due to input
It is larger and cause the reduction of model convergence rate, the problems such as training process needs time for expending long, and resource is more.Therefore, in order to add
Fast model convergence, reduces model training and needs the time expended and resource in the process, need to obtain from entire training video
Training image frame;Training image frame is the part of all images included by entire training video.Usually, may be used according to
Preset sample frequency respectively samples multiple training videos, using the image that each training video is sampled as
Training image frame in the training video, the training image frame for each training video being then based on is to event detection model
It is trained.
Meanwhile it also especially being regarded in training just because of usually including at least one event in each training video
When frequency includes multiple events, different events, which would generally mutually be interted, to be appeared in training video, different events it
Between also have linking.Therefore in order to preferably be positioned to the main matter in training video, strengthen main matter and exist
The weight occupied in all events, and the weight that secondary event occupies in all events is weakened, it can be by training video frame point
At multiple batches, each batch includes preset quantity training image frame, in this way can scheme the training included by different events
As the cutting of frame as possible is opened, different event is divided into different batches.
Herein, the quantity of the corresponding training image frame of each batch can be specifically chosen according to actual needs;For example,
It, can be by the quantity setting of corresponding training image frame in each batch if the event switching in training video is very fast
It is less;If the event switching in training video is slower, the quantity of corresponding training image frame in each batch can be arranged
It is more.
In addition, it should be noted that when training video is cut into multiple batches, since acquired training regards
The integral multiple of the quantity of training image frame in frequency included training image number of frames in most of not each batches, therefore
Carried out in the last one obtained batch of cutting for training video, the quantity of included training image frame usually all without
Method reaches preset quantity, therefore the batch that can cannot be satisfied demand to training image number of frames is filled, in training image
Transparent frame, completely black frame or complete white frame, the training image frame for allowing the batch to include are filled after the image sequence that frame is constituted
Quantity reaches preset quantity.
S102:The use of target nerve network is the training image frame extraction feature vector in all batches.
When specific implementation, convolutional neural networks model (Convolutional may be used in target nerve network
Neural Network, CNN) feature extraction is carried out to the training image frame in each batch, it obtains and every training image frame
Corresponding feature vector.
Herein, in order to accelerate the convergence during event detection model training, used target network model can be
Training image frame in training video is inputted to target nerve network to be trained, trained target nerve network is treated and is instructed
Obtained from white silk.
S103:Network is handled using attention mechanism, and at least two are carried out to the feature vector of training image frame in each batch
Take turns weight assignment.
When specific implementation, using attention mechanism learning training picture frame part to be processed, each current shape
State can all learn the image for obtaining the position to be paid close attention to and/or currently inputting according to preceding state, go to processing attention part
Pixel, rather than whole pixels of image.For example, certain training video A includes diving and two events of auditorium, and it is it to dive
In main matter, attention mechanism can focus on more focus in diving event, strengthen the pass to event of diving
Note, and weaken the concern to auditorium event.
Concern of the reinforcing to main matter, while the process that secondary event is paid close attention in reduction, namely to training image frame
Feature vector carry out weight assignment, increase the weight of the training image frame corresponding to main matter, and reduce secondary event
The weight of corresponding training image frame.
Specifically, shown in Figure 2, the embodiment of the present application two provide it is a kind of using attention mechanism handle network to each
The method that the feature vector of training image frame carries out two-wheeled weight assignment in batch, including:
S201:Using feature vector as granularity, network is handled to training image frame in each batch using attention mechanism
Feature vector carries out weight assignment respectively;And
S202:Using batch as granularity, weight assignment is carried out respectively to each batch using attention mechanism processing network.
Herein, due to will include multiple events in each training video, and can be when being divided into multiple batches, quilt
Be divided into multiple batches, but this division can not strict guarantee in each batch only there are one the corresponding instruction of event
Practice picture frame.Therefore, the weight occupied in all events in reinforcing main matter, and secondary event is weakened in all events
The weight occupied, first has to using feature vector as granularity, and network is handled to training image in each batch using attention mechanism
The feature vector of frame carries out weight assignment respectively, namely in each batch, increases the corresponding training image frame of main matter
Weight reduces the weight of the corresponding training image frame of secondary event.
Similarly, the generation position due to event in training video, duration have uncertainty, lead to part batch
The quantity of the included corresponding training image frame of main matter is more in secondary, and included main matter is corresponding in the batch of part
The quantity of training image frame is few, in order to further increase the weight of the corresponding training image frame of main matter, and reduces secondary
The weight of the corresponding training image frame of event, for each batch executed attention mechanism processing and then be directed to institute
There is batch to carry out attention mechanism processing, that is, increasing the power for the batch for including the corresponding training image frame of more main matter
Weight, it includes less or even the batch including the corresponding training image frame of main matter weight to reduce, to further
It is primarily focused in the corresponding training image of main matter, further weakens secondary event caused by event detection model
It influences.Meanwhile it when the weight of the corresponding training image frame of secondary event in reducing training video, can cause to want many times
There are a great number of elements numerical value can all be lowered in the feature vector of the corresponding training image frame of event, and some can be even zeroed,
And then when the feature vector using the training image frame for reducing weight is to event detection model training, simplify calculating
Complexity reduce computing resource and training time to also just reduce required required calculation amount in training process
Consuming.
Specifically, using feature vector as granularity, network is handled to training image frame in each batch using attention mechanism
Feature vector carry out weight assignment respectively, the weight assignment result a (i) of obtained i-th of batch meets formula (1):
A (i)=tanh (W1F1+W2F2+…+WnFn+c) (1)
Wherein, n indicates the quantity of training image frame in i-th of batch;W1To Wn1st is indicated in each batch respectively
To the corresponding weight of n-th training image frame;F1To FnIndicate the 1st to n-th training image frame point in each batch
Not corresponding feature vector;C indicates, using batch as granularity, to weigh each batch respectively using attention mechanism processing network
Bigoted item when reassignment;Tanh indicates activation primitive;
Using batch as granularity, weight assignment is carried out respectively to each batch using attention mechanism processing network, is obtained
The weight assignment result b (j) of j-th of batch meets formula (2):
B (j)=M1a(1)+M2a(2)+…+Mma(m)+d (2)
Wherein, M1To MmIt indicates from the corresponding weight of the 1st to m-th batch;D indicates, using batch as granularity, to use note
Meaning power mechanism processing network carries out each batch bigoted item when weight assignment respectively.
Herein, it should be noted that network can also be handled to each batch using meaning power mechanism according to the actual needs
The feature vector of middle training image frame carries out the weight assignment of more wheels, further to increase the corresponding training image of main matter
The weight of frame reduces the weight of the corresponding training image frame of secondary event, further reduces calculation amount.
In another embodiment, using batch as granularity, using attention mechanism handle network to each batch respectively into
After row weight assignment, further include:The weight assignment result of each batch is normalized.More simplify feature vector, further
Reduce calculation amount.
S104:The feature vector of training image frame in each batch for carrying out weight assignment is input to object classifiers,
Obtain the classification results of training video.
When specific implementation, the feature vector of training image in each batch for carrying out weight assignment is inputted respectively
To grader, grader can feature based vector, to each feature vector characterization training image frame classify.For
The feature vector of weight is increased, grader can more learn to its feature, for reducing the feature vector of weight, point
Class device can reduce the study to this Partial Feature vector, then according to the classification results of each feature vector, entirely be trained
The classification results of video.
Specifically, shown in Figure 3, the embodiment of the present application three also provide it is a kind of will be in each batch that carry out weight assignment
The feature vector of training image frame is input to grader, obtains the specific method of the classification results of training video, including:
S301:The corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively, is obtained every
The corresponding classification results of a batch.
When specific implementation, the corresponding classification results of each batch can use all training image frames in the batch
Classification results weigh;In the batch, there is which kind of more training image frames belong to, then the batch belongs to this
The probability that the probability of class will belong to other classes compared with it is high.
Therefore following manner, which may be used, obtains the corresponding classification results of each batch:
The corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively successively, is obtained each
The classification results of the training image frame of the feature vector characterization of weight assignment are carried out;It is most that training image number of frames will be corresponding with
Classification results of the classification results as the batch.
For example, training video A includes main matter and secondary event;The training image frame of the training video is divided
Batch, the batch that obtained number is 1 includes 64 training image frames.Two-wheeled power is being carried out to this 64 training images
After reassignment, the corresponding feature vector of 64 training images for having carried out weight assignment is input to grader successively, obtains this
In 64 training image frames, classification results, which are the training image frame of main matter, 50, and classification results are the instruction of secondary event
Practicing picture frame has 14, and the quantity for belonging to the training image frame of main matter is more than the number for the training image frame for belonging to secondary event
Amount, therefore can determine that the classification results for the batch that the number is 1 are main matter.
S302:According to being corresponding with classification results of the most classification results of batch size as training video.
In the method according to above-mentioned S301, the corresponding classification results of each batch included by each training video are obtained
Afterwards, the classification results most by batch size is corresponding with, the classification results as training video.
For example, training video A includes main matter and secondary event;Training image frame in training video A is divided into
When multiple batches, 20 batches are formed altogether;The corresponding feature vector for carrying out weight assignment of each batch is inputted into target respectively
Grader obtains the corresponding classification results of each batch, wherein it is main matter to have the classification results of 16 batches, there is 4 batches
Secondary result is secondary event, then the classification results of the training video are main matter.
S105:According to the comparison result between the classification results of training video and the label of training video, to target god
It is trained through network, attention mechanism processing network and object classifiers.
Specifically, the embodiment of the present application four also provides a kind of state according to the classification results of training video and training video
Comparison result between label, to the specific method that target nerve network and object classifiers are trained, including:
Following comparison operations are executed, until the classification results of training video and the label of training video are consistent;
Shown in Figure 4, comparing operation includes:
S401:Whether the label of the classification results and training video that compare training video is consistent;If so, jumping to
S402;If it is not, then jumping to S403;
S402:It completes to train the epicycle of target nerve network, attention mechanism processing network and object classifiers;It should
Flow terminates.
S403:The parameter of target nerve network, attention mechanism processing network and object classifiers is adjusted;
S404:Based on the parameter after adjustment, extracted newly for the training image frame in all batches using target nerve network
Feature vector, and using attention mechanism processing network to the new feature vector of training image frame in each batch again into
Row at least two-wheeled weight assignment;And by the new feature of training image frame in each batch for having carried out again weight assignment to
Amount is input to grader, obtains the new classification results of training video;And S401 is executed again.
When specific implementation, the feature vector of training image frame carries out weight assignment in first time is to each batch
Before, initial assignment is carried out to attention mechanism processing network in the way of weight random distribution.Initial assignment is carried out
The weight of the attention mechanism processing network training image frame that may will belong to main matter reduce, and time important affair will be belonged to
The weight of the training image frame of part improves, and influences finally to the accuracy of training video classification results, therefore, it is necessary to paying attention to
Power mechanism processing network is trained so that attention mechanism processing network is increasingly intended to improve the corresponding instruction of main matter
Practice the weight of picture frame, and the direction for reducing the corresponding training image frame weight of secondary event is developed.
Meanwhile if target nerve network cannot learn well to the feature in training image frame, it is final right also to influence
The accuracy of training video classification results, thus target nerve network is trained so that target nerve network is increasingly
It tends to preferably learn to develop to the direction of feature in training image frame.Likewise, be also required to object classifiers into
Row training so that object classifiers are correctly oriented development to when classifying to feature vector towards classification.
The embodiment of the present application when being trained to event detection model using the training image frame in training video,
Training image frame can be divided into multiple batches first, then carried for the training image frame in all batches using target nerve network
Take feature vector.Then reuse attention mechanism processing network to the feature vector of training image frame in each batch carry out to
Few two-wheeled weight assignment, to increase the power for the corresponding feature vector of training image frame for belonging to main matter in training video
Weight reduces the weight for the corresponding feature vector of training image frame that main matter is not belonging in training video, is weighed based on have passed through
During the feature vector of reassignment is to event detection model training, event detection model can be good at study to belonging to main
The feature in the training image frame of event is wanted, ensures the precision of finally obtained event detection model;Simultaneously as training regards
The weight of the corresponding feature vector of training image frame of main matter is not belonging in frequency to be reduced, that is, being not belonging to main matter
The value of element in the corresponding feature vector of training image frame can be reduced accordingly, and Partial Elements even can directly be zeroed, Jin Erji
When being not belonging to the corresponding feature vector of training image frame of main matter to event detection model training, reduce a large amount of
Calculation amount, reduce required calculation amount during event detection model training, reduce computing resource and training event
It expends.
Shown in Figure 5, the embodiment of the present application five also provides another event detection model training method, including:
S501:The training image frame in multiple training videos with label is obtained, and training image frame is divided into multiple
Batch;Each batch includes preset quantity training image frame.
Herein, similar with S101, S101 descriptions are referred to, details are not described herein.
S502:The use of target nerve network is the training image frame extraction feature vector in all batches.
Herein, similar with S102, S102 descriptions are referred to, details are not described herein.
S503:The feature vector of the training image frame in each batch is spliced respectively, forms splicing feature vector.
When specific implementation, the feature vector of the training image frame in each batch is spliced, the spelling of formation
It is actually to constitute more high-dimensional splicing feature vector using the feature vector of multiple training image frames to connect vector.
Specifically, since the size for the training image frame for belonging to same training video is consistent, obtained institute
There is the dimension of the feature vector of training image frame to be the same.Will be respectively by the feature of the training image frame in each batch
Vector is spliced, and when forming splicing feature vector, can is horizontally-spliced, be can also be longitudinal spliced.Such as it is inciting somebody to action
The dimension that the feature vector of training image frame is trained the feature vector of picture frame is 1*512, then by 10 training images
The feature vector of frame carries out longitudinal spliced result:The feature vector of 10 training image frames is carried out lateral spelling by 10*512
The result connect is:1*5120.
S504:At least two-wheeled weight is carried out using attention mechanism processing network splicing vector corresponding to each batch to assign
Value.
Herein, splicing vector corresponding to each batch carries out at least method of two-wheeled weight assignment and above-mentioned S103 classes
Seemingly, details are not described herein.
S505:The corresponding splicing feature vector for having carried out weight assignment of each batch is input to object classifiers, is obtained
The classification results of training video.
Herein, the method that above-mentioned S104 may be used obtains the classification results of training video.In addition it is different from above-mentioned S104,
Splicing feature vector due to having carried out weight assignment has actually been a big vector, and object classifiers can be straight
Connect based on to having gone the splicing feature vector of weight assignment, to gone weight assignment splicing feature vector characterization batch into
Row classification, obtains classification results corresponding with batch.Herein, belong to secondary event due to having been reduced in above-mentioned S504
The weight of training image frame, and the weight for the training image frame for belonging to main matter is increased, thus reduce and belong to secondary
The training image frame of event can realize the precise classification to batch to entirely splicing the interference of feature vector.Obtaining batch
Classification results after, according to being corresponding with classification results of the most classification results of batch size as training video.
S506:According to the comparison result between the classification results of training video and the label of training video, to target god
It is trained through network, attention mechanism processing network and object classifiers.
Herein, similar with S105, S105 descriptions are referred to, details are not described herein.
By the embodiment of the present application five, the feature vector of the training image frame in each batch is spliced respectively, shape
At splicing feature vector, operation later is all based on the operation for splicing feature vector, can be by splicing feature vector, more preferably
Response training picture frame weight occupied in batch, classified to batch based on splicing feature vector, relative to base
Classify to training image frame in feature vector, then the classification results again based on training image frame in batch obtain batch
Classification, can reduce the number of sort operation, be further reduced calculation amount.
Based on same inventive concept, thing corresponding with event detection model training method is additionally provided in the embodiment of the present application
Part detection model training device, the principle and the above-mentioned thing of the embodiment of the present application solved the problems, such as due to the device in the embodiment of the present application
Part detection model training method is similar, therefore the implementation of device may refer to the implementation of method, and overlaps will not be repeated.
Shown in Figure 6, the embodiment of the present application six provides a kind of event detection model training apparatus and includes:
Acquisition module 61, for obtaining the training image frame in multiple training videos with label, and by training image
Frame is divided into multiple batches;Each batch includes preset quantity training image frame;
Characteristic extracting module 62, for using target nerve network be all batches in training image frame extract feature to
Amount;
Attention mechanism processing module 63, for handling network to training image frame in each batch using attention mechanism
Feature vector carry out at least two-wheeled weight assignment;
Sort module 64, for the feature vector of training image frame in each batch for carrying out weight assignment to be input to mesh
Grader is marked, the classification results of training video are obtained;
Training module 65, for according to the comparison knot between the classification results of training video and the label of training video
Fruit is trained target nerve network, attention mechanism processing network and object classifiers.
The embodiment of the present application when being trained to event detection model using the training image frame in training video,
Training image frame can be divided into multiple batches first, then carried for the training image frame in all batches using target nerve network
Take feature vector.Then reuse attention mechanism processing network to the feature vector of training image frame in each batch carry out to
Few two-wheeled weight assignment, to increase the power for the corresponding feature vector of training image frame for belonging to main matter in training video
Weight reduces the weight for the corresponding feature vector of training image frame that main matter is not belonging in training video, is weighed based on have passed through
During the feature vector of reassignment is to event detection model training, event detection model can be good at study to belonging to main
The feature in the training image frame of event is wanted, ensures the precision of finally obtained event detection model;Simultaneously as training regards
The weight of the corresponding feature vector of training image frame of main matter is not belonging in frequency to be reduced, that is, being not belonging to main matter
The value of element in the corresponding feature vector of training image frame can be reduced accordingly, and Partial Elements even can directly be zeroed, Jin Erji
When being not belonging to the corresponding feature vector of training image frame of main matter to event detection model training, reduce a large amount of
Calculation amount, reduce required calculation amount during event detection model training, reduce computing resource and training event
It expends.
Optionally, acquisition module 61 are specifically used for:Obtain multiple training videos with label;
According to preset sample frequency, training video is sampled;
Using the image sampled to each training video as the training image frame in the training video.
Optionally, attention mechanism processing module 63 is specifically used for using feature vector as granularity, at attention mechanism
Reason network carries out weight assignment respectively to the feature vector of training image frame in each batch, and, using batch as granularity, use
Attention mechanism processing network carries out weight assignment respectively to each batch.
Optionally, using feature vector as granularity, network is handled to training image frame in each batch using attention mechanism
Feature vector carry out weight assignment respectively, the weight assignment result a (i) of obtained i-th of batch meets formula (1):
(1) a (i)=tanh (W1F1+W2F2+…+WnFn+c);
Wherein, n indicates the quantity of training image frame in i-th of batch;W1To Wn1st is indicated in each batch respectively
To the corresponding weight of n-th training image frame;F1To FnIndicate the 1st to n-th training image frame point in each batch
Not corresponding feature vector;C indicates, using batch as granularity, to weigh each batch respectively using attention mechanism processing network
Bigoted item when reassignment;Tanh indicates activation primitive;
Using batch as granularity, weight assignment is carried out respectively to each batch using attention mechanism processing network, is obtained
The weight assignment result b (j) of j-th of batch meets formula (2):
(2) b (j)=M1a(1)+M2a(2)+…+Mma(m)+d;
M1To MmIt indicates from the corresponding weight of the 1st to m-th batch;D indicates, using batch as granularity, to use attention
Mechanism processing network carries out each batch bigoted item when weight assignment respectively;
Attention mechanism processing module 63 is additionally operable to, using batch as granularity, network be handled to every using attention mechanism
After a batch carries out weight assignment respectively, the weight assignment result of each batch is normalized.
Optionally, sort module 64 is specifically used for:By the corresponding feature vector difference for carrying out weight assignment of each batch
Object classifiers are inputted, the corresponding classification results of each batch are obtained;
According to being corresponding with classification results of the most classification results of batch size as training video.
Optionally, sort module 64 is specifically used for passing through following step by the corresponding spy for carrying out weight assignment of each batch
Sign vector inputs object classifiers respectively, obtains the corresponding classification results of each batch:
The corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively successively, is obtained each
The classification results of the training image frame of the feature vector characterization of weight assignment are carried out;
The most classification results of training image number of frames will be corresponding with as the classification results of the batch.
Optionally, further include:Concatenation module 66, for respectively by the feature vector of the training image frame in each batch into
Row splicing forms splicing feature vector;
Attention mechanism processing module 63 is additionally operable to:Network is handled to the corresponding splicing of each batch using attention mechanism
Vector carries out at least two-wheeled weight assignment;
Sort module 64 is additionally operable to:The corresponding splicing feature vector for having carried out weight assignment of each batch is input to target
Grader obtains the classification results of training video.
Optionally, sort module 64 is specifically used for using following step by the corresponding spelling for carrying out weight assignment of each batch
It connects feature vector and is input to object classifiers, obtain the classification results of training video:
The corresponding splicing feature vector for having carried out weight assignment of each batch is inputted into object classifiers respectively, is obtained each
The corresponding classification results of batch;
According to being corresponding with classification results of the most classification results of batch size as training video.
Optionally, training module 65 is specifically used for:Execute following comparisons operation, until the classification results of training video and
The label of training video is consistent;
Comparing operation includes:
The label of the classification results of training video and training video is compared;
If the classification results of training video and the label of training video are inconsistent, to target nerve network, attention
Mechanism handles network and the parameter of object classifiers is adjusted;
Based on the parameter after adjustment, new feature is extracted for the training image frame in all batches using target nerve network
Vector, and using attention mechanism processing network the new feature vector of training image frame in each batch is re-started it is also few
Two-wheeled weight assignment;
The new feature vector of training image frame in each batch for having carried out again weight assignment is input to grader, is obtained
Obtain the new classification results of training video;
And it executes compare operation again.
Shown in Figure 7, the embodiment of the present application seven also provides a kind of event detecting method, including:
S701:Obtain video to be detected;
S702:By video input to be detected to the event detection model training method by the embodiment of the present application any one
In obtained event detection model, the classification results of video to be detected are obtained;
Wherein, event detection model includes:Target nerve network, attention mechanism processing network and object classifiers.
The embodiment of the present application eight also provides a kind of event detecting method, including:
Video acquiring module to be detected, for obtaining video to be detected;
Event checking module, for video input to be detected extremely to be passed through the event detection of the application any one embodiment
In the event detection model that model training method obtains, the classification results of video to be detected are obtained;
Wherein, event detection model includes:Target nerve network, attention mechanism processing network and object classifiers.
Corresponding to the event detection model training method in Fig. 1, the embodiment of the present application nine additionally provides a kind of computer and sets
It is standby, as shown in figure 8, the equipment includes memory 1000, processor 2000 and is stored on the memory 1000 and can be at this
The computer program run on reason device 2000, wherein above-mentioned processor 2000 realizes above-mentioned thing when executing above computer program
The step of part detection model training method.
Specifically, above-mentioned memory 1000 and processor 2000 can be general memory and processor, not do here
It is specific to limit, when the computer program of 2000 run memory 1000 of processor storage, it is able to carry out above-mentioned event detection mould
Type training method, in order to ensure model accuracy when to solve directly to be trained event detection model using training video,
It is computationally intensive required for causing, the problem of expending excessive computing resource and training time, and then reach and do not influencing model
Under the premise of precision, required required calculation amount in training process is reduced, reduces computing resource and the consuming of training time
Effect.
Corresponding to the objective event detection model training method in Fig. 1, the embodiment of the present application also provides a kind of computers can
Storage medium is read, computer program is stored on the computer readable storage medium, when which is run by processor
The step of executing above-mentioned event detection model training method.
Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium
Computer program when being run, above-mentioned objective event detection model training method is able to carry out, to solve to use training video
It is computationally intensive required for causing in order to ensure model accuracy when being directly trained to event detection model, expend excessive meter
The problem of calculating resource and training time, and then reach under the premise of not influencing model accuracy, it reduces required in training process
Required calculation amount reduces computing resource and the effect of the consuming of training time.
The computer program of event detection model training method and event detecting method that the embodiment of the present application is provided
Product, including the computer readable storage medium of program code is stored, the instruction that program code includes can be used for executing front
Method in embodiment of the method, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
If function is realized in the form of SFU software functional unit and when sold or used as an independent product, can store
In a computer read/write memory medium.Based on this understanding, the technical solution of the application is substantially in other words to existing
There is the part for the part or the technical solution that technology contributes that can be expressed in the form of software products, the computer
Software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal meter
Calculation machine, server or network equipment etc.) execute each embodiment method of the application all or part of step.And it is above-mentioned
Storage medium includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory
The various media that can store program code such as (RAM, Random Access Memory), magnetic disc or CD.
More than, the only specific implementation mode of the application, but the protection domain of the application is not limited thereto, and it is any to be familiar with
Those skilled in the art can easily think of the change or the replacement in the technical scope that the application discloses, and should all cover
Within the protection domain of the application.Therefore, the protection domain of the application should be subject to the protection scope in claims.
Claims (10)
1. a kind of event detection model training method, which is characterized in that including:
The training image frame in multiple training videos with label is obtained, and the training image frame is divided into multiple batches;
Each batch includes preset quantity training image frame;
The use of target nerve network is the training image frame extraction feature vector in all batches;
Network is handled using attention mechanism, and at least two-wheeled weight tax is carried out to the feature vector of training image frame in each batch
Value;
The feature vector of training image frame in each batch for carrying out weight assignment is input to object classifiers, obtains the instruction
Practice the classification results of video;
According to the comparison result between the label of the classification results of the training video and the training video, to the target
Neural network, attention mechanism processing network and the object classifiers are trained.
2. according to the method described in claim 1, it is characterized in that, the instruction obtained in multiple training videos with label
Practice picture frame, specifically includes:
Obtain multiple training videos with label;
According to preset sample frequency, the training video is sampled;
Using the image sampled to each training video as the training image frame in the training video.
3. according to the method described in claim 1, it is characterized in that, handling network to being instructed in each batch using attention mechanism
The feature vector for practicing picture frame carries out weight assignment, specifically includes:
Using feature vector as granularity, feature vector point of the network to training image frame in each batch is handled using attention mechanism
Weight assignment is not carried out, and, using batch as granularity, weight is carried out respectively to each batch using attention mechanism processing network
Assignment.
4. according to the method described in claim 3, it is characterized in that, described using feature vector as granularity, attention mechanism is used
Processing network carries out weight assignment respectively to the feature vector of training image frame in each batch, obtained i-th of batch
Weight assignment result a (i) meets formula (1):
(1) a (i)=tanh (W1F1+W2F2+…+WnFn+c);
Wherein, n indicates the quantity of training image frame in i-th of batch;W1To Wn1st to n-th is indicated in each batch respectively
Open the corresponding weight of training image frame;F1To FnIndicate that the 1st to n-th training image frame is right respectively in each batch
The feature vector answered;C is indicated using batch as granularity, and weight tax is carried out respectively to each batch using attention mechanism processing network
Bigoted item when value;Tanh indicates activation primitive;
It is described using batch as granularity, using attention mechanism processing network weight assignment is carried out respectively to each batch, obtain
The weight assignment result b (j) of j-th of batch meets formula (2):
(2) b (j)=M1a(1)+M2a(2)+…+Mma(m)+d;
M1To MmIt indicates from the corresponding weight of the 1st to m-th batch;D indicates, using batch as granularity, to use attention mechanism
Processing network carries out each batch bigoted item when weight assignment respectively;
It is described using batch as granularity, using attention mechanism processing network weight assignment is carried out respectively to each batch after, also
Including:The weight assignment result of each batch is normalized.
5. according to the method described in claim 1, it is characterized in that, described scheme training in each batch for carrying out weight assignment
As the feature vector of frame is input to grader, the classification results of the training video are obtained, are specifically included:
The corresponding feature vector for carrying out weight assignment of each batch is inputted into object classifiers respectively, each batch is obtained and corresponds to
Classification results;
According to being corresponding with classification results of the most classification results of batch size as the training video.
6. according to the method described in claim 5, it is characterized in that, described by the corresponding spy for carrying out weight assignment of each batch
Sign vector inputs object classifiers respectively, obtains the corresponding classification results of each batch, specifically includes:
The corresponding feature vector for carrying out weight assignment of each batch is inputted into the object classifiers respectively successively, is obtained each
The classification results of the training image frame of the feature vector characterization of weight assignment are carried out;
The most classification results of training image number of frames will be corresponding with as the classification results of the batch.
7. according to the method described in claim 1, it is characterized in that, further including:
The feature vector of the training image frame in each batch is spliced respectively, forms splicing feature vector;
It is described to handle feature vector progress at least two-wheeled power of the network to training image frame in each batch using attention mechanism
Reassignment specifically includes:
At least two-wheeled weight assignment is carried out using attention mechanism processing network splicing vector corresponding to each batch;
The feature vector by training image frame in each batch for carrying out weight assignment is input to grader, obtains the instruction
The classification results for practicing video, specifically include:
The corresponding splicing feature vector for having carried out weight assignment of each batch is input to object classifiers, the training is obtained and regards
The classification results of frequency.
8. the method according to the description of claim 7 is characterized in that described by the corresponding spelling for carrying out weight assignment of each batch
It connects feature vector and is input to object classifiers, obtain the classification results of the training video, specifically include:
The corresponding splicing feature vector for having carried out weight assignment of each batch is inputted into object classifiers respectively, obtains each batch
Corresponding classification results;
According to being corresponding with classification results of the most classification results of batch size as the training video.
9. according to the method described in claim 1, it is characterized in that, the classification results and institute according to the training video
The comparison result between the label of training video is stated, the target nerve network and the object classifiers are trained,
It specifically includes:
Following comparison operations are executed, until the classification results of the training video and the label of the training video are consistent;
The comparison operates:
The label of the classification results of the training video and the training video is compared;
If the label of the classification results of the training video and the training video is inconsistent, to the target nerve net
The parameter of network, attention mechanism processing network and the object classifiers is adjusted;
Based on the parameter after adjustment, new feature is extracted for the training image frame in all batches using target nerve network
Vector, and using attention mechanism processing network the new feature vector of training image frame in each batch is re-started it is also few
Two-wheeled weight assignment;
The new feature vector of training image frame in each batch for having carried out again weight assignment is input to grader, obtains institute
State the new classification results of training video;
And the comparison operation is executed again.
10. a kind of event detecting method, which is characterized in that including:
Obtain video to be detected;
The video input to be detected to the event detection model training method by claim 1-9 any one is obtained
In event detection model, the classification results of the video to be detected are obtained;
Wherein, the event detection model includes:The target nerve network, attention mechanism processing network and described
Object classifiers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810297702.9A CN108334910B (en) | 2018-03-30 | 2018-03-30 | Event detection model training method and event detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810297702.9A CN108334910B (en) | 2018-03-30 | 2018-03-30 | Event detection model training method and event detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108334910A true CN108334910A (en) | 2018-07-27 |
CN108334910B CN108334910B (en) | 2020-11-03 |
Family
ID=62933866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810297702.9A Active CN108334910B (en) | 2018-03-30 | 2018-03-30 | Event detection model training method and event detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108334910B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101913A (en) * | 2018-08-01 | 2018-12-28 | 北京飞搜科技有限公司 | Pedestrian recognition methods and device again |
CN110738103A (en) * | 2019-09-04 | 2020-01-31 | 北京奇艺世纪科技有限公司 | Living body detection method, living body detection device, computer equipment and storage medium |
CN110782021A (en) * | 2019-10-25 | 2020-02-11 | 浪潮电子信息产业股份有限公司 | Image classification method, device, equipment and computer readable storage medium |
CN110807437A (en) * | 2019-11-08 | 2020-02-18 | 腾讯科技(深圳)有限公司 | Video granularity characteristic determination method and device and computer-readable storage medium |
CN110969066A (en) * | 2018-09-30 | 2020-04-07 | 北京金山云网络技术有限公司 | Live video identification method and device and electronic equipment |
WO2020107616A1 (en) * | 2018-11-26 | 2020-06-04 | 深圳云天励飞技术有限公司 | Parallel computing method and apparatus |
CN111767985A (en) * | 2020-06-19 | 2020-10-13 | 深圳市商汤科技有限公司 | Neural network training method, video identification method and device |
CN111950332A (en) * | 2019-05-17 | 2020-11-17 | 杭州海康威视数字技术股份有限公司 | Video time sequence positioning method and device, computing equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066973A (en) * | 2017-04-17 | 2017-08-18 | 杭州电子科技大学 | A kind of video content description method of utilization spatio-temporal attention model |
CN107330362A (en) * | 2017-05-25 | 2017-11-07 | 北京大学 | A kind of video classification methods based on space-time notice |
CN107341462A (en) * | 2017-06-28 | 2017-11-10 | 电子科技大学 | A kind of video classification methods based on notice mechanism |
CN107463609A (en) * | 2017-06-27 | 2017-12-12 | 浙江大学 | It is a kind of to solve the method for video question and answer using Layered Space-Time notice codec network mechanism |
CN107784293A (en) * | 2017-11-13 | 2018-03-09 | 中国矿业大学(北京) | A kind of Human bodys' response method classified based on global characteristics and rarefaction representation |
CN107818306A (en) * | 2017-10-31 | 2018-03-20 | 天津大学 | A kind of video answering method based on attention model |
-
2018
- 2018-03-30 CN CN201810297702.9A patent/CN108334910B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066973A (en) * | 2017-04-17 | 2017-08-18 | 杭州电子科技大学 | A kind of video content description method of utilization spatio-temporal attention model |
CN107330362A (en) * | 2017-05-25 | 2017-11-07 | 北京大学 | A kind of video classification methods based on space-time notice |
CN107463609A (en) * | 2017-06-27 | 2017-12-12 | 浙江大学 | It is a kind of to solve the method for video question and answer using Layered Space-Time notice codec network mechanism |
CN107341462A (en) * | 2017-06-28 | 2017-11-10 | 电子科技大学 | A kind of video classification methods based on notice mechanism |
CN107818306A (en) * | 2017-10-31 | 2018-03-20 | 天津大学 | A kind of video answering method based on attention model |
CN107784293A (en) * | 2017-11-13 | 2018-03-09 | 中国矿业大学(北京) | A kind of Human bodys' response method classified based on global characteristics and rarefaction representation |
Non-Patent Citations (1)
Title |
---|
JEFF DONAHUE等: "Long-Term Recurrent Convolutional Networks for Visual Recognition and Description", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101913A (en) * | 2018-08-01 | 2018-12-28 | 北京飞搜科技有限公司 | Pedestrian recognition methods and device again |
CN110969066A (en) * | 2018-09-30 | 2020-04-07 | 北京金山云网络技术有限公司 | Live video identification method and device and electronic equipment |
CN110969066B (en) * | 2018-09-30 | 2023-10-10 | 北京金山云网络技术有限公司 | Live video identification method and device and electronic equipment |
WO2020107616A1 (en) * | 2018-11-26 | 2020-06-04 | 深圳云天励飞技术有限公司 | Parallel computing method and apparatus |
CN111950332B (en) * | 2019-05-17 | 2023-09-05 | 杭州海康威视数字技术股份有限公司 | Video time sequence positioning method, device, computing equipment and storage medium |
CN111950332A (en) * | 2019-05-17 | 2020-11-17 | 杭州海康威视数字技术股份有限公司 | Video time sequence positioning method and device, computing equipment and storage medium |
CN110738103A (en) * | 2019-09-04 | 2020-01-31 | 北京奇艺世纪科技有限公司 | Living body detection method, living body detection device, computer equipment and storage medium |
WO2021077744A1 (en) * | 2019-10-25 | 2021-04-29 | 浪潮电子信息产业股份有限公司 | Image classification method, apparatus and device, and computer readable storage medium |
CN110782021A (en) * | 2019-10-25 | 2020-02-11 | 浪潮电子信息产业股份有限公司 | Image classification method, device, equipment and computer readable storage medium |
CN110807437A (en) * | 2019-11-08 | 2020-02-18 | 腾讯科技(深圳)有限公司 | Video granularity characteristic determination method and device and computer-readable storage medium |
CN111428771B (en) * | 2019-11-08 | 2023-04-18 | 腾讯科技(深圳)有限公司 | Video scene classification method and device and computer-readable storage medium |
CN111428771A (en) * | 2019-11-08 | 2020-07-17 | 腾讯科技(深圳)有限公司 | Video scene classification method and device and computer-readable storage medium |
CN111767985B (en) * | 2020-06-19 | 2022-07-22 | 深圳市商汤科技有限公司 | Neural network training method, video identification method and device |
CN111767985A (en) * | 2020-06-19 | 2020-10-13 | 深圳市商汤科技有限公司 | Neural network training method, video identification method and device |
Also Published As
Publication number | Publication date |
---|---|
CN108334910B (en) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108334910A (en) | A kind of event detection model training method and event detecting method | |
CN108491817A (en) | A kind of event detection model training method, device and event detecting method | |
CN107909101B (en) | Semi-supervised transfer learning character identifying method and system based on convolutional neural networks | |
US11514694B2 (en) | Teaching GAN (generative adversarial networks) to generate per-pixel annotation | |
CN108399386A (en) | Information extracting method in pie chart and device | |
WO2022057658A1 (en) | Method and apparatus for training recommendation model, and computer device and storage medium | |
CN111741330B (en) | Video content evaluation method and device, storage medium and computer equipment | |
CN111160335A (en) | Image watermarking processing method and device based on artificial intelligence and electronic equipment | |
CN107918656A (en) | Video front cover extracting method and device based on video title | |
CN110990631A (en) | Video screening method and device, electronic equipment and storage medium | |
WO2020077858A1 (en) | Video description generation method based on neural network, and medium, terminal and apparatus | |
CN109803180A (en) | Video preview drawing generating method, device, computer equipment and storage medium | |
CN109919252B (en) | Method for generating classifier by using few labeled images | |
CN113780486B (en) | Visual question answering method, device and medium | |
CN113239914B (en) | Classroom student expression recognition and classroom state evaluation method and device | |
CN109902716A (en) | A kind of training method and image classification method being aligned disaggregated model | |
CN116229056A (en) | Semantic segmentation method, device and equipment based on double-branch feature fusion | |
CN113761359B (en) | Data packet recommendation method, device, electronic equipment and storage medium | |
CN108717547A (en) | The method and device of sample data generation method and device, training pattern | |
CN114067119A (en) | Training method of panorama segmentation model, panorama segmentation method and device | |
CN109583367A (en) | Image text row detection method and device, storage medium and electronic equipment | |
CN112070040A (en) | Text line detection method for video subtitles | |
CN113569852A (en) | Training method and device of semantic segmentation model, electronic equipment and storage medium | |
CN110458600A (en) | Portrait model training method, device, computer equipment and storage medium | |
CN108549857A (en) | Event detection model training method, device and event detecting method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 101-8, 1st floor, building 31, area 1, 188 South Fourth Ring Road West, Fengtai District, Beijing Applicant after: Guoxin Youyi Data Co., Ltd Address before: 100070, No. 188, building 31, headquarters square, South Fourth Ring Road West, Fengtai District, Beijing Applicant before: SIC YOUE DATA Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |