CN106599907A - Multi-feature fusion-based dynamic scene classification method and apparatus - Google Patents
Multi-feature fusion-based dynamic scene classification method and apparatus Download PDFInfo
- Publication number
- CN106599907A CN106599907A CN201611073666.5A CN201611073666A CN106599907A CN 106599907 A CN106599907 A CN 106599907A CN 201611073666 A CN201611073666 A CN 201611073666A CN 106599907 A CN106599907 A CN 106599907A
- Authority
- CN
- China
- Prior art keywords
- feature
- video
- feature information
- sorted
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a multi-feature fusion-based dynamic scene classification method and apparatus. The method comprises the following steps: videos to be classified are obtained; a C3D feature extractor is used for performing feature extraction on the videos to be classified; first feature information is obtained; an iDT feature extractor is used for performing feature extraction on the videos to be classified, and second feature information is obtained; a VGG feature extractor is used for performing feature extraction on the videos to be classified, and third feature information is obtained; the first feature information, the second feature information and the third feature information are fused, and fusion features can be obtained; the videos to be classified are classified based on the fusion features, and a classification result of the videos to be classified is obtained. According to the feature fusion-based dynamic scene classification method provided in the invention, three kinds of feature extractors are used for extracting different features of the videos to be classified, short time dynamic features of the videos to be classified are taken into consideration, long time dynamic features and static features of the videos to be classified are also taken into consideration, and accurate dynamic scene classification can be enabled.
Description
Technical field
The present invention relates to aviation surveillance technology, more particularly to the dynamic scene sorting technique and device of multiple features fusion.
Background technology
With the development and the continuous opening that uses low latitude field of country of unmanned air vehicle technique, unmanned plane is widely used
In the tasks such as disaster inspection, mountain area rescue, goods and materials conveying, sample collection.Unmanned plane with camera head is in flight course
In shot, and picture is returned to into server, server can carry out target detection tracking automatically according to image content, can be with
Realize automatic identification weather, environment, the condition of a disaster etc..
To improve the accuracy of target detection tracking, those skilled in the art remove and have carried out numerous studies and improvement to algorithm
Outward, it is also contemplated that the difference of the dynamic scene that target is located, the accuracy of target detection tracking can also be badly influenced.Therefore,
Those skilled in the art's proposition carried out first dynamic scene classification before target detection tracking is carried out.But, existing dynamic field
Scape sorting technique is generally based only upon still image and is classified, and causes nicety of grading poor.
The content of the invention
The present invention provides a kind of dynamic scene sorting technique and device of multiple features fusion, for solving existing dynamic scene
Sorting technique is generally based only upon still image and is classified, and causes the problem that nicety of grading is poor.
On the one hand, the present invention provides a kind of dynamic scene sorting technique of multiple features fusion, including:
Obtain video to be sorted;
Feature extraction is carried out to the video to be sorted using Three dimensional convolution Neural Network Feature Extractor, first is obtained special
Reference ceases;Feature extraction is carried out to the video to be sorted using improved dense track characteristic extractor, second feature is obtained
Information;Feature extraction is carried out to the video to be sorted using visual geometric Neural Network Feature Extractor, third feature is obtained
Information;
The fisrt feature information, the second feature information and the third feature information are merged, acquisition is melted
Close feature;
The video to be sorted is classified according to the fusion feature, obtains the classification knot of the video to be sorted
Really.
The dynamic scene sorting technique of multiple features fusion as above, it is described to the fisrt feature information, described the
Two characteristic informations and the third feature information are merged, and obtain fusion feature, including:
The first corresponding fisrt feature data of default dimension in the fisrt feature information are obtained, the second feature is obtained
The second corresponding second feature data of default dimension in information, obtain the 3rd default dimension in the third feature information corresponding
Third feature data;
Merge according to the fisrt feature data, the second feature data, the third feature data acquisition special
Levy.
The dynamic scene sorting technique of multiple features fusion as above, it is described to the fisrt feature information, described the
Two characteristic informations and the third feature information are merged, and before obtaining fusion feature, are also included:
Obtain the respective fisrt feature information of all training videos in training video storehouse, second feature information and third feature
Information;
According to the respective fisrt feature information of all training videos, second feature information and third feature information, obtain
The Fei Sheer for taking all dimensions of fisrt feature information, second feature information and third feature information differentiates ratio;
The Fei Sheer of all dimensions in the fisrt feature information is differentiated than determining the fisrt feature information
First default dimension, the Fei Sheer of all dimensions in the second feature information is differentiated than determining the second feature letter
The default dimension of the second of breath, the Fei Sheer of all dimensions in the third feature information is differentiated than determining the third feature letter
The default dimension of the 3rd of breath;
Wherein, the training video storehouse includes at least two training videos for belonging to a different category.
The dynamic scene sorting technique of multiple features fusion as above, i-th dimension in arbitrary characteristic information
Fei Sheer differentiate ratio acquisition formula it is as follows:
K=Sb/Si;
The classification sum of category, xijFor the characteristic matrix of i-th dimension of all training videos of j-th classification, mijFor j-th
The average value matrix of the characteristic matrix of i-th dimension of all training videos of classification, mihFor all instructions of h-th classification
Practice the average value matrix of the characteristic matrix of i-th dimension of video.The span of i is 1 to I positive integer, and I is described
The dimension sum of the characteristic information belonging to i-th dimension, the span of j is 1 to J positive integer, and it is to remove that the span of h is
Positive integer of outside j 1 to J.The dynamic scene sorting technique of multiple features fusion as above, the employing Three dimensional convolution nerve
Network characterization extractor carries out feature extraction to the video to be sorted, obtains fisrt feature information, including:
The video to be sorted is divided, at least one video segment comprising N two field pictures is obtained;
Feature extraction is carried out to all video segments using Three dimensional convolution Neural Network Feature Extractor, obtains described
Fisrt feature information;
Wherein, N is the positive integer more than 1.
The dynamic scene sorting technique of multiple features fusion as above, it is described to be extracted using improved dense track characteristic
Device carries out feature extraction to the video to be sorted, obtains second feature information, including:
Obtain the dense track characteristic and homography matrix of the video to be sorted;
The dense track characteristic is corrected using the homography matrix, obtains the second feature information.
The dynamic scene sorting technique of multiple features fusion as above, the employing visual geometric neural network characteristics are carried
Taking device carries out feature extraction to the video to be sorted, obtains third feature information, including:
An at least frame key frame is extracted in the video to be sorted, using VGG feature extractors to an at least frame
Key frame carries out feature extraction, obtains the third feature information.
The dynamic scene sorting technique of multiple features fusion as above, it is described to be treated point to described according to the fusion feature
Class video is classified, and obtains the classification results of the video to be sorted, including:
According to the fusion feature, the video to be sorted is classified using support vector machine classifier, obtain institute
State the classification results of video to be sorted.
The dynamic scene sorter of multiple features fusion provided in an embodiment of the present invention is described below, the apparatus and method one
One correspondence, to realize above-described embodiment in Feature Fusion dynamic scene sorting technique, with identical technical characteristic and
Technique effect, the embodiment of the present invention is repeated no more to this.
On the other hand, the present invention provides a kind of dynamic scene sorter of multiple features fusion, including:
Video acquiring module to be sorted, for obtaining video to be sorted;
Characteristic extracting module, for carrying out spy to the video to be sorted using Three dimensional convolution Neural Network Feature Extractor
Extraction is levied, fisrt feature information is obtained;Feature is carried out to the video to be sorted using improved dense track characteristic extractor
Extract, obtain second feature information;Feature is carried out to the video to be sorted using visual geometric Neural Network Feature Extractor
Extract, obtain third feature information;
Fusion Module, for entering to the fisrt feature information, the second feature information and the third feature information
Row fusion, obtains fusion feature;
Sort module, for classifying to the video to be sorted according to the fusion feature, obtains described to be sorted
The classification results of video.
The dynamic scene sorting technique and device of the Feature Fusion that the present invention is provided, it is special using C3D feature extractors, iDT
The different characteristic that three kinds of feature extractors of extractor and VGG feature extractors extract video to be sorted is levied, different characteristic is merged
Laggard Mobile state scene classification.Not only allow for the behavioral characteristics in short-term of video to be sorted, it is also contemplated that the length of video to be sorted
When behavioral characteristics, more merged the static information of video to be sorted so that dynamic scene classification it is more accurate.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are these
Some bright embodiments, for those of ordinary skill in the art, without having to pay creative labor, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
The schematic flow sheet of the dynamic scene sorting technique embodiment one of the multiple features fusion that Fig. 1 is provided for the present invention;
The schematic flow sheet of the dynamic scene sorting technique embodiment two of the multiple features fusion that Fig. 2 is provided for the present invention;
The structural representation of the dynamic scene sorter embodiment one of the multiple features fusion that Fig. 3 is provided for the present invention.
Specific embodiment
To make purpose, technical scheme and the advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
The a part of embodiment of the present invention, rather than the embodiment of whole.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Before target detection tracking is carried out, determine dynamic scene type be favorably improved target detection tracking speed with
Precision.The difficult point of dynamic scene classification problem is, due to factors such as illumination, view transformations, to cause in same class dynamic scene class
Difference it is larger, and difference is less between the class of inhomogeneity dynamic scene.For example, equally it is the scene of forest fire, Ke Nengyou
Different in the degree of fire, the visual angle of shooting is different, and the concentration of smog is different, shoots the dynamic scene for coming and there is very big difference
It is different;Different dynamic scenes can be constituted by the different combination of same object, the gap resulted between class diminishes, such as waterfall
Difference is less between the class of cloth scene and river scene.The existing dynamic scene sorting technique based on object identification in scene is being entered
During Mobile state scene classification, it is impossible to solve that difference in above-mentioned class is big, the little technical problem of difference between class, cause classification speed it is slow,
Precision is low.
To solve the above problems, the embodiment of the present invention provides a kind of dynamic scene sorting technique of multiple features fusion.Fig. 1 is
The schematic flow sheet of the dynamic scene sorting technique embodiment one of the multiple features fusion that the present invention is provided.The executive agent of the method
For the dynamic scene sorter of multiple features fusion, the device can be by software or hardware realization, and exemplary, the device can
Think server, apparatus such as computer.As shown in figure 1, the method includes:
S101, acquisition video to be sorted;
S102, feature extraction is carried out to video to be sorted using Three dimensional convolution Neural Network Feature Extractor, obtain first
Characteristic information;
S103, feature extraction is carried out to video to be sorted using improved dense track characteristic extractor, obtain second special
Reference ceases;
S104, feature extraction is carried out to video to be sorted using visual geometric Neural Network Feature Extractor, obtain the 3rd
Characteristic information;
S105, fisrt feature information, second feature information and third feature information are merged, obtain fusion feature;
S106, video to be sorted is classified according to fusion feature, obtain the classification results of video to be sorted.
Wherein, S102, S103 and S104 can be performed simultaneously, also can successively be performed, and the present invention is not limited this.
Specifically, exemplary in S101, video to be sorted can shoot what is obtained when being patrolled and examined for unmanned plane
Video, can be using the video of real-time Transmission meeting server as video to be sorted.
Specifically, in S102, extracted using Three dimensional convolution (Convolution 3D, abbreviation C3D) neural network characteristics
Device extracts the fisrt feature information of video to be sorted.C3D feature extractors are convolutional neural networks (Convolutional
Neural Network, abbreviation CNN) framework, internal convolution kernel is 3 × 3 × 3 three dimensional convolution kernel.This feature extractor will be treated
Classification video is divided into multiple segments and is processed, while the information in make use of whole frames of video to be sorted, therefore it is available
In the multidate information in short-term and some static informations that extract input video.
Before specifically used C3D feature extractors, C3D feature extractors are entered initially with the video in training video storehouse
Row training, generally comprises substantial amounts of classified sport video in short-term, comprising abundant movable information in training video storehouse.Example
Property, the video that the training stage uses can come from sport-1M data bases, and the data base is regarded by 1,000,000 motions in short-term
Frequency is constituted, the sport video in short-term such as example play basketball, play soccer.Therefore, it is special using the first of the acquisition of C3D network characterizations extractor
Reference breath can be used for characterizing the static information hidden in video to be sorted and in short-term multidate information.
Optionally, the employing C3D feature extractors in S102 carry out feature extraction to video to be sorted, obtain fisrt feature
Information, specifically includes:
S1021, video to be sorted is divided, obtained at least one video segment comprising N two field pictures;
S1022, using C3D feature extractors all video segments are carried out with feature extraction, obtain fisrt feature information;
Wherein, N is the positive integer more than 1.
Exemplary, it is contemplated that C3D networks are the feature extractors of a process video-frequency band, can be carried out video to be sorted
Divide, obtain at least one video segment comprising N two field pictures, wherein N is the positive integer more than 1, exemplary, and N can be
16.Optionally, totalframes Ts of the N again smaller than video to be sorted.
When N is 16, exemplary, the basic configuration of C3D feature extractors can be:Comprising five convolutional layers, five
Pond layer, each volume basic unit one pond layer of heel, two full articulamentums and a classification layer are used for predicting classification results.Five
The neuron number of convolutional layer is respectively 64,128,256,256,256.Meanwhile, all of convolutional layer has the convolution of formed objects
Core, is 3 × 3 × 3.Using maximum pond, the size of the core used by it is 2 × 2 × 2 to all of pond layer.Each full articulamentum
Neuron number be 4096.When feature is extracted using C3D convolutional neural networks, made using the feature of second full articulamentum
For result output.
Specifically, in S103, using improved dense track (Improved Dense Trajectory, abbreviation iDT)
Feature extractor carries out feature extraction to video to be sorted, obtains second feature information.IDT feature extractors are used to extract to treat point
Trace information in class video.
Optionally, the improved dense track characteristic extractor of employing in S103 carries out feature extraction to video to be sorted,
Second feature information is obtained, including:
S1031, the dense track characteristic and homography matrix that obtain video to be sorted;
S1032, dense track characteristic is corrected using homography matrix, obtain second feature information.
Specifically, the existing dense track acquisition algorithm based on optical flow field can be adopted to obtain the dense rail of video to be sorted
Mark feature.Obtaining after dense track is got, process is being filtered to all dense tracks, removing static in all tracks
Motionless trace information and the trace information for having position to be mutated.
Further, after dense track is obtained, it is contemplated that photographic head is probably to carry flight by unmanned plane to be clapped
Take the photograph, thus there is movement in itself in photographic head, the trace information that the movement of photographic head is caused merges in dense track, may shadow
The classification of dynamic scene is rung, therefore the track of the mobile generation of photographic head need to be filtered.Produce to eliminate this photographic head movement
Trace information, can model generation homography matrix and carries out track elimination using homography matrix.
To obtain homography matrix, first registration is carried out to the successive frame in video to be sorted.Exemplary adopting adds
The method that fast robust features (Speed-up Robust Features, abbreviation SURF) and optical flow method are combined carries out registration.Then
Reuse stochastical sampling concordance (Random Sample Consensus, abbreviation RANSAC) algorithm and obtain homography matrix.
After acquiring homography matrix, dense track can be corrected using the homography matrix, be removed due to photographic head
The wrong trace information that movement is caused, so as to obtain second feature information.Because iDT feature extractors are extracted video to be sorted
From starting to all trace informations for terminating, thus second feature information can be used for characterizing video to be sorted it is long when track it is special
Levy, i.e., multidate information when long.
Specifically, in S104, the visual geometric proposed using the visual geometric group of engineering science institute of Oxford University
(Visual Geometry Group, abbreviation VGG) Neural Network Feature Extractor, according to the part two field picture of video to be sorted
Extract the static information of video to be sorted.Wherein, VGG feature extractors are also CNN frameworks, in specifically used VGG feature extractions
Before device, VGG feature extractors are trained initially with training picture library.It is different from C3D feature extractors, VGG feature extractions
Comprising a large amount of classified static scene images in the training picture library that device is used.Exemplary, the picture that the training stage uses comes
From Places-365 data bases, and places365 data bases are made up of the static scene picture of 365 classes, and each class is one
Specific scene.Therefore, this feature extractor can well extract the static information for describing scene in video to be sorted partially, with
Make up the problem of scene static information disappearance when C3D feature extractors extract feature.
Optionally, the employing VGG feature extractors in S104 carry out feature extraction to video to be sorted, obtain the 3rd special
Reference ceases, and specifically includes:
An at least frame key frame is extracted in video to be sorted, an at least frame key frame is entered using VGG feature extractors
Row feature extraction, obtains third feature information.
Specifically, key-frame extraction is carried out to video to be sorted first, one section of video is often made up of hundreds of frame picture, especially
It is when unmanned plane during flying speed is slower, and the content difference in consecutive frame is little, if all carrying out feature extraction to each frame,
Extraction rate will be caused relatively slow and consumed compared with multiple resource, in order to more efficiently extract the static information hidden in video, can
Choosing key frame at initial, the middle, ending three in video to be sorted carries out the representative of the static information as video to be sorted.
VGG network characterizations extractor include 16 convolutional layers, 16 pond layers, each convolutional layer is followed by one
Pond layer, three full articulamentums, a classification layer is used for output category result.Wherein the size of the convolution kernel of convolutional layer be 3 ×
3.When feature is extracted using VGG convolutional neural networks, exported as a result using the feature of second full articulamentum.
Specifically, in S105, the fisrt feature information acquired using different characteristic extractor, second feature are believed
Breath and third feature information are blended, and obtain fusion feature, fusion feature can be used to characterizing video to be sorted it is long when and in short-term
Multidate information, and the different static information that different characteristic extractor is extracted.
Specifically, in S106, according to the fusion feature acquired in S105, using traditional support vector machine
(Support Vector Machine, abbreviation SVM) linear classifier is classified, you can obtain the classification letter of video to be sorted
Breath.
Exemplary, SVM classifier parameter C is set to 100, and core adopts linear kernel, before using SVM classifier, need to make
Sorter model is trained with training set data, and sorter model is tested using test set data.
The dynamic scene sorting technique of the Feature Fusion that the present invention is provided, using C3D feature extractors, iDT feature extractions
Three kinds of feature extractors of device and VGG feature extractors extract the different characteristic of video to be sorted, carry out after different characteristic is merged
Dynamic scene is classified.Not only allow for the multidate information in short-term of video to be sorted, it is also contemplated that video to be sorted it is long when dynamic
Information, has more merged the static information of video to be sorted so that dynamic scene classification results are more accurate.
Optionally, on the basis of above-described embodiment, in S105 to fisrt feature information, second feature information and
Three characteristic informations are merged, and obtain fusion feature, are specifically included:
The first corresponding fisrt feature data of default dimension, obtain second feature letter in S1051, acquisition fisrt feature information
The second default corresponding second feature data of dimension in breath, obtain in third feature information the 3rd default dimension corresponding 3rd special
Levy data;
S1052, according to fisrt feature data, second feature data, third feature data acquisition fusion feature.
Specifically, each characteristic information is size identical two-dimensional matrix, for example, when fisrt feature information is that 1*4096 is big
During little matrix, wherein 1 is row, 4096 are row, then it is believed that fisrt feature packet contains 4096 dimensions, the first dimension correspondence
Characteristic the first column data for being designated as in fisrt feature information.The data of different dimensions are regarded for be sorted in characteristic information
The classification of frequency affects different, when Feature Fusion is carried out, selects to affect the larger corresponding data of dimension to merge classification,
The accuracy of dynamic scene classification can be improved.Exemplary, the corresponding default dimension of different characteristic data is different, presets dimension
Quantity can also be different.
Further, on the basis of any of the above-described embodiment, in conjunction with specific embodiments, the determination feature in S1051 is melted
The default dimension closed is described in detail.The dynamic scene sorting technique embodiment of the multiple features fusion that Fig. 2 is provided for the present invention
Two schematic flow sheet.As shown in Fig. 2 before Feature Fusion is carried out, also including:
S201, obtain the respective fisrt feature information of all training videos in training video storehouse, second feature information and the
Three characteristic informations;
S202, according to the respective fisrt feature information of all training videos, second feature information and third feature information, obtain
The Fei Sheer for taking all dimensions of fisrt feature information, second feature information and third feature information differentiates ratio;
S203, the Fei Sheer of all dimensions in fisrt feature information differentiate than determine fisrt feature information first
Default dimension, the Fei Sheer of all dimensions in second feature information differentiates more default than determine second feature information second
Dimension, the Fei Sheer of all dimensions in third feature information differentiates dimension more default than determine third feature information the 3rd;
Wherein, training video storehouse includes at least two training videos for belonging to a different category.
Specifically, after the Fei Sheer of all dimensions in getting a characteristic information differentiates ratio, can be exemplary, will
All dimensions differentiate the big minispread of ratio according to Fei Sheer, choose Fei Sheer and differentiate than the dimension higher than preset value as default dimension
Degree.It is exemplary, can also choose Fei Sheer and differentiate than several dimensions before high as default dimension, exemplary, different characteristic
The quantity of the default dimension of information can be with difference.
Larger default dimension is affected on visual classification in obtain each characteristic information, can be carried out point using training video
Analysis, determines that Fei Sheer differentiates higher dimension.
Specifically, on the basis of with reference to any of the above-described embodiment, using specific embodiment to any feature information in appoint
The Fei Sheer of dimension differentiates the acquisition of ratio, is described in detail.The Fei Sheer of the i-th dimension degree in any feature information differentiates ratio
Acquisition formula it is as follows:
K=Sb/Si;
Wherein, SiFor the within-cluster variance of i-th dimension,SbFor i-th dimension
Inter _ class relationship,J is that the classification belonging to all training videos is total, xijFor
The characteristic matrix of i-th dimension of all training videos of j-th classification, mijFor all training videos of j-th classification
I-th dimension characteristic matrix average value matrix, mihFor i-th dimension of all training videos of h-th classification
Characteristic matrix average value matrix.The span of i is 1 to I positive integer, and I is the spy belonging to i-th dimension
The dimension sum of reference breath, the span of j is 1 to J positive integer, the span of h be in addition to j 1 to J positive integer
Wherein, SiLess expression dimension is more similar in same class video, SbIt is bigger represent between the dimension and class other
The similarity of video is lower, therefore the value of k is the bigger the better, and is more conducive to helping visual classification.When the value of k is bigger, show this
I-th dimension in characteristic information is bigger to the impact that dynamic scene is classified.After all dimensions corresponding k value is got, can
It is combined from the dimension with larger k value in each characteristic information, obtains fusion feature.
It is exemplary, if there are 9 training videos for belonging to 3 classifications in training video storehouse, there are 3 training to regard per classification
Frequently.For fisrt feature information C3D feature, the method for Fei Sheer differentiation ratios of the 1st dimension of this characteristic information is calculated such as
Under:
First each video is divided into into 5 sections of short-sighted frequencies containing only 16 frame pictures, using C3D feature extractors to each short-sighted frequency
Feature is extracted, the feature of fc7 layers is taken, the characteristic matrix of 5 1*4096 dimensions is obtained, taking can be with a 1*4096 after average
The characteristic matrix of dimension represents a video.So, 9 training videos can use 9* Jing after C3D feature extractors extract feature
4096 characteristic matrix is represented.
For the feature of this 4096 dimensions, when the Fei Sheer that calculate the 1st dimension differentiates to be compared.First, 9* is obtained
First column matrix of 4096 characteristic matrixes, obtains the matrix of a 9*1.The matrix of 3*1 above is first classification
The feature that video extraction goes out, the matrix of middle 3*1 is the video extraction feature out of second classification, the square of 3*1 below
Battle array for the 3rd classification video extraction feature out.Example, the matrix of this 9*1 is [1,2,1,2,3,2,3,1,3
]T。
Wherein, the average value matrix of [1,2,1] is [1.3,1.3,1.3];[2,3,2] average value matrix for [2.3,
2.3,2.3];[3,1,3] average value matrix is [2.3,2.3,2.3].
Then, within-cluster variance is calculated:
And calculating inter _ class relationship:
Finally, according to SiAnd SbK is compared in the Fei Sheer differentiations for obtaining the 1st dimension in fisrt feature information;
K=Sb/Si=20.02/4.01=4.99.
On the other hand the embodiment of the present invention provides a kind of dynamic scene sorter of multiple features fusion, to perform as above
Described embodiment of the method, with identical technical characteristic and technique effect, the present invention is repeated no more to this.
The structural representation of the dynamic scene sorter embodiment one of the multiple features fusion that Fig. 3 is provided for the present invention, such as
Shown in Fig. 3, the device includes:
Video acquiring module to be sorted 301, for obtaining video to be sorted;
Characteristic extracting module 302, for carrying out spy to video to be sorted using Three dimensional convolution Neural Network Feature Extractor
Extraction is levied, fisrt feature information is obtained;Feature extraction is carried out to video to be sorted using improved dense track characteristic extractor,
Obtain second feature information;Feature extraction is carried out to video to be sorted using visual geometric Neural Network Feature Extractor, is obtained
Third feature information;
Fusion Module 303, for merging to fisrt feature information, second feature information and third feature information, obtains
Take fusion feature;
Sort module 304, for classifying to video to be sorted according to fusion feature, obtains the classification of video to be sorted
As a result.
Further, Fusion Module 303 is specifically for obtaining the first default dimension corresponding first in fisrt feature information
Characteristic, obtains the second corresponding second feature data of default dimension in second feature information, in obtaining third feature information
The 3rd corresponding third feature data of default dimension;
According to fisrt feature data, second feature data, third feature data acquisition fusion feature.
Further, the device also includes:Default dimension acquisition module, preset dimension acquisition module specifically for:
Obtain the respective fisrt feature information of all training videos in training video storehouse, second feature information and third feature
Information;
According to the respective fisrt feature information of all training videos, second feature information and third feature information, the is obtained
The Fei Sheer of all dimensions of one characteristic information, second feature information and third feature information differentiates ratio;
The Fei Sheer of all dimensions in fisrt feature information differentiates more default than determine fisrt feature information first
Dimension, the Fei Sheer of all dimensions in second feature information differentiates dimension more default than determine second feature information second
Degree, the Fei Sheer of all dimensions in third feature information differentiates dimension more default than determine third feature information the 3rd;
Wherein, training video storehouse includes at least two training videos for belonging to a different category.
Further, the Fei Sheer of i-th dimension in any feature information differentiates that the acquisition formula of ratio is as follows:
K=Sb/Si;
Wherein, SiFor the within-cluster variance of i-th dimension,SbFor i-th dimension
Inter _ class relationship,J is that the classification belonging to all training videos is total, xijFor
The characteristic matrix of i-th dimension of all training videos of j-th classification, mijFor all training videos of j-th classification
I-th dimension characteristic matrix average value matrix, mihFor i-th dimension of all training videos of h-th classification
Characteristic matrix average value matrix.The span of i is 1 to I positive integer, and I is the spy belonging to i-th dimension
The dimension sum of reference breath, the span of j is 1 to J positive integer, the span of h be in addition to j 1 to J positive integer
Further, characteristic extracting module 302 specifically for:
Video to be sorted is divided, at least one video segment comprising N two field pictures is obtained;
Feature extraction is carried out to all video segments using Three dimensional convolution Neural Network Feature Extractor, fisrt feature is obtained
Information;
Wherein, N is the positive integer more than 1.
Further, characteristic extracting module 302 specifically for:
Obtain the dense track characteristic and homography matrix of video to be sorted;
Dense track characteristic is corrected using homography matrix, obtains second feature information.
Further, characteristic extracting module 302 specifically for:
An at least frame key frame is extracted in video to be sorted, using visual geometric Neural Network Feature Extractor at least
One frame key frame carries out feature extraction, obtains third feature information.
Further, sort module 304 is specifically for according to fusion feature, using support vector machine classifier to be sorted
Video is classified, and obtains the classification results of video to be sorted.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above-mentioned each method embodiment can lead to
Cross the related hardware of programmed instruction to complete.Aforesaid program can be stored in a computer read/write memory medium.The journey
Sequence upon execution, performs the step of including above-mentioned each method embodiment;And aforesaid storage medium includes:ROM, RAM, magnetic disc or
Person's CD etc. is various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above only to illustrate technical scheme, rather than a limitation;To the greatest extent
Pipe has been described in detail with reference to foregoing embodiments to the present invention, it will be understood by those within the art that:Its according to
So the technical scheme described in foregoing embodiments can be modified, either which part or all technical characteristic are entered
Row equivalent;And these modifications or replacement, do not make the essence disengaging various embodiments of the present invention technology of appropriate technical solution
The scope of scheme.
Claims (10)
1. the dynamic scene sorting technique of a kind of multiple features fusion, it is characterised in that include:
Obtain video to be sorted;
Feature extraction is carried out to the video to be sorted using Three dimensional convolution Neural Network Feature Extractor, fisrt feature letter is obtained
Breath;Feature extraction is carried out to the video to be sorted using improved dense track characteristic extractor, second feature information is obtained;
Feature extraction is carried out to the video to be sorted using visual geometric Neural Network Feature Extractor, third feature information is obtained;
The fisrt feature information, the second feature information and the third feature information are merged, fusion is obtained special
Levy;
The video to be sorted is classified according to the fusion feature, obtains the classification results of the video to be sorted.
2. method according to claim 1, it is characterised in that described to the fisrt feature information, the second feature
Information and the third feature information are merged, and obtain fusion feature, including:
The first corresponding fisrt feature data of default dimension in the fisrt feature information are obtained, the second feature information is obtained
In the second default corresponding second feature data of dimension, obtain the 3rd default dimension the corresponding 3rd in the third feature information
Characteristic;
The fusion feature according to the fisrt feature data, the second feature data, the third feature data acquisition.
3. method according to claim 2, it is characterised in that described to the fisrt feature information, the second feature
Information and the third feature information are merged, and before obtaining fusion feature, are also included:
Obtain the respective fisrt feature information of all training videos in training video storehouse, second feature information and third feature letter
Breath;
According to the respective fisrt feature information of all training videos, second feature information and third feature information, the is obtained
The Fei Sheer of all dimensions of one characteristic information, second feature information and third feature information differentiates ratio;
The Fei Sheer of all dimensions in the fisrt feature information differentiates the first of fisrt feature information more described than determination
Default dimension, the Fei Sheer of all dimensions in the second feature information is differentiated than determining the second feature information
Second default dimension, the Fei Sheer of all dimensions in the third feature information is differentiated than determining the third feature information
3rd default dimension;
Wherein, the training video storehouse includes at least two training videos for belonging to a different category.
4. method according to claim 3, it is characterised in that the Fei Sheer of i-th dimension in any feature information sentences
Not than acquisition formula it is as follows:
K=Sb/Si;
Wherein, SiFor the within-cluster variance of i-th dimension,SbBetween the class for i-th dimension
Dispersion,J is that the classification belonging to all training videos is total, xijFor jth
The characteristic matrix of i-th dimension of all training videos of individual classification, mijFor the of all training videos of j-th classification
The average value matrix of the characteristic matrix of i dimension, mihFor the spy of i-th dimension of all training videos of h-th classification
Levy the average value matrix of data matrix.The span of i is 1 to I positive integer, and I is the feature letter belonging to i-th dimension
The dimension sum of breath, the span of j is 1 to J positive integer, the span of h be in addition to j 1 to J positive integer.
5. method according to claim 1, it is characterised in that the employing Three dimensional convolution Neural Network Feature Extractor pair
The video to be sorted carries out feature extraction, obtains fisrt feature information, including:
The video to be sorted is divided, at least one video segment comprising N two field pictures is obtained;
Feature extraction is carried out to all video segments using Three dimensional convolution Neural Network Feature Extractor, described first is obtained
Characteristic information;
Wherein, N is the positive integer more than 1.
6. method according to claim 1, it is characterised in that it is described using improved dense track characteristic extractor to institute
Stating video to be sorted carries out feature extraction, obtains second feature information, including:
Obtain the dense track characteristic and homography matrix of the video to be sorted;
The dense track characteristic is corrected using the homography matrix, obtains the second feature information.
7. method according to claim 1, it is characterised in that the employing visual geometric Neural Network Feature Extractor pair
The video to be sorted carries out feature extraction, obtains third feature information, including:
An at least frame key frame is extracted in the video to be sorted, using visual geometric Neural Network Feature Extractor to described
An at least frame key frame carries out feature extraction, obtains the third feature information.
8. the method according to any one of claim 1 to 7, it is characterised in that it is described according to the fusion feature to described
Video to be sorted is classified, and obtains the classification results of the video to be sorted, including:
According to the fusion feature, the video to be sorted is classified using support vector machine classifier, obtain described treating
The classification results of classification video.
9. the dynamic scene sorter of a kind of multiple features fusion, it is characterised in that include:
Video acquiring module to be sorted, for obtaining video to be sorted;
Characteristic extracting module, carries for carrying out feature to the video to be sorted using Three dimensional convolution Neural Network Feature Extractor
Take, obtain fisrt feature information;Feature extraction is carried out to the video to be sorted using improved dense track characteristic extractor,
Obtain second feature information;Feature extraction is carried out to the video to be sorted using visual geometric Neural Network Feature Extractor,
Obtain third feature information;
Fusion Module, for melting to the fisrt feature information, the second feature information and the third feature information
Close, obtain fusion feature;
Sort module, for classifying to the video to be sorted according to the fusion feature, obtains the video to be sorted
Classification results.
10. device according to claim 9, it is characterised in that the Fusion Module specifically for:
The first corresponding fisrt feature data of default dimension in the fisrt feature information are obtained, the second feature information is obtained
In the second default corresponding second feature data of dimension, obtain the 3rd default dimension the corresponding 3rd in the third feature information
Characteristic;
The fusion feature according to the fisrt feature data, the second feature data, the third feature data acquisition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611073666.5A CN106599907B (en) | 2016-11-29 | 2016-11-29 | The dynamic scene classification method and device of multiple features fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611073666.5A CN106599907B (en) | 2016-11-29 | 2016-11-29 | The dynamic scene classification method and device of multiple features fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106599907A true CN106599907A (en) | 2017-04-26 |
CN106599907B CN106599907B (en) | 2019-11-29 |
Family
ID=58594055
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611073666.5A Active CN106599907B (en) | 2016-11-29 | 2016-11-29 | The dynamic scene classification method and device of multiple features fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106599907B (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107393554A (en) * | 2017-06-20 | 2017-11-24 | 武汉大学 | In a kind of sound scene classification merge class between standard deviation feature extracting method |
CN107689035A (en) * | 2017-08-30 | 2018-02-13 | 广州华多网络科技有限公司 | A kind of homography matrix based on convolutional neural networks determines method and device |
CN107909095A (en) * | 2017-11-07 | 2018-04-13 | 江苏大学 | A kind of image-recognizing method based on deep learning |
CN107909070A (en) * | 2017-11-24 | 2018-04-13 | 天津英田视讯科技有限公司 | A kind of method of road water detection |
CN108090203A (en) * | 2017-12-25 | 2018-05-29 | 上海七牛信息技术有限公司 | Video classification methods, device, storage medium and electronic equipment |
CN108090497A (en) * | 2017-12-28 | 2018-05-29 | 广东欧珀移动通信有限公司 | Video classification methods, device, storage medium and electronic equipment |
CN108229336A (en) * | 2017-12-13 | 2018-06-29 | 北京市商汤科技开发有限公司 | Video identification and training method and device, electronic equipment, program and medium |
CN108491856A (en) * | 2018-02-08 | 2018-09-04 | 西安电子科技大学 | A kind of image scene classification method based on Analysis On Multi-scale Features convolutional neural networks |
CN108510012A (en) * | 2018-05-04 | 2018-09-07 | 四川大学 | A kind of target rapid detection method based on Analysis On Multi-scale Features figure |
CN108647599A (en) * | 2018-04-27 | 2018-10-12 | 南京航空航天大学 | In conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network |
CN109002766A (en) * | 2018-06-22 | 2018-12-14 | 北京邮电大学 | A kind of expression recognition method and device |
CN109115501A (en) * | 2018-07-12 | 2019-01-01 | 哈尔滨工业大学(威海) | A kind of Civil Aviation Engine Gas path fault diagnosis method based on CNN and SVM |
CN109145840A (en) * | 2018-08-29 | 2019-01-04 | 北京字节跳动网络技术有限公司 | video scene classification method, device, equipment and storage medium |
CN109165682A (en) * | 2018-08-10 | 2019-01-08 | 中国地质大学(武汉) | A kind of remote sensing images scene classification method merging depth characteristic and significant characteristics |
CN109257622A (en) * | 2018-11-01 | 2019-01-22 | 广州市百果园信息技术有限公司 | A kind of audio/video processing method, device, equipment and medium |
CN109376696A (en) * | 2018-11-28 | 2019-02-22 | 北京达佳互联信息技术有限公司 | Method, apparatus, computer equipment and the storage medium of video actions classification |
CN109697453A (en) * | 2018-09-30 | 2019-04-30 | 中科劲点(北京)科技有限公司 | Semi-supervised scene classification recognition methods, system and device based on multimodality fusion |
CN110033505A (en) * | 2019-04-16 | 2019-07-19 | 西安电子科技大学 | A kind of human action capture based on deep learning and virtual animation producing method |
CN110220585A (en) * | 2019-06-20 | 2019-09-10 | 广东工业大学 | A kind of bridge vibration test method and relevant apparatus |
WO2019174439A1 (en) * | 2018-03-13 | 2019-09-19 | 腾讯科技(深圳)有限公司 | Image recognition method and apparatus, and terminal and storage medium |
CN110516737A (en) * | 2019-08-26 | 2019-11-29 | 南京人工智能高等研究院有限公司 | Method and apparatus for generating image recognition model |
CN111145222A (en) * | 2019-12-30 | 2020-05-12 | 浙江中创天成科技有限公司 | Fire detection method combining smoke movement trend and textural features |
CN111563488A (en) * | 2020-07-14 | 2020-08-21 | 成都市映潮科技股份有限公司 | Video subject content identification method, system and storage medium |
WO2021031523A1 (en) * | 2019-08-21 | 2021-02-25 | 创新先进技术有限公司 | Document recognition method and device |
CN112687022A (en) * | 2020-12-18 | 2021-04-20 | 山东盛帆蓝海电气有限公司 | Intelligent building inspection method and system based on video |
WO2021093468A1 (en) * | 2019-11-15 | 2021-05-20 | 腾讯科技(深圳)有限公司 | Video classification method and apparatus, model training method and apparatus, device and storage medium |
US20210390713A1 (en) * | 2020-06-12 | 2021-12-16 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for performing motion transfer using a learning model |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682302A (en) * | 2012-03-12 | 2012-09-19 | 浙江工业大学 | Human body posture identification method based on multi-characteristic fusion of key frame |
CN102902981A (en) * | 2012-09-13 | 2013-01-30 | 中国科学院自动化研究所 | Violent video detection method based on slow characteristic analysis |
CN103077318A (en) * | 2013-01-17 | 2013-05-01 | 电子科技大学 | Classifying method based on sparse measurement |
CN103366181A (en) * | 2013-06-28 | 2013-10-23 | 安科智慧城市技术(中国)有限公司 | Method and device for identifying scene integrated by multi-feature vision codebook |
CN104881655A (en) * | 2015-06-03 | 2015-09-02 | 东南大学 | Human behavior recognition method based on multi-feature time-space relationship fusion |
CN105956572A (en) * | 2016-05-15 | 2016-09-21 | 北京工业大学 | In vivo face detection method based on convolutional neural network |
-
2016
- 2016-11-29 CN CN201611073666.5A patent/CN106599907B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682302A (en) * | 2012-03-12 | 2012-09-19 | 浙江工业大学 | Human body posture identification method based on multi-characteristic fusion of key frame |
CN102902981A (en) * | 2012-09-13 | 2013-01-30 | 中国科学院自动化研究所 | Violent video detection method based on slow characteristic analysis |
CN103077318A (en) * | 2013-01-17 | 2013-05-01 | 电子科技大学 | Classifying method based on sparse measurement |
CN103366181A (en) * | 2013-06-28 | 2013-10-23 | 安科智慧城市技术(中国)有限公司 | Method and device for identifying scene integrated by multi-feature vision codebook |
CN104881655A (en) * | 2015-06-03 | 2015-09-02 | 东南大学 | Human behavior recognition method based on multi-feature time-space relationship fusion |
CN105956572A (en) * | 2016-05-15 | 2016-09-21 | 北京工业大学 | In vivo face detection method based on convolutional neural network |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107393554A (en) * | 2017-06-20 | 2017-11-24 | 武汉大学 | In a kind of sound scene classification merge class between standard deviation feature extracting method |
CN107689035B (en) * | 2017-08-30 | 2021-12-21 | 广州方硅信息技术有限公司 | Homography matrix determination method and device based on convolutional neural network |
CN107689035A (en) * | 2017-08-30 | 2018-02-13 | 广州华多网络科技有限公司 | A kind of homography matrix based on convolutional neural networks determines method and device |
CN107909095A (en) * | 2017-11-07 | 2018-04-13 | 江苏大学 | A kind of image-recognizing method based on deep learning |
CN107909070A (en) * | 2017-11-24 | 2018-04-13 | 天津英田视讯科技有限公司 | A kind of method of road water detection |
US10909380B2 (en) | 2017-12-13 | 2021-02-02 | Beijing Sensetime Technology Development Co., Ltd | Methods and apparatuses for recognizing video and training, electronic device and medium |
CN108229336A (en) * | 2017-12-13 | 2018-06-29 | 北京市商汤科技开发有限公司 | Video identification and training method and device, electronic equipment, program and medium |
CN108229336B (en) * | 2017-12-13 | 2021-06-04 | 北京市商汤科技开发有限公司 | Video recognition and training method and apparatus, electronic device, program, and medium |
CN108090203A (en) * | 2017-12-25 | 2018-05-29 | 上海七牛信息技术有限公司 | Video classification methods, device, storage medium and electronic equipment |
CN108090497A (en) * | 2017-12-28 | 2018-05-29 | 广东欧珀移动通信有限公司 | Video classification methods, device, storage medium and electronic equipment |
CN108090497B (en) * | 2017-12-28 | 2020-07-07 | Oppo广东移动通信有限公司 | Video classification method and device, storage medium and electronic equipment |
CN108491856A (en) * | 2018-02-08 | 2018-09-04 | 西安电子科技大学 | A kind of image scene classification method based on Analysis On Multi-scale Features convolutional neural networks |
US11393206B2 (en) * | 2018-03-13 | 2022-07-19 | Tencent Technology (Shenzhen) Company Limited | Image recognition method and apparatus, terminal, and storage medium |
CN110569795B (en) * | 2018-03-13 | 2022-10-14 | 腾讯科技(深圳)有限公司 | Image identification method and device and related equipment |
CN110569795A (en) * | 2018-03-13 | 2019-12-13 | 腾讯科技(深圳)有限公司 | Image identification method and device and related equipment |
WO2019174439A1 (en) * | 2018-03-13 | 2019-09-19 | 腾讯科技(深圳)有限公司 | Image recognition method and apparatus, and terminal and storage medium |
CN108647599A (en) * | 2018-04-27 | 2018-10-12 | 南京航空航天大学 | In conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network |
CN108510012A (en) * | 2018-05-04 | 2018-09-07 | 四川大学 | A kind of target rapid detection method based on Analysis On Multi-scale Features figure |
CN108510012B (en) * | 2018-05-04 | 2022-04-01 | 四川大学 | Target rapid detection method based on multi-scale feature map |
CN109002766B (en) * | 2018-06-22 | 2021-07-09 | 北京邮电大学 | Expression recognition method and device |
CN109002766A (en) * | 2018-06-22 | 2018-12-14 | 北京邮电大学 | A kind of expression recognition method and device |
CN109115501A (en) * | 2018-07-12 | 2019-01-01 | 哈尔滨工业大学(威海) | A kind of Civil Aviation Engine Gas path fault diagnosis method based on CNN and SVM |
CN109165682A (en) * | 2018-08-10 | 2019-01-08 | 中国地质大学(武汉) | A kind of remote sensing images scene classification method merging depth characteristic and significant characteristics |
CN109165682B (en) * | 2018-08-10 | 2020-06-16 | 中国地质大学(武汉) | Remote sensing image scene classification method integrating depth features and saliency features |
CN109145840A (en) * | 2018-08-29 | 2019-01-04 | 北京字节跳动网络技术有限公司 | video scene classification method, device, equipment and storage medium |
CN109145840B (en) * | 2018-08-29 | 2022-06-24 | 北京字节跳动网络技术有限公司 | Video scene classification method, device, equipment and storage medium |
CN109697453A (en) * | 2018-09-30 | 2019-04-30 | 中科劲点(北京)科技有限公司 | Semi-supervised scene classification recognition methods, system and device based on multimodality fusion |
CN109257622A (en) * | 2018-11-01 | 2019-01-22 | 广州市百果园信息技术有限公司 | A kind of audio/video processing method, device, equipment and medium |
CN109376696B (en) * | 2018-11-28 | 2020-10-23 | 北京达佳互联信息技术有限公司 | Video motion classification method and device, computer equipment and storage medium |
CN109376696A (en) * | 2018-11-28 | 2019-02-22 | 北京达佳互联信息技术有限公司 | Method, apparatus, computer equipment and the storage medium of video actions classification |
CN110033505A (en) * | 2019-04-16 | 2019-07-19 | 西安电子科技大学 | A kind of human action capture based on deep learning and virtual animation producing method |
CN110220585A (en) * | 2019-06-20 | 2019-09-10 | 广东工业大学 | A kind of bridge vibration test method and relevant apparatus |
WO2021031523A1 (en) * | 2019-08-21 | 2021-02-25 | 创新先进技术有限公司 | Document recognition method and device |
CN110516737A (en) * | 2019-08-26 | 2019-11-29 | 南京人工智能高等研究院有限公司 | Method and apparatus for generating image recognition model |
CN110516737B (en) * | 2019-08-26 | 2023-05-26 | 南京人工智能高等研究院有限公司 | Method and device for generating image recognition model |
WO2021093468A1 (en) * | 2019-11-15 | 2021-05-20 | 腾讯科技(深圳)有限公司 | Video classification method and apparatus, model training method and apparatus, device and storage medium |
US11967151B2 (en) | 2019-11-15 | 2024-04-23 | Tencent Technology (Shenzhen) Company Limited | Video classification method and apparatus, model training method and apparatus, device, and storage medium |
CN111145222A (en) * | 2019-12-30 | 2020-05-12 | 浙江中创天成科技有限公司 | Fire detection method combining smoke movement trend and textural features |
WO2021248432A1 (en) * | 2020-06-12 | 2021-12-16 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for performing motion transfer using a learning model |
US20210390713A1 (en) * | 2020-06-12 | 2021-12-16 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for performing motion transfer using a learning model |
US11830204B2 (en) * | 2020-06-12 | 2023-11-28 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for performing motion transfer using a learning model |
CN111563488A (en) * | 2020-07-14 | 2020-08-21 | 成都市映潮科技股份有限公司 | Video subject content identification method, system and storage medium |
CN112687022A (en) * | 2020-12-18 | 2021-04-20 | 山东盛帆蓝海电气有限公司 | Intelligent building inspection method and system based on video |
Also Published As
Publication number | Publication date |
---|---|
CN106599907B (en) | 2019-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106599907B (en) | The dynamic scene classification method and device of multiple features fusion | |
Huang et al. | Tracknet: A deep learning network for tracking high-speed and tiny objects in sports applications | |
CN108334848B (en) | Tiny face recognition method based on generation countermeasure network | |
Wang et al. | Hierarchical attention network for action recognition in videos | |
Tran et al. | Two-stream flow-guided convolutional attention networks for action recognition | |
CN109816689A (en) | A kind of motion target tracking method that multilayer convolution feature adaptively merges | |
Li et al. | Rehar: Robust and efficient human activity recognition | |
CN112417990B (en) | Examination student illegal behavior identification method and system | |
CN109919060A (en) | A kind of identity card content identifying system and method based on characteristic matching | |
CN111383244B (en) | Target detection tracking method | |
CN111209897A (en) | Video processing method, device and storage medium | |
CN110020669A (en) | A kind of license plate classification method, system, terminal device and computer program | |
Zhang et al. | Moving foreground-aware visual attention and key volume mining for human action recognition | |
CN108921023A (en) | A kind of method and device of determining low quality portrait data | |
Ganesh et al. | A novel framework for fine grained action recognition in soccer | |
Chen et al. | A trajectory-based ball tracking framework with enrichment for broadcast baseball videos | |
Rachmadi et al. | Combined convolutional neural network for event recognition | |
Varini et al. | Egocentric video summarization of cultural tour based on user preferences | |
US20220044423A1 (en) | Ball trajectory tracking | |
CN110689066B (en) | Training method combining face recognition data equalization and enhancement | |
Lazarescu et al. | Using camera motion to identify types of American football plays | |
Zhou et al. | Feature sampling strategies for action recognition | |
Wang et al. | Dynamic tracking attention model for action recognition | |
CN114898290A (en) | Real-time detection method and system for marine ship | |
Afzal et al. | Reinforcement Learning based Video Summarization with Combination of ResNet and Gated Recurrent Unit. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |