CN106599907A - Multi-feature fusion-based dynamic scene classification method and apparatus - Google Patents

Multi-feature fusion-based dynamic scene classification method and apparatus Download PDF

Info

Publication number
CN106599907A
CN106599907A CN201611073666.5A CN201611073666A CN106599907A CN 106599907 A CN106599907 A CN 106599907A CN 201611073666 A CN201611073666 A CN 201611073666A CN 106599907 A CN106599907 A CN 106599907A
Authority
CN
China
Prior art keywords
feature
video
feature information
sorted
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611073666.5A
Other languages
Chinese (zh)
Other versions
CN106599907B (en
Inventor
曹先彬
郑洁宛
黄元骏
潘朝凤
刘俊英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201611073666.5A priority Critical patent/CN106599907B/en
Publication of CN106599907A publication Critical patent/CN106599907A/en
Application granted granted Critical
Publication of CN106599907B publication Critical patent/CN106599907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-feature fusion-based dynamic scene classification method and apparatus. The method comprises the following steps: videos to be classified are obtained; a C3D feature extractor is used for performing feature extraction on the videos to be classified; first feature information is obtained; an iDT feature extractor is used for performing feature extraction on the videos to be classified, and second feature information is obtained; a VGG feature extractor is used for performing feature extraction on the videos to be classified, and third feature information is obtained; the first feature information, the second feature information and the third feature information are fused, and fusion features can be obtained; the videos to be classified are classified based on the fusion features, and a classification result of the videos to be classified is obtained. According to the feature fusion-based dynamic scene classification method provided in the invention, three kinds of feature extractors are used for extracting different features of the videos to be classified, short time dynamic features of the videos to be classified are taken into consideration, long time dynamic features and static features of the videos to be classified are also taken into consideration, and accurate dynamic scene classification can be enabled.

Description

The dynamic scene sorting technique of multiple features fusion and device
Technical field
The present invention relates to aviation surveillance technology, more particularly to the dynamic scene sorting technique and device of multiple features fusion.
Background technology
With the development and the continuous opening that uses low latitude field of country of unmanned air vehicle technique, unmanned plane is widely used In the tasks such as disaster inspection, mountain area rescue, goods and materials conveying, sample collection.Unmanned plane with camera head is in flight course In shot, and picture is returned to into server, server can carry out target detection tracking automatically according to image content, can be with Realize automatic identification weather, environment, the condition of a disaster etc..
To improve the accuracy of target detection tracking, those skilled in the art remove and have carried out numerous studies and improvement to algorithm Outward, it is also contemplated that the difference of the dynamic scene that target is located, the accuracy of target detection tracking can also be badly influenced.Therefore, Those skilled in the art's proposition carried out first dynamic scene classification before target detection tracking is carried out.But, existing dynamic field Scape sorting technique is generally based only upon still image and is classified, and causes nicety of grading poor.
The content of the invention
The present invention provides a kind of dynamic scene sorting technique and device of multiple features fusion, for solving existing dynamic scene Sorting technique is generally based only upon still image and is classified, and causes the problem that nicety of grading is poor.
On the one hand, the present invention provides a kind of dynamic scene sorting technique of multiple features fusion, including:
Obtain video to be sorted;
Feature extraction is carried out to the video to be sorted using Three dimensional convolution Neural Network Feature Extractor, first is obtained special Reference ceases;Feature extraction is carried out to the video to be sorted using improved dense track characteristic extractor, second feature is obtained Information;Feature extraction is carried out to the video to be sorted using visual geometric Neural Network Feature Extractor, third feature is obtained Information;
The fisrt feature information, the second feature information and the third feature information are merged, acquisition is melted Close feature;
The video to be sorted is classified according to the fusion feature, obtains the classification knot of the video to be sorted Really.
The dynamic scene sorting technique of multiple features fusion as above, it is described to the fisrt feature information, described the Two characteristic informations and the third feature information are merged, and obtain fusion feature, including:
The first corresponding fisrt feature data of default dimension in the fisrt feature information are obtained, the second feature is obtained The second corresponding second feature data of default dimension in information, obtain the 3rd default dimension in the third feature information corresponding Third feature data;
Merge according to the fisrt feature data, the second feature data, the third feature data acquisition special Levy.
The dynamic scene sorting technique of multiple features fusion as above, it is described to the fisrt feature information, described the Two characteristic informations and the third feature information are merged, and before obtaining fusion feature, are also included:
Obtain the respective fisrt feature information of all training videos in training video storehouse, second feature information and third feature Information;
According to the respective fisrt feature information of all training videos, second feature information and third feature information, obtain The Fei Sheer for taking all dimensions of fisrt feature information, second feature information and third feature information differentiates ratio;
The Fei Sheer of all dimensions in the fisrt feature information is differentiated than determining the fisrt feature information First default dimension, the Fei Sheer of all dimensions in the second feature information is differentiated than determining the second feature letter The default dimension of the second of breath, the Fei Sheer of all dimensions in the third feature information is differentiated than determining the third feature letter The default dimension of the 3rd of breath;
Wherein, the training video storehouse includes at least two training videos for belonging to a different category.
The dynamic scene sorting technique of multiple features fusion as above, i-th dimension in arbitrary characteristic information Fei Sheer differentiate ratio acquisition formula it is as follows:
K=Sb/Si
The classification sum of category, xijFor the characteristic matrix of i-th dimension of all training videos of j-th classification, mijFor j-th The average value matrix of the characteristic matrix of i-th dimension of all training videos of classification, mihFor all instructions of h-th classification Practice the average value matrix of the characteristic matrix of i-th dimension of video.The span of i is 1 to I positive integer, and I is described The dimension sum of the characteristic information belonging to i-th dimension, the span of j is 1 to J positive integer, and it is to remove that the span of h is Positive integer of outside j 1 to J.The dynamic scene sorting technique of multiple features fusion as above, the employing Three dimensional convolution nerve Network characterization extractor carries out feature extraction to the video to be sorted, obtains fisrt feature information, including:
The video to be sorted is divided, at least one video segment comprising N two field pictures is obtained;
Feature extraction is carried out to all video segments using Three dimensional convolution Neural Network Feature Extractor, obtains described Fisrt feature information;
Wherein, N is the positive integer more than 1.
The dynamic scene sorting technique of multiple features fusion as above, it is described to be extracted using improved dense track characteristic Device carries out feature extraction to the video to be sorted, obtains second feature information, including:
Obtain the dense track characteristic and homography matrix of the video to be sorted;
The dense track characteristic is corrected using the homography matrix, obtains the second feature information.
The dynamic scene sorting technique of multiple features fusion as above, the employing visual geometric neural network characteristics are carried Taking device carries out feature extraction to the video to be sorted, obtains third feature information, including:
An at least frame key frame is extracted in the video to be sorted, using VGG feature extractors to an at least frame Key frame carries out feature extraction, obtains the third feature information.
The dynamic scene sorting technique of multiple features fusion as above, it is described to be treated point to described according to the fusion feature Class video is classified, and obtains the classification results of the video to be sorted, including:
According to the fusion feature, the video to be sorted is classified using support vector machine classifier, obtain institute State the classification results of video to be sorted.
The dynamic scene sorter of multiple features fusion provided in an embodiment of the present invention is described below, the apparatus and method one One correspondence, to realize above-described embodiment in Feature Fusion dynamic scene sorting technique, with identical technical characteristic and Technique effect, the embodiment of the present invention is repeated no more to this.
On the other hand, the present invention provides a kind of dynamic scene sorter of multiple features fusion, including:
Video acquiring module to be sorted, for obtaining video to be sorted;
Characteristic extracting module, for carrying out spy to the video to be sorted using Three dimensional convolution Neural Network Feature Extractor Extraction is levied, fisrt feature information is obtained;Feature is carried out to the video to be sorted using improved dense track characteristic extractor Extract, obtain second feature information;Feature is carried out to the video to be sorted using visual geometric Neural Network Feature Extractor Extract, obtain third feature information;
Fusion Module, for entering to the fisrt feature information, the second feature information and the third feature information Row fusion, obtains fusion feature;
Sort module, for classifying to the video to be sorted according to the fusion feature, obtains described to be sorted The classification results of video.
The dynamic scene sorting technique and device of the Feature Fusion that the present invention is provided, it is special using C3D feature extractors, iDT The different characteristic that three kinds of feature extractors of extractor and VGG feature extractors extract video to be sorted is levied, different characteristic is merged Laggard Mobile state scene classification.Not only allow for the behavioral characteristics in short-term of video to be sorted, it is also contemplated that the length of video to be sorted When behavioral characteristics, more merged the static information of video to be sorted so that dynamic scene classification it is more accurate.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are these Some bright embodiments, for those of ordinary skill in the art, without having to pay creative labor, can be with Other accompanying drawings are obtained according to these accompanying drawings.
The schematic flow sheet of the dynamic scene sorting technique embodiment one of the multiple features fusion that Fig. 1 is provided for the present invention;
The schematic flow sheet of the dynamic scene sorting technique embodiment two of the multiple features fusion that Fig. 2 is provided for the present invention;
The structural representation of the dynamic scene sorter embodiment one of the multiple features fusion that Fig. 3 is provided for the present invention.
Specific embodiment
To make purpose, technical scheme and the advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is The a part of embodiment of the present invention, rather than the embodiment of whole.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Before target detection tracking is carried out, determine dynamic scene type be favorably improved target detection tracking speed with Precision.The difficult point of dynamic scene classification problem is, due to factors such as illumination, view transformations, to cause in same class dynamic scene class Difference it is larger, and difference is less between the class of inhomogeneity dynamic scene.For example, equally it is the scene of forest fire, Ke Nengyou Different in the degree of fire, the visual angle of shooting is different, and the concentration of smog is different, shoots the dynamic scene for coming and there is very big difference It is different;Different dynamic scenes can be constituted by the different combination of same object, the gap resulted between class diminishes, such as waterfall Difference is less between the class of cloth scene and river scene.The existing dynamic scene sorting technique based on object identification in scene is being entered During Mobile state scene classification, it is impossible to solve that difference in above-mentioned class is big, the little technical problem of difference between class, cause classification speed it is slow, Precision is low.
To solve the above problems, the embodiment of the present invention provides a kind of dynamic scene sorting technique of multiple features fusion.Fig. 1 is The schematic flow sheet of the dynamic scene sorting technique embodiment one of the multiple features fusion that the present invention is provided.The executive agent of the method For the dynamic scene sorter of multiple features fusion, the device can be by software or hardware realization, and exemplary, the device can Think server, apparatus such as computer.As shown in figure 1, the method includes:
S101, acquisition video to be sorted;
S102, feature extraction is carried out to video to be sorted using Three dimensional convolution Neural Network Feature Extractor, obtain first Characteristic information;
S103, feature extraction is carried out to video to be sorted using improved dense track characteristic extractor, obtain second special Reference ceases;
S104, feature extraction is carried out to video to be sorted using visual geometric Neural Network Feature Extractor, obtain the 3rd Characteristic information;
S105, fisrt feature information, second feature information and third feature information are merged, obtain fusion feature;
S106, video to be sorted is classified according to fusion feature, obtain the classification results of video to be sorted.
Wherein, S102, S103 and S104 can be performed simultaneously, also can successively be performed, and the present invention is not limited this.
Specifically, exemplary in S101, video to be sorted can shoot what is obtained when being patrolled and examined for unmanned plane Video, can be using the video of real-time Transmission meeting server as video to be sorted.
Specifically, in S102, extracted using Three dimensional convolution (Convolution 3D, abbreviation C3D) neural network characteristics Device extracts the fisrt feature information of video to be sorted.C3D feature extractors are convolutional neural networks (Convolutional Neural Network, abbreviation CNN) framework, internal convolution kernel is 3 × 3 × 3 three dimensional convolution kernel.This feature extractor will be treated Classification video is divided into multiple segments and is processed, while the information in make use of whole frames of video to be sorted, therefore it is available In the multidate information in short-term and some static informations that extract input video.
Before specifically used C3D feature extractors, C3D feature extractors are entered initially with the video in training video storehouse Row training, generally comprises substantial amounts of classified sport video in short-term, comprising abundant movable information in training video storehouse.Example Property, the video that the training stage uses can come from sport-1M data bases, and the data base is regarded by 1,000,000 motions in short-term Frequency is constituted, the sport video in short-term such as example play basketball, play soccer.Therefore, it is special using the first of the acquisition of C3D network characterizations extractor Reference breath can be used for characterizing the static information hidden in video to be sorted and in short-term multidate information.
Optionally, the employing C3D feature extractors in S102 carry out feature extraction to video to be sorted, obtain fisrt feature Information, specifically includes:
S1021, video to be sorted is divided, obtained at least one video segment comprising N two field pictures;
S1022, using C3D feature extractors all video segments are carried out with feature extraction, obtain fisrt feature information;
Wherein, N is the positive integer more than 1.
Exemplary, it is contemplated that C3D networks are the feature extractors of a process video-frequency band, can be carried out video to be sorted Divide, obtain at least one video segment comprising N two field pictures, wherein N is the positive integer more than 1, exemplary, and N can be 16.Optionally, totalframes Ts of the N again smaller than video to be sorted.
When N is 16, exemplary, the basic configuration of C3D feature extractors can be:Comprising five convolutional layers, five Pond layer, each volume basic unit one pond layer of heel, two full articulamentums and a classification layer are used for predicting classification results.Five The neuron number of convolutional layer is respectively 64,128,256,256,256.Meanwhile, all of convolutional layer has the convolution of formed objects Core, is 3 × 3 × 3.Using maximum pond, the size of the core used by it is 2 × 2 × 2 to all of pond layer.Each full articulamentum Neuron number be 4096.When feature is extracted using C3D convolutional neural networks, made using the feature of second full articulamentum For result output.
Specifically, in S103, using improved dense track (Improved Dense Trajectory, abbreviation iDT) Feature extractor carries out feature extraction to video to be sorted, obtains second feature information.IDT feature extractors are used to extract to treat point Trace information in class video.
Optionally, the improved dense track characteristic extractor of employing in S103 carries out feature extraction to video to be sorted, Second feature information is obtained, including:
S1031, the dense track characteristic and homography matrix that obtain video to be sorted;
S1032, dense track characteristic is corrected using homography matrix, obtain second feature information.
Specifically, the existing dense track acquisition algorithm based on optical flow field can be adopted to obtain the dense rail of video to be sorted Mark feature.Obtaining after dense track is got, process is being filtered to all dense tracks, removing static in all tracks Motionless trace information and the trace information for having position to be mutated.
Further, after dense track is obtained, it is contemplated that photographic head is probably to carry flight by unmanned plane to be clapped Take the photograph, thus there is movement in itself in photographic head, the trace information that the movement of photographic head is caused merges in dense track, may shadow The classification of dynamic scene is rung, therefore the track of the mobile generation of photographic head need to be filtered.Produce to eliminate this photographic head movement Trace information, can model generation homography matrix and carries out track elimination using homography matrix.
To obtain homography matrix, first registration is carried out to the successive frame in video to be sorted.Exemplary adopting adds The method that fast robust features (Speed-up Robust Features, abbreviation SURF) and optical flow method are combined carries out registration.Then Reuse stochastical sampling concordance (Random Sample Consensus, abbreviation RANSAC) algorithm and obtain homography matrix. After acquiring homography matrix, dense track can be corrected using the homography matrix, be removed due to photographic head The wrong trace information that movement is caused, so as to obtain second feature information.Because iDT feature extractors are extracted video to be sorted From starting to all trace informations for terminating, thus second feature information can be used for characterizing video to be sorted it is long when track it is special Levy, i.e., multidate information when long.
Specifically, in S104, the visual geometric proposed using the visual geometric group of engineering science institute of Oxford University (Visual Geometry Group, abbreviation VGG) Neural Network Feature Extractor, according to the part two field picture of video to be sorted Extract the static information of video to be sorted.Wherein, VGG feature extractors are also CNN frameworks, in specifically used VGG feature extractions Before device, VGG feature extractors are trained initially with training picture library.It is different from C3D feature extractors, VGG feature extractions Comprising a large amount of classified static scene images in the training picture library that device is used.Exemplary, the picture that the training stage uses comes From Places-365 data bases, and places365 data bases are made up of the static scene picture of 365 classes, and each class is one Specific scene.Therefore, this feature extractor can well extract the static information for describing scene in video to be sorted partially, with Make up the problem of scene static information disappearance when C3D feature extractors extract feature.
Optionally, the employing VGG feature extractors in S104 carry out feature extraction to video to be sorted, obtain the 3rd special Reference ceases, and specifically includes:
An at least frame key frame is extracted in video to be sorted, an at least frame key frame is entered using VGG feature extractors Row feature extraction, obtains third feature information.
Specifically, key-frame extraction is carried out to video to be sorted first, one section of video is often made up of hundreds of frame picture, especially It is when unmanned plane during flying speed is slower, and the content difference in consecutive frame is little, if all carrying out feature extraction to each frame, Extraction rate will be caused relatively slow and consumed compared with multiple resource, in order to more efficiently extract the static information hidden in video, can Choosing key frame at initial, the middle, ending three in video to be sorted carries out the representative of the static information as video to be sorted.
VGG network characterizations extractor include 16 convolutional layers, 16 pond layers, each convolutional layer is followed by one Pond layer, three full articulamentums, a classification layer is used for output category result.Wherein the size of the convolution kernel of convolutional layer be 3 × 3.When feature is extracted using VGG convolutional neural networks, exported as a result using the feature of second full articulamentum.
Specifically, in S105, the fisrt feature information acquired using different characteristic extractor, second feature are believed Breath and third feature information are blended, and obtain fusion feature, fusion feature can be used to characterizing video to be sorted it is long when and in short-term Multidate information, and the different static information that different characteristic extractor is extracted.
Specifically, in S106, according to the fusion feature acquired in S105, using traditional support vector machine (Support Vector Machine, abbreviation SVM) linear classifier is classified, you can obtain the classification letter of video to be sorted Breath.
Exemplary, SVM classifier parameter C is set to 100, and core adopts linear kernel, before using SVM classifier, need to make Sorter model is trained with training set data, and sorter model is tested using test set data.
The dynamic scene sorting technique of the Feature Fusion that the present invention is provided, using C3D feature extractors, iDT feature extractions Three kinds of feature extractors of device and VGG feature extractors extract the different characteristic of video to be sorted, carry out after different characteristic is merged Dynamic scene is classified.Not only allow for the multidate information in short-term of video to be sorted, it is also contemplated that video to be sorted it is long when dynamic Information, has more merged the static information of video to be sorted so that dynamic scene classification results are more accurate.
Optionally, on the basis of above-described embodiment, in S105 to fisrt feature information, second feature information and Three characteristic informations are merged, and obtain fusion feature, are specifically included:
The first corresponding fisrt feature data of default dimension, obtain second feature letter in S1051, acquisition fisrt feature information The second default corresponding second feature data of dimension in breath, obtain in third feature information the 3rd default dimension corresponding 3rd special Levy data;
S1052, according to fisrt feature data, second feature data, third feature data acquisition fusion feature.
Specifically, each characteristic information is size identical two-dimensional matrix, for example, when fisrt feature information is that 1*4096 is big During little matrix, wherein 1 is row, 4096 are row, then it is believed that fisrt feature packet contains 4096 dimensions, the first dimension correspondence Characteristic the first column data for being designated as in fisrt feature information.The data of different dimensions are regarded for be sorted in characteristic information The classification of frequency affects different, when Feature Fusion is carried out, selects to affect the larger corresponding data of dimension to merge classification, The accuracy of dynamic scene classification can be improved.Exemplary, the corresponding default dimension of different characteristic data is different, presets dimension Quantity can also be different.
Further, on the basis of any of the above-described embodiment, in conjunction with specific embodiments, the determination feature in S1051 is melted The default dimension closed is described in detail.The dynamic scene sorting technique embodiment of the multiple features fusion that Fig. 2 is provided for the present invention Two schematic flow sheet.As shown in Fig. 2 before Feature Fusion is carried out, also including:
S201, obtain the respective fisrt feature information of all training videos in training video storehouse, second feature information and the Three characteristic informations;
S202, according to the respective fisrt feature information of all training videos, second feature information and third feature information, obtain The Fei Sheer for taking all dimensions of fisrt feature information, second feature information and third feature information differentiates ratio;
S203, the Fei Sheer of all dimensions in fisrt feature information differentiate than determine fisrt feature information first Default dimension, the Fei Sheer of all dimensions in second feature information differentiates more default than determine second feature information second Dimension, the Fei Sheer of all dimensions in third feature information differentiates dimension more default than determine third feature information the 3rd;
Wherein, training video storehouse includes at least two training videos for belonging to a different category.
Specifically, after the Fei Sheer of all dimensions in getting a characteristic information differentiates ratio, can be exemplary, will All dimensions differentiate the big minispread of ratio according to Fei Sheer, choose Fei Sheer and differentiate than the dimension higher than preset value as default dimension Degree.It is exemplary, can also choose Fei Sheer and differentiate than several dimensions before high as default dimension, exemplary, different characteristic The quantity of the default dimension of information can be with difference.
Larger default dimension is affected on visual classification in obtain each characteristic information, can be carried out point using training video Analysis, determines that Fei Sheer differentiates higher dimension.
Specifically, on the basis of with reference to any of the above-described embodiment, using specific embodiment to any feature information in appoint The Fei Sheer of dimension differentiates the acquisition of ratio, is described in detail.The Fei Sheer of the i-th dimension degree in any feature information differentiates ratio Acquisition formula it is as follows:
K=Sb/Si
Wherein, SiFor the within-cluster variance of i-th dimension,SbFor i-th dimension Inter _ class relationship,J is that the classification belonging to all training videos is total, xijFor The characteristic matrix of i-th dimension of all training videos of j-th classification, mijFor all training videos of j-th classification I-th dimension characteristic matrix average value matrix, mihFor i-th dimension of all training videos of h-th classification Characteristic matrix average value matrix.The span of i is 1 to I positive integer, and I is the spy belonging to i-th dimension The dimension sum of reference breath, the span of j is 1 to J positive integer, the span of h be in addition to j 1 to J positive integer
Wherein, SiLess expression dimension is more similar in same class video, SbIt is bigger represent between the dimension and class other The similarity of video is lower, therefore the value of k is the bigger the better, and is more conducive to helping visual classification.When the value of k is bigger, show this I-th dimension in characteristic information is bigger to the impact that dynamic scene is classified.After all dimensions corresponding k value is got, can It is combined from the dimension with larger k value in each characteristic information, obtains fusion feature.
It is exemplary, if there are 9 training videos for belonging to 3 classifications in training video storehouse, there are 3 training to regard per classification Frequently.For fisrt feature information C3D feature, the method for Fei Sheer differentiation ratios of the 1st dimension of this characteristic information is calculated such as Under:
First each video is divided into into 5 sections of short-sighted frequencies containing only 16 frame pictures, using C3D feature extractors to each short-sighted frequency Feature is extracted, the feature of fc7 layers is taken, the characteristic matrix of 5 1*4096 dimensions is obtained, taking can be with a 1*4096 after average The characteristic matrix of dimension represents a video.So, 9 training videos can use 9* Jing after C3D feature extractors extract feature 4096 characteristic matrix is represented.
For the feature of this 4096 dimensions, when the Fei Sheer that calculate the 1st dimension differentiates to be compared.First, 9* is obtained First column matrix of 4096 characteristic matrixes, obtains the matrix of a 9*1.The matrix of 3*1 above is first classification The feature that video extraction goes out, the matrix of middle 3*1 is the video extraction feature out of second classification, the square of 3*1 below Battle array for the 3rd classification video extraction feature out.Example, the matrix of this 9*1 is [1,2,1,2,3,2,3,1,3 ]T
Wherein, the average value matrix of [1,2,1] is [1.3,1.3,1.3];[2,3,2] average value matrix for [2.3, 2.3,2.3];[3,1,3] average value matrix is [2.3,2.3,2.3].
Then, within-cluster variance is calculated:
And calculating inter _ class relationship:
Finally, according to SiAnd SbK is compared in the Fei Sheer differentiations for obtaining the 1st dimension in fisrt feature information;
K=Sb/Si=20.02/4.01=4.99.
On the other hand the embodiment of the present invention provides a kind of dynamic scene sorter of multiple features fusion, to perform as above Described embodiment of the method, with identical technical characteristic and technique effect, the present invention is repeated no more to this.
The structural representation of the dynamic scene sorter embodiment one of the multiple features fusion that Fig. 3 is provided for the present invention, such as Shown in Fig. 3, the device includes:
Video acquiring module to be sorted 301, for obtaining video to be sorted;
Characteristic extracting module 302, for carrying out spy to video to be sorted using Three dimensional convolution Neural Network Feature Extractor Extraction is levied, fisrt feature information is obtained;Feature extraction is carried out to video to be sorted using improved dense track characteristic extractor, Obtain second feature information;Feature extraction is carried out to video to be sorted using visual geometric Neural Network Feature Extractor, is obtained Third feature information;
Fusion Module 303, for merging to fisrt feature information, second feature information and third feature information, obtains Take fusion feature;
Sort module 304, for classifying to video to be sorted according to fusion feature, obtains the classification of video to be sorted As a result.
Further, Fusion Module 303 is specifically for obtaining the first default dimension corresponding first in fisrt feature information Characteristic, obtains the second corresponding second feature data of default dimension in second feature information, in obtaining third feature information The 3rd corresponding third feature data of default dimension;
According to fisrt feature data, second feature data, third feature data acquisition fusion feature.
Further, the device also includes:Default dimension acquisition module, preset dimension acquisition module specifically for:
Obtain the respective fisrt feature information of all training videos in training video storehouse, second feature information and third feature Information;
According to the respective fisrt feature information of all training videos, second feature information and third feature information, the is obtained The Fei Sheer of all dimensions of one characteristic information, second feature information and third feature information differentiates ratio;
The Fei Sheer of all dimensions in fisrt feature information differentiates more default than determine fisrt feature information first Dimension, the Fei Sheer of all dimensions in second feature information differentiates dimension more default than determine second feature information second Degree, the Fei Sheer of all dimensions in third feature information differentiates dimension more default than determine third feature information the 3rd;
Wherein, training video storehouse includes at least two training videos for belonging to a different category.
Further, the Fei Sheer of i-th dimension in any feature information differentiates that the acquisition formula of ratio is as follows:
K=Sb/Si
Wherein, SiFor the within-cluster variance of i-th dimension,SbFor i-th dimension Inter _ class relationship,J is that the classification belonging to all training videos is total, xijFor The characteristic matrix of i-th dimension of all training videos of j-th classification, mijFor all training videos of j-th classification I-th dimension characteristic matrix average value matrix, mihFor i-th dimension of all training videos of h-th classification Characteristic matrix average value matrix.The span of i is 1 to I positive integer, and I is the spy belonging to i-th dimension The dimension sum of reference breath, the span of j is 1 to J positive integer, the span of h be in addition to j 1 to J positive integer
Further, characteristic extracting module 302 specifically for:
Video to be sorted is divided, at least one video segment comprising N two field pictures is obtained;
Feature extraction is carried out to all video segments using Three dimensional convolution Neural Network Feature Extractor, fisrt feature is obtained Information;
Wherein, N is the positive integer more than 1.
Further, characteristic extracting module 302 specifically for:
Obtain the dense track characteristic and homography matrix of video to be sorted;
Dense track characteristic is corrected using homography matrix, obtains second feature information.
Further, characteristic extracting module 302 specifically for:
An at least frame key frame is extracted in video to be sorted, using visual geometric Neural Network Feature Extractor at least One frame key frame carries out feature extraction, obtains third feature information.
Further, sort module 304 is specifically for according to fusion feature, using support vector machine classifier to be sorted Video is classified, and obtains the classification results of video to be sorted.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above-mentioned each method embodiment can lead to Cross the related hardware of programmed instruction to complete.Aforesaid program can be stored in a computer read/write memory medium.The journey Sequence upon execution, performs the step of including above-mentioned each method embodiment;And aforesaid storage medium includes:ROM, RAM, magnetic disc or Person's CD etc. is various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above only to illustrate technical scheme, rather than a limitation;To the greatest extent Pipe has been described in detail with reference to foregoing embodiments to the present invention, it will be understood by those within the art that:Its according to So the technical scheme described in foregoing embodiments can be modified, either which part or all technical characteristic are entered Row equivalent;And these modifications or replacement, do not make the essence disengaging various embodiments of the present invention technology of appropriate technical solution The scope of scheme.

Claims (10)

1. the dynamic scene sorting technique of a kind of multiple features fusion, it is characterised in that include:
Obtain video to be sorted;
Feature extraction is carried out to the video to be sorted using Three dimensional convolution Neural Network Feature Extractor, fisrt feature letter is obtained Breath;Feature extraction is carried out to the video to be sorted using improved dense track characteristic extractor, second feature information is obtained; Feature extraction is carried out to the video to be sorted using visual geometric Neural Network Feature Extractor, third feature information is obtained;
The fisrt feature information, the second feature information and the third feature information are merged, fusion is obtained special Levy;
The video to be sorted is classified according to the fusion feature, obtains the classification results of the video to be sorted.
2. method according to claim 1, it is characterised in that described to the fisrt feature information, the second feature Information and the third feature information are merged, and obtain fusion feature, including:
The first corresponding fisrt feature data of default dimension in the fisrt feature information are obtained, the second feature information is obtained In the second default corresponding second feature data of dimension, obtain the 3rd default dimension the corresponding 3rd in the third feature information Characteristic;
The fusion feature according to the fisrt feature data, the second feature data, the third feature data acquisition.
3. method according to claim 2, it is characterised in that described to the fisrt feature information, the second feature Information and the third feature information are merged, and before obtaining fusion feature, are also included:
Obtain the respective fisrt feature information of all training videos in training video storehouse, second feature information and third feature letter Breath;
According to the respective fisrt feature information of all training videos, second feature information and third feature information, the is obtained The Fei Sheer of all dimensions of one characteristic information, second feature information and third feature information differentiates ratio;
The Fei Sheer of all dimensions in the fisrt feature information differentiates the first of fisrt feature information more described than determination Default dimension, the Fei Sheer of all dimensions in the second feature information is differentiated than determining the second feature information Second default dimension, the Fei Sheer of all dimensions in the third feature information is differentiated than determining the third feature information 3rd default dimension;
Wherein, the training video storehouse includes at least two training videos for belonging to a different category.
4. method according to claim 3, it is characterised in that the Fei Sheer of i-th dimension in any feature information sentences Not than acquisition formula it is as follows:
K=Sb/Si
Wherein, SiFor the within-cluster variance of i-th dimension,SbBetween the class for i-th dimension Dispersion,J is that the classification belonging to all training videos is total, xijFor jth The characteristic matrix of i-th dimension of all training videos of individual classification, mijFor the of all training videos of j-th classification The average value matrix of the characteristic matrix of i dimension, mihFor the spy of i-th dimension of all training videos of h-th classification Levy the average value matrix of data matrix.The span of i is 1 to I positive integer, and I is the feature letter belonging to i-th dimension The dimension sum of breath, the span of j is 1 to J positive integer, the span of h be in addition to j 1 to J positive integer.
5. method according to claim 1, it is characterised in that the employing Three dimensional convolution Neural Network Feature Extractor pair The video to be sorted carries out feature extraction, obtains fisrt feature information, including:
The video to be sorted is divided, at least one video segment comprising N two field pictures is obtained;
Feature extraction is carried out to all video segments using Three dimensional convolution Neural Network Feature Extractor, described first is obtained Characteristic information;
Wherein, N is the positive integer more than 1.
6. method according to claim 1, it is characterised in that it is described using improved dense track characteristic extractor to institute Stating video to be sorted carries out feature extraction, obtains second feature information, including:
Obtain the dense track characteristic and homography matrix of the video to be sorted;
The dense track characteristic is corrected using the homography matrix, obtains the second feature information.
7. method according to claim 1, it is characterised in that the employing visual geometric Neural Network Feature Extractor pair The video to be sorted carries out feature extraction, obtains third feature information, including:
An at least frame key frame is extracted in the video to be sorted, using visual geometric Neural Network Feature Extractor to described An at least frame key frame carries out feature extraction, obtains the third feature information.
8. the method according to any one of claim 1 to 7, it is characterised in that it is described according to the fusion feature to described Video to be sorted is classified, and obtains the classification results of the video to be sorted, including:
According to the fusion feature, the video to be sorted is classified using support vector machine classifier, obtain described treating The classification results of classification video.
9. the dynamic scene sorter of a kind of multiple features fusion, it is characterised in that include:
Video acquiring module to be sorted, for obtaining video to be sorted;
Characteristic extracting module, carries for carrying out feature to the video to be sorted using Three dimensional convolution Neural Network Feature Extractor Take, obtain fisrt feature information;Feature extraction is carried out to the video to be sorted using improved dense track characteristic extractor, Obtain second feature information;Feature extraction is carried out to the video to be sorted using visual geometric Neural Network Feature Extractor, Obtain third feature information;
Fusion Module, for melting to the fisrt feature information, the second feature information and the third feature information Close, obtain fusion feature;
Sort module, for classifying to the video to be sorted according to the fusion feature, obtains the video to be sorted Classification results.
10. device according to claim 9, it is characterised in that the Fusion Module specifically for:
The first corresponding fisrt feature data of default dimension in the fisrt feature information are obtained, the second feature information is obtained In the second default corresponding second feature data of dimension, obtain the 3rd default dimension the corresponding 3rd in the third feature information Characteristic;
The fusion feature according to the fisrt feature data, the second feature data, the third feature data acquisition.
CN201611073666.5A 2016-11-29 2016-11-29 The dynamic scene classification method and device of multiple features fusion Active CN106599907B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611073666.5A CN106599907B (en) 2016-11-29 2016-11-29 The dynamic scene classification method and device of multiple features fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611073666.5A CN106599907B (en) 2016-11-29 2016-11-29 The dynamic scene classification method and device of multiple features fusion

Publications (2)

Publication Number Publication Date
CN106599907A true CN106599907A (en) 2017-04-26
CN106599907B CN106599907B (en) 2019-11-29

Family

ID=58594055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611073666.5A Active CN106599907B (en) 2016-11-29 2016-11-29 The dynamic scene classification method and device of multiple features fusion

Country Status (1)

Country Link
CN (1) CN106599907B (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107393554A (en) * 2017-06-20 2017-11-24 武汉大学 In a kind of sound scene classification merge class between standard deviation feature extracting method
CN107689035A (en) * 2017-08-30 2018-02-13 广州华多网络科技有限公司 A kind of homography matrix based on convolutional neural networks determines method and device
CN107909095A (en) * 2017-11-07 2018-04-13 江苏大学 A kind of image-recognizing method based on deep learning
CN107909070A (en) * 2017-11-24 2018-04-13 天津英田视讯科技有限公司 A kind of method of road water detection
CN108090203A (en) * 2017-12-25 2018-05-29 上海七牛信息技术有限公司 Video classification methods, device, storage medium and electronic equipment
CN108090497A (en) * 2017-12-28 2018-05-29 广东欧珀移动通信有限公司 Video classification methods, device, storage medium and electronic equipment
CN108229336A (en) * 2017-12-13 2018-06-29 北京市商汤科技开发有限公司 Video identification and training method and device, electronic equipment, program and medium
CN108491856A (en) * 2018-02-08 2018-09-04 西安电子科技大学 A kind of image scene classification method based on Analysis On Multi-scale Features convolutional neural networks
CN108510012A (en) * 2018-05-04 2018-09-07 四川大学 A kind of target rapid detection method based on Analysis On Multi-scale Features figure
CN108647599A (en) * 2018-04-27 2018-10-12 南京航空航天大学 In conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network
CN109002766A (en) * 2018-06-22 2018-12-14 北京邮电大学 A kind of expression recognition method and device
CN109115501A (en) * 2018-07-12 2019-01-01 哈尔滨工业大学(威海) A kind of Civil Aviation Engine Gas path fault diagnosis method based on CNN and SVM
CN109145840A (en) * 2018-08-29 2019-01-04 北京字节跳动网络技术有限公司 video scene classification method, device, equipment and storage medium
CN109165682A (en) * 2018-08-10 2019-01-08 中国地质大学(武汉) A kind of remote sensing images scene classification method merging depth characteristic and significant characteristics
CN109257622A (en) * 2018-11-01 2019-01-22 广州市百果园信息技术有限公司 A kind of audio/video processing method, device, equipment and medium
CN109376696A (en) * 2018-11-28 2019-02-22 北京达佳互联信息技术有限公司 Method, apparatus, computer equipment and the storage medium of video actions classification
CN109697453A (en) * 2018-09-30 2019-04-30 中科劲点(北京)科技有限公司 Semi-supervised scene classification recognition methods, system and device based on multimodality fusion
CN110033505A (en) * 2019-04-16 2019-07-19 西安电子科技大学 A kind of human action capture based on deep learning and virtual animation producing method
CN110220585A (en) * 2019-06-20 2019-09-10 广东工业大学 A kind of bridge vibration test method and relevant apparatus
WO2019174439A1 (en) * 2018-03-13 2019-09-19 腾讯科技(深圳)有限公司 Image recognition method and apparatus, and terminal and storage medium
CN110516737A (en) * 2019-08-26 2019-11-29 南京人工智能高等研究院有限公司 Method and apparatus for generating image recognition model
CN111145222A (en) * 2019-12-30 2020-05-12 浙江中创天成科技有限公司 Fire detection method combining smoke movement trend and textural features
CN111563488A (en) * 2020-07-14 2020-08-21 成都市映潮科技股份有限公司 Video subject content identification method, system and storage medium
WO2021031523A1 (en) * 2019-08-21 2021-02-25 创新先进技术有限公司 Document recognition method and device
CN112687022A (en) * 2020-12-18 2021-04-20 山东盛帆蓝海电气有限公司 Intelligent building inspection method and system based on video
WO2021093468A1 (en) * 2019-11-15 2021-05-20 腾讯科技(深圳)有限公司 Video classification method and apparatus, model training method and apparatus, device and storage medium
US20210390713A1 (en) * 2020-06-12 2021-12-16 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for performing motion transfer using a learning model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682302A (en) * 2012-03-12 2012-09-19 浙江工业大学 Human body posture identification method based on multi-characteristic fusion of key frame
CN102902981A (en) * 2012-09-13 2013-01-30 中国科学院自动化研究所 Violent video detection method based on slow characteristic analysis
CN103077318A (en) * 2013-01-17 2013-05-01 电子科技大学 Classifying method based on sparse measurement
CN103366181A (en) * 2013-06-28 2013-10-23 安科智慧城市技术(中国)有限公司 Method and device for identifying scene integrated by multi-feature vision codebook
CN104881655A (en) * 2015-06-03 2015-09-02 东南大学 Human behavior recognition method based on multi-feature time-space relationship fusion
CN105956572A (en) * 2016-05-15 2016-09-21 北京工业大学 In vivo face detection method based on convolutional neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682302A (en) * 2012-03-12 2012-09-19 浙江工业大学 Human body posture identification method based on multi-characteristic fusion of key frame
CN102902981A (en) * 2012-09-13 2013-01-30 中国科学院自动化研究所 Violent video detection method based on slow characteristic analysis
CN103077318A (en) * 2013-01-17 2013-05-01 电子科技大学 Classifying method based on sparse measurement
CN103366181A (en) * 2013-06-28 2013-10-23 安科智慧城市技术(中国)有限公司 Method and device for identifying scene integrated by multi-feature vision codebook
CN104881655A (en) * 2015-06-03 2015-09-02 东南大学 Human behavior recognition method based on multi-feature time-space relationship fusion
CN105956572A (en) * 2016-05-15 2016-09-21 北京工业大学 In vivo face detection method based on convolutional neural network

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107393554A (en) * 2017-06-20 2017-11-24 武汉大学 In a kind of sound scene classification merge class between standard deviation feature extracting method
CN107689035B (en) * 2017-08-30 2021-12-21 广州方硅信息技术有限公司 Homography matrix determination method and device based on convolutional neural network
CN107689035A (en) * 2017-08-30 2018-02-13 广州华多网络科技有限公司 A kind of homography matrix based on convolutional neural networks determines method and device
CN107909095A (en) * 2017-11-07 2018-04-13 江苏大学 A kind of image-recognizing method based on deep learning
CN107909070A (en) * 2017-11-24 2018-04-13 天津英田视讯科技有限公司 A kind of method of road water detection
US10909380B2 (en) 2017-12-13 2021-02-02 Beijing Sensetime Technology Development Co., Ltd Methods and apparatuses for recognizing video and training, electronic device and medium
CN108229336A (en) * 2017-12-13 2018-06-29 北京市商汤科技开发有限公司 Video identification and training method and device, electronic equipment, program and medium
CN108229336B (en) * 2017-12-13 2021-06-04 北京市商汤科技开发有限公司 Video recognition and training method and apparatus, electronic device, program, and medium
CN108090203A (en) * 2017-12-25 2018-05-29 上海七牛信息技术有限公司 Video classification methods, device, storage medium and electronic equipment
CN108090497A (en) * 2017-12-28 2018-05-29 广东欧珀移动通信有限公司 Video classification methods, device, storage medium and electronic equipment
CN108090497B (en) * 2017-12-28 2020-07-07 Oppo广东移动通信有限公司 Video classification method and device, storage medium and electronic equipment
CN108491856A (en) * 2018-02-08 2018-09-04 西安电子科技大学 A kind of image scene classification method based on Analysis On Multi-scale Features convolutional neural networks
US11393206B2 (en) * 2018-03-13 2022-07-19 Tencent Technology (Shenzhen) Company Limited Image recognition method and apparatus, terminal, and storage medium
CN110569795B (en) * 2018-03-13 2022-10-14 腾讯科技(深圳)有限公司 Image identification method and device and related equipment
CN110569795A (en) * 2018-03-13 2019-12-13 腾讯科技(深圳)有限公司 Image identification method and device and related equipment
WO2019174439A1 (en) * 2018-03-13 2019-09-19 腾讯科技(深圳)有限公司 Image recognition method and apparatus, and terminal and storage medium
CN108647599A (en) * 2018-04-27 2018-10-12 南京航空航天大学 In conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network
CN108510012A (en) * 2018-05-04 2018-09-07 四川大学 A kind of target rapid detection method based on Analysis On Multi-scale Features figure
CN108510012B (en) * 2018-05-04 2022-04-01 四川大学 Target rapid detection method based on multi-scale feature map
CN109002766B (en) * 2018-06-22 2021-07-09 北京邮电大学 Expression recognition method and device
CN109002766A (en) * 2018-06-22 2018-12-14 北京邮电大学 A kind of expression recognition method and device
CN109115501A (en) * 2018-07-12 2019-01-01 哈尔滨工业大学(威海) A kind of Civil Aviation Engine Gas path fault diagnosis method based on CNN and SVM
CN109165682A (en) * 2018-08-10 2019-01-08 中国地质大学(武汉) A kind of remote sensing images scene classification method merging depth characteristic and significant characteristics
CN109165682B (en) * 2018-08-10 2020-06-16 中国地质大学(武汉) Remote sensing image scene classification method integrating depth features and saliency features
CN109145840A (en) * 2018-08-29 2019-01-04 北京字节跳动网络技术有限公司 video scene classification method, device, equipment and storage medium
CN109145840B (en) * 2018-08-29 2022-06-24 北京字节跳动网络技术有限公司 Video scene classification method, device, equipment and storage medium
CN109697453A (en) * 2018-09-30 2019-04-30 中科劲点(北京)科技有限公司 Semi-supervised scene classification recognition methods, system and device based on multimodality fusion
CN109257622A (en) * 2018-11-01 2019-01-22 广州市百果园信息技术有限公司 A kind of audio/video processing method, device, equipment and medium
CN109376696B (en) * 2018-11-28 2020-10-23 北京达佳互联信息技术有限公司 Video motion classification method and device, computer equipment and storage medium
CN109376696A (en) * 2018-11-28 2019-02-22 北京达佳互联信息技术有限公司 Method, apparatus, computer equipment and the storage medium of video actions classification
CN110033505A (en) * 2019-04-16 2019-07-19 西安电子科技大学 A kind of human action capture based on deep learning and virtual animation producing method
CN110220585A (en) * 2019-06-20 2019-09-10 广东工业大学 A kind of bridge vibration test method and relevant apparatus
WO2021031523A1 (en) * 2019-08-21 2021-02-25 创新先进技术有限公司 Document recognition method and device
CN110516737A (en) * 2019-08-26 2019-11-29 南京人工智能高等研究院有限公司 Method and apparatus for generating image recognition model
CN110516737B (en) * 2019-08-26 2023-05-26 南京人工智能高等研究院有限公司 Method and device for generating image recognition model
WO2021093468A1 (en) * 2019-11-15 2021-05-20 腾讯科技(深圳)有限公司 Video classification method and apparatus, model training method and apparatus, device and storage medium
US11967151B2 (en) 2019-11-15 2024-04-23 Tencent Technology (Shenzhen) Company Limited Video classification method and apparatus, model training method and apparatus, device, and storage medium
CN111145222A (en) * 2019-12-30 2020-05-12 浙江中创天成科技有限公司 Fire detection method combining smoke movement trend and textural features
WO2021248432A1 (en) * 2020-06-12 2021-12-16 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for performing motion transfer using a learning model
US20210390713A1 (en) * 2020-06-12 2021-12-16 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for performing motion transfer using a learning model
US11830204B2 (en) * 2020-06-12 2023-11-28 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for performing motion transfer using a learning model
CN111563488A (en) * 2020-07-14 2020-08-21 成都市映潮科技股份有限公司 Video subject content identification method, system and storage medium
CN112687022A (en) * 2020-12-18 2021-04-20 山东盛帆蓝海电气有限公司 Intelligent building inspection method and system based on video

Also Published As

Publication number Publication date
CN106599907B (en) 2019-11-29

Similar Documents

Publication Publication Date Title
CN106599907B (en) The dynamic scene classification method and device of multiple features fusion
Huang et al. Tracknet: A deep learning network for tracking high-speed and tiny objects in sports applications
CN108334848B (en) Tiny face recognition method based on generation countermeasure network
Wang et al. Hierarchical attention network for action recognition in videos
Tran et al. Two-stream flow-guided convolutional attention networks for action recognition
CN109816689A (en) A kind of motion target tracking method that multilayer convolution feature adaptively merges
Li et al. Rehar: Robust and efficient human activity recognition
CN112417990B (en) Examination student illegal behavior identification method and system
CN109919060A (en) A kind of identity card content identifying system and method based on characteristic matching
CN111383244B (en) Target detection tracking method
CN111209897A (en) Video processing method, device and storage medium
CN110020669A (en) A kind of license plate classification method, system, terminal device and computer program
Zhang et al. Moving foreground-aware visual attention and key volume mining for human action recognition
CN108921023A (en) A kind of method and device of determining low quality portrait data
Ganesh et al. A novel framework for fine grained action recognition in soccer
Chen et al. A trajectory-based ball tracking framework with enrichment for broadcast baseball videos
Rachmadi et al. Combined convolutional neural network for event recognition
Varini et al. Egocentric video summarization of cultural tour based on user preferences
US20220044423A1 (en) Ball trajectory tracking
CN110689066B (en) Training method combining face recognition data equalization and enhancement
Lazarescu et al. Using camera motion to identify types of American football plays
Zhou et al. Feature sampling strategies for action recognition
Wang et al. Dynamic tracking attention model for action recognition
CN114898290A (en) Real-time detection method and system for marine ship
Afzal et al. Reinforcement Learning based Video Summarization with Combination of ResNet and Gated Recurrent Unit.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant