CN110363794A - Light stream prediction technique between video successive frame - Google Patents

Light stream prediction technique between video successive frame Download PDF

Info

Publication number
CN110363794A
CN110363794A CN201910645583.6A CN201910645583A CN110363794A CN 110363794 A CN110363794 A CN 110363794A CN 201910645583 A CN201910645583 A CN 201910645583A CN 110363794 A CN110363794 A CN 110363794A
Authority
CN
China
Prior art keywords
network
light stream
feature
layer
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910645583.6A
Other languages
Chinese (zh)
Inventor
王传旭
刘帅
丰艳
闫春娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University of Science and Technology
Original Assignee
Qingdao University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University of Science and Technology filed Critical Qingdao University of Science and Technology
Priority to CN201910645583.6A priority Critical patent/CN110363794A/en
Publication of CN110363794A publication Critical patent/CN110363794A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses the light stream prediction techniques between a kind of video successive frame, are related to computer machine vision technique field, comprising the following steps: step a: by can deformation convolution unit extract consecutive frame space characteristics;Step b: fusion reconstruct is carried out to consecutive frame space characteristics;Step c: deconvolution operation is carried out to the feature of fusion reconstruct, constructs network stack;Step d: loss function training network stack is utilized;Step e: output result.The beneficial effects of the present invention are: convolution kernel is optimized from basic structure type, by fixed rectangular convolution be changed to can deformation convolution, improve precision of prediction and saved calculation resources;By training by the feature reconstruction of fusion and allocation of parameters and channel weight again, thus it is a small amount of increase calculating cost in the case where retain the feature correlation of consecutive frame to greatest extent.

Description

Light stream prediction technique between video successive frame
Technical field
Light stream prediction side the present invention relates to computer machine vision technique field, between especially a kind of video successive frame Method.
Background technique
Light stream prediction can be applied to trajectory track, and the fields such as preceding background segment and human behavior identification are computer views Feel the common method and major issue in research.Light stream includes space motion object pixel motion on observation imaging plane The information such as instantaneous velocity and displacement vector, light stream image indicate image object movement by way of color gamut and spatial domain combine State.Relative to other image studies methods, the emphasis of light stream is essentially consisted in " movement ", and light stream not only includes observed object Motion information also includes the abundant information in relation to scenery three-dimensional structure.Light stream prediction referred to using pixel in image sequence in the time The correlation between variation and consecutive frame on domain finds previous frame with corresponding relationship existing between present frame, to count The method for calculating the motion information (light stream figure) of object between consecutive frame.
The previous algorithm that light stream prediction is carried out using convolutional neural networks, in convolution characteristic extraction part usually using fixed The rectangular convolution kernel of size shape;This convolution form limits network in the ability adjusted to different images body self-adaptation, The universality being displaced to pixel size between consecutive frame is poor, also will cause the waste of computing resource.
The key of light stream prediction algorithm is to calculate the sports ground of adjacent interframe, therefore the association fusion of adjacent two frames feature It is the most important thing solved the problems, such as, and existing method is directly superimposed or is based on window mobile computing correlation in precision and time There are obvious shortcoming, part edge details easy to be lost in fusion process in complexity, precision is affected.
Summary of the invention
The purpose of the present invention is to solve the light stream forecasting problems between video image successive frame, retain to greatest extent empty Between structure and the feature correlation information of consecutive frame, promote precision of prediction, devise the light stream between a kind of video successive frame Prediction technique.
To achieve the goals above, the technical scheme is that, the light stream prediction technique between a kind of video successive frame, The following steps are included: step a: by can deformation convolution unit extract consecutive frame space characteristics;Step b: to consecutive frame space spy Sign carries out fusion reconstruct;Step c: deconvolution operation is carried out to the feature of fusion reconstruct, constructs network stack;Step d: damage is utilized Lose function training network stack;Step e: output.
Further, in the step a, it is described can deformation convolution unit first layer be 7*7 can deformation convolutional layer, The second layer is traditional convolutional layer of 5*5, and traditional convolutional layer that third layer and the 4th layer are 3*3, the step-length of each convolutional layer is 2; It is described can convolution operation used in deformation convolutional layer are as follows:
In formula, P0For can the exported Feature Mapping u of deformation convolutional layer a bit, the input feature vector that wherein x represents this layer reflects It penetrates or original image, R is covered on the region of u by convolution kernel, w is weighted value, PnIt is R enumerating in x institute overlay area.
Further, in the step b, the consecutive frame space characteristics by the overall situation by channel be averaged pondization convert The scalar of port number is characterized for a length;It includes full connection, ReLU activation primitive, full connection that the scalar, which is sent into one, With the full articulamentum of Sigmoid activation primitive, the weight vectors as fusion feature are obtained by subsequent training operation; Using the weight vectors as the selection parameter in feature channel, then by multiplication by channel weighting to consecutive frame space characteristics On, complete the recalibration to primitive character on channel dimension.
Further, in the step c, by deconvolution operation by the feature amplification of the fusion reconstruct and close to light Then resolution ratio is restored to original size by up-sampling by stream.
Further, in the step c, the building network stack includes network one, network two and network three, described Network one using can deformation convolution sum SE-net, network two simultaneously removes SE-net using traditional convolution kernel, and the network three uses Traditional convolution kernel adds SE-net, and the network one is connected in parallel with network two and being connected in series through for network three.
Further, the network two transmits prediction light stream and loss amount to network three.
Further, in the step d, the ground truch and output light stream mean end-point mistake in data set are used Difference is used as loss function, calculates loss after output light stream is carried out primary interpolation.
Further, the strategy of the trained network is first the first hierarchical network of stand-alone training, then again by the guidance The fixed training undernet of parameters within network weight, until all-ones subnet network training is completed, in the last layer addition synthesis mould Block finely tunes synthesis module under fixed upper layer network inner parameter.
Further, the step e includes output result light stream image corresponding with output video consecutive frame.
The beneficial effects of the present invention are: convolution kernel is optimized from basic structure type, by fixed rectangular convolution Be changed to can deformation convolution, improve precision of prediction and saved calculation resources;By training the feature reconstruction of fusion and again Allocation of parameters and channel weight, so that the feature for retaining consecutive frame to greatest extent in the case where a small amount of increase calculates cost is related Property.
Detailed description of the invention
Fig. 1 is application scheme flow chart;
Fig. 2 is traditional convolutional layer operation chart;
Fig. 3 is can deformation convolution kernel schematic diagram;
Fig. 4 be can deformation convolution schematic diagram;
Fig. 5 be added offset can deformation pond schematic diagram;
Fig. 6 is the operation principle schematic diagram of SE-net
Fig. 7 is complete sub-network structure figure;
Fig. 8 is complete network stack diagram.
Specific embodiment
It is of the invention to reach the technical means and efficacy that predetermined goal of the invention is taken further to illustrate, below in conjunction with Attached drawing and preferred embodiment, to specific embodiment, structure, feature and its effect according to the present invention, detailed description are as follows:
The application is for the light stream forecasting problem solved between video image successive frame.Firstly, the application use can deformation The self-adaptive features of adjacent two frame of convolution feature extracting method, to greatest extent retaining space structural information.Secondly, by two parts sky Between feature superposition be input to SE-net feature reconstruction module and calculate correlation and restructuring allocation channel weight, obtain fusion feature; Fusion feature is subjected to up-sampling again and deconvolution operates, is recycled after obtaining rough light stream image by network stack and warp The method of optimization is corrected, and parameters within network weight learns to adjust by sample training.
A kind of light stream prediction technique between video successive frame, the process with reference to shown in Fig. 1, comprising the following steps:
Step a: by can deformation convolution unit extract consecutive frame space characteristics.
Core of the convolution operation as convolutional neural networks, be usually seen as on local receptor field, will spatially The condensate that information in information and characteristic dimension is polymerize, the concrete operations of traditional convolutional layer are as shown in Fig. 2, traditional convolution Layer one image of input or characteristic pattern, convolution kernel is mobile in input and is calculated with overlay area and is mapped to next layer Grade, this calculating is subchannel.Traditional square convolution kernel operating principle can be formulated are as follows:
In formula, P0For can the exported Feature Mapping u of deformation convolutional layer a bit, the input feature vector that wherein x represents this layer reflects It penetrates or original image, R is covered on the region of u by convolution kernel, w is weighted value, PnIt is R enumerating in x institute overlay area.
This layer output has just been obtained after multiple channels as above operate.
In the application, can deformation convolution unit first layer be 7*7 can deformation convolutional layer, the second layer be 5*5 tradition Convolutional layer, third layer and the 4th layer of traditional convolutional layer for 3*3, the step-length of each convolutional layer are 2;
It can convolution operation used in deformation convolutional layer are as follows:
In formula, P0For can the exported Feature Mapping u of deformation convolutional layer a bit, the input feature vector that wherein x represents this layer reflects It penetrates or original image, R is covered on the region of u by convolution kernel, w is weighted value, PnIt is R enumerating in x institute overlay area, △PnFor offset.
The previous convolution in light stream prediction field is square convolution kernel, and the form of this square convolution kernel limits net Network is in the space that image adaptive optimizes, and the universality being displaced to pixel size between frame is poor, and calculated performance is low.In Fig. 3, figure It (a) is traditional convolution kernel, figure (b), figure (c) and figure (d) are can deformation convolution kernel.The application improves convolution kernel, Be added have adaptivity can deformation convolution, first layer using can deformation convolution kernel, be with the difference of traditional convolution kernel It both increases the variable △ Pn of an offset in the position of each sampled point.By these variables, convolution kernel can work as Front position nearby arbitrarily samples, and the regular lattice point before being no longer limited to.In fact, increased inclined in deformable convolutional layer Shifting amount is a part of network structure, is calculated by another parallel Standard convolution layer, and then can also pass through ladder Degree backpropagation is learnt end to end.
In addition after the study of the offset, as shown in figure 4, the size and location of deformable convolution kernel can be according to current The picture material for needing to identify carries out dynamic adjustment, and visual effect is exactly the convolution kernel sampling point position of different location can basis Adaptive variation occurs for picture material, to adapt to the geometric deformations such as the shape of different objects, size.
Step b: fusion reconstruct is carried out to consecutive frame space characteristics.
In this step, consecutive frame space characteristics are converted into a length and are characterized by the pond that is averaged of the overall situation by channel The scalar of port number.Convolutional neural networks for light stream prediction would generally establish two relatively independent nets for consecutive frame image Then the two streams are combined carry out subsequent processing in some stage by network stream.The movement and three of light stream expression image object Tie up information, therefore find both relevance and reservation be the key that the step.The SE-net that the application uses is to pass through foundation Relation of interdependence between fusion feature mapping channel, and feature recalibration (reconstruct) is carried out as next step input. With reference to Fig. 6, if giving an input X, feature port number is C ', and wide and high respectively W ', H ' pass through a series of convolution transforms It is C, the Feature Mapping U of wide and a height of W, H that a feature port number is obtained after Ftr.Pass through three operations realizations pair after this The reconstruct of Feature Mapping, the parameters weighting of each operation are all obtained simultaneously finally obtaining the important of C feature channel by learning training Degree, and promote important feature on this basis and inhibit secondary feature.
Squeeze operation, i.e., Feature Mapping is compressed on Spatial Dimension, and each two-dimensional feature channel is become one A real number, the feature port number phase of dimension and input that this real number has global receptive field in a way, and exports Matching, essence of this step operation are that an overall situation is averaged pond.The two dimensional character uc that channel is c in U adds up on entire area to be taken Average, final each channel obtain a scalar zc, the one-dimensional variable z that C combination of channels is C at a length, expression spy The global numeric distribution situation that sign mapping U is responded in channel C, and make the layer close to input that can also obtain the overall situation Receptive field, concrete operations can be formulated as:
It include the full connection of full connection, ReLU activation primitive, full connection and Sigmoid activation primitive by scalar feeding one Layer obtains the weight vectors as fusion feature by subsequent training operation.It is operated using Excitation, it is one Similar to the mechanism of door in Recognition with Recurrent Neural Network.Weight is generated for each feature channel by parameter W, wherein parameter W is learned Commonly use the correlation for carrying out explicitly Modelling feature interchannel.The mark between C 0 to 1 is obtained by FC-ReLU-FC-Sigmoid Amount, as each channel weight, then each channel of original output channel is weighted (corresponding channel with corresponding weight Each element is multiplied respectively with weight), the feature after obtaining new weighting.Following formula illustrates the step operation, first uses W1 here Multiplied by z, i.e., full articulamentum FC (Fully Connected Layer) operation, the dimension of W1 is C/r*C, this r is a contracting Parameter is put, this programme takes 16, and the purpose of this parameter is to reduce by channel number to reduce calculation amount.Again because of z's Dimension is 1/C, so W1z result dimension is 1/C/r;Then using a ReLU (Rectified Linear Unit) line Property rectification layer, it is still 1/C/r that it is constant, which to export dimension, here training when it is adjustable;Be multiplied again with W2 be multiplied with W2 be also one The process of a full articulamentum, the dimension of W2 are C*C/r, therefore output dimension here is exactly 1/C;Finally using sigmoid Activation primitive obtains the weight vectors s that length is C.
S=Fex(z, W)=σ (g (z, W))=σ (W2δ(W1z))
Using weight vectors as the selection parameter in feature channel, then by multiplication by channel weighting to consecutive frame space spy In sign, the recalibration to primitive character on channel dimension is completed.It is the operation of a Reweight, by Excitation Output weight regard as into cross feature selecting after each feature channel importance, then by multiplication by channel weighting Onto previous feature, the recalibration to primitive character on channel dimension is completed.
In formula, UcFor c-th of channel characteristics of U, the weight that the c that Sc is weight vectors S is tieed up,For output.
Step c: deconvolution operation is carried out to the feature of fusion reconstruct, constructs network stack;
It is then by up-sampling that resolution ratio is extensive by deconvolution operation by the feature amplification of fusion reconstruct and close to light stream Original size is arrived again.Deconvolution operation is the inverse operation of convolution operation, and major function is amplification characteristic mapping, improves image resolution Rate keeps picture material abundant.The output F (rough light stream) of upper one layer of deconvolution is added in deconvolution step as reference, leads to This mode is crossed, the high-level information that thicker characteristic pattern is transmitted both has been remained, the essence provided in low-level image feature figure is also provided Thin local message.As shown in fig. 7, having carried out four deconvolution operation in this step, the light stream figure finally obtained is by above adopting Sample (bilinear interpolation) is restored to the clarity of original image rank, and to here, whole processes of a sub-network terminate.
In order to optimize final result, network performance is promoted, this patent is in the subsequent part that joined network stack and warp.
Building network stack includes network one, network two and network three, is three kinds of structures son different with internal module Network.Network one using can deformation convolution sum SE-net, network two simultaneously removes SE-net using traditional convolution kernel, and network three uses Traditional convolution kernel adds SE-net, and network one is connected in parallel with network two and being connected in series through for network three.Experiment shows using can shape Becoming network when convolution does not add SE-net module can not restrain, therefore not do the network settings of the combination herein.Experimental result card It is bright optimal according to the structure progress sub-network storehouse combined effect of Fig. 8, show that prediction light stream is smooth.
Preferably, network two transmits prediction light stream and loss amount to network three, and undernet is made to be absorbed in study video phase The lost motion of adjacent frame, loss are operated to obtain, be shown below by warp:
I′2=(I1,F)
C=| | I2-I′2||
In formula, I1、I2Respectively video consecutive frame, F are that sub-network exports light stream, I '2For first frame and output light stream mapping Image (being similar to the second frame) out, C is loss amount.It can effectively prevent the network over-fitting of heap poststack by Warp operation And optimize precision simultaneously.
Step d: loss function training network stack is utilized
The mode of learning of the application network is supervised learning, uses the ground truch and output light levelling in data set Equal end point error constantly adjusts network parameter as loss function, calculates loss after output light stream is carried out primary interpolation.Cause Data set and strategy used by this is trained will largely influence network performance, although and network stack can have Effect improves precision of prediction, but has following disadvantage: 1, the complicated network structure is huge, and training speed is slow and is easy to appear quasi- The case where closing or not restraining;2, the sub-network downlink shared information in the more tributaries of multi-layer, does not only result in the transmitting of error, can also draw Play the problem of costing bio disturbance confusion;3, heap stack network needs a large amount of calculating cost, is easy to cause in the lesser equipment of memory Insufficient space.Therefore herein using the strategy of substep training.Specific network training strategy is: first the first level of stand-alone training net Then guidance parameters within network weight is fixed training undernet again by network, until all-ones subnet network training is completed, last One layer of addition synthesis module finely tunes synthesis module under fixed upper layer network inner parameter.Wherein, original sample used by training It should work as and have ground truth, and suitable sample data is chosen according to model application.And it should contain in sample It blocks, obscure and the case where big displacement, so that e-learning is suitable for handling these situations.
Step e: output.
Step e includes output result light stream image corresponding with output video consecutive frame.
Above with reference to preferred embodiment, invention has been described, but protection scope of the present invention is not restricted to This can carry out various improvement to it and can be replaced wherein with equivalent without departing from the scope of the invention Component, as long as be not present structural conflict, it is mentioned in the various embodiments items technical characteristic can combine in any way Get up, and any reference signs in the claims should not be construed as limiting the involved claims, no matter comes from which point It sees, the present embodiments are to be considered as illustrative and not restrictive.Therefore, any to fall within the scope of the appended claims All technical solutions be within the scope of the invention.

Claims (9)

1. the light stream prediction technique between a kind of video successive frame, which comprises the following steps:
Step a: by can deformation convolution unit extract consecutive frame space characteristics;
Step b: fusion reconstruct is carried out to consecutive frame space characteristics;
Step c: deconvolution operation is carried out to the feature of fusion reconstruct, constructs network stack;
Step d: loss function training network stack is utilized;
Step e: output.
2. the light stream prediction technique between video successive frame according to claim 1, which is characterized in that in the step a In, it is described can deformation convolution unit first layer be 7*7 can deformation convolutional layer, the second layer be 5*5 traditional convolutional layer, third Layer and the 4th layer of traditional convolutional layer for 3*3, the step-length of each convolutional layer are 2;It is described can convolution operation used in deformation convolutional layer Are as follows:
In formula, P0For can the exported Feature Mapping u of deformation convolutional layer a bit, wherein x represent this layer input feature vector mapping or Original image, R are covered on the region of u by convolution kernel, and w is weighted value, PnIt is R enumerating in x institute overlay area.
3. the light stream prediction technique between video successive frame according to claim 1, which is characterized in that in the step b In, the consecutive frame space characteristics are converted into the mark that a length is characterized port number by the pond that is averaged of the overall situation by channel Amount;It include the full connection of full connection, ReLU activation primitive, full connection and Sigmoid activation primitive by scalar feeding one Layer obtains the weight vectors as fusion feature by subsequent training operation;Using the weight vectors as feature channel Selection parameter, then by multiplication by channel weighting to consecutive frame space characteristics, complete on channel dimension to original The recalibration of feature.
4. the light stream prediction technique between video successive frame according to claim 1, which is characterized in that in the step c In, it is then by up-sampling that resolution ratio is extensive by deconvolution operation by the feature amplification of the fusion reconstruct and close to light stream Original size is arrived again.
5. the light stream prediction technique between video successive frame according to claim 1, which is characterized in that in the step c In, the building network stack includes network one, network two and network three, and the network one uses can deformation convolution sum SE- Net, network two is using traditional convolution kernel and removes SE-net, and the network three adds SE-net, the net using traditional convolution kernel Network one is connected in parallel with network two and being connected in series through for network three.
6. the light stream prediction technique between video successive frame according to claim 5, which is characterized in that the network two to The transmitting of network three prediction light stream and loss amount.
7. the light stream prediction technique between video successive frame according to claim 1, which is characterized in that in the step d In, use the groundtruch in data set, as loss function, output light stream to be carried out with the equal end point error of output light levelling Loss is calculated after interpolation.
8. the light stream prediction technique between video successive frame according to claim 1, which is characterized in that the trained network Strategy be first the first hierarchical network of stand-alone training, then again by the fixed training junior's net of the guidance parameters within network weight Network is added synthesis module in the last layer, finely tunes under fixed upper layer network inner parameter until all-ones subnet network training completion Synthesis module.
9. the light stream prediction technique between a kind of video successive frame according to claim 1, the step e includes output knot Fruit light stream image corresponding with output video consecutive frame.
CN201910645583.6A 2019-07-17 2019-07-17 Light stream prediction technique between video successive frame Pending CN110363794A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910645583.6A CN110363794A (en) 2019-07-17 2019-07-17 Light stream prediction technique between video successive frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910645583.6A CN110363794A (en) 2019-07-17 2019-07-17 Light stream prediction technique between video successive frame

Publications (1)

Publication Number Publication Date
CN110363794A true CN110363794A (en) 2019-10-22

Family

ID=68219906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910645583.6A Pending CN110363794A (en) 2019-07-17 2019-07-17 Light stream prediction technique between video successive frame

Country Status (1)

Country Link
CN (1) CN110363794A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110830808A (en) * 2019-11-29 2020-02-21 合肥图鸭信息科技有限公司 Video frame reconstruction method and device and terminal equipment
CN110944212A (en) * 2019-11-29 2020-03-31 合肥图鸭信息科技有限公司 Video frame reconstruction method and device and terminal equipment
CN111683256A (en) * 2020-08-11 2020-09-18 蔻斯科技(上海)有限公司 Video frame prediction method, video frame prediction device, computer equipment and storage medium
CN112085717A (en) * 2020-09-04 2020-12-15 厦门大学 Video prediction method and system for laparoscopic surgery
CN113838102A (en) * 2021-09-26 2021-12-24 南昌航空大学 Optical flow determination method and system based on anisotropic dense convolution
CN114511485A (en) * 2022-01-29 2022-05-17 电子科技大学 Compressed video quality enhancement method based on cyclic deformable fusion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710826A (en) * 2018-04-13 2018-10-26 燕山大学 A kind of traffic sign deep learning mode identification method
CN109784150A (en) * 2018-12-06 2019-05-21 东南大学 Video driving behavior recognition methods based on multitask space-time convolutional neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710826A (en) * 2018-04-13 2018-10-26 燕山大学 A kind of traffic sign deep learning mode identification method
CN109784150A (en) * 2018-12-06 2019-05-21 东南大学 Video driving behavior recognition methods based on multitask space-time convolutional neural networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALEXEY DOSOVITSKIY: "FlowNet: Learning Optical Flow with Convolutional Networks", 《IEEE》 *
EDDY ILG: "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks", 《IEEE》 *
JIFENG DAI: "Deformable Convolutional Networks", 《ARXIV:1703.06211V3》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110830808A (en) * 2019-11-29 2020-02-21 合肥图鸭信息科技有限公司 Video frame reconstruction method and device and terminal equipment
CN110944212A (en) * 2019-11-29 2020-03-31 合肥图鸭信息科技有限公司 Video frame reconstruction method and device and terminal equipment
CN111683256A (en) * 2020-08-11 2020-09-18 蔻斯科技(上海)有限公司 Video frame prediction method, video frame prediction device, computer equipment and storage medium
CN112085717A (en) * 2020-09-04 2020-12-15 厦门大学 Video prediction method and system for laparoscopic surgery
CN112085717B (en) * 2020-09-04 2024-03-19 厦门大学 Video prediction method and system for laparoscopic surgery
CN113838102A (en) * 2021-09-26 2021-12-24 南昌航空大学 Optical flow determination method and system based on anisotropic dense convolution
CN113838102B (en) * 2021-09-26 2023-06-06 南昌航空大学 Optical flow determining method and system based on anisotropic dense convolution
CN114511485A (en) * 2022-01-29 2022-05-17 电子科技大学 Compressed video quality enhancement method based on cyclic deformable fusion
CN114511485B (en) * 2022-01-29 2023-05-26 电子科技大学 Compressed video quality enhancement method adopting cyclic deformable fusion

Similar Documents

Publication Publication Date Title
CN110363794A (en) Light stream prediction technique between video successive frame
CN113874883A (en) Hand pose estimation
CN110188795A (en) Image classification method, data processing method and device
CN106250931A (en) A kind of high-definition picture scene classification method based on random convolutional neural networks
CN105069752B (en) The optics ocean clutter cancellation method of chaos during based on sky
CN112489164B (en) Image coloring method based on improved depth separable convolutional neural network
CN104599290B (en) Video sensing node-oriented target detection method
CN110570363A (en) Image defogging method based on Cycle-GAN with pyramid pooling and multi-scale discriminator
CN110717532A (en) Real-time detection method for robot target grabbing area based on SE-RetinaGrasp model
TWI226193B (en) Image segmentation method, image segmentation apparatus, image processing method, and image processing apparatus
CN107680116A (en) A kind of method for monitoring moving object in video sequences
CN110705344A (en) Crowd counting model based on deep learning and implementation method thereof
CN110222760A (en) A kind of fast image processing method based on winograd algorithm
CN112288776B (en) Target tracking method based on multi-time step pyramid codec
CN113449691A (en) Human shape recognition system and method based on non-local attention mechanism
CN110909591A (en) Self-adaptive non-maximum value inhibition processing method for pedestrian image detection by using coding vector
CN115019302A (en) Improved YOLOX target detection model construction method and application thereof
CN113688765A (en) Attention mechanism-based action recognition method for adaptive graph convolution network
CN111583340A (en) Method for reducing monocular camera pose estimation error rate based on convolutional neural network
CN104408697A (en) Image super-resolution reconstruction method based on genetic algorithm and regular prior model
CN116258757A (en) Monocular image depth estimation method based on multi-scale cross attention
CN116740121A (en) Straw image segmentation method based on special neural network and image preprocessing
CN115546654A (en) Grouping mixed attention-based remote sensing scene image classification method
CN117809200A (en) Multi-scale remote sensing image target detection method based on enhanced small target feature extraction
WO2021057091A1 (en) Viewpoint image processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191022