CN107808389A - Unsupervised methods of video segmentation based on deep learning - Google Patents

Unsupervised methods of video segmentation based on deep learning Download PDF

Info

Publication number
CN107808389A
CN107808389A CN201711004135.5A CN201711004135A CN107808389A CN 107808389 A CN107808389 A CN 107808389A CN 201711004135 A CN201711004135 A CN 201711004135A CN 107808389 A CN107808389 A CN 107808389A
Authority
CN
China
Prior art keywords
layer
convolutional
convolutional layer
segmentation
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711004135.5A
Other languages
Chinese (zh)
Other versions
CN107808389B (en
Inventor
宋利
许经纬
解蓉
张文军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201711004135.5A priority Critical patent/CN107808389B/en
Publication of CN107808389A publication Critical patent/CN107808389A/en
Application granted granted Critical
Publication of CN107808389B publication Critical patent/CN107808389B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a kind of unsupervised methods of video segmentation based on deep learning, including:Coding and decoding deep neural network is established, coding and decoding deep neural network includes:Still image segmentation flow network, inter-frame information segmentation flow network and UNE;Still image segmentation flow network is used to carry out current video frame prospect background dividing processing, and the prospect background that inter-frame information segmentation flow network is used to carry out the optical flow field information between current video frame and next frame of video moving object is split;After the segmentation figure picture of still image segmentation flow network and inter-frame information segmentation flow network output is merged by UNE, Video segmentation result is obtained.The still image segmentation flow network of the present invention is used to split in the frame of high quality, the optical flow field information that inter-frame information segmentation flow network is used for high quality is split, two-way exports the segmentation result after being got a promotion by last mixing operation, so as to obtain preferable segmentation result according to effective doubleway output and mixing operation.

Description

Unsupervised methods of video segmentation based on deep learning
Technical field
The present invention relates to technical field of video processing, in particular it relates to the unsupervised Video segmentation side based on deep learning Method.
Background technology
Video segmentation refers to that carrying out prospect background to the object in each frame of video splits the process that obtains binary map, its Difficult point is the density that should ensure spatial domain (in frame) segmentation, also to ensure the company of time-domain (inter-frame information) segmentation simultaneously Continuous property.The Video segmentation of high quality is video editing, VS identification, the basis of Video Semantic Analysis, thus with very Important meaning.
Existing methods of video segmentation can be roughly divided into following three class according to its principle:
1) based on unsupervised conventional video dividing method
Such method need not manually participate in marking key frame such as (the first frame) information, and general step is that image segmentation adds The similar Block- matching of interframe, the automatic given video of segmentation.If A.aktor and M.Irani et al. are being delivered BMVC in 2014 " every frame is handled to obtain some in the texts of Video segmentation by non-local consensus voting " one can The segmentation (object proposal) of object can be included, these segmentations is then based on and carries out interframe similarity detection, screening is similar Degree highest segmentation is used as segmentation result.The advantages of such method is not need manual intervention, but needs to calculate substantial amounts of segmentation Intermediate form such as super-pixel point (superpixels) etc., consumes substantial amounts of time and memory space.
2) based on semi-supervised conventional video dividing method
Such method generally requires artificial participation mark key frame (such as the first frame or former frames) information, then marks these The segmentation information being poured in is transmitted to follow-up all frames by way of interframe transmission.Such as Y.-H.Tsai, M.-H.Yang and M.J.Black et al. " proposes what CVPR in 2016 was delivered in the texts of Video segmentation via object flow " one All frames are put into a figure using the method for global figure, the side of figure represents the similarity of interframe, schemes eventually through solving Segmentation the first frame marked is transmitted to follow-up frame.This method is accuracy rate highest method in conventional method, because it The information of each frame is considered in optimization process, but due to the difficulty that global figure solves, the time for calculating segmentation greatly increases. This is also general character-segmentation accuracy rate height of such method but computation complexity is also very high simultaneously.
3) method based on deep learning
With the development of deep learning, deep neural network all achieves in fields such as image classification, segmentation, identifications to be compared Good result, but in video field be limited to the higher redundancy of time-domain it does not play powerful effect also completely. S.Caelles, K.Maninis, J.Pont-Tuset, L.Leal-Taixe, D.Cremers, and L.Van Gool et al. in What CVPR in 2017 was delivered " proposes that Video segmentation only needs in the texts of One-shot video object segmentation " one Single frames segmentation is carried out per frame to video, it is not necessary to rely on inter-frame information.They think that inter-frame information is redundancy, it is not necessary that , and reference can be used as without accurate inter-frame information in many cases, thus the scheme that they provide is training one Strong image segmentation network, then in the given video of segmentation, to the first frame or above some frames are accurately marked, with these frames Go to finely tune (finetune) big network, finally remove other frames for splitting the video with this network.What this method had an over-fitting can Energy property, and it is not applied for large-scale Video segmentation scene.
The content of the invention
For in the prior art the defects of, it is an object of the invention to provide a kind of unsupervised video based on deep learning point Segmentation method.
According to the unsupervised methods of video segmentation provided by the invention based on deep learning, including:
Coding and decoding deep neural network is established, the coding and decoding deep neural network includes:Still image segmentation stream Network, inter-frame information segmentation flow network and UNE;Wherein, the still image segmentation flow network is used for current video Frame carries out prospect background dividing processing, and the inter-frame information segmentation flow network is used for the current video frame and next frame of video Between optical flow field information carry out moving object prospect background segmentation;
The segmentation figure picture that the still image is split to flow network and inter-frame information segmentation flow network output is melted by described After conjunction network is merged, Video segmentation result is obtained.
Alternatively, it is described to establish coding and decoding deep neural network, including:
Still image segmentation flow network is established, and by having carried out the image of still image segmentation to the still image Segmentation flow network is trained;
Inter-frame information segmentation flow network is established, and by having carried out the video of inter-frame information segmentation to the inter-frame information Segmentation flow network is trained;
The coding and decoding deep neural network is trained using the video segmentation data marked completely.
Alternatively, the still image segmentation flow network includes:The coded portion and decoded portion that full convolutional network is formed, Wherein,
The full convolutional network of coded portion includes:The generalized convolution layer and join with layer 5 generalized convolution level that five levels join One layer of expansion convolutional layer, the expansion convolutional layer positioned at layer 6 includes the expansion of four class different scales, per a kind of composition One output road, the average value of the output result on four classes output road is the output result of the coded portion;
The full convolutional network of decoded portion is:The full convolutional network that three layers of cyclic convolution layer and three layers of up-sampling layer are formed; The full convolutional network of the decoded portion, for exporting the picture segmentation result consistent with inputting photo resolution.
Alternatively, in the full convolutional network of coded portion five layers of generalized convolution layer include cascade the first generalized convolution layer, Second generalized convolution layer, the 3rd generalized convolution layer, the 4th generalized convolution layer, the 5th generalized convolution layer, wherein:
First generalized convolution layer includes successively:Convolutional layer A11, active coating, convolutional layer A12, active coating, pond layer;
Second generalized convolution layer includes successively:Convolutional layer A21, active coating, convolutional layer A22, active coating, pond layer;
3rd generalized convolution layer includes successively:Convolutional layer A31, active coating, convolutional layer A32, active coating, convolutional layer A33, swash Work layer, pond layer;
4th generalized convolution layer includes successively:Convolutional layer A41, active coating, convolutional layer A42, active coating, convolutional layer A43, swash Work layer, pond layer;
5th generalized convolution layer includes successively:Convolutional layer A51, active coating, convolutional layer A52, active coating, convolutional layer A53, swash Work layer, pond layer;
The expansion convolutional layer joined in the full convolutional network of coded portion with layer 5 generalized convolution level includes:And The four classes expansion convolutional layer of connection, wherein:
First kind expansion convolutional layer includes successively:First yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, Active coating, random drop layer, convolutional layer;
Second class expansion convolutional layer includes successively:Second yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, Active coating, random drop layer, convolutional layer;
3rd class expansion convolutional layer includes successively:3rd yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, Active coating, random drop layer, convolutional layer;
4th class expansion convolutional layer includes successively:4th yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, Active coating, random drop layer, convolutional layer.
Alternatively, in the full convolutional network of decoded portion, every layer of up-sampling layer joins with corresponding cyclic convolution level, its In:
First up-sampling layer and the 3rd cyclic convolution level join, and the output that the first up-sampling layer is used for last layer is entered Row up-samples for twice;The 3rd cyclic convolution layer be used for by coded portion convolutional layer A33 output carry out process of convolution and and The output of first up-sampling layer carries out cyclic convolution operation;
Second up-sampling layer is cascaded with second circulation convolutional layer, and the output that the second up-sampling layer is used for last layer is entered Row up-samples for twice;The second circulation convolutional layer be used for by coded portion convolutional layer A22 output carry out process of convolution and and The output of second up-sampling layer carries out cyclic convolution operation;
3rd up-sampling layer is cascaded with first circulation convolutional layer, and the output that the 3rd up-sampling layer is used for last layer is entered Row up-samples for twice;The first circulation convolutional layer be used for by coded portion convolutional layer A12 output carry out process of convolution and and The output of 3rd up-sampling layer carries out cyclic convolution operation.
Alternatively, the image by having carried out still image segmentation is split flow network to the still image and carried out Training, including:
Choose ECSSD images partitioned data set, MSRA 10K images partitioned data sets and the images point of PASCAL VOC 2012 Cut the samples pictures in data set;
10 are expanded to after carrying out random cropping, mirror image, upset, zoom, affine transformation to the samples pictures4Magnitude Data bulk;
Fixed decoded portion, go to train coded portion using 60% data, until coded portion is restrained;
The still image segmentation flow network is trained using 100% training data;Wherein, the coded portion uses receipts Weights when holding back are initialized, and decoded portion carries out random initializtion.
Alternatively, the inter-frame information segmentation flow network includes:The coded portion for the mutual cascade that full convolutional network is formed And decoded portion;Wherein:
The full convolutional network of coded portion includes:The generalized convolution layer and join with layer 5 generalized convolution level that five levels join One layer of expansion convolutional layer, the expansion convolutional layer positioned at layer 6 includes the expansion of four class different scales, per a kind of composition One output road, the average value of the output result on four classes output road is the output result of the coded portion;
Five layers of generalized convolution layer include the first generalized convolution layer of cascade, the second broad sense in the full convolutional network of coded portion Convolutional layer, the 3rd generalized convolution layer, the 4th generalized convolution layer, the 5th generalized convolution layer, wherein:
First generalized convolution layer includes successively:Convolutional layer B11, active coating, convolutional layer B12, active coating, pond layer;
Second generalized convolution layer includes successively:Convolutional layer B21, active coating, convolutional layer B22, active coating, pond layer;
3rd generalized convolution layer includes successively:Convolutional layer B31, active coating, convolutional layer B32, active coating, convolutional layer B33, swash Work layer, pond layer;
4th generalized convolution layer includes successively:Convolutional layer B41, active coating, convolutional layer B42, active coating, convolutional layer B43, swash Work layer, pond layer;
5th generalized convolution layer includes successively:Convolutional layer B51, active coating, convolutional layer B52, active coating, convolutional layer B53, swash Work layer, pond layer;
The expansion convolutional layer joined in the full convolutional network of coded portion with layer 5 generalized convolution level includes:And The four classes expansion convolutional layer of connection, wherein:
First kind expansion convolutional layer includes successively:First yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, Active coating, random drop layer, convolutional layer;
Second class expansion convolutional layer includes successively:Second yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, Active coating, random drop layer, convolutional layer;
3rd class expansion convolutional layer includes successively:3rd yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, Active coating, random drop layer, convolutional layer;
4th class expansion convolutional layer includes successively:4th yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, Active coating, random drop layer, convolutional layer;
The full convolutional network of decoded portion is:The full convolutional network that three layers of cyclic convolution layer and three layers of up-sampling layer are formed; The full convolutional network of the decoded portion, for exporting the picture segmentation result consistent with inputting photo resolution;Wherein:
In the full convolutional network of decoded portion, every layer of up-sampling layer joins with corresponding cyclic convolution level, wherein:
First up-sampling layer and the 3rd cyclic convolution level join, and the output that the first up-sampling layer is used for last layer is entered Row up-samples for twice;The 3rd cyclic convolution layer be used for by coded portion convolutional layer B33 output carry out process of convolution and and The output of first up-sampling layer carries out cyclic convolution operation;
Second up-sampling layer is cascaded with second circulation convolutional layer, and the output that the second up-sampling layer is used for last layer is entered Row up-samples for twice;The second circulation convolutional layer be used for by coded portion convolutional layer B22 output carry out process of convolution and and The output of second up-sampling layer carries out cyclic convolution operation;
3rd up-sampling layer is cascaded with first circulation convolutional layer, and the output that the 3rd up-sampling layer is used for last layer is entered Row up-samples for twice;The first circulation convolutional layer be used for by coded portion convolutional layer B12 output carry out process of convolution and and The output of 3rd up-sampling layer carries out cyclic convolution operation.
Alternatively, the video by having carried out inter-frame information segmentation is split flow network to the inter-frame information and carried out Training, including:
The training video collection VID that VS detects in ILSVRC2015 is collected, wherein, in the training video collection VID There is the indicia framing of complete object detection;
The still image segmentation flow network obtained using training does image segmentation to video set VID every frame, obtains prospect Background segment result;
Calculate the optical flow field of each video interframe and preserve optical flow field information corresponding to the every frame of video into RGB and scheme;
The correct image of segmentation is filtered out with reference to the indicia framing in the training video collection VID according to default screening strategy Initial training image of the segmentation result as inter-frame information segmentation flow network;Wherein, the screening strategy meets following bar Part:
First:Carrying out the result of image segmentation per frame to video, to occupy the scope of object detection flag frame be 75% to arrive 90%;
Second:The average light stream range value for the optical flow field RGB figures being calculated is between 5 to 100;
10 are expanded to after the initial training image is carried out into random cropping, mirror image, upset, zoom, affine transformation4 The data bulk of magnitude;
Fixed decoded portion, go to train coded portion using 60% data, until coded portion is restrained;
The inter-frame information segmentation flow network is trained using 100% training data;Wherein, the coded portion uses receipts Weights when holding back are initialized, and decoded portion carries out random initializtion.
Alternatively, the UNE includes:Articulamentum, convolutional layer, active coating, convolutional layer, active coating;Wherein:
The articulamentum is used to connect the still image segmentation flow network and inter-frame information segmentation flow network, and leads to Cross convolutional layer, active coating, convolutional layer, active coating and split flow network and inter-frame information segmentation flow network to the still image Output result merged, obtain final Video segmentation result.
Alternatively, still image segmentation flow network and inter-frame information the segmentation flow network carries out network in the training process The real-time update of parameter.
Compared with prior art, the present invention has following beneficial effect:
Unsupervised methods of video segmentation provided by the invention based on deep learning, split by structure comprising still image The double-current Video segmentation network of flow network and inter-frame information segmentation flow network, wherein, still image segmentation flow network is used for high-quality Split in the frame of amount, the optical flow field information that inter-frame information segmentation flow network is used for high quality is split, and two-way output passes through last Mixing operation get a promotion after segmentation result;Block when existing, move the conventional method such as slow and can not completely solve the problems, such as When, the present invention still can obtain preferable segmentation result according to effective doubleway output and mixing operation.
Brief description of the drawings
The detailed description made by reading with reference to the following drawings to non-limiting example, further feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is the schematic diagram of the unsupervised methods of video segmentation of the invention based on deep learning;
Fig. 2 is the principle schematic of cyclic convolution layer in the decoding network that the present invention uses;
Fig. 3 is the effect that generation inter-frame information proposed by the present invention splits the screening strategy of data set needed for flow network training Schematic diagram;
Fig. 4 is the embodiment of the present invention in current best unsupervised approaches and has the comparative result figure of measure of supervision.Wherein, Fast video splits (Fast Object Segmentation in Unconstrained Video, FST) method and object stream Video segmentation (Video Segmentation via Object Flow, OFL) method be respectively it is current best unsupervised and Semi-supervised method.
Embodiment
With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill to this area For personnel, without departing from the inventive concept of the premise, some changes and improvements can also be made.These belong to the present invention Protection domain.
As shown in figure 1, this implementation provides a kind of unsupervised methods of video segmentation based on deep learning, specific implementation details As follows, be practiced without describing in detail below is carried out partly referring to the content of the invention.
Two road networks, including still image segmentation stream and inter-frame information segmentation flow network are built first.The knot of two road networks Structure is identical, and they are all based on coding-decoding architecture:Wherein coded portion is a full convolutional network, including five layers wide Adopted convolutional layer (three first layers have convolutional layer, pond layer and active coating, latter two layers do not have pond layer) and last layer " expansion " Convolutional layer.Wherein last layer is divided into " expansion " of 4 class different scales, is formed all the way per class, the output result of coded portion is This 4 tunnel output result is averaged;Decoded portion is also a full convolutional network, and it is connected on behind coded portion, including three Cyclic convolution layer and three up-sampling layers.The output result of final two-way is all in the same size with input.Coded portion and decoding Partial detailed information is as follows:
The concrete structure of coded portion is as follows:(1~5 layer of the generalized convolution layer being listed below all is cascade operation, the 6th layer 4 tunnels between be parallel operation, be cascade operation between the 6th layer and the 5th layer)
Generalized convolution layer 1:Convolutional layer 1-1+ active coatings+convolutional layer 1-2+ active coatings+pond layer;
Generalized convolution layer 2:Convolutional layer 2-1+ active coatings+convolutional layer 2-2+ active coatings+pond layer;
Generalized convolution layer 3:Convolutional layer 3-1+ active coatings+convolutional layer 3-2+ active coatings+convolutional layer 3-3+ active coatings+pond Layer;
Generalized convolution layer 4:Convolutional layer 4-1+ active coatings+convolutional layer 4-2+ active coatings+convolutional layer 4-3+ active coatings+pond Layer;
Generalized convolution layer 5:Convolutional layer 5-1+ active coatings+convolutional layer 5-2+ active coatings+convolutional layer 5-3+ active coatings+pond Layer;
" expansion " convolutional layer 6-1:" expansion " convolutional layer (dilation=6)+active coating+random drop layer (dropout)+ Convolutional layer+active coating+random drop layer (dropout)+convolutional layer;
" expansion " convolutional layer 6-2:" expansion " convolutional layer (dilation=12)+active coating+random drop layer (dropout) + convolutional layer+active coating+random drop layer (dropout)+convolutional layer;
" expansion " convolutional layer 6-3:" expansion " convolutional layer (dilation=18)+active coating+random drop layer (dropout) + convolutional layer+active coating+random drop layer (dropout)+convolutional layer;
" expansion " convolutional layer 6-4:" expansion " convolutional layer (dilation=24)+active coating+random drop layer (dropout) + convolutional layer+active coating+random drop layer (dropout)+convolutional layer;
The concrete structure of decoded portion is as follows:(the up-sampling layer+cyclic convolution layer 3~1 being listed below all is cascade behaviour Make)
Up-sample layer+cyclic convolution layer 3:Up-sampling layer is that the output to last layer carries out twice of up-sampling.Cyclic convolution Layer 3 is that coded portion convolutional layer 3-3 output is carried out into process of convolution and carries out cyclic convolution operation with the output of up-sampling layer.
Up-sample layer+cyclic convolution layer 2:Up-sampling layer is that the output to last layer carries out twice of up-sampling.Cyclic convolution Layer 2 is that coded portion convolutional layer 2-2 output is carried out into process of convolution and carries out cyclic convolution operation with the output of up-sampling layer.
Up-sample layer+cyclic convolution layer 1:Up-sampling layer is that the output to last layer carries out twice of up-sampling.Cyclic convolution Layer 1 is that coded portion convolutional layer 1-2 output is carried out into process of convolution and carries out cyclic convolution operation with the output of up-sampling layer. For
It should be noted that the "+" in the present embodiment represents cascade connection, subscript 1-1 represents the first of generalized convolution layer 1 Layer convolutional layer, subscript 1-2 represent the second layer convolutional layer of generalized convolution layer 1;Subscript i-j represents generalized convolution layer i jth layer volume Lamination, wherein i are that 1~5, j is 1~3.Subscript 6-1 represents the first kind " expansion " convolutional layer, subscript 6-2 of " expansion " convolutional layer Represent that the second class " expansion " convolutional layer of " expansion " convolutional layer, subscript 6-3 represent the 3rd class " expansion " volume of " expansion " convolutional layer Lamination, subscript 6-4 represent the 4th class " expansion " convolutional layer of " expansion " convolutional layer.
The details of cyclic convolution layer is as shown in Fig. 2 it can be regarded as increases on the basis of convolutional layer along time dimension Circulation has been added to connect, advantage of this is that:With the increase of training time, each convolutional layer is not in the case of parameter is increased Increase the local sensing scope to input, catch, merge local detail with can changing.As shown in Fig. 2 setting follows in the present invention The number of plies of ring connection is 3, and the hardware pressure of training is balanced while computational efficiency is ensured.After above-mentioned network is put up, Two road networks are trained respectively:
Split flow network for still image:We choose current published three authoritative image partitioned data set (bags Include ECSSD, MSRA 10K and PASCAL VOC 2012), they are collected to obtain 21582 pictures, and by random cropping, Data set is expanded to 10 by the operation such as mirror image, upset, zoom, affine transformation4Magnitude, mitigate and be likely to occur in training process Over-fitting.When training the network, decoded portion is first fixed, goes to train coded portion using 60% data;Deng coded portion After convergence, weights when going to train whole network, wherein coded portion use convergence before using 100% training data are initial Change, decoded portion random initializtion.
Split flow network for inter-frame information:Currently without disclosed extensive video segmentation data collection, thus we must Training dataset must be made manually.First collect the training video collection VID that VS detects in ILSVRC2015, these video sets There are the indicia framing of complete object detection, the position of Precise Representation object.Then the still image obtained using training is split Flow network does image to every frame of video set to be split to obtain prospect background segmentation result.Then the light stream of each video interframe is calculated Field simultaneously preserves optical flow field information corresponding to the every frame of video into RGB figures.Finally existing video is combined using a set of screening strategy The indicia framing screening of detection draws qualified frame and its segmentation result, is collected as training inter-frame information segmentation drift net The training data of network.
Screening strategy includes at 2 points:1) reliable segmentation result:The result for carrying out image segmentation per frame to video occupies The scope of object detection flag frame is 75% to 90%.2) reliable optical flow field information:The optical flow field RGB figures being calculated must Average light stream range value must be met between 5 to 100, very slow or motion quickly can all cause optical flow field information very inaccurate Really.
24960, which are finally given, by screening is available for the initial data of training (as shown in figure 3, occurring in screening process Part possible case and processing), and operated by random cropping, mirror image, upset, zoom, affine transformation etc. by data set It is expanded to 104Magnitude, mitigate the over-fitting being likely to occur in training process.When training the network, decoded portion is first fixed, is used 60% data go to train coded portion;After restraining Deng coded portion, go to train whole network using 100% training data, Weight initialization when wherein being restrained before coded portion use, decoded portion random initializtion.
After above-mentioned two-way network training, last part-UNE is built.The network includes a company Connect layer and two generalized convolution layers (including convolutional layer and active coating).The concrete structure of UNE is as follows:Articulamentum, convolution Layer, active coating, convolutional layer, active coating.Wherein, still image is split flow network by articulamentum and inter-frame information segmentation flow network is straight Connect it is connected, to two-way export fusion treatment as final segmentation result.It is as complete that three networks constitute an entirety Video segmentation network.
Finally using the video segmentation data collection training UNE partly marked completely.Participate in training process is to have instructed Experienced still image segmentation flow network, the inter-frame information segmentation flow network trained and UNE to be trained composition it is whole Body.In training process, fixed still image segmentation flow network and inter-frame information segmentation flow network parameter do not update, and choose mark completely The parameter of part training set renewal UNE in the video segmentation data collection DAVIS of note, treats that training convergence is completed.
So far, the deep neural network that the unsupervised methods of video segmentation of proposition needs is ready for finishing.It is direct during test Use the network, it is not necessary to do any post processing work.Testing process is as follows:It is calculated first between frame of video and frame Optical flow field and processing obtain and per optical flow field RGB figures corresponding to frame.Then by each frame of video optical flow field RGB corresponding with its The inter-frame information segmentation flow network that the still image segmentation flow network and the 4th step that the synchronous input second step of figure obtains obtain.Finally The output of UNE is final segmentation result.
In order to embody the progressive of the present invention, the inventive method and currently representational unsupervised approaches and semi-supervised Method compares.The evaluation measures that most methods of video segmentation use at present are all except simultaneously using friendship (Intersection over U), formula is defined as follows:
IoU=100 × (S ∩ G)/(S ∪ G)
Wherein:S is the segmentation result that each algorithm obtains, and G is corresponding Standard Segmentation result.IoU is bigger, illustrates segmentation knot Fruit is better.
Table 1
Table 1 is the IoU results pair of the inventive method and other method in DAVIS and SegTrack v2 two datasets Than.Wherein:DAVIS data sets are data sets most authoritative at present, and picture is 480p and 1080p, and kind of object is more, and mark is clear Chu;SegTrack v2 data sets object all very littles, video resolution is than relatively low.Find out from the result of table:In DAVIS data sets On, the inventive method exceedes all unsupervised and semi-supervised methods, wherein lifting best unsupervised approaches FST14%, carries Best semi-supervised method is risen close to two points.It should be noted that semi-supervised method is since it is desired that the first frame or above some The accurate mark of frame, processing time are typically all long.Contrasted by OFL, the picture that OFL methods handle a 480p size needs Close to 2 minutes, and the inventive method only needed 0.2 second.On SegTrack v2 data sets, the inventive method is poor compared with OFL methods A bit, the reason for possible, is as follows:(1) resolution ratio of video is relatively low, and object is all smaller, is unfavorable for the deep learning of the present invention Method catch detailed information;(2) OFL methods are a kind of parametric methods, and this method is done for each video in an experiment Parameter optimization is to obtain best result, and by contrast, method of the invention did not carry out special domain optimization, all The network tested on video is all that pre-training is good.Fig. 4 is that the inventive method and FST methods, the segmentation result of OFL methods are straight See contrast, it can be seen that the inventive method preserves best, segmentation precision also highest at details.
The specific embodiment of the present invention is described above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make a variety of changes or change within the scope of the claims, this not shadow Ring the substantive content of the present invention.In the case where not conflicting, the feature in embodiments herein and embodiment can any phase Mutually combination.

Claims (10)

  1. A kind of 1. unsupervised methods of video segmentation based on deep learning, it is characterised in that including:
    Coding and decoding deep neural network is established, the coding and decoding deep neural network includes:Still image segmentation flow network, Inter-frame information splits flow network and UNE;Wherein, the still image segmentation flow network is used to enter current video frame Row prospect background dividing processing, the inter-frame information segmentation flow network are used between the current video frame and next frame of video Optical flow field information carry out moving object prospect background segmentation;
    The segmentation figure picture that the still image is split to flow network and inter-frame information segmentation flow network output passes through the fusion net After network is merged, Video segmentation result is obtained.
  2. 2. the unsupervised methods of video segmentation according to claim 1 based on deep learning, it is characterised in that the foundation Coding and decoding deep neural network, including:
    Still image segmentation flow network is established, and the image by having carried out still image segmentation is split to the still image Flow network is trained;
    Inter-frame information segmentation flow network is established, and the video by having carried out inter-frame information segmentation is split to the inter-frame information Flow network is trained;
    The coding and decoding deep neural network is trained using the video segmentation data marked completely.
  3. 3. the unsupervised methods of video segmentation according to claim 2 based on deep learning, it is characterised in that the static state Image segmentation flow network includes:The coded portion and decoded portion that full convolutional network is formed, wherein,
    The full convolutional network of coded portion includes:The generalized convolution layer of five levels connection and one with layer 5 generalized convolution level connection Layer expansion convolutional layer, the expansion convolutional layer positioned at layer 6 include the expansion of four class different scales, and one is formed per a kind of Road is exported, the average value of the output result on four classes output road is the output result of the coded portion;
    The full convolutional network of decoded portion is:The full convolutional network that three layers of cyclic convolution layer and three layers of up-sampling layer are formed;It is described The full convolutional network of decoded portion, for exporting the picture segmentation result consistent with inputting photo resolution.
  4. 4. the unsupervised methods of video segmentation according to claim 3 based on deep learning, it is characterised in that coded portion Full convolutional network in five layers of generalized convolution layer include the first generalized convolution layer, the second generalized convolution layer, the 3rd broad sense of cascade Convolutional layer, the 4th generalized convolution layer, the 5th generalized convolution layer, wherein:
    First generalized convolution layer includes successively:Convolutional layer A11, active coating, convolutional layer A12, active coating, pond layer;
    Second generalized convolution layer includes successively:Convolutional layer A21, active coating, convolutional layer A22, active coating, pond layer;
    3rd generalized convolution layer includes successively:Convolutional layer A31, active coating, convolutional layer A32, active coating, convolutional layer A33, activation Layer, pond layer;
    4th generalized convolution layer includes successively:Convolutional layer A41, active coating, convolutional layer A42, active coating, convolutional layer A43, activation Layer, pond layer;
    5th generalized convolution layer includes successively:Convolutional layer A51, active coating, convolutional layer A52, active coating, convolutional layer A53, activation Layer, pond layer;
    The expansion convolutional layer joined in the full convolutional network of coded portion with layer 5 generalized convolution level includes:In parallel Four classes expand convolutional layer, wherein:
    First kind expansion convolutional layer includes successively:First yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, activation Layer, random drop layer, convolutional layer;
    Second class expansion convolutional layer includes successively:Second yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, activation Layer, random drop layer, convolutional layer;
    3rd class expansion convolutional layer includes successively:3rd yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, activation Layer, random drop layer, convolutional layer;
    4th class expansion convolutional layer includes successively:4th yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, activation Layer, random drop layer, convolutional layer.
  5. 5. the unsupervised methods of video segmentation according to claim 4 based on deep learning, it is characterised in that decoded portion Full convolutional network in, every layer up-sampling layer join with corresponding cyclic convolution level, wherein:
    First up-sampling layer and the 3rd cyclic convolution level join, and the output that the first up-sampling layer is used for last layer carries out two Up-sample again;The 3rd cyclic convolution layer is used to coded portion convolutional layer A33 output carrying out process of convolution and and first The output for up-sampling layer carries out cyclic convolution operation;
    Second up-sampling layer cascades with second circulation convolutional layer, and the output that the second up-sampling layer is used for last layer carries out two Up-sample again;The second circulation convolutional layer is used to coded portion convolutional layer A22 output carrying out process of convolution and and second The output for up-sampling layer carries out cyclic convolution operation;
    3rd up-sampling layer cascades with first circulation convolutional layer, and the output that the 3rd up-sampling layer is used for last layer carries out two Up-sample again;The first circulation convolutional layer is used to coded portion convolutional layer A12 output carrying out process of convolution and the and the 3rd The output for up-sampling layer carries out cyclic convolution operation.
  6. 6. the unsupervised methods of video segmentation according to claim 3 based on deep learning, it is characterised in that described to pass through The image for having carried out still image segmentation is trained to still image segmentation flow network, including:
    Choose ECSSD images partitioned data set, MSRA 10K images partitioned data sets and the images of PASCAL VOC 2012 segmentation number According to the samples pictures of concentration;
    10 are expanded to after carrying out random cropping, mirror image, upset, zoom, affine transformation to the samples pictures4The number of magnitude Data bulk;
    Fixed decoded portion, go to train coded portion using 60% data, until coded portion is restrained;
    The still image segmentation flow network is trained using 100% training data;Wherein, when the coded portion is using convergence Weights initialized, decoded portion carry out random initializtion.
  7. 7. the unsupervised methods of video segmentation according to claim 2 based on deep learning, it is characterised in that the interframe Information segmentation flow network includes:The coded portion and decoded portion for the mutual cascade that full convolutional network is formed;Wherein:
    The full convolutional network of coded portion includes:The generalized convolution layer of five levels connection and one with layer 5 generalized convolution level connection Layer expansion convolutional layer, the expansion convolutional layer positioned at layer 6 include the expansion of four class different scales, and one is formed per a kind of Road is exported, the average value of the output result on four classes output road is the output result of the coded portion;
    Five layers of generalized convolution layer include the first generalized convolution layer of cascade, the second generalized convolution in the full convolutional network of coded portion Layer, the 3rd generalized convolution layer, the 4th generalized convolution layer, the 5th generalized convolution layer, wherein:
    First generalized convolution layer includes successively:Convolutional layer B11, active coating, convolutional layer B12, active coating, pond layer;
    Second generalized convolution layer includes successively:Convolutional layer B21, active coating, convolutional layer B22, active coating, pond layer;
    3rd generalized convolution layer includes successively:Convolutional layer B31, active coating, convolutional layer B32, active coating, convolutional layer B33, activation Layer, pond layer;
    4th generalized convolution layer includes successively:Convolutional layer B41, active coating, convolutional layer B42, active coating, convolutional layer B43, activation Layer, pond layer;
    5th generalized convolution layer includes successively:Convolutional layer B51, active coating, convolutional layer B52, active coating, convolutional layer B53, activation Layer, pond layer;
    The expansion convolutional layer joined in the full convolutional network of coded portion with layer 5 generalized convolution level includes:In parallel Four classes expand convolutional layer, wherein:
    First kind expansion convolutional layer includes successively:First yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, activation Layer, random drop layer, convolutional layer;
    Second class expansion convolutional layer includes successively:Second yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, activation Layer, random drop layer, convolutional layer;
    3rd class expansion convolutional layer includes successively:3rd yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, activation Layer, random drop layer, convolutional layer;
    4th class expansion convolutional layer includes successively:4th yardstick expansion convolutional layer, active coating, random drop layer, convolutional layer, activation Layer, random drop layer, convolutional layer;
    The full convolutional network of decoded portion is:The full convolutional network that three layers of cyclic convolution layer and three layers of up-sampling layer are formed;It is described The full convolutional network of decoded portion, for exporting the picture segmentation result consistent with inputting photo resolution;Wherein:
    In the full convolutional network of decoded portion, every layer of up-sampling layer joins with corresponding cyclic convolution level, wherein:
    First up-sampling layer and the 3rd cyclic convolution level join, and the output that the first up-sampling layer is used for last layer carries out two Up-sample again;The 3rd cyclic convolution layer is used to coded portion convolutional layer B33 output carrying out process of convolution and and first The output for up-sampling layer carries out cyclic convolution operation;
    Second up-sampling layer cascades with second circulation convolutional layer, and the output that the second up-sampling layer is used for last layer carries out two Up-sample again;The second circulation convolutional layer is used to coded portion convolutional layer B22 output carrying out process of convolution and and second The output for up-sampling layer carries out cyclic convolution operation;
    3rd up-sampling layer cascades with first circulation convolutional layer, and the output that the 3rd up-sampling layer is used for last layer carries out two Up-sample again;The first circulation convolutional layer is used to coded portion convolutional layer B12 output carrying out process of convolution and the and the 3rd The output for up-sampling layer carries out cyclic convolution operation.
  8. 8. the unsupervised methods of video segmentation according to claim 7 based on deep learning, it is characterised in that described to pass through The video for having carried out inter-frame information segmentation is trained to inter-frame information segmentation flow network, including:
    The training video collection VID that VS detects in ILSVRC2015 is collected, wherein, have in the training video collection VID The indicia framing of complete object detection;
    The still image segmentation flow network obtained using training does image segmentation to video set VID every frame, obtains prospect background Segmentation result;
    Calculate the optical flow field of each video interframe and preserve optical flow field information corresponding to the every frame of video into RGB and scheme;
    The correct image segmentation of segmentation is filtered out with reference to the indicia framing in the training video collection VID according to default screening strategy As a result the initial training image as inter-frame information segmentation flow network;Wherein, the screening strategy meets following condition:
    First:It is 75% to 90% to the video scope that the result of progress image segmentation occupies object detection flag frame per frame;
    Second:The average light stream range value for the optical flow field RGB figures being calculated is between 5 to 100;
    10 are expanded to after the initial training image is carried out into random cropping, mirror image, upset, zoom, affine transformation4Magnitude Data bulk;
    Fixed decoded portion, go to train coded portion using 60% data, until coded portion is restrained;
    The inter-frame information segmentation flow network is trained using 100% training data;Wherein, when the coded portion is using convergence Weights initialized, decoded portion carry out random initializtion.
  9. 9. the unsupervised methods of video segmentation according to claim 1 based on deep learning, it is characterised in that the fusion Network includes:Articulamentum, convolutional layer, active coating, convolutional layer, active coating;Wherein:
    The articulamentum is used to connect the still image segmentation flow network and inter-frame information segmentation flow network, and passes through volume Lamination, active coating, convolutional layer, active coating split flow network to the still image and the inter-frame information splits the defeated of flow network Go out result to be merged, obtain final Video segmentation result.
  10. 10. the unsupervised methods of video segmentation based on deep learning according to claim any one of 2-9, its feature exist In, still image segmentation flow network and inter-frame information segmentation flow network carry out in the training process network parameter in real time more Newly.
CN201711004135.5A 2017-10-24 2017-10-24 Unsupervised video segmentation method based on deep learning Active CN107808389B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711004135.5A CN107808389B (en) 2017-10-24 2017-10-24 Unsupervised video segmentation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711004135.5A CN107808389B (en) 2017-10-24 2017-10-24 Unsupervised video segmentation method based on deep learning

Publications (2)

Publication Number Publication Date
CN107808389A true CN107808389A (en) 2018-03-16
CN107808389B CN107808389B (en) 2020-04-17

Family

ID=61585461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711004135.5A Active CN107808389B (en) 2017-10-24 2017-10-24 Unsupervised video segmentation method based on deep learning

Country Status (1)

Country Link
CN (1) CN107808389B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108712630A (en) * 2018-04-19 2018-10-26 安凯(广州)微电子技术有限公司 A kind of internet camera system and its implementation based on deep learning
CN108805898A (en) * 2018-05-31 2018-11-13 北京字节跳动网络技术有限公司 Method of video image processing and device
CN108876792A (en) * 2018-04-13 2018-11-23 北京迈格威科技有限公司 Semantic segmentation methods, devices and systems and storage medium
CN109034162A (en) * 2018-07-13 2018-12-18 南京邮电大学 A kind of image, semantic dividing method
CN109068174A (en) * 2018-09-12 2018-12-21 上海交通大学 Video frame rate upconversion method and system based on cyclic convolution neural network
CN109086807A (en) * 2018-07-16 2018-12-25 哈尔滨工程大学 A kind of semi-supervised light stream learning method stacking network based on empty convolution
CN109118490A (en) * 2018-06-28 2019-01-01 厦门美图之家科技有限公司 A kind of image segmentation network generation method and image partition method
CN109785327A (en) * 2019-01-18 2019-05-21 中山大学 The video moving object dividing method of the apparent information of fusion and motion information
CN109961095A (en) * 2019-03-15 2019-07-02 深圳大学 Image labeling system and mask method based on non-supervisory deep learning
CN110147763A (en) * 2019-05-20 2019-08-20 哈尔滨工业大学 Video semanteme dividing method based on convolutional neural networks
CN110246142A (en) * 2019-06-14 2019-09-17 深圳前海达闼云端智能科技有限公司 A kind of method, terminal and readable storage medium storing program for executing detecting barrier
WO2019218826A1 (en) * 2018-05-17 2019-11-21 腾讯科技(深圳)有限公司 Image processing method and device, computer apparatus, and storage medium
CN110555805A (en) * 2018-05-31 2019-12-10 杭州海康威视数字技术股份有限公司 Image processing method, device, equipment and storage medium
CN111275518A (en) * 2020-01-15 2020-06-12 中山大学 Video virtual fitting method and device based on mixed optical flow
US10762629B1 (en) 2019-11-14 2020-09-01 SegAI LLC Segmenting medical images
CN112016406A (en) * 2020-08-07 2020-12-01 青岛科技大学 Video key frame extraction method based on full convolution network
CN112085760A (en) * 2020-09-04 2020-12-15 厦门大学 Prospect segmentation method of laparoscopic surgery video
CN112784750A (en) * 2021-01-22 2021-05-11 清华大学 Fast video object segmentation method and device based on pixel and region feature matching
WO2021139625A1 (en) * 2020-01-07 2021-07-15 广州虎牙科技有限公司 Image processing method, image segmentation model training method and related apparatus
CN113469146A (en) * 2021-09-02 2021-10-01 深圳市海清视讯科技有限公司 Target detection method and device
CN114358144A (en) * 2021-12-16 2022-04-15 西南交通大学 Image segmentation quality evaluation method
US11423544B1 (en) 2019-11-14 2022-08-23 Seg AI LLC Segmenting medical images

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1532812A2 (en) * 2002-04-26 2005-05-25 The Trustees Of Columbia University In The City Of New York Method and system for optimal video transcoding based on utility function descriptors
CN106204597A (en) * 2016-07-13 2016-12-07 西北工业大学 A kind of based on from the VS dividing method walking the Weakly supervised study of formula

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1532812A2 (en) * 2002-04-26 2005-05-25 The Trustees Of Columbia University In The City Of New York Method and system for optimal video transcoding based on utility function descriptors
CN106204597A (en) * 2016-07-13 2016-12-07 西北工业大学 A kind of based on from the VS dividing method walking the Weakly supervised study of formula

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KATERINA FRAGKIADAKI 等: "Learning to Segment Moving Objects in Videos", 《CVPR2015》 *

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876792B (en) * 2018-04-13 2020-11-10 北京迈格威科技有限公司 Semantic segmentation method, device and system and storage medium
CN108876792A (en) * 2018-04-13 2018-11-23 北京迈格威科技有限公司 Semantic segmentation methods, devices and systems and storage medium
CN108712630A (en) * 2018-04-19 2018-10-26 安凯(广州)微电子技术有限公司 A kind of internet camera system and its implementation based on deep learning
WO2019218826A1 (en) * 2018-05-17 2019-11-21 腾讯科技(深圳)有限公司 Image processing method and device, computer apparatus, and storage medium
US11373305B2 (en) 2018-05-17 2022-06-28 Tencent Technology (Shenzhen) Company Limited Image processing method and device, computer apparatus, and storage medium
CN110555805A (en) * 2018-05-31 2019-12-10 杭州海康威视数字技术股份有限公司 Image processing method, device, equipment and storage medium
CN110555805B (en) * 2018-05-31 2022-05-31 杭州海康威视数字技术股份有限公司 Image processing method, device, equipment and storage medium
CN108805898B (en) * 2018-05-31 2020-10-16 北京字节跳动网络技术有限公司 Video image processing method and device
CN108805898A (en) * 2018-05-31 2018-11-13 北京字节跳动网络技术有限公司 Method of video image processing and device
CN109118490A (en) * 2018-06-28 2019-01-01 厦门美图之家科技有限公司 A kind of image segmentation network generation method and image partition method
CN109118490B (en) * 2018-06-28 2021-02-26 厦门美图之家科技有限公司 Image segmentation network generation method and image segmentation method
CN109034162B (en) * 2018-07-13 2022-07-26 南京邮电大学 Image semantic segmentation method
CN109034162A (en) * 2018-07-13 2018-12-18 南京邮电大学 A kind of image, semantic dividing method
CN109086807A (en) * 2018-07-16 2018-12-25 哈尔滨工程大学 A kind of semi-supervised light stream learning method stacking network based on empty convolution
CN109086807B (en) * 2018-07-16 2022-03-18 哈尔滨工程大学 Semi-supervised optical flow learning method based on void convolution stacking network
CN109068174A (en) * 2018-09-12 2018-12-21 上海交通大学 Video frame rate upconversion method and system based on cyclic convolution neural network
CN109785327A (en) * 2019-01-18 2019-05-21 中山大学 The video moving object dividing method of the apparent information of fusion and motion information
CN109961095A (en) * 2019-03-15 2019-07-02 深圳大学 Image labeling system and mask method based on non-supervisory deep learning
CN110147763B (en) * 2019-05-20 2023-02-24 哈尔滨工业大学 Video semantic segmentation method based on convolutional neural network
CN110147763A (en) * 2019-05-20 2019-08-20 哈尔滨工业大学 Video semanteme dividing method based on convolutional neural networks
CN110246142A (en) * 2019-06-14 2019-09-17 深圳前海达闼云端智能科技有限公司 A kind of method, terminal and readable storage medium storing program for executing detecting barrier
US11423544B1 (en) 2019-11-14 2022-08-23 Seg AI LLC Segmenting medical images
US10762629B1 (en) 2019-11-14 2020-09-01 SegAI LLC Segmenting medical images
WO2021139625A1 (en) * 2020-01-07 2021-07-15 广州虎牙科技有限公司 Image processing method, image segmentation model training method and related apparatus
CN111275518B (en) * 2020-01-15 2023-04-21 中山大学 Video virtual fitting method and device based on mixed optical flow
CN111275518A (en) * 2020-01-15 2020-06-12 中山大学 Video virtual fitting method and device based on mixed optical flow
CN112016406B (en) * 2020-08-07 2022-12-02 青岛科技大学 Video key frame extraction method based on full convolution network
CN112016406A (en) * 2020-08-07 2020-12-01 青岛科技大学 Video key frame extraction method based on full convolution network
CN112085760A (en) * 2020-09-04 2020-12-15 厦门大学 Prospect segmentation method of laparoscopic surgery video
CN112085760B (en) * 2020-09-04 2024-04-26 厦门大学 Foreground segmentation method for laparoscopic surgery video
CN112784750B (en) * 2021-01-22 2022-08-09 清华大学 Fast video object segmentation method and device based on pixel and region feature matching
CN112784750A (en) * 2021-01-22 2021-05-11 清华大学 Fast video object segmentation method and device based on pixel and region feature matching
CN113469146A (en) * 2021-09-02 2021-10-01 深圳市海清视讯科技有限公司 Target detection method and device
CN114358144A (en) * 2021-12-16 2022-04-15 西南交通大学 Image segmentation quality evaluation method
CN114358144B (en) * 2021-12-16 2023-09-26 西南交通大学 Image segmentation quality assessment method

Also Published As

Publication number Publication date
CN107808389B (en) 2020-04-17

Similar Documents

Publication Publication Date Title
CN107808389A (en) Unsupervised methods of video segmentation based on deep learning
CN105869178B (en) A kind of complex target dynamic scene non-formaldehyde finishing method based on the convex optimization of Multiscale combination feature
CN107292247A (en) A kind of Human bodys' response method and device based on residual error network
WO2022011681A1 (en) Method for fusing knowledge graph based on iterative completion
CN103824272B (en) The face super-resolution reconstruction method heavily identified based on k nearest neighbor
CN110263603A (en) Face identification method and device based on center loss and residual error visual simulation network
Zhai et al. FPANet: feature pyramid attention network for crowd counting
Liu et al. Multi-stage context refinement network for semantic segmentation
CN106504219B (en) Constrained path morphology high-resolution remote sensing image road Enhancement Method
CN113888399B (en) Face age synthesis method based on style fusion and domain selection structure
CN111709443A (en) Calligraphy character style classification method based on rotation invariant convolution neural network
Chen et al. Coupled Global–Local object detection for large VHR aerial images
Cheng et al. A survey on image semantic segmentation using deep learning techniques
CN106846377A (en) A kind of target tracking algorism extracted based on color attribute and active features
CN114463340A (en) Edge information guided agile remote sensing image semantic segmentation method
Zhou et al. An enhancement model based on dense atrous and inception convolution for image semantic segmentation
Mo et al. Attention-guided collaborative counting
Fu et al. Cooperative attention generative adversarial network for unsupervised domain adaptation
Zhao et al. Probability-based channel pruning for depthwise separable convolutional networks
Zhang et al. Multi-granularity semantic alignment distillation learning for remote sensing image semantic segmentation
Zhang et al. SiamMBFAN: Siamese tracker with multi-branch feature aggregation network
Li et al. HNSR: Highway networks based deep convolutional neural networks model for single image super-resolution
Cai et al. Rgb road scene material segmentation
Fang et al. Collaborative learning in bounding box regression for object detection
CN113129237A (en) Depth image deblurring method based on multi-scale fusion coding network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant