CN109785327A - The video moving object dividing method of the apparent information of fusion and motion information - Google Patents

The video moving object dividing method of the apparent information of fusion and motion information Download PDF

Info

Publication number
CN109785327A
CN109785327A CN201910048996.6A CN201910048996A CN109785327A CN 109785327 A CN109785327 A CN 109785327A CN 201910048996 A CN201910048996 A CN 201910048996A CN 109785327 A CN109785327 A CN 109785327A
Authority
CN
China
Prior art keywords
video
information
apparent
motion information
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910048996.6A
Other languages
Chinese (zh)
Inventor
赖剑煌
陈子轩
郭春超
谢晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201910048996.6A priority Critical patent/CN109785327A/en
Publication of CN109785327A publication Critical patent/CN109785327A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses the video moving object dividing methods of a kind of apparent information of fusion and motion information, object apparent information and motion information of this method first by extracting video with depth characteristic, then coding is interacted to the apparent information of depth and Depth Motion information and obtains apparent-motion information and movement-apparent information, the two is merged again, the initial segmentation figure after obtaining interaction coding;Video sequence frame by frame obtains Video segmentation sequence after being split, construct an energy equation, so that the minimum target of energy total value of entire Video segmentation sequence optimizes energy equation, to generate a video moving object parted pattern;Initial segmentation figure is further divided according to the parted pattern, obtains final segmenting structure.The method of the present invention possesses more powerful generalization ability, all has greatly improved on picture quality and segmentation accuracy rate.

Description

The video moving object dividing method of the apparent information of fusion and motion information
Technical field
The present invention relates to object segmentation research field in video image, in particular to the apparent information of a kind of fusion and fortune The video moving object dividing method of dynamic information.
Background technique
Object in video is split, it is however generally that, two kinds of information: the apparent information and object of object itself can be used Body changed motion information as time goes by.Apparent information is the most basic information of video, it reflects current object The appearance of body.Motion information, which is then through, records variation (displacement, shape of object on a timeline in the information of entire video Become etc.).Traditional video object dividing method, has generally only used one of above two information.For only extracting apparent letter For the model of breath, when encountering object color and background difficulty point or blocking, very bad performance is often had; And for the model for only extracting motion information, when encounter object deformation amplitude it is larger or missing key frame the case where when, Often fail.Therefore, how problem to be modeled using two kinds of above-mentioned information simultaneously, obtains better segmentation result, is The a big difficulty of video object cutting techniques.
Recently, flourishing with depth learning technology, many people attempt directly to build the problem with depth model Mould, to solve video object segmentation problem.Although the above method achieves in testing original as a result, however there is also general The problems such as changing scarce capacity.It after all is the actually information of relative redundancy because video is for segmentation task.It is logical Normal one whole section of video, only will appear very few several segmentation object objects, and the conclusive deficiency of sample number will lead to depth net When network directly learns video information, it is easy to the case where over-fitting occur.
Summary of the invention
The purpose of the present invention is to overcome the shortcomings of the existing technology with it is insufficient, a kind of apparent information of fusion and fortune are provided The video moving object dividing method of dynamic information, it is excellent that this method has that practical, segmentation accuracy rate is high, generalization ability is strong Point.
The purpose of the present invention is realized by the following technical solution: the video of the apparent information of fusion and motion information fortune Animal body dividing method, comprising steps of
(1) pass through the apparent information of object and motion information with depth characteristic extraction video;
(2) coding is interacted to the apparent information of depth and Depth Motion information and obtains apparent-motion information and movement-table Information is seen, then the two is merged, the initial segmentation figure after obtaining interaction coding;
(3) Video segmentation sequence is obtained after video sequence frame by frame is split, and an energy equation is constructed, so that entirely The minimum target of energy total value of Video segmentation sequence optimizes energy equation, to generate video moving object point Cut model;Initial segmentation figure is further divided according to the parted pattern, obtains final segmenting structure.
The present invention is encoded by the interaction to the apparent information of object and object of which movement information, is realized above two information Fusion, using the optimization of energy equation, to generate the video moving object that segmentation is accurate and generalization ability is a strong segmentation Model.Compared to general deep learning model, this method possesses more powerful generalization ability;And for conventional method and Speech, this method all have greatly improved on picture quality and segmentation accuracy rate.
Preferably, in the step (1), the apparent information of object that network extracts video is divided by depth conspicuousness.
Preferably, in the step (1), the object of which movement information of video is extracted by depth light stream network.
Preferably, in the step (1), the apparent information of object is generated by single frames picture, and the motion information of object is by phase Adjacent two frame pictures generate.
Preferably, in the step (2), apparent-motion information be with the apparent information of object come to object in video Motion information correct, method is: being first displaced with former frame of the optical flow field to video, then the video frame after displacement It is input to depth conspicuousness segmentation network, obtained depth conspicuousness segmentation result is apparent-motion information.
Preferably, in the step (2), movement-apparent information be with the motion information of object come to object in video Apparent information correct, method is: being first displaced with former frame segmentation figure of the optical flow field to video, then after displacement Segmentation result figure is come the depth conspicuousness segmentation figure for the present frame corrected.
Preferably, in the step (3), the method for constructing energy equation is: Video segmentation sequence and energy are configured to Graph model, then integrate itself, these three parts of adjacent node in adjacent node and timing modeled, construct energy Equation.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1, the apparent information of object and object of which movement information of the invention by extracting video with depth characteristic, then will be apparent The mode of information and motion information alternate coded also enhances model at this while improving video moving object segmentation Generalization ability and transfer ability in business.
2, the present invention proposes the interaction coding of the apparent information of depth and Depth Motion information, compared to existing conventional method For, there is biggish promotion on picture quality and segmentation result.
3, the method proposed by the present invention that initial segmentation result is corrected by timing information, compared to existing deep learning For method, there is stronger generalization ability.
Detailed description of the invention
Fig. 1 is the flow chart of the present embodiment method.
Fig. 2 is the present embodiment depth conspicuousness segmentation network diagram.
Fig. 3 is the present embodiment interaction coding flow diagram.
Fig. 4 is the present embodiment energy equation Optimizing Flow schematic diagram.
Fig. 5 is using partial segmentation result figure of the present embodiment method on DAVIS2016.
Fig. 6 is using partial segmentation result figure of the present embodiment method on SegTrack-v2.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.
Embodiment
Referring to Fig. 1-4, the video moving object dividing method of the apparent information of the present embodiment fusion and motion information is main Including the apparent information of object for extracting video, the object of which movement information for extracting video, interaction coding and to pass through construction energy 4 steps such as equation Optimized Segmentation result, are with reference to the accompanying drawing described in detail above-mentioned steps with example.
One, the apparent information of object of video is extracted
In the step, the depth conspicuousness segmentation network based on single frames is built and trained, the depthmeter of object is obtained See information.As shown in Fig. 2, depth conspicuousness segmentation network generally consists of two parts: encoder and decoder can will input Natural picture be converted into 0-1 distribution conspicuousness segmentation figure.The deep learning mould based on U-Net structure can be used in the present embodiment Type.
1.1 encoders are usually that the depth characteristic of multilayer extracts network, as adopting under general one, feature pyramid Sample process, the natural image hierarchical coding of input.Low-dimensional feature mainly includes the sharp information in the part of image, and high-rise special Sign is mainly the whole semantic information of image.Such as the pre-training network of vgg16, a total of 16 layers of convolutional layer can be used in structure.
1.2 decoders are usually the depth characteristic converged network of multilayer, and the up-sampling structure an of inverted pyramid is presented.It is main Acting on is that the image local for encoding out by encoder and global feature merge, and then carries out being similar to decoded behaviour Make, is gradually upsampled to and original image conspicuousness segmented image of the same size.As shown in Fig. 2, decoder section is one and compiles Code device network of a size, is also made of 16 layers of convolutional layer, is arest neighbors with top sampling method corresponding to the layer of pond Interpolation, broken lined arrows mean that the encoder feature figure corresponding size is fused in decoder.
Because conspicuousness detection typically just two classes divide, its loss function is cross entropy, for single frames For the segmentation result of picture, the calculation formula of cross entropy is as follows:
WhereinWithWhat is respectively represented is the natural picture and corresponding segmentation result label of input, and i and j represent the I-th frame of j video.MYIt is conspicuousness segmentation network.Current video frameIt is input to trained conspicuousness segmentation network MY, the apparent information of the depth that can be obtained:
WhereinIt is exactly the conspicuousness segmentation result of single frames.
Two, the object of which movement information of video is extracted
It is to obtain the Depth Motion information of object by building and training a depth light stream network in the step.Herein Depth light stream network it is similar with the segmentation of depth conspicuousness, all with a down-sampling and up-sampling process.Specifically: first will The successive video frames of adjacent k frame input network, do depth characteristic extraction by a series of convolutional layer and pond layer, obtain image Feature on time dimension.Then upper sampling process be also successively the Fusion Features of same size together, obtain final Depth light stream result.Adjacent k video frameIt is input to trained light stream network Mp, can be obtained Depth Motion information:
Wherein MpIt is depth light stream network model, OiIt is the light stream figure of adjacent k frame.
Three, coding is interacted to the apparent information of depth and Depth Motion information
As shown in figure 3, interaction coding is broadly divided into following steps:
3.1 with light stream O obtained abovei, the displacement of object in moving scene is calculated, then displacement is added and is applied to upper one The conspicuousness segmentation figure of frameIn, obtain a segmentation result by the motion information correction in light streamRetainWithLap, then with threshold decision whether the reservationWithBetween the part that is not overlapped, realize with fortune Coding of the dynamic information to apparent information --- movement-apparent information.
3.2 will be from OiObtained in be displaced and be applied in the previous frame (j-th of the (i-1)-th frame of video) of video, obtain one New RGB figure
3.3 willInput depth conspicuousness parted pattern MYIn, obtain the Optic flow information limited by apparent informationRealize the coding with apparent information to motion information --- apparent-motion information.
3.4 last handlesWithIt is fused together, the segmentation figure as shown in formula (4), after interaction coding can be obtained
Four, pass through construction energy equation Optimized Segmentation result
As shown in figure 4, in order in combination with video spatial information and time serial message come optimize and enhance segmentation knot Fruit constructs an energy equation to video sequence.Firstly, one energy diagram g=(v, e) of building, each of v point is all right A pixel in segmentation figure is answered, and e then corresponds to the side of it and other points, represents the energy between the point and other points Difference, the purpose of the step are to allow the energy total value of entire Video segmentation sequence minimum.The energy equation of the present embodiment building It is as follows:
Wherein p indicates building in each of g node, and q is indicated and p spatially neighbouring node, and r is indicated and p Node neighbouring in time.IfIt is the primary power set of all nodes, and lp,lq,lr Respectively indicate p, q, the primary power that r is possessed.NsAnd NtRespectively indicate the node collection with the segmentation figure on room and time It closes.
This is the modeling description to segmentation figure itself, with to UpOptimization can eliminate a part of noise.
This is to enhance the effect of segmentation by spatial information to the function modelling of segmentation graph structure.
Wherein the effect of C is mainly used for completion details.
This is that the result of segmentation is corrected by timing information to the function modelling of segmentation graphic sequence.
Finally, by being optimized to energy equation E (L), can be finally obtained point by iterative algorithm to equation solution Cut result figure L*:
L*=argminE (L) (10)
Wherein, L*It is exactly the label of corresponding pixel points in the segmentation figure after optimizing, 0 represents background, and 1 represents cutting object.
The present invention has preferable segmentation effect, table 1 is this hair on the data set that the video object of some mainstreams is divided The bright performance on DAVIS2016 and SegTrack-v2, all achieves the effect of state-of-the-art on indices Fruit.
Segmentation result of 1 present invention of table on mainstream data collection
DAVIS (Dense Annotation Video Segmentation) is to generally acknowledge more authoritative video point now Data set is cut, altogether includes 50 different videos, only one example in each video in 2016 versions of its publication. SegTrack-v2 is a relatively older video segmentation data collection, altogether includes 14 different videos, the example in video Number differs.Wherein, mIoU (hand over and than) and F measure (F1 measurement) is the most important two kinds of valuation functions of segmentation result. MIoU principal measure be cut zone accuracy, F measure principal measure be segmented image profile accuracy, it Formula it is as follows:
M and G respectively represents the cut zone in obtained cut zone and label in mIoU, in F measure, It the predicted value (precision) for the segmentation figure that P and R are respectively represented and recalls value (recall).
Fig. 5,6 respectively illustrate partial segmentation knot using the present embodiment method on DAVIS2016 and SegTrack-v2 Fruit, it can be seen that, this method realizes accurate segmentation substantially from result figure, for object color and background difficulty point or hides Situations such as situations such as gear and larger object deformation amplitude, it may have preferable segmentation effect.
It can implement the technology that the present invention describes by various means.For example, these technologies may be implemented in hardware, consolidate In part, software or combinations thereof.For hardware embodiments, processing module may be implemented in one or more specific integrated circuits (ASIC), digital signal processor (DSP), programmable logic device (PLD), field-programmable logic gate array (FPGA), place Manage device, controller, microcontroller, electronic device, other electronic units for being designed to execute function described in the invention or In a combination thereof.
It, can be with the module of execution functions described herein (for example, process, step for firmware and/or Software implementations Suddenly, process etc.) implement the technology.Firmware and/or software code are storable in memory and are executed by processor.Storage Device may be implemented in processor or outside processor.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can store in a computer-readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light The various media that can store program code such as disk.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims (7)

1. the video moving object dividing method of the apparent information of fusion and motion information, which is characterized in that comprising steps of
(1) pass through the apparent information of object and motion information with depth characteristic extraction video;
(2) coding is interacted to the apparent information of depth and Depth Motion information and obtains apparent-motion information and the apparent letter of movement- Breath, then the two is merged, the initial segmentation figure after obtaining interaction coding;
(3) Video segmentation sequence is obtained after video sequence frame by frame is split, constructs an energy equation, so that entire video The minimum target of energy total value of sequence of partitions optimizes energy equation, so that generating a video moving object divides mould Type;Initial segmentation figure is further divided according to the parted pattern, obtains final segmenting structure.
2. the video moving object dividing method of fusion apparent information and motion information according to claim 1, It is characterized in that, in the step (1), the apparent information of object that network extracts video is divided by depth conspicuousness.
3. the video moving object dividing method of fusion apparent information and motion information according to claim 1, It is characterized in that, in the step (1), the object of which movement information of video is extracted by depth light stream network.
4. the video moving object dividing method of fusion apparent information and motion information according to claim 1, It is characterized in that, in the step (1), the apparent information of object is generated by single frames picture, and the motion information of object is by adjacent two frame Picture generates.
5. the video moving object dividing method of fusion apparent information and motion information according to claim 1, It is characterized in that, in the step (2), apparent-motion information is with the apparent information of object come the movement to object in video Information is corrected, and method is: being first displaced with former frame of the optical flow field to video, then the video frame after displacement is input to Depth conspicuousness divides network, and obtained depth conspicuousness segmentation result is apparent-motion information.
6. the video moving object dividing method of fusion apparent information and motion information according to claim 1, It is characterized in that, in the step (2), movement-apparent information is with the motion information of object come to object in video apparent Information is corrected, and method is: being first displaced with former frame segmentation figure of the optical flow field to video, then the segmentation knot after displacement Fruit figure is come the depth conspicuousness segmentation figure for the present frame corrected.
7. the video moving object dividing method of fusion apparent information and motion information according to claim 1, It is characterized in that, in the step (3), the method for constructing energy equation is: by Video segmentation sequence and energy tectonic ore-forming model, Then integrate itself, these three parts of adjacent node in adjacent node and timing modeled, construct energy equation.
CN201910048996.6A 2019-01-18 2019-01-18 The video moving object dividing method of the apparent information of fusion and motion information Pending CN109785327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910048996.6A CN109785327A (en) 2019-01-18 2019-01-18 The video moving object dividing method of the apparent information of fusion and motion information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910048996.6A CN109785327A (en) 2019-01-18 2019-01-18 The video moving object dividing method of the apparent information of fusion and motion information

Publications (1)

Publication Number Publication Date
CN109785327A true CN109785327A (en) 2019-05-21

Family

ID=66501016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910048996.6A Pending CN109785327A (en) 2019-01-18 2019-01-18 The video moving object dividing method of the apparent information of fusion and motion information

Country Status (1)

Country Link
CN (1) CN109785327A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591868A (en) * 2021-07-30 2021-11-02 南开大学 Video target segmentation method and system based on full-duplex strategy
CN117315056A (en) * 2023-11-27 2023-12-29 支付宝(杭州)信息技术有限公司 Video editing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104134217A (en) * 2014-07-29 2014-11-05 中国科学院自动化研究所 Video salient object segmentation method based on super voxel graph cut
CN107808389A (en) * 2017-10-24 2018-03-16 上海交通大学 Unsupervised methods of video segmentation based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104134217A (en) * 2014-07-29 2014-11-05 中国科学院自动化研究所 Video salient object segmentation method based on super voxel graph cut
CN107808389A (en) * 2017-10-24 2018-03-16 上海交通大学 Unsupervised methods of video segmentation based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHUNCHAO GUO ET AL: "Motion-Appearance Interactive Encoding for Object Segmentation in Unconstrained Videos", 《ARXIV》 *
SUYOG DUTT JAIN ET AL: "FusionSeg:Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos", 《ARXIV》 *
王军 等: "基于深度空时特征编码的视频显著性检测", 《通信技术》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591868A (en) * 2021-07-30 2021-11-02 南开大学 Video target segmentation method and system based on full-duplex strategy
CN113591868B (en) * 2021-07-30 2023-09-01 南开大学 Video target segmentation method and system based on full duplex strategy
CN117315056A (en) * 2023-11-27 2023-12-29 支付宝(杭州)信息技术有限公司 Video editing method and device
CN117315056B (en) * 2023-11-27 2024-03-19 支付宝(杭州)信息技术有限公司 Video editing method and device

Similar Documents

Publication Publication Date Title
US20210390700A1 (en) Referring image segmentation
CN108804397B (en) Chinese character font conversion generation method based on small amount of target fonts
CN111539887B (en) Channel attention mechanism and layered learning neural network image defogging method based on mixed convolution
CN113012172B (en) AS-UNet-based medical image segmentation method and system
CN103413286B (en) United reestablishing method of high dynamic range and high-definition pictures based on learning
CN110751111B (en) Road extraction method and system based on high-order spatial information global automatic perception
CN112634296B (en) RGB-D image semantic segmentation method and terminal for gate mechanism guided edge information distillation
CN115049936A (en) High-resolution remote sensing image-oriented boundary enhancement type semantic segmentation method
CN111488932B (en) Self-supervision video time-space characterization learning method based on frame rate perception
CN112329780B (en) Depth image semantic segmentation method based on deep learning
WO2022133627A1 (en) Image segmentation method and apparatus, and device and storage medium
CN109785327A (en) The video moving object dividing method of the apparent information of fusion and motion information
CN109902808B (en) Method for optimizing convolutional neural network based on floating point digital variation genetic algorithm
CN115391563B (en) Knowledge graph link prediction method based on multi-source heterogeneous data fusion
CN111127472A (en) Multi-scale image segmentation method based on weight learning
CN109597998A (en) A kind of characteristics of image construction method of visual signature and characterizing semantics joint insertion
CN115984933A (en) Training method of human face animation model, and voice data processing method and device
CN114219968A (en) MA-Xnet-based pavement crack segmentation method
CN116109920A (en) Remote sensing image building extraction method based on transducer
CN112651360A (en) Skeleton action recognition method under small sample
CN116485867A (en) Structured scene depth estimation method for automatic driving
CN116091288A (en) Diffusion model-based image steganography method
CN114067162A (en) Image reconstruction method and system based on multi-scale and multi-granularity feature decoupling
CN117474796B (en) Image generation method, device, equipment and computer readable storage medium
CN112712855B (en) Joint training-based clustering method for gene microarray containing deletion value

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190521