CN107564032A - A kind of video tracking object segmentation methods based on outward appearance network - Google Patents

A kind of video tracking object segmentation methods based on outward appearance network Download PDF

Info

Publication number
CN107564032A
CN107564032A CN201710780214.9A CN201710780214A CN107564032A CN 107564032 A CN107564032 A CN 107564032A CN 201710780214 A CN201710780214 A CN 201710780214A CN 107564032 A CN107564032 A CN 107564032A
Authority
CN
China
Prior art keywords
network
outward appearance
frame
bounding box
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710780214.9A
Other languages
Chinese (zh)
Inventor
夏春秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vision Technology Co Ltd
Original Assignee
Shenzhen Vision Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vision Technology Co Ltd filed Critical Shenzhen Vision Technology Co Ltd
Priority to CN201710780214.9A priority Critical patent/CN107564032A/en
Publication of CN107564032A publication Critical patent/CN107564032A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

A kind of video tracking object segmentation methods based on outward appearance network proposed in the present invention, its main contents include:Outward appearance network, object detection network, bounding box filters and training, its process is, each input frame is first set to pass through from the outward appearance network of the Object Segmentation of the classification independence obtained, remove final pond layer and be fully connected layer, connected using jump, multiresolution spatial information is allowed to flow to network end-point from shallow-layer, the output of these sides is connected in network end-point, and by exporting the fusion convolutional layer of neural network forecast, then frame is made to detect network by instance-level semantic object, prospect of the application outward appearance is split to obtain appearance images, then bounding box is filtered using wave filter, finally give segmentation figure picture.Constrained present invention incorporates the output of the outward appearance network and semantic instance once trained detection network, while to result application time, improve the training speed of outward appearance network, while improve the precision of detection and segmentation, greatly improve accuracy.

Description

A kind of video tracking object segmentation methods based on outward appearance network
Technical field
The present invention relates to video object segmentation field, more particularly, to a kind of video tracking object based on outward appearance network Dividing method.
Background technology
Video object segmentation is a basic problem in computer vision, and before current video signal treatment research Along one of with focus.Video object segmentation refers to the combination by Video segmentation for some video semanteme objects on time-space domain, Each frame of video is exactly divided into some different semantic object regions, so as to realize flexibly processing to video.Depending on Frequency Object Segmentation has broad application prospects, as Video coding, video frequency searching, multimedia operations, image procossing, pattern are known Not, video compression coding and video database operation etc., traffic flow video monitoring, industrial automation monitoring, peace be can be also used for In the actual production life such as anti-and network multimedia interaction.The quality of video object segmentation quality directly affects the work in later stage Make, so, the research of Video Object Segmentation Technology is important and challenging.The single node network that conventional method uses exists When video bag contains with multiple examples as the object class of annotation, all or several such examples conducts pair can be mistakenly identified A part for elephant so that segmentation precision declines, and accuracy is not high.
The present invention proposes a kind of video tracking object segmentation methods based on outward appearance network, first makes each input frame from obtaining Classification independence Object Segmentation outward appearance network by, remove and final pond layer and be fully connected layer, connected using jump, Allow multiresolution spatial information to flow to network end-point from shallow-layer, connect the output of these sides in network end-point, and pass through output The fusion convolutional layer of neural network forecast, then makes frame detect network by instance-level semantic object, and prospect of the application outward appearance is split to obtain Appearance images, then bounding box is filtered using wave filter, finally gives segmentation figure picture.Present invention incorporates once train Outward appearance network and semantic instance detection network output, while result application time is constrained, improves the instruction of outward appearance network Practice speed, while improve the precision of detection and segmentation, greatly improve accuracy.
The content of the invention
The problem of for segmentation precision and not high accuracy, it is an object of the invention to provide a kind of based on outward appearance network Video tracking object segmentation methods, first make each input frame from the outward appearance network of the Object Segmentation of the classification independence obtained by, Remove final pond layer and be fully connected layer, connected using jump, it is allowed to which multiresolution spatial information flows to network end from shallow-layer End, the output of these sides is connected in network end-point, and by exporting the fusion convolutional layer of neural network forecast, then frame is passed through example Level semantic object detection network, prospect of the application outward appearance are split to obtain appearance images, then bounding box are filtered using wave filter Ripple, finally give segmentation figure picture.
To solve the above problems, the present invention provides a kind of video tracking object segmentation methods based on outward appearance network, it is led Content is wanted to include:
(1) outward appearance network;
(2) object detection network;
(3) bounding box filters;
(4) train.
Wherein, described outward appearance network, first, outward appearance net of each input frame from the Object Segmentation of the classification independence obtained Network passes through;Network is based on VGG16 convolutional network frameworks, is converted into the network of a complete convolution;It is different from full convolutional network, be Holding spatial resolution, final pond layer and is fully connected layer and has been completely removed;
Connected using jump, it is allowed to which multiresolution spatial information flows to network end-point from shallow-layer, and it is thin to improve object outline Segmentation precision on section;More specifically, the final characteristic pattern in VGG16 each stages is used before the layer of pond, and by itself and single 1 × 1 kernel carries out convolution, obtains the intensity slicing probability graph with current down-sampling stage formed objects, and use bi-linear filter Original image size is sampled;
Finally, the output of these sides is connected in network end-point, and by exporting the fusion convolutional layer of neural network forecast:Full width ash Degree segmentation probability graph;In order to realize that Pixel-level is split, softmax graders are balanced by the classification of offer binary class mask S-shaped cross entropy loss layer replaces.
Wherein, described object detection network, now, frame detect network by instance-level semantic object;The network is by original The RGB image of beginning produces one group of bounding box as input, and for any object of its discovery, and these bounding boxes belong to what it was supported The set of classification;Object detection network can separate the example of same object class, so as to allow to select in video correctly Example, wherein at least one is similar to be chosen by outward appearance network.
Wherein, described bounding box filtering, including the wave filter based on outward appearance, termporal filter and connection component filtering Device.
Further, the wave filter based on outward appearance, after by two network delivery input frames, one is obtained The initial fragment prognostic chart and the bounding box of the identified object of some Semantic detection networks obtained from single outward appearance network is built View;A kind of method for being used to combine the result of two networks is proposed, to the final prediction Object Segmentation figure of each frame in video Refined.
Further, the described method for being used to combine the result of two networks, first, first image calibration is used True Data selects the bounding box for belonging to annotation object;Then, by searching for the bounding box suggestion most matched with appearance images, And the application time continuity in these detections, continuation select correct bounding box in a subsequent frame;
For first image, the Object Segmentation that selection provides with the True Data demarcated by the first frame has optimal weight Folded Semantic detection (bounding box);By selected classification storage in memory, to scan in a subsequent frame;
For follow-up all frames, the classification only found in the first frame is only frame interested, and remaining is left out; In the suggestion of remaining detection object, according to the size of the point of interface of union between each bounding box suggestion and appearance images, choosing Select the detection object of most suitable appearance images prediction.
Further, described termporal filter, the correct bounding box of a semantic object is selected in former frame, may Its outward appearance can be switched to and predict another object instance overlapping with its semantic bounding box height;In order to further ensure that to border The correct selection of frame, it will be only filtered by the object's position in the frame and former frame of the point of interface of union threshold value, so as to right Correct bounding box performs time tracking;
If semantic object detection can not detect any object in the first frame, the first frame annotation is used instead to define side Boundary's frame;Then for all subsequent frames, the connection component intersected with previous boundary frame is found, and deletes every other fragment, A new bounding box is finally selected according to selected connection component;After this step terminates, an appearance images and note will be obtained Release the correct semantic bounding box detection of object.
Further, described connection component wave filter, in the final step of algorithm, the inspection selected in previous steps is used Survey to limit and strengthen the segmentation figure obtained from outward appearance network;Appearance images are filtered using bounding box, and remove the back of the body Scape noise;
In order to obtain final prediction (i.e. binary system is predicted) segmentation mask, twice threshold is set for outward appearance segmentation figure, i.e., it is low Threshold value and high threshold;Then each mask obtained is divided into their connection component.
Further, described Low threshold and high threshold, during first time, using high threshold mask, and delete and previously step The disjoint all component of bounding box of selection in rapid;This limitation can pair wrong fragment instance similar with annotation object progress Filtering, or simply filter out noise;
At second, the Low threshold mask that final segmentation mask intersects from the mask with being obtained during first time is added to company Connected components;
This enhancing operation provides looser threshold value in selected bounding box, according to the Tuscany side with strong and weak edge Edge detector, weak edge is only selected when being connected with strong edge;It is (high to find the power limited by the selected borderline region of segmentation figure And low confidence) segmenting pixels, and weak pixel is selected when their connection component intersects with strong pixel.
Wherein, described training, only select outward appearance network to be trained, and the use of momentum is 0.9 for off-line training Stochastic gradient descent;Using mirror image, rotate and be sized to expanding data;Meanwhile depth supervision is not performed to training, will Each side output is connected to cross entropy segmentation loss function.
Brief description of the drawings
Fig. 1 is a kind of system framework figure of the video tracking object segmentation methods based on outward appearance network of the present invention.
Fig. 2 is a kind of schematic flow sheet of the video tracking object segmentation methods based on outward appearance network of the present invention.
Fig. 3 is a kind of termporal filter of the video tracking object segmentation methods based on outward appearance network of the present invention.
Fig. 4 is a kind of connection component wave filter of the video tracking object segmentation methods based on outward appearance network of the present invention.
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
Fig. 1 is a kind of system framework figure of the video tracking object segmentation methods based on outward appearance network of the present invention.Main bag Include outward appearance network, object detection network, bounding box filtering and training.
Outward appearance network, first, each input frame pass through from the outward appearance network of the Object Segmentation of the classification independence obtained;Network Based on VGG16 convolutional network frameworks, the network of a complete convolution is converted into;It is different from full convolutional network, in order to keep space Resolution ratio, final pond layer and is fully connected layer and has been completely removed;
Connected using jump, it is allowed to which multiresolution spatial information flows to network end-point from shallow-layer, and it is thin to improve object outline Segmentation precision on section;More specifically, the final characteristic pattern in VGG16 each stages is used before the layer of pond, and by itself and single 1 × 1 kernel carries out convolution, obtains the intensity slicing probability graph with current down-sampling stage formed objects, and use bi-linear filter Original image size is sampled;
Finally, the output of these sides is connected in network end-point, and by exporting the fusion convolutional layer of neural network forecast:Full width ash Degree segmentation probability graph;In order to realize that Pixel-level is split, softmax graders are balanced by the classification of offer binary class mask S-shaped cross entropy loss layer replaces.
Object detection network, now, frame detect network by instance-level semantic object;The network is by original RGB image One group of bounding box is produced as input, and for any object of its discovery, these bounding boxes belong to the set for the classification that it is supported; Object detection network can separate the example of same object class, so as to allow to select correct example in video, wherein extremely Rare one similar to be chosen by outward appearance network.
Bounding box filtering includes the wave filter based on outward appearance, termporal filter and connection component wave filter.
Wave filter based on outward appearance, after by two network delivery input frames, one is obtained from single outward appearance network The initial fragment prognostic chart of acquisition and the bounding box suggestion of the identified object of some Semantic detection networks;It is proposed that one kind is used for The method for combining the result of two networks, is refined to the final prediction Object Segmentation figure of each frame in video.
First, the bounding box for belonging to annotation object is selected using the True Data of first image calibration;Then, pass through The bounding box suggestion most matched with appearance images, and the application time continuity in these detections are searched for, is continued in follow-up frame The middle correct bounding box of selection;
For first image, the Object Segmentation that selection provides with the True Data demarcated by the first frame has optimal weight Folded Semantic detection (bounding box);By selected classification storage in memory, to scan in a subsequent frame;
For follow-up all frames, the classification only found in the first frame is only frame interested, and remaining is left out; In the suggestion of remaining detection object, according to the size of the point of interface of union between each bounding box suggestion and appearance images, choosing Select the detection object of most suitable appearance images prediction.
Training, only outward appearance network is selected to be trained, and under the stochastic gradient for the use of momentum being 0.9 for off-line training Drop;Using mirror image, rotate and be sized to expanding data;Meanwhile depth supervision is not performed to training, each side is exported It is connected to cross entropy segmentation loss function.
Fig. 2 is a kind of schematic flow sheet of the video tracking object segmentation methods based on outward appearance network of the present invention.First make every Individual input frame from the outward appearance network of the Object Segmentation of the classification independence obtained by, remove final pond layer and be fully connected layer, Connected using jump, it is allowed to which multiresolution spatial information flows to network end-point from shallow-layer, and it is defeated to connect these sides in network end-point Go out, and by exporting the fusion convolutional layer of neural network forecast, frame is detected network, prospect of the application by instance-level semantic object Outward appearance is split to obtain appearance images, and then bounding box is filtered using wave filter, finally gives segmentation figure picture.
Fig. 3 is a kind of termporal filter of the video tracking object segmentation methods based on outward appearance network of the present invention.Previous The correct bounding box of a semantic object is selected in frame, it is overlapping with its semantic bounding box height that the prediction of its outward appearance may be switched to Another object instance;In order to further ensure that the correct selection to bounding box, will only it pass through the point of interface of union threshold value Frame is filtered with the object's position in former frame, so as to perform time tracking to correct bounding box;
If semantic object detection can not detect any object in the first frame, the first frame annotation is used instead to define side Boundary's frame;Then for all subsequent frames, the connection component intersected with previous boundary frame is found, and deletes every other fragment, A new bounding box is finally selected according to selected connection component;After this step terminates, an appearance images and note will be obtained Release the correct semantic bounding box detection of object.
Fig. 4 is a kind of connection component wave filter of the video tracking object segmentation methods based on outward appearance network of the present invention. The final step of algorithm, limit using the detection selected in previous steps and strengthen the segmentation figure obtained from outward appearance network;Make Appearance images are filtered with bounding box, and remove ambient noise;
In order to obtain final prediction (i.e. binary system is predicted) segmentation mask, twice threshold is set for outward appearance segmentation figure, i.e., it is low Threshold value and high threshold;Then each mask obtained is divided into their connection component.
During first time, using high threshold mask, and disjoint all groups of the bounding box with being selected in previous steps is deleted Part;This limitation meeting pair wrong fragment instance similar with annotation object is filtered, or simply filters out noise;
At second, the Low threshold mask that final segmentation mask intersects from the mask with being obtained during first time is added to company Connected components;
This enhancing operation provides looser threshold value in selected bounding box, according to the Tuscany side with strong and weak edge Edge detector, weak edge is only selected when being connected with strong edge;It is (high to find the power limited by the selected borderline region of segmentation figure And low confidence) segmenting pixels, and weak pixel is selected when their connection component intersects with strong pixel.
For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and scope, the present invention can be realized with other concrete forms.In addition, those skilled in the art can be to this hair Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these improvement and modification also should be regarded as the present invention's Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention More and change.

Claims (10)

1. a kind of video tracking object segmentation methods based on outward appearance network, it is characterised in that mainly include outward appearance network (one); Object detection network (two);Bounding box filters (three);Train (four).
2. based on the outward appearance network (one) described in claims 1, it is characterised in that first, each input frame is from the class obtained The outward appearance network of not independent Object Segmentation passes through;Network is based on VGG16 convolutional network frameworks, is converted into complete convolution Network;It is different from full convolutional network, in order to keep spatial resolution, final pond layer and it is fully connected layer and has been completely removed;
Connected using jump, it is allowed to which multiresolution spatial information flows to network end-point from shallow-layer, and improves in object outline details Segmentation precision;More specifically, the final characteristic pattern in VGG16 each stages is used before the layer of pond, and by itself and single 1 × 1 Kernel carries out convolution, obtains the intensity slicing probability graph with current down-sampling stage formed objects, and with bi-linear filter pair Original image size is sampled;
Finally, the output of these sides is connected in network end-point, and by exporting the fusion convolutional layer of neural network forecast:Full width gray scale point Cut probability graph;In order to realize Pixel-level segmentation, the S-shaped that softmax graders are balanced by the classification of offer binary class mask Cross entropy loss layer replaces.
3. based on the object detection network (two) described in claims 1, it is characterised in that now, frame is semantic by instance-level Object detection network;The network produces one group of border using original RGB image as input for any object of its discovery Frame, these bounding boxes belong to the set for the classification that it is supported;Object detection network can separate the example of same object class, from And allow to select correct example in video, wherein at least one is similar to be chosen by outward appearance network.
4. based on described in claims 1 bounding box filter (three), it is characterised in that including the wave filter based on outward appearance, when Between wave filter and connection component wave filter.
5. based on the wave filter based on outward appearance described in claims 4, it is characterised in that inputted by two network deliveries After frame, obtain an initial fragment prognostic chart obtained from single outward appearance network and some Semantic detection networks are identified The bounding box suggestion of object;A kind of method for being used to combine the result of two networks is proposed, to the final of each frame in video Prediction Object Segmentation figure is refined.
6. based on the method for being used to combine the result of two networks described in claims 5, it is characterised in that first, use The True Data of first image calibration selects the bounding box for belonging to annotation object;Then, by search and appearance images most The bounding box suggestion of matching, and the application time continuity in these detections, continuation select correct border in a subsequent frame Frame;
For first image, select with the Object Segmentation that the True Data demarcated by the first frame provides with optimal overlapping Semantic detection (bounding box);By selected classification storage in memory, to scan in a subsequent frame;
For follow-up all frames, the classification only found in the first frame is only frame interested, and remaining is left out;Surplus During remaining detection object is suggested, according to the size of the point of interface of union between each bounding box suggestion and appearance images, selection is most It is adapted to the detection object of appearance images prediction.
7. based on the termporal filter described in claims 4, it is characterised in that one semantic object of selection in former frame Correct bounding box, its outward appearance may be switched to and predict another object instance overlapping with its semantic bounding box height;In order to The correct selection to bounding box is further ensured that, will only pass through the object's position in the frame and former frame of the point of interface of union threshold value It is filtered, so as to perform time tracking to correct bounding box;
If semantic object detection can not detect any object in the first frame, the first frame annotation is used instead to define border Frame;Then for all subsequent frames, the connection component intersected with previous boundary frame is found, and deletes every other fragment, most A new bounding box is selected according to selected connection component afterwards;After this step terminates, an appearance images and annotation will be obtained The correct semantic bounding box detection of object.
8. based on the connection component wave filter described in claims 4, it is characterised in that in the final step of algorithm, use elder generation What is selected in preceding step detects to limit and strengthen the segmentation figure obtained from outward appearance network;Appearance images are carried out using bounding box Filtering, and remove ambient noise;
In order to obtain final prediction (i.e. binary system is predicted) segmentation mask, twice threshold, i.e. Low threshold are set for outward appearance segmentation figure And high threshold;Then each mask obtained is divided into their connection component.
9. based on Low threshold and high threshold described in claims 8, it is characterised in that during first time, using high threshold mask, And delete the disjoint all component of bounding box with being selected in previous steps;This limitation can pair mistake similar with annotation object Fragment instance is filtered by mistake, or simply filters out noise;
At second, the Low threshold mask that final segmentation mask intersects from the mask with being obtained during first time is added to connection group Part;
This enhancing operation provides looser threshold value in selected bounding box, is examined according to the Tuscany edge with strong and weak edge Device is surveyed, weak edge is only selected when being connected with strong edge;It is (high and low to find the power limited by the selected borderline region of segmentation figure Confidence level) segmenting pixels, and weak pixel is selected when their connection component intersects with strong pixel.
10. based on the training (four) described in claims 1, it is characterised in that only selection outward appearance network is trained, and right The stochastic gradient descent that momentum is 0.9 is used in off-line training;Using mirror image, rotate and be sized to expanding data;Meanwhile Depth supervision is not performed to training, the output of each side is connected to cross entropy segmentation loss function.
CN201710780214.9A 2017-09-01 2017-09-01 A kind of video tracking object segmentation methods based on outward appearance network Withdrawn CN107564032A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710780214.9A CN107564032A (en) 2017-09-01 2017-09-01 A kind of video tracking object segmentation methods based on outward appearance network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710780214.9A CN107564032A (en) 2017-09-01 2017-09-01 A kind of video tracking object segmentation methods based on outward appearance network

Publications (1)

Publication Number Publication Date
CN107564032A true CN107564032A (en) 2018-01-09

Family

ID=60978742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710780214.9A Withdrawn CN107564032A (en) 2017-09-01 2017-09-01 A kind of video tracking object segmentation methods based on outward appearance network

Country Status (1)

Country Link
CN (1) CN107564032A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784164A (en) * 2018-12-12 2019-05-21 北京达佳互联信息技术有限公司 Prospect recognition methods, device, electronic equipment and storage medium
CN109800657A (en) * 2018-12-25 2019-05-24 天津大学 A kind of convolutional neural networks face identification method for fuzzy facial image
CN110097568A (en) * 2019-05-13 2019-08-06 中国石油大学(华东) A kind of the video object detection and dividing method based on the double branching networks of space-time
WO2020125495A1 (en) * 2018-12-17 2020-06-25 中国科学院深圳先进技术研究院 Panoramic segmentation method, apparatus and device
CN112312203A (en) * 2020-08-25 2021-02-02 北京沃东天骏信息技术有限公司 Video playing method, device and storage medium
CN113421280A (en) * 2021-05-31 2021-09-21 江苏大学 Method for segmenting reinforcement learning video object by integrating precision and speed

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106296728A (en) * 2016-07-27 2017-01-04 昆明理工大学 A kind of Segmentation of Moving Object method in unrestricted scene based on full convolutional network
CN106682108A (en) * 2016-12-06 2017-05-17 浙江大学 Video retrieval method based on multi-modal convolutional neural network
US20170228617A1 (en) * 2016-02-04 2017-08-10 Nec Laboratories America, Inc. Video monitoring using semantic segmentation based on global optimization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170228617A1 (en) * 2016-02-04 2017-08-10 Nec Laboratories America, Inc. Video monitoring using semantic segmentation based on global optimization
CN106296728A (en) * 2016-07-27 2017-01-04 昆明理工大学 A kind of Segmentation of Moving Object method in unrestricted scene based on full convolutional network
CN106682108A (en) * 2016-12-06 2017-05-17 浙江大学 Video retrieval method based on multi-modal convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GILAD SHARIR ET.AL: "Video Object Segmentation using Tracked Object Proposals", 《ARXIV:1707.06545V1[CS.CV]》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784164A (en) * 2018-12-12 2019-05-21 北京达佳互联信息技术有限公司 Prospect recognition methods, device, electronic equipment and storage medium
CN109784164B (en) * 2018-12-12 2020-11-06 北京达佳互联信息技术有限公司 Foreground identification method and device, electronic equipment and storage medium
WO2020125495A1 (en) * 2018-12-17 2020-06-25 中国科学院深圳先进技术研究院 Panoramic segmentation method, apparatus and device
CN109800657A (en) * 2018-12-25 2019-05-24 天津大学 A kind of convolutional neural networks face identification method for fuzzy facial image
CN110097568A (en) * 2019-05-13 2019-08-06 中国石油大学(华东) A kind of the video object detection and dividing method based on the double branching networks of space-time
CN110097568B (en) * 2019-05-13 2023-06-09 中国石油大学(华东) Video object detection and segmentation method based on space-time dual-branch network
CN112312203A (en) * 2020-08-25 2021-02-02 北京沃东天骏信息技术有限公司 Video playing method, device and storage medium
CN112312203B (en) * 2020-08-25 2023-04-07 北京沃东天骏信息技术有限公司 Video playing method, device and storage medium
CN113421280A (en) * 2021-05-31 2021-09-21 江苏大学 Method for segmenting reinforcement learning video object by integrating precision and speed
CN113421280B (en) * 2021-05-31 2024-05-14 江苏大学 Reinforced learning video object segmentation method integrating precision and speed

Similar Documents

Publication Publication Date Title
CN107564032A (en) A kind of video tracking object segmentation methods based on outward appearance network
CN109636795B (en) Real-time non-tracking monitoring video remnant detection method
CN111104903B (en) Depth perception traffic scene multi-target detection method and system
CN108985169B (en) Shop cross-door operation detection method based on deep learning target detection and dynamic background modeling
CN109918987B (en) Video subtitle keyword identification method and device
CN105574524B (en) Based on dialogue and divide the mirror cartoon image template recognition method and system that joint identifies
CN103714181B (en) A kind of hierarchical particular persons search method
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN110705412A (en) Video target detection method based on motion history image
CN107977592B (en) Image text detection method and system, user terminal and server
CN111091101B (en) High-precision pedestrian detection method, system and device based on one-step method
CN110008953A (en) Potential target Area generation method based on the fusion of convolutional neural networks multilayer feature
CN111931572B (en) Target detection method for remote sensing image
Conde et al. Exploring vision transformers for fine-grained classification
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN111738338B (en) Defect detection method applied to motor coil based on cascaded expansion FCN network
Sumari et al. Towards practical implementations of person re-identification from full video frames
Diers et al. A survey of methods for automated quality control based on images
Xiu et al. Dynamic-scale graph convolutional network for semantic segmentation of 3d point cloud
CN114419006A (en) Method and system for removing watermark of gray level video characters changing along with background
CN111882545B (en) Fabric defect detection method based on bidirectional information transmission and feature fusion
CN110111358B (en) Target tracking method based on multilayer time sequence filtering
CN114463800A (en) Multi-scale feature fusion face detection and segmentation method based on generalized intersection-parallel ratio
CN109871903B (en) Target detection method based on end-to-end deep network and counterstudy
CN110929632A (en) Complex scene-oriented vehicle target detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20180109

WW01 Invention patent application withdrawn after publication