CN107564032A - A kind of video tracking object segmentation methods based on outward appearance network - Google Patents
A kind of video tracking object segmentation methods based on outward appearance network Download PDFInfo
- Publication number
- CN107564032A CN107564032A CN201710780214.9A CN201710780214A CN107564032A CN 107564032 A CN107564032 A CN 107564032A CN 201710780214 A CN201710780214 A CN 201710780214A CN 107564032 A CN107564032 A CN 107564032A
- Authority
- CN
- China
- Prior art keywords
- network
- outward appearance
- frame
- bounding box
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Landscapes
- Image Analysis (AREA)
Abstract
A kind of video tracking object segmentation methods based on outward appearance network proposed in the present invention, its main contents include:Outward appearance network, object detection network, bounding box filters and training, its process is, each input frame is first set to pass through from the outward appearance network of the Object Segmentation of the classification independence obtained, remove final pond layer and be fully connected layer, connected using jump, multiresolution spatial information is allowed to flow to network end-point from shallow-layer, the output of these sides is connected in network end-point, and by exporting the fusion convolutional layer of neural network forecast, then frame is made to detect network by instance-level semantic object, prospect of the application outward appearance is split to obtain appearance images, then bounding box is filtered using wave filter, finally give segmentation figure picture.Constrained present invention incorporates the output of the outward appearance network and semantic instance once trained detection network, while to result application time, improve the training speed of outward appearance network, while improve the precision of detection and segmentation, greatly improve accuracy.
Description
Technical field
The present invention relates to video object segmentation field, more particularly, to a kind of video tracking object based on outward appearance network
Dividing method.
Background technology
Video object segmentation is a basic problem in computer vision, and before current video signal treatment research
Along one of with focus.Video object segmentation refers to the combination by Video segmentation for some video semanteme objects on time-space domain,
Each frame of video is exactly divided into some different semantic object regions, so as to realize flexibly processing to video.Depending on
Frequency Object Segmentation has broad application prospects, as Video coding, video frequency searching, multimedia operations, image procossing, pattern are known
Not, video compression coding and video database operation etc., traffic flow video monitoring, industrial automation monitoring, peace be can be also used for
In the actual production life such as anti-and network multimedia interaction.The quality of video object segmentation quality directly affects the work in later stage
Make, so, the research of Video Object Segmentation Technology is important and challenging.The single node network that conventional method uses exists
When video bag contains with multiple examples as the object class of annotation, all or several such examples conducts pair can be mistakenly identified
A part for elephant so that segmentation precision declines, and accuracy is not high.
The present invention proposes a kind of video tracking object segmentation methods based on outward appearance network, first makes each input frame from obtaining
Classification independence Object Segmentation outward appearance network by, remove and final pond layer and be fully connected layer, connected using jump,
Allow multiresolution spatial information to flow to network end-point from shallow-layer, connect the output of these sides in network end-point, and pass through output
The fusion convolutional layer of neural network forecast, then makes frame detect network by instance-level semantic object, and prospect of the application outward appearance is split to obtain
Appearance images, then bounding box is filtered using wave filter, finally gives segmentation figure picture.Present invention incorporates once train
Outward appearance network and semantic instance detection network output, while result application time is constrained, improves the instruction of outward appearance network
Practice speed, while improve the precision of detection and segmentation, greatly improve accuracy.
The content of the invention
The problem of for segmentation precision and not high accuracy, it is an object of the invention to provide a kind of based on outward appearance network
Video tracking object segmentation methods, first make each input frame from the outward appearance network of the Object Segmentation of the classification independence obtained by,
Remove final pond layer and be fully connected layer, connected using jump, it is allowed to which multiresolution spatial information flows to network end from shallow-layer
End, the output of these sides is connected in network end-point, and by exporting the fusion convolutional layer of neural network forecast, then frame is passed through example
Level semantic object detection network, prospect of the application outward appearance are split to obtain appearance images, then bounding box are filtered using wave filter
Ripple, finally give segmentation figure picture.
To solve the above problems, the present invention provides a kind of video tracking object segmentation methods based on outward appearance network, it is led
Content is wanted to include:
(1) outward appearance network;
(2) object detection network;
(3) bounding box filters;
(4) train.
Wherein, described outward appearance network, first, outward appearance net of each input frame from the Object Segmentation of the classification independence obtained
Network passes through;Network is based on VGG16 convolutional network frameworks, is converted into the network of a complete convolution;It is different from full convolutional network, be
Holding spatial resolution, final pond layer and is fully connected layer and has been completely removed;
Connected using jump, it is allowed to which multiresolution spatial information flows to network end-point from shallow-layer, and it is thin to improve object outline
Segmentation precision on section;More specifically, the final characteristic pattern in VGG16 each stages is used before the layer of pond, and by itself and single 1
× 1 kernel carries out convolution, obtains the intensity slicing probability graph with current down-sampling stage formed objects, and use bi-linear filter
Original image size is sampled;
Finally, the output of these sides is connected in network end-point, and by exporting the fusion convolutional layer of neural network forecast:Full width ash
Degree segmentation probability graph;In order to realize that Pixel-level is split, softmax graders are balanced by the classification of offer binary class mask
S-shaped cross entropy loss layer replaces.
Wherein, described object detection network, now, frame detect network by instance-level semantic object;The network is by original
The RGB image of beginning produces one group of bounding box as input, and for any object of its discovery, and these bounding boxes belong to what it was supported
The set of classification;Object detection network can separate the example of same object class, so as to allow to select in video correctly
Example, wherein at least one is similar to be chosen by outward appearance network.
Wherein, described bounding box filtering, including the wave filter based on outward appearance, termporal filter and connection component filtering
Device.
Further, the wave filter based on outward appearance, after by two network delivery input frames, one is obtained
The initial fragment prognostic chart and the bounding box of the identified object of some Semantic detection networks obtained from single outward appearance network is built
View;A kind of method for being used to combine the result of two networks is proposed, to the final prediction Object Segmentation figure of each frame in video
Refined.
Further, the described method for being used to combine the result of two networks, first, first image calibration is used
True Data selects the bounding box for belonging to annotation object;Then, by searching for the bounding box suggestion most matched with appearance images,
And the application time continuity in these detections, continuation select correct bounding box in a subsequent frame;
For first image, the Object Segmentation that selection provides with the True Data demarcated by the first frame has optimal weight
Folded Semantic detection (bounding box);By selected classification storage in memory, to scan in a subsequent frame;
For follow-up all frames, the classification only found in the first frame is only frame interested, and remaining is left out;
In the suggestion of remaining detection object, according to the size of the point of interface of union between each bounding box suggestion and appearance images, choosing
Select the detection object of most suitable appearance images prediction.
Further, described termporal filter, the correct bounding box of a semantic object is selected in former frame, may
Its outward appearance can be switched to and predict another object instance overlapping with its semantic bounding box height;In order to further ensure that to border
The correct selection of frame, it will be only filtered by the object's position in the frame and former frame of the point of interface of union threshold value, so as to right
Correct bounding box performs time tracking;
If semantic object detection can not detect any object in the first frame, the first frame annotation is used instead to define side
Boundary's frame;Then for all subsequent frames, the connection component intersected with previous boundary frame is found, and deletes every other fragment,
A new bounding box is finally selected according to selected connection component;After this step terminates, an appearance images and note will be obtained
Release the correct semantic bounding box detection of object.
Further, described connection component wave filter, in the final step of algorithm, the inspection selected in previous steps is used
Survey to limit and strengthen the segmentation figure obtained from outward appearance network;Appearance images are filtered using bounding box, and remove the back of the body
Scape noise;
In order to obtain final prediction (i.e. binary system is predicted) segmentation mask, twice threshold is set for outward appearance segmentation figure, i.e., it is low
Threshold value and high threshold;Then each mask obtained is divided into their connection component.
Further, described Low threshold and high threshold, during first time, using high threshold mask, and delete and previously step
The disjoint all component of bounding box of selection in rapid;This limitation can pair wrong fragment instance similar with annotation object progress
Filtering, or simply filter out noise;
At second, the Low threshold mask that final segmentation mask intersects from the mask with being obtained during first time is added to company
Connected components;
This enhancing operation provides looser threshold value in selected bounding box, according to the Tuscany side with strong and weak edge
Edge detector, weak edge is only selected when being connected with strong edge;It is (high to find the power limited by the selected borderline region of segmentation figure
And low confidence) segmenting pixels, and weak pixel is selected when their connection component intersects with strong pixel.
Wherein, described training, only select outward appearance network to be trained, and the use of momentum is 0.9 for off-line training
Stochastic gradient descent;Using mirror image, rotate and be sized to expanding data;Meanwhile depth supervision is not performed to training, will
Each side output is connected to cross entropy segmentation loss function.
Brief description of the drawings
Fig. 1 is a kind of system framework figure of the video tracking object segmentation methods based on outward appearance network of the present invention.
Fig. 2 is a kind of schematic flow sheet of the video tracking object segmentation methods based on outward appearance network of the present invention.
Fig. 3 is a kind of termporal filter of the video tracking object segmentation methods based on outward appearance network of the present invention.
Fig. 4 is a kind of connection component wave filter of the video tracking object segmentation methods based on outward appearance network of the present invention.
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
Fig. 1 is a kind of system framework figure of the video tracking object segmentation methods based on outward appearance network of the present invention.Main bag
Include outward appearance network, object detection network, bounding box filtering and training.
Outward appearance network, first, each input frame pass through from the outward appearance network of the Object Segmentation of the classification independence obtained;Network
Based on VGG16 convolutional network frameworks, the network of a complete convolution is converted into;It is different from full convolutional network, in order to keep space
Resolution ratio, final pond layer and is fully connected layer and has been completely removed;
Connected using jump, it is allowed to which multiresolution spatial information flows to network end-point from shallow-layer, and it is thin to improve object outline
Segmentation precision on section;More specifically, the final characteristic pattern in VGG16 each stages is used before the layer of pond, and by itself and single 1
× 1 kernel carries out convolution, obtains the intensity slicing probability graph with current down-sampling stage formed objects, and use bi-linear filter
Original image size is sampled;
Finally, the output of these sides is connected in network end-point, and by exporting the fusion convolutional layer of neural network forecast:Full width ash
Degree segmentation probability graph;In order to realize that Pixel-level is split, softmax graders are balanced by the classification of offer binary class mask
S-shaped cross entropy loss layer replaces.
Object detection network, now, frame detect network by instance-level semantic object;The network is by original RGB image
One group of bounding box is produced as input, and for any object of its discovery, these bounding boxes belong to the set for the classification that it is supported;
Object detection network can separate the example of same object class, so as to allow to select correct example in video, wherein extremely
Rare one similar to be chosen by outward appearance network.
Bounding box filtering includes the wave filter based on outward appearance, termporal filter and connection component wave filter.
Wave filter based on outward appearance, after by two network delivery input frames, one is obtained from single outward appearance network
The initial fragment prognostic chart of acquisition and the bounding box suggestion of the identified object of some Semantic detection networks;It is proposed that one kind is used for
The method for combining the result of two networks, is refined to the final prediction Object Segmentation figure of each frame in video.
First, the bounding box for belonging to annotation object is selected using the True Data of first image calibration;Then, pass through
The bounding box suggestion most matched with appearance images, and the application time continuity in these detections are searched for, is continued in follow-up frame
The middle correct bounding box of selection;
For first image, the Object Segmentation that selection provides with the True Data demarcated by the first frame has optimal weight
Folded Semantic detection (bounding box);By selected classification storage in memory, to scan in a subsequent frame;
For follow-up all frames, the classification only found in the first frame is only frame interested, and remaining is left out;
In the suggestion of remaining detection object, according to the size of the point of interface of union between each bounding box suggestion and appearance images, choosing
Select the detection object of most suitable appearance images prediction.
Training, only outward appearance network is selected to be trained, and under the stochastic gradient for the use of momentum being 0.9 for off-line training
Drop;Using mirror image, rotate and be sized to expanding data;Meanwhile depth supervision is not performed to training, each side is exported
It is connected to cross entropy segmentation loss function.
Fig. 2 is a kind of schematic flow sheet of the video tracking object segmentation methods based on outward appearance network of the present invention.First make every
Individual input frame from the outward appearance network of the Object Segmentation of the classification independence obtained by, remove final pond layer and be fully connected layer,
Connected using jump, it is allowed to which multiresolution spatial information flows to network end-point from shallow-layer, and it is defeated to connect these sides in network end-point
Go out, and by exporting the fusion convolutional layer of neural network forecast, frame is detected network, prospect of the application by instance-level semantic object
Outward appearance is split to obtain appearance images, and then bounding box is filtered using wave filter, finally gives segmentation figure picture.
Fig. 3 is a kind of termporal filter of the video tracking object segmentation methods based on outward appearance network of the present invention.Previous
The correct bounding box of a semantic object is selected in frame, it is overlapping with its semantic bounding box height that the prediction of its outward appearance may be switched to
Another object instance;In order to further ensure that the correct selection to bounding box, will only it pass through the point of interface of union threshold value
Frame is filtered with the object's position in former frame, so as to perform time tracking to correct bounding box;
If semantic object detection can not detect any object in the first frame, the first frame annotation is used instead to define side
Boundary's frame;Then for all subsequent frames, the connection component intersected with previous boundary frame is found, and deletes every other fragment,
A new bounding box is finally selected according to selected connection component;After this step terminates, an appearance images and note will be obtained
Release the correct semantic bounding box detection of object.
Fig. 4 is a kind of connection component wave filter of the video tracking object segmentation methods based on outward appearance network of the present invention.
The final step of algorithm, limit using the detection selected in previous steps and strengthen the segmentation figure obtained from outward appearance network;Make
Appearance images are filtered with bounding box, and remove ambient noise;
In order to obtain final prediction (i.e. binary system is predicted) segmentation mask, twice threshold is set for outward appearance segmentation figure, i.e., it is low
Threshold value and high threshold;Then each mask obtained is divided into their connection component.
During first time, using high threshold mask, and disjoint all groups of the bounding box with being selected in previous steps is deleted
Part;This limitation meeting pair wrong fragment instance similar with annotation object is filtered, or simply filters out noise;
At second, the Low threshold mask that final segmentation mask intersects from the mask with being obtained during first time is added to company
Connected components;
This enhancing operation provides looser threshold value in selected bounding box, according to the Tuscany side with strong and weak edge
Edge detector, weak edge is only selected when being connected with strong edge;It is (high to find the power limited by the selected borderline region of segmentation figure
And low confidence) segmenting pixels, and weak pixel is selected when their connection component intersects with strong pixel.
For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention
In the case of refreshing and scope, the present invention can be realized with other concrete forms.In addition, those skilled in the art can be to this hair
Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these improvement and modification also should be regarded as the present invention's
Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention
More and change.
Claims (10)
1. a kind of video tracking object segmentation methods based on outward appearance network, it is characterised in that mainly include outward appearance network (one);
Object detection network (two);Bounding box filters (three);Train (four).
2. based on the outward appearance network (one) described in claims 1, it is characterised in that first, each input frame is from the class obtained
The outward appearance network of not independent Object Segmentation passes through;Network is based on VGG16 convolutional network frameworks, is converted into complete convolution
Network;It is different from full convolutional network, in order to keep spatial resolution, final pond layer and it is fully connected layer and has been completely removed;
Connected using jump, it is allowed to which multiresolution spatial information flows to network end-point from shallow-layer, and improves in object outline details
Segmentation precision;More specifically, the final characteristic pattern in VGG16 each stages is used before the layer of pond, and by itself and single 1 × 1
Kernel carries out convolution, obtains the intensity slicing probability graph with current down-sampling stage formed objects, and with bi-linear filter pair
Original image size is sampled;
Finally, the output of these sides is connected in network end-point, and by exporting the fusion convolutional layer of neural network forecast:Full width gray scale point
Cut probability graph;In order to realize Pixel-level segmentation, the S-shaped that softmax graders are balanced by the classification of offer binary class mask
Cross entropy loss layer replaces.
3. based on the object detection network (two) described in claims 1, it is characterised in that now, frame is semantic by instance-level
Object detection network;The network produces one group of border using original RGB image as input for any object of its discovery
Frame, these bounding boxes belong to the set for the classification that it is supported;Object detection network can separate the example of same object class, from
And allow to select correct example in video, wherein at least one is similar to be chosen by outward appearance network.
4. based on described in claims 1 bounding box filter (three), it is characterised in that including the wave filter based on outward appearance, when
Between wave filter and connection component wave filter.
5. based on the wave filter based on outward appearance described in claims 4, it is characterised in that inputted by two network deliveries
After frame, obtain an initial fragment prognostic chart obtained from single outward appearance network and some Semantic detection networks are identified
The bounding box suggestion of object;A kind of method for being used to combine the result of two networks is proposed, to the final of each frame in video
Prediction Object Segmentation figure is refined.
6. based on the method for being used to combine the result of two networks described in claims 5, it is characterised in that first, use
The True Data of first image calibration selects the bounding box for belonging to annotation object;Then, by search and appearance images most
The bounding box suggestion of matching, and the application time continuity in these detections, continuation select correct border in a subsequent frame
Frame;
For first image, select with the Object Segmentation that the True Data demarcated by the first frame provides with optimal overlapping
Semantic detection (bounding box);By selected classification storage in memory, to scan in a subsequent frame;
For follow-up all frames, the classification only found in the first frame is only frame interested, and remaining is left out;Surplus
During remaining detection object is suggested, according to the size of the point of interface of union between each bounding box suggestion and appearance images, selection is most
It is adapted to the detection object of appearance images prediction.
7. based on the termporal filter described in claims 4, it is characterised in that one semantic object of selection in former frame
Correct bounding box, its outward appearance may be switched to and predict another object instance overlapping with its semantic bounding box height;In order to
The correct selection to bounding box is further ensured that, will only pass through the object's position in the frame and former frame of the point of interface of union threshold value
It is filtered, so as to perform time tracking to correct bounding box;
If semantic object detection can not detect any object in the first frame, the first frame annotation is used instead to define border
Frame;Then for all subsequent frames, the connection component intersected with previous boundary frame is found, and deletes every other fragment, most
A new bounding box is selected according to selected connection component afterwards;After this step terminates, an appearance images and annotation will be obtained
The correct semantic bounding box detection of object.
8. based on the connection component wave filter described in claims 4, it is characterised in that in the final step of algorithm, use elder generation
What is selected in preceding step detects to limit and strengthen the segmentation figure obtained from outward appearance network;Appearance images are carried out using bounding box
Filtering, and remove ambient noise;
In order to obtain final prediction (i.e. binary system is predicted) segmentation mask, twice threshold, i.e. Low threshold are set for outward appearance segmentation figure
And high threshold;Then each mask obtained is divided into their connection component.
9. based on Low threshold and high threshold described in claims 8, it is characterised in that during first time, using high threshold mask,
And delete the disjoint all component of bounding box with being selected in previous steps;This limitation can pair mistake similar with annotation object
Fragment instance is filtered by mistake, or simply filters out noise;
At second, the Low threshold mask that final segmentation mask intersects from the mask with being obtained during first time is added to connection group
Part;
This enhancing operation provides looser threshold value in selected bounding box, is examined according to the Tuscany edge with strong and weak edge
Device is surveyed, weak edge is only selected when being connected with strong edge;It is (high and low to find the power limited by the selected borderline region of segmentation figure
Confidence level) segmenting pixels, and weak pixel is selected when their connection component intersects with strong pixel.
10. based on the training (four) described in claims 1, it is characterised in that only selection outward appearance network is trained, and right
The stochastic gradient descent that momentum is 0.9 is used in off-line training;Using mirror image, rotate and be sized to expanding data;Meanwhile
Depth supervision is not performed to training, the output of each side is connected to cross entropy segmentation loss function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710780214.9A CN107564032A (en) | 2017-09-01 | 2017-09-01 | A kind of video tracking object segmentation methods based on outward appearance network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710780214.9A CN107564032A (en) | 2017-09-01 | 2017-09-01 | A kind of video tracking object segmentation methods based on outward appearance network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107564032A true CN107564032A (en) | 2018-01-09 |
Family
ID=60978742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710780214.9A Withdrawn CN107564032A (en) | 2017-09-01 | 2017-09-01 | A kind of video tracking object segmentation methods based on outward appearance network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107564032A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109784164A (en) * | 2018-12-12 | 2019-05-21 | 北京达佳互联信息技术有限公司 | Prospect recognition methods, device, electronic equipment and storage medium |
CN109800657A (en) * | 2018-12-25 | 2019-05-24 | 天津大学 | A kind of convolutional neural networks face identification method for fuzzy facial image |
CN110097568A (en) * | 2019-05-13 | 2019-08-06 | 中国石油大学(华东) | A kind of the video object detection and dividing method based on the double branching networks of space-time |
WO2020125495A1 (en) * | 2018-12-17 | 2020-06-25 | 中国科学院深圳先进技术研究院 | Panoramic segmentation method, apparatus and device |
CN112312203A (en) * | 2020-08-25 | 2021-02-02 | 北京沃东天骏信息技术有限公司 | Video playing method, device and storage medium |
CN113421280A (en) * | 2021-05-31 | 2021-09-21 | 江苏大学 | Method for segmenting reinforcement learning video object by integrating precision and speed |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106296728A (en) * | 2016-07-27 | 2017-01-04 | 昆明理工大学 | A kind of Segmentation of Moving Object method in unrestricted scene based on full convolutional network |
CN106682108A (en) * | 2016-12-06 | 2017-05-17 | 浙江大学 | Video retrieval method based on multi-modal convolutional neural network |
US20170228617A1 (en) * | 2016-02-04 | 2017-08-10 | Nec Laboratories America, Inc. | Video monitoring using semantic segmentation based on global optimization |
-
2017
- 2017-09-01 CN CN201710780214.9A patent/CN107564032A/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170228617A1 (en) * | 2016-02-04 | 2017-08-10 | Nec Laboratories America, Inc. | Video monitoring using semantic segmentation based on global optimization |
CN106296728A (en) * | 2016-07-27 | 2017-01-04 | 昆明理工大学 | A kind of Segmentation of Moving Object method in unrestricted scene based on full convolutional network |
CN106682108A (en) * | 2016-12-06 | 2017-05-17 | 浙江大学 | Video retrieval method based on multi-modal convolutional neural network |
Non-Patent Citations (1)
Title |
---|
GILAD SHARIR ET.AL: "Video Object Segmentation using Tracked Object Proposals", 《ARXIV:1707.06545V1[CS.CV]》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109784164A (en) * | 2018-12-12 | 2019-05-21 | 北京达佳互联信息技术有限公司 | Prospect recognition methods, device, electronic equipment and storage medium |
CN109784164B (en) * | 2018-12-12 | 2020-11-06 | 北京达佳互联信息技术有限公司 | Foreground identification method and device, electronic equipment and storage medium |
WO2020125495A1 (en) * | 2018-12-17 | 2020-06-25 | 中国科学院深圳先进技术研究院 | Panoramic segmentation method, apparatus and device |
CN109800657A (en) * | 2018-12-25 | 2019-05-24 | 天津大学 | A kind of convolutional neural networks face identification method for fuzzy facial image |
CN110097568A (en) * | 2019-05-13 | 2019-08-06 | 中国石油大学(华东) | A kind of the video object detection and dividing method based on the double branching networks of space-time |
CN110097568B (en) * | 2019-05-13 | 2023-06-09 | 中国石油大学(华东) | Video object detection and segmentation method based on space-time dual-branch network |
CN112312203A (en) * | 2020-08-25 | 2021-02-02 | 北京沃东天骏信息技术有限公司 | Video playing method, device and storage medium |
CN112312203B (en) * | 2020-08-25 | 2023-04-07 | 北京沃东天骏信息技术有限公司 | Video playing method, device and storage medium |
CN113421280A (en) * | 2021-05-31 | 2021-09-21 | 江苏大学 | Method for segmenting reinforcement learning video object by integrating precision and speed |
CN113421280B (en) * | 2021-05-31 | 2024-05-14 | 江苏大学 | Reinforced learning video object segmentation method integrating precision and speed |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107564032A (en) | A kind of video tracking object segmentation methods based on outward appearance network | |
CN109636795B (en) | Real-time non-tracking monitoring video remnant detection method | |
CN111104903B (en) | Depth perception traffic scene multi-target detection method and system | |
CN108985169B (en) | Shop cross-door operation detection method based on deep learning target detection and dynamic background modeling | |
CN109918987B (en) | Video subtitle keyword identification method and device | |
CN105574524B (en) | Based on dialogue and divide the mirror cartoon image template recognition method and system that joint identifies | |
CN103714181B (en) | A kind of hierarchical particular persons search method | |
CN111401293B (en) | Gesture recognition method based on Head lightweight Mask scanning R-CNN | |
CN110705412A (en) | Video target detection method based on motion history image | |
CN107977592B (en) | Image text detection method and system, user terminal and server | |
CN111091101B (en) | High-precision pedestrian detection method, system and device based on one-step method | |
CN110008953A (en) | Potential target Area generation method based on the fusion of convolutional neural networks multilayer feature | |
CN111931572B (en) | Target detection method for remote sensing image | |
Conde et al. | Exploring vision transformers for fine-grained classification | |
CN110852199A (en) | Foreground extraction method based on double-frame coding and decoding model | |
CN111738338B (en) | Defect detection method applied to motor coil based on cascaded expansion FCN network | |
Sumari et al. | Towards practical implementations of person re-identification from full video frames | |
Diers et al. | A survey of methods for automated quality control based on images | |
Xiu et al. | Dynamic-scale graph convolutional network for semantic segmentation of 3d point cloud | |
CN114419006A (en) | Method and system for removing watermark of gray level video characters changing along with background | |
CN111882545B (en) | Fabric defect detection method based on bidirectional information transmission and feature fusion | |
CN110111358B (en) | Target tracking method based on multilayer time sequence filtering | |
CN114463800A (en) | Multi-scale feature fusion face detection and segmentation method based on generalized intersection-parallel ratio | |
CN109871903B (en) | Target detection method based on end-to-end deep network and counterstudy | |
CN110929632A (en) | Complex scene-oriented vehicle target detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180109 |
|
WW01 | Invention patent application withdrawn after publication |