CN110796662A - Real-time semantic video segmentation method - Google Patents

Real-time semantic video segmentation method Download PDF

Info

Publication number
CN110796662A
CN110796662A CN201910859421.2A CN201910859421A CN110796662A CN 110796662 A CN110796662 A CN 110796662A CN 201910859421 A CN201910859421 A CN 201910859421A CN 110796662 A CN110796662 A CN 110796662A
Authority
CN
China
Prior art keywords
frame
segmentation
video
current
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910859421.2A
Other languages
Chinese (zh)
Other versions
CN110796662B (en
Inventor
冯君逸
李颂元
李玺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910859421.2A priority Critical patent/CN110796662B/en
Publication of CN110796662A publication Critical patent/CN110796662A/en
Application granted granted Critical
Publication of CN110796662B publication Critical patent/CN110796662B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a real-time semantic video segmentation method which is used for greatly accelerating a semantic segmentation algorithm of a video. The method specifically comprises the following steps: 1) acquiring a plurality of groups of data sets for training semantic segmentation, and defining an algorithm target; 2) training a lightweight image semantic segmentation CNN model; 3) decoding an original video to obtain a residual error image, a motion vector and an RGB image; 4) if the current frame is an I frame, sending the I frame to the segmentation model obtained in the step 2) to obtain a complete segmentation result; 5) if the current frame is a P frame, transmitting the segmentation result of the previous frame to the current frame by using the motion vector, and selecting a sub-block of the current frame for correction by using a residual error map; 6) and repeating the steps 4) and 5) until the segmentation of all the video frames is completed. The method makes full use of the correlation of adjacent frames in the video, and the accelerated processing based on the compressed domain information can rapidly complete the complex segmentation task and simultaneously keep higher accuracy, and the efficiency is improved by tens of times compared with the common segmentation method.

Description

Real-time semantic video segmentation method
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a real-time semantic video segmentation method.
Background
Semantic video segmentation is a computer vision task that assigns a semantic category to each pixel of each frame of a video. In real-time semantic video segmentation, certain requirements are put on the segmentation speed, and generally more than 24 frames per second. Current advanced semantic video segmentation methods are Convolutional Neural Network (CNN) based machine learning methods, which in turn can be broadly divided into two categories, continuous image frame based and direct video based. The first method treats video as a sequence of image frames, which trade off a point segmentation accuracy for real-time semantic segmentation performance by reducing the scale of input data or cutting the network. This type of method does not exploit the inter-frame coherence implied by the video. The second method extracts inter-frame coherent features on a video through technologies such as optical flow, 3DCNN, RNN and the like, but the technologies are time-consuming and can become bottlenecks of semantic video segmentation.
In fact, the existing compressed video itself already contains inter-frame coherence information, i.e. motion vectors (Mv) and residual information (Res). This information is very fast to obtain and with them the speed of semantic video segmentation can be greatly increased. However, inter-frame coherent information provided by compressed video has larger noise compared with optical flow and other technologies, and how to utilize the compressed information and ensure accurate segmentation becomes a key problem solved by the method.
Disclosure of Invention
In order to solve the above problems, the present invention provides a real-time semantic video segmentation method. The method is based on a deep neural network, based on an image semantic segmentation model, and further utilizes strong correlation between adjacent picture frames in the video and multi-modal motion information in a video compression domain to carry out rapid inference, thereby realizing a real-time semantic video segmentation effect.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a real-time semantic video segmentation method comprises the following steps:
s1, acquiring a plurality of groups of videos for training semantic segmentation, and defining an algorithm target;
s2, training a lightweight image semantic segmentation CNN model;
s3, decoding the video to obtain a residual error image, a motion vector and an RGB image;
s4, for the current frame in the video, if the current frame is an I frame, sending the RGB image to the image semantic segmentation CNN model after training in S2 to obtain a complete segmentation result;
s5, for the current frame in the video, if the current frame is a P frame, transmitting the segmentation result of the previous frame to the current frame by using the motion vector, and selecting a sub-block of the current frame for correction by using a residual error map;
s6, repeating the steps S3 and S4 for all frames in the video until the segmentation of all video frames is completed.
Further, in step S1, for each video V for video semantic segmentation, defining an algorithm target as: the classification of all the pixels of each frame of image in the video V is detected.
Further, in step S2, the training of the lightweight image semantic segmentation CNN model specifically includes:
s21, carrying out classification extraction on each pixel in the image by using a convolutional neural network phi of a single picture, wherein the classification prediction result of the image I processed by the convolutional neural network phi is phi (I);
and S22, calculating cross entropy loss according to the prediction and the given classification label to optimize parameters in the network phi.
Further, in step S3, the video is encoded and decoded by using the MPEG-4 video encoding and decoding standard, and the group of pictures GOP parameter g and the B frame ratio β are set, where the decoding process is as follows when the current frame time is t:
s31, if the current t frame is an I frame, directly decoding to obtain an RGB image I (t) of the current t frame;
s32, if the current t frame is a P frame, firstly, partially decoding to obtain a motion vector Mv (t) and a residual vector Res (t), and further decoding to obtain an RGB image I (t) according to the translation and compensation transformation of a pixel domain.
Further, in step S4, if the current t-th frame is an I-frame, the current t-th frame is semantically segmented according to the following algorithm:
and S41, sending the current RGB image I (t) into the image semantic segmentation CNN model trained in S2 for prediction to obtain a semantic segmentation result F (t) ═ phi (I (t)).
Further, in step S5, if the current t-th frame is a P frame, the current t-th frame is semantically segmented according to the following algorithm:
s51, performing pixel domain translation on the segmentation result F (t-1) of the previous frame by using the motion vector Mv (t) of the current frame to obtain the segmentation result of the current frame:
F(t)[p]=F(t-1)[p-Mv(t)[p]]
wherein: f (t) p represents the value of the pixel position p in the segmentation result F (t) of the current t-th frame obtained after translation; p is a pixel coordinate; mv (t) [ p ] denotes a value at a pixel position p in the motion vector diagram mv (t) of the current t-th frame;
s52, utilizing the residual error map Res (t) of the current frame to obtain all the sub-regions R of the current frameiAnd selecting the subarea with the most pixel points and the residual value larger than the threshold value as the subarea R (t) to be subdivided:
Figure BDA0002199246040000031
wherein R isiRepresenting the ith candidate sub-region; res (t) [ p ]]Represents the residual value at pixel position p in residual map res (t); THR is an artificially set threshold; indicator represents an Indicator function, if | Res (t) [ p ]]|>If THR is established, the value is 1, otherwise, the value is 0;
s53, sending the sub-region R (t) obtained in S52 into the image semantic segmentation CNN model trained in S2 for re-segmentation to obtain a new semantic segmentation result F of the sub-regionR(t):
FR(t)=φ(I(t)[R(t)])
Wherein i (t) [ r (t) ] represents an RGB image of the r (t) sub-region;
s54, updating the segmentation result of the R (t) sub-region in the current frame according to the segmentation result of the sub-region obtained in the step S53:
F(t)[R(t)]=FR(t)
wherein, F (t) [ R (t) ] represents the segmentation result of the R (t) sub-area in the current t-th frame.
The efficiency of the non-key frame segmentation algorithm based on the step of S5 is much higher than that of a method for performing segmentation processing on frames by CNN, and the processing speed of the method for P frames is dozens of times higher than that of the method for performing frame-by-frame processing by avoiding performing redundant feature extraction on similar images.
The method makes full use of the correlation of adjacent frames in the video, and the accelerated processing based on the compressed domain information can rapidly complete the complex segmentation task and simultaneously keep higher accuracy, and the efficiency is improved by tens of times compared with the common segmentation method.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
As shown in fig. 1, a real-time semantic video segmentation method includes the following steps:
s1, acquiring a plurality of groups of videos for training semantic segmentation, and defining an algorithm target. In this step, for each video V used for video semantic segmentation, the algorithm objective is defined as: the classification of all the pixels of each frame of image in the video V is detected.
S2, training a lightweight image semantic segmentation CNN model. In this step, training the lightweight image semantic segmentation CNN model specifically includes:
s21, carrying out classification extraction on each pixel in the image by using a convolutional neural network phi of a single picture, wherein the classification prediction result of the image I processed by the convolutional neural network phi is phi (I);
and S22, calculating cross entropy loss according to the prediction and the given classification label to optimize parameters in the network phi.
S3, decoding the video to obtain a residual error image, a motion vector and an RGB image, in the step, encoding and decoding the video by using an MPEG-4 video encoding and decoding standard, setting a group of pictures (GOP) parameter g and a B frame ratio β, wherein if the current frame time is t, the decoding process needs to distinguish whether the current frame is an I frame or a P frame, and the respective decoding process is as follows:
s31, if the current t frame is an I frame, directly decoding to obtain an RGB image I (t) of the current t frame;
s32, if the current t frame is a P frame, firstly, partially decoding to obtain a motion vector Mv (t) and a residual vector Res (t), and further decoding to obtain an RGB image I (t) according to the translation and compensation transformation of a pixel domain.
And S4, for the current frame in the video, if the current frame is an I frame, sending the RGB image to the image semantic segmentation CNN model after training in S2 to obtain a complete segmentation result.
In this step, if the current t-th frame is an I-frame, semantic segmentation is performed on the t-th frame according to the following algorithm:
and S41, sending the current RGB image I (t) into the image semantic segmentation CNN model trained in S2 for prediction to obtain a semantic segmentation result F (t) ═ phi (I (t)).
And S5, for the current frame in the video, if the current frame is a P frame, transmitting the segmentation result of the previous frame to the current frame by using the motion vector, and selecting the sub-block of the current frame for correction by using the residual error map.
In this step, if the current t-th frame is a P frame, semantic segmentation is performed on the current t-th frame according to the following algorithm:
s51, performing pixel domain translation on the segmentation result F (t-1) of the previous frame by using the motion vector Mv (t) of the current frame to obtain the segmentation result of the current frame:
F(t)[p]=F(t-1)[p-Mv(t)[p]]
wherein: f (t) p represents the value of the pixel position p in the segmentation result F (t) of the current t-th frame obtained after translation; p is a pixel coordinate; mv (t) [ p ] denotes a value at a pixel position p in the motion vector diagram mv (t) of the current t-th frame;
and S52, the current frame image can be subjected to gridding treatment to equally divide the length and the width of the image respectively to form a plurality of sub-blocks, namely sub-regions. From all sub-regions R of the current frame, using the residual map Res (t) of the current frameiAnd selecting the subarea with the most pixel points and the residual value larger than the threshold value as the subarea R (t) to be subdivided:
wherein R isiRepresenting the ith candidate sub-region; res (t) [ p ]]Represents the residual value at pixel position p in residual map res (t); THR is an artificially set threshold; indicator represents an Indicator function, if | Res (t) [ p ]]|>If THR is established, the value is 1, otherwise, the value is 0;
s53, for the sub-region r (t) obtained in S52, we consider that it is changed greatly compared with the previous frame and it is difficult to describe this change by the motion vector, so it is subdivided. Therefore, the RGB image of the sub-region is sent into the image semantic segmentation CNN model trained in S2 for re-segmentation, and a new semantic segmentation result F of the sub-region is obtainedR(t):
FR(t)=φ(I(t)[R(t)])
Wherein i (t) [ r (t) ] represents an RGB image of the r (t) sub-region;
s54, updating the segmentation result of the R (t) sub-region in the current frame according to the segmentation result of the sub-region obtained in the step S53:
F(t)[R(t)]=FR(t)
wherein, F (t) [ R (t) ] represents the segmentation result of the R (t) sub-area in the current t-th frame. The remaining subregion segmentation results, except for the r (t) subregions, remain unchanged.
The efficiency of the non-key frame segmentation algorithm based on the steps is much higher than that of a method for segmenting the similar images frame by frame through CNN, and the processing speed of the method for the P frame is dozens of times higher than that of the method for processing the P frame by avoiding redundant feature extraction of the similar images.
S6, repeating the steps S3 and S4 for all frames in the video until the video stream processing is finished, and finishing the semantic segmentation of all video frames.
In the embodiment, the semantic video segmentation method firstly trains a convolution neural network model for semantic segmentation of a static picture, on the basis, the strong correlation between the front frame and the rear frame of the video is utilized, the motion information of a video compression domain is fully explored, the problem of feature extraction and classification is converted into the problem of pixel movement between adjacent video frames, and the sub-region possibly generating larger errors is finely segmented based on the principle of the compression model, so that the high accuracy is maintained while the high model operation speed is achieved.
The method has very strong generalization capability, and the framework can be applied to other pixel domain identification tasks of more videos, including video target detection, video instance segmentation, video panorama segmentation and the like. The speed of the model does not depend on a specific CNN network structure, and the speed of the high-precision model and the light-weight model is improved by several times to tens of times.
Examples
The following simulation experiment is performed based on the above method, and the implementation method of this embodiment is as described above, and the specific steps are not described in detail, and only the experimental results are shown below.
ICNet is used as a lightweight image semantic segmentation CNN model in the embodiment, multiple experiments are carried out on a semantic segmentation public data set Cityscapes, the experiments comprise 5000 video short segments, the fact that the method can obviously improve the efficiency of semantic video segmentation and guarantee accuracy is proved, in an algorithm, a picture group GOP parameter g is set to be 12, and the B frame ratio β is set to be 0.
Compared with the traditional method of performing segmentation processing by CNN frame by frame, the method of the invention is mainly distinguished whether the compressed domain operation of S3-S5 is performed or not according to the algorithm flow. The effects of the two methods are shown in table 1.
TABLE 1 Effect of the invention on the Cityscapes dataset
Figure BDA0002199246040000081
Therefore, through the technical scheme, the embodiment of the invention develops a real-time semantic video segmentation method based on the deep learning technology. The invention can fully utilize the motion information in the video compression domain to model the correlation relation of adjacent frames in the video, and further uses the correlation to reduce redundant calculation, thereby greatly accelerating the model speed of video semantic segmentation.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (6)

1. A real-time semantic video segmentation method is characterized by comprising the following steps:
s1, acquiring a plurality of groups of videos for training semantic segmentation, and defining an algorithm target;
s2, training a lightweight image semantic segmentation CNN model;
s3, decoding the video to obtain a residual error image, a motion vector and an RGB image;
s4, for the current frame in the video, if the current frame is an I frame, sending the RGB image to the image semantic segmentation CNN model after training in S2 to obtain a complete segmentation result;
s5, for the current frame in the video, if the current frame is a P frame, transmitting the segmentation result of the previous frame to the current frame by using the motion vector, and selecting a sub-block of the current frame for correction by using a residual error map;
s6, repeating the steps S3 and S4 for all frames in the video until the segmentation of all video frames is completed.
2. The real-time semantic video segmentation method according to claim 1, wherein in step S1, for each video V used for video semantic segmentation, an algorithm object is defined as: the classification of all the pixels of each frame of image in the video V is detected.
3. The real-time semantic video segmentation method according to claim 2, wherein in step S2, training the lightweight image semantic segmentation CNN model specifically includes:
s21, carrying out classification extraction on each pixel in the image by using a convolutional neural network phi of a single picture, wherein the classification prediction result of the image I processed by the convolutional neural network phi is phi (I);
and S22, calculating cross entropy loss according to the prediction and the given classification label to optimize parameters in the network phi.
4. The real-time semantic video segmentation method of claim 3, wherein in step S3, the video is encoded and decoded by using MPEG-4 video encoding and decoding standard, and the GOP parameter g and B frame ratio β are set, and when the current frame time is t, the decoding process is as follows:
s31, if the current t frame is an I frame, directly decoding to obtain an RGB image I (t) of the current t frame;
s32, if the current t frame is a P frame, firstly, partially decoding to obtain a motion vector Mv (t) and a residual vector Res (t), and further decoding to obtain an RGB image I (t) according to the translation and compensation transformation of a pixel domain.
5. The real-time semantic video segmentation method according to claim 4, wherein in step S4, if the current t-th frame is an I-frame, the current t-th frame is semantically segmented according to the following algorithm:
and S41, sending the current RGB image I (t) into the image semantic segmentation CNN model trained in S2 for prediction to obtain a semantic segmentation result F (t) ═ phi (I (t)).
6. The real-time semantic video segmentation method according to claim 5, wherein in step S5, if the current t-th frame is a P-frame, the current t-th frame is semantically segmented according to the following algorithm:
s51, performing pixel domain translation on the segmentation result F (t-1) of the previous frame by using the motion vector Mv (t) of the current frame to obtain the segmentation result of the current frame:
F(t)[p]=F(t-1)[p-Mv(t)[p]]
wherein: f (t) p represents the value of the pixel position p in the segmentation result F (t) of the current t-th frame obtained after translation; p is a pixel coordinate; mv (t) [ p ] denotes a value at a pixel position p in the motion vector diagram mv (t) of the current t-th frame;
s52, utilizing the residual error map Res (t) of the current frame to obtain all the sub-regions R of the current frameiAnd selecting the subarea with the most pixel points and the residual value larger than the threshold value as the subarea R (t) to be subdivided:
Figure FDA0002199246030000021
wherein R isiRepresenting the ith candidate sub-region; res (t) [ p ]]Represents the residual value at pixel position p in residual map res (t); THR is an artificially set threshold; indicator represents an Indicator function, if | Res (t) [ p ]]|>If THR is established, the value is 1, otherwise, the value is 0;
s53, sending the sub-region R (t) obtained in S52 into the image semantic segmentation CNN model trained in S2 for re-segmentation to obtain a new semantic segmentation result F of the sub-regionR(t):
FR(t)=φ(I(t)[R(t)])
Wherein i (t) [ r (t) ] represents an RGB image of the r (t) sub-region;
s54, updating the segmentation result of the R (t) sub-region in the current frame according to the segmentation result of the sub-region obtained in the step S53:
F(t)[R(t)]=FR(t)
wherein, F (t) [ R (t) ] represents the segmentation result of the R (t) sub-area in the current t-th frame.
CN201910859421.2A 2019-09-11 2019-09-11 Real-time semantic video segmentation method Active CN110796662B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910859421.2A CN110796662B (en) 2019-09-11 2019-09-11 Real-time semantic video segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910859421.2A CN110796662B (en) 2019-09-11 2019-09-11 Real-time semantic video segmentation method

Publications (2)

Publication Number Publication Date
CN110796662A true CN110796662A (en) 2020-02-14
CN110796662B CN110796662B (en) 2022-04-19

Family

ID=69427102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910859421.2A Active CN110796662B (en) 2019-09-11 2019-09-11 Real-time semantic video segmentation method

Country Status (1)

Country Link
CN (1) CN110796662B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985456A (en) * 2020-09-10 2020-11-24 上海交通大学 Video real-time identification, segmentation and detection architecture
CN112084949A (en) * 2020-09-10 2020-12-15 上海交通大学 Video real-time identification segmentation and detection method and device
CN112364822A (en) * 2020-11-30 2021-02-12 重庆电子工程职业学院 Automatic driving video semantic segmentation system and method
CN112990273A (en) * 2021-02-18 2021-06-18 中国科学院自动化研究所 Compressed domain-oriented video sensitive character recognition method, system and equipment
CN113486697A (en) * 2021-04-16 2021-10-08 成都思晗科技股份有限公司 Forest smoke and fire monitoring method based on space-based multi-modal image fusion
CN115294489A (en) * 2022-06-22 2022-11-04 太原理工大学 Semantic segmentation method and system for disaster video data
CN115713625A (en) * 2022-11-18 2023-02-24 盐城众拓视觉创意有限公司 Method for rapidly combining teaching real-recorded video and courseware background into film
WO2023154007A3 (en) * 2022-02-11 2023-10-26 脸萌有限公司 Feature extraction method and apparatus for video, slicing method and apparatus for video, and electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120294530A1 (en) * 2010-01-22 2012-11-22 Malavika Bhaskaranand Method and apparatus for video object segmentation
US20130155228A1 (en) * 2011-12-19 2013-06-20 Industrial Technology Research Institute Moving object detection method and apparatus based on compressed domain
US20150256850A1 (en) * 2014-03-10 2015-09-10 Euclid Discoveries, Llc Continuous Block Tracking For Temporal Prediction In Video Encoding
CN108256511A (en) * 2018-03-15 2018-07-06 太原理工大学 Body movement detection method based on Video coding code stream

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120294530A1 (en) * 2010-01-22 2012-11-22 Malavika Bhaskaranand Method and apparatus for video object segmentation
US20130155228A1 (en) * 2011-12-19 2013-06-20 Industrial Technology Research Institute Moving object detection method and apparatus based on compressed domain
US20150256850A1 (en) * 2014-03-10 2015-09-10 Euclid Discoveries, Llc Continuous Block Tracking For Temporal Prediction In Video Encoding
CN108256511A (en) * 2018-03-15 2018-07-06 太原理工大学 Body movement detection method based on Video coding code stream

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
FEDERICO PERAZZI ET AL: "《Learning Video Object Segmentation from Static Images》", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
JAIN S ET AL: "《Fast Semantic Segmentation on Video Using Block Motion-Based Feature Interpolation》", 《15TH EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV)》 *
XIZHOU ZHU ET AL: "《Towards High Performance Video Object Detection》", 《 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
ZHENGTAO TAN ET AL: "《Real Time Compressed Video Object Segmentation》", 《2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 *
ZOUWU NING ET AL: "《Visual attention based video object segmentation in MPEG compressed domain》", 《2007 IET CONFERENCE ON WIRELESS, MOBILE AND SENSOR NETWORKS (CCWMSN07)》 *
冯杰: "《基于H.264压缩域的视频分割与特征提取方法研究》", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *
孔祥鹏: "《基于H.264压缩域的运动对象分割提取方法研究》", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
孙涛: "《基于压缩域的运动对象分割技术研究》", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
杨高波 等: "《MPEG-4 框架下的视频对象分割及其关键技术分析》", 《通信学报》 *
陆宇: "《基于H.264压缩域的视频对象分割》", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *
陈薇薇: "《MPEG-2压缩域运动矢量的致密化及运动对象分割算法研究》", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985456A (en) * 2020-09-10 2020-11-24 上海交通大学 Video real-time identification, segmentation and detection architecture
CN112084949A (en) * 2020-09-10 2020-12-15 上海交通大学 Video real-time identification segmentation and detection method and device
CN112084949B (en) * 2020-09-10 2022-07-19 上海交通大学 Video real-time identification segmentation and detection method and device
CN112364822A (en) * 2020-11-30 2021-02-12 重庆电子工程职业学院 Automatic driving video semantic segmentation system and method
CN112990273A (en) * 2021-02-18 2021-06-18 中国科学院自动化研究所 Compressed domain-oriented video sensitive character recognition method, system and equipment
CN113486697A (en) * 2021-04-16 2021-10-08 成都思晗科技股份有限公司 Forest smoke and fire monitoring method based on space-based multi-modal image fusion
CN113486697B (en) * 2021-04-16 2024-02-13 成都思晗科技股份有限公司 Forest smoke and fire monitoring method based on space-based multimode image fusion
WO2023154007A3 (en) * 2022-02-11 2023-10-26 脸萌有限公司 Feature extraction method and apparatus for video, slicing method and apparatus for video, and electronic device and storage medium
CN115294489A (en) * 2022-06-22 2022-11-04 太原理工大学 Semantic segmentation method and system for disaster video data
CN115713625A (en) * 2022-11-18 2023-02-24 盐城众拓视觉创意有限公司 Method for rapidly combining teaching real-recorded video and courseware background into film

Also Published As

Publication number Publication date
CN110796662B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
CN110796662B (en) Real-time semantic video segmentation method
Tu et al. Action-stage emphasized spatiotemporal VLAD for video action recognition
US11398037B2 (en) Method and apparatus for performing segmentation of an image
US8983178B2 (en) Apparatus and method for performing segment-based disparity decomposition
CN106331723B (en) Video frame rate up-conversion method and system based on motion region segmentation
CN108615241B (en) Rapid human body posture estimation method based on optical flow
CN111310594B (en) Video semantic segmentation method based on residual error correction
JP2018507477A (en) Method and apparatus for generating initial superpixel label map for image
CN108200432A (en) A kind of target following technology based on video compress domain
US20040062440A1 (en) Sprite recognition in animated sequences
CN108764177B (en) Moving target detection method based on low-rank decomposition and representation joint learning
Zhao et al. Transformer-based self-supervised monocular depth and visual odometry
CN104202606B (en) One kind determines method based on HEVC estimation starting points
CN111292357B (en) Video inter-frame rapid motion estimation method based on correlation filtering
Jing et al. Video prediction: a step-by-step improvement of a video synthesis network
Sheng et al. VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision
CN110853040B (en) Image collaborative segmentation method based on super-resolution reconstruction
CN113920170A (en) Pedestrian trajectory prediction method and system combining scene context and pedestrian social relationship and storage medium
CN114419729A (en) Behavior identification method based on light-weight double-flow network
Nemcev et al. Modified EM-algorithm for motion field refinement in motion compensated frame interpoliation
Luo et al. Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization
Chu et al. A basis-background subtraction method using non-negative matrix factorization
Gao et al. Object-Centric Voxelization of Dynamic Scenes via Inverse Neural Rendering
US11967083B1 (en) Method and apparatus for performing segmentation of an image
Xiang et al. A CNNs-based method for optical flow estimation with prior constraints and stacked U-Nets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant