WO2016016033A1 - Method and apparatus for interactive video segmentation - Google Patents
Method and apparatus for interactive video segmentation Download PDFInfo
- Publication number
- WO2016016033A1 WO2016016033A1 PCT/EP2015/066540 EP2015066540W WO2016016033A1 WO 2016016033 A1 WO2016016033 A1 WO 2016016033A1 EP 2015066540 W EP2015066540 W EP 2015066540W WO 2016016033 A1 WO2016016033 A1 WO 2016016033A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- superpixels
- frames
- sequence
- superpixel
- information related
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/187—Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/24—Indexing scheme for image data processing or generation, in general involving graphical user interfaces [GUIs]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20101—Interactive definition of point of interest, landmark or seed
Definitions
- the present solution relates to a method and an apparatus for interactive video segmentation. More specifically, a method and an apparatus for generating segmentation masks for a sequence of frames based on temporally consistent superpixels are described .
- Video segmentation is complex and often time- and memory- consuming, especially for high-resolution images.
- Superpixel algorithms represent a very useful and increasingly popular preprocessing step for video segmentation, but also for a wide range of other computer vision applications, such as tracking, multi-view object segmentation, scene flow, 3D layout
- a method for generating segmentation masks for a sequence of frames based on temporally consistent superpixels comprises:
- a computer readable storage medium has stored therein instructions enabling generating segmentation masks for a sequence of frames based on temporally consistent
- a superpixel unit configured to obtain temporally consistent superpixels for the sequence of frames
- a display unit configured to display temporally consistent superpixels and further information related to the displayed superpixels for a selected frame from the sequence of frames to a user;
- a user interface configured to capture a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the selected
- information related to the selected superpixels is provided in a superpixel table.
- This table gives an easily accessible overview on the selected superpixels to the user and can be used to manipulate the selection of superpixels.
- the proposed solution introduces a fast way to interactively segment video sequences and generate segmentation masks.
- the selection and tracking of regions in frame sequences is based on temporally consistent superpixels, which are obtained, for example, by applying a superpixel algorithm to the sequence of frames or by retrieving existing temporally consistent
- the video segmentation process can be split into two steps, i.e. an automatic offline-processing (batch-processing) for superpixel generation and a real-time interactive video segmentation using these superpixels.
- Segmentation masks for frames of the sequence of frames other than the selected frame are generated using label identifiers of the selected superpixels. In this way the temporal
- one or more start frames and end frames of the sequence of frames are set for a superpixel to limit tracking of the superpixel to selected ranges of frames. This allows the user to restrict tracking to a subsequence of the sequence of frames. In this way the user may accurately specify which superpixel shall be considered at which point in time for generating a segmentation mask.
- user inputs to select a further superpixel for a frame of the sequence of frames other than the selected frame or to remove a selected superpixel are captured.
- Each further selected superpixel is added to the superpixel table with the start frame set to the current frame.
- user inputs to group two or more of the selected superpixels are captured.
- information on selected superpixels is stored in a file. This information can be used as input for subsequent processing steps and allows resuming the superpixel selection at a later time.
- the generated segmentation masks are made available via an output or stored, e.g. as image files. Also these segmentation masks can be used as input for subsequent processing steps.
- Fig. 1 shows an example of a frame
- Fig. 2 depicts a superpixel label map corresponding to the frame of Fig. 1 ;
- Fig. 3 depicts the main elements of a graphical user interface of a video segmentation tool
- Fig. 4 illustrates the GUI showing the first frame of a sequence with the superpixel boundaries as overlay
- Fig. 5 depicts the GUI of Fig. 4 toggled to an original view
- Fig. 6 shows navigation and zoom buttons of the GUI
- Fig. 7 shows a superpixel table with exemplary
- Fig. 8 depicts highlighted selected superpixels in a frame, whose end frame number is identical to the current frame number
- Fig. 9 illustrates grouping of superpixels
- Fig. 11 depicts a segmentation mask resulting from two selected groups of superpixels
- Figs. 12 to 16 illustrate the selection of an object in a frame
- Fig. 17 shows a selected region after setting an end
- Fig. 18 depicts a group resulting from grouping the
- Fig. 19 shows exemplary segmentation masks obtained for the group of Fig. 18;
- Fig. 20 schematically illustrates one embodiment of a
- Fig. 21 shows a first embodiment of an apparatus
- Fig. 22 schematically illustrates a second embodiment of an apparatus configured to perform the method of
- GUI Qt-graphical user interface
- the implemented solution requires as input a frame sequence and a corresponding sequence of superpixel label maps.
- These superpixel label maps can be generated using, for example, the algorithm described in [1], either beforehand in an independent superpixel generating step or upon reception of the frame sequence by the interactive video segmentation tool.
- Fig. 1 shows an example of a frame
- Fig. 2 depicts a
- the superpixel labels are coded by grey values.
- Fig. 3 depicts the main elements of the GUI 1 of the tool. The largest part of the GUI 1 is occupied by a frame area 2.
- buttons 3 Located above the frame area 2 is a button area 3 comprising a variety of buttons.
- a superpixel table 4 which shows information about selected superpixels .
- the tool allows to playback, pause or go step by step through the sequence by clicking the
- buttons 5 or using keyboard shortcuts.
- a slider 6 that is below the navigation buttons 1 can be used.
- zoom buttons 7 it is possible to zoom in and out, bring the view to the original size of the frame again or fit it to the current window size.
- the user can start with the interactive video segmentation.
- To segment an object the user just has to select the region of the object.
- the selection of a frame region is based on the selection of superpixels.
- Deselecting superpixels works in a similar way as selecting them. The only difference is that it is additionally necessary to press the shift key and then left-click the superpixel to remove it from the selection. It also works for continuously deselecting superpixels in case the shift key is held pressed and the mouse is dragged holding the left mouse button down over the superpixels that should be deselected. Deselected superpixels will also be removed from the superpixel table 4 on the right side.
- Fig. 7 shows the superpixel table 4 with exemplary superpixels in more detail. It contains the following information about the selected superpixels:
- the label of the superpixel is an identifier for the temporally consistent superpixel. It is calculated, for example, using the unique RGB color of the superpixel in the superpixel label map.
- the start and end frame number for a superpixel indicate the ( sub) sequence of frames, i.e. the time slot, for which the superpixel should be tracked.
- start frame number is set to the current frame number
- - end frame number is set to the frame number of the last frame in the sequence.
- selected superpixels in the frame are highlighted using the unique label grey value of - li the superpixels.
- An example is depicted in Fig. 8, where the highlighted superpixels are those in the hat of the mannequin visible in the area identified by the white rectangle.
- the label identifier of the superpixels is used to propagate the selected region across subsequent frames of the frame sequence. Thus in subsequent frames the superpixels with the same identifier are also selected. Stepping forward, using play, or the slider to navigate to a subsequent frame shows the propagation of the selected region.
- the start and end frame can be used to refine the selection in the subsequent frames.
- a superpixel can have multiple time slots, each time slot having its own start and end frame number. Thus, it is not only possible to exclude a superpixel from the tracking at frame 7+1. It is also possible to
- the video segmentation tool is not restricted to handle only one region. Different regions can be identified using the group number.
- the group number preferably is an integer value between 1 and 255. It is used, for example, during the generation of the segmentation masks to distinguish the different regions. By default, the group number is 1. If the user does not change the group number, all selected superpixels will have a grey value of 1 in the generated segmentation masks. If the user wants to create segmentation masks with multiple separate regions, the group feature of the tool should be used.
- Figs. 9 to 11 show an example in which the hats of the two mannequins on the right are tracked and each region gets its own group number. Figs.
- 9 and 10 show the process of setting the group identifier for the hat of the mannequin in the middle.
- the user selects appropriate superpixels in the superpixel table 4 and clicks the x Group' button below the superpixel table 4. As a visual help the superpixels selected in the table are
- a group dialog that appears when the x Group' button is selected the user enters an integer value between 1 and 255.
- This group identifier can be used, for example, as a grey value in the segmentation mask.
- Fig. 11 depicts the segmentation mask for the
- the segmentation tool provides the functionality to assess the propagation of the selected regions.
- the navigation features play, pause, step, and slider
- the user can (re-) play the complete sequence and pause at frames, in which the tracked regions need a further inspection, or simply step directly through the complete sequence.
- the zoom-feature as well as the switching of the views is helpful.
- the user can export them as either text files or segmentation masks, which are generated as grey-scale images.
- the generated text file will contain information about the selected
- the regions exported as text files can be loaded into the tool again. This is especially useful if the user wants to resume the segmentation work at a later point in time, share the work with others, or create multiple differing versions.
- the user has to click the x Image' button below the superpixel table 4.
- the user chooses an output directory, a bit depth for the grey-scale images (8 bit or 24 bit) and then clicks x Start' .
- processing the generated images are available in the output directory .
- the end frame number should be set as intended to stop the tracking of the selected superpixels in the frame 17.
- the end frame number is either set by right- clicking the superpixels in frame 17 or by directly editing their end frame number in the superpixel table.
- the superpixels, whose end frame number is equal to the frame number of the displayed frame are highlighted using the unique label grey value.
- the selected region looks as shown in Fig. 17. Once the selected region is correct over the frames, groups are created and unique numbers are set for different selected regions.
- the user selects the lines with the superpixels belonging to a region in the superpixel table 4 and clicks the x Group' button. In the present case these are all lines in the table. The user then enters a group number and clicks OK. The resulting group is depicted in Fig. 18.
- the segmentation masks can be exported as described above. In the present case, the generated segmentation masks look like illustrated in
- superpixels is schematically illustrated in Fig. 20.
- a sequence of frames is retrieved 10, e.g. from a network or from a local storage.
- Temporally consistent superpixels for the sequence of frames are then obtained 11, e.g. by applying a superpixel algorithm to the sequence of frames or by retrieving existing temporally consistent superpixels provided for the sequence of frames.
- the superpixels for a selected frame and further information related to the displayed superpixels are displayed 12 to a user.
- the method proceeds with capturing 13 a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the selected superpixels.
- Fig. 21 schematically illustrates one embodiment of an
- the apparatus 20 for generating segmentation masks for a sequence of frames based on temporally consistent superpixels.
- the apparatus 20 comprises an input 21 for retrieving 10 a sequence of frames, e.g. from a network or from a local storage.
- a superpixel unit 22 obtains 11 temporally consistent superpixels for the sequence of frames, e.g. by applying a superpixel algorithm to the sequence of frames or by retrieving existing temporally consistent superpixels provided for the sequence of frames.
- Via a display unit 23, e.g. a display device or an output connected to a display device temporally consistent superpixels and further information related to the displayed superpixels for a selected frame from the sequence of frames are displayed 12 to a user.
- the apparatus further comprises a user interface 24 for capturing 13 a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the selected superpixels.
- a segmentation mask generator 25 uses the selected one or more superpixels and the further information related to the selected superpixels a segmentation mask generator 25 generates 14 segmentation masks for the sequence of frames.
- the resulting segmentation masks are preferably stored on a local storage 26 or made available at an output 27.
- the superpixel unit 22, the segmentation mask generator 25, and the user interface 24 may likewise be fully or partially combined into a single unit or implemented as software running on a processor.
- the user interface 24 may likewise be fully or partially combined into a single unit or implemented as software running on a processor.
- the user interface 24 may likewise be fully or partially combined into a single unit or implemented as software running on a processor.
- interface 24 may be part of the display unit 23, e.g. in the form of a touch screen. Also, the input 21 and the output 27 can likewise form a single bi-directional interface.
- Another embodiment of an apparatus 30 configured to perform the method for generating segmentation masks for a sequence of frames based on temporally consistent superpixels is
- a processing device 31 comprises a processing device 31 and a memory device 32 storing instructions that, when executed, cause the apparatus to perform steps according to one of the described methods.
- the processing device 31 can be a processor adapted to perform the steps according to one of the described methods.
- said adaptation comprises that the processor is configured, e.g. programmed, to perform steps according to one of the described methods.
- a processor as used herein may include one or more processing units, such as microprocessors, digital signal processors, or combination thereof.
- the local storage and the memory device 32 may include volatile and/or non-volatile memory regions and storage devices such hard disk drives and DVD drives.
- a part of the memory is a non- transitory program storage device readable by the processing device 31, tangibly embodying a program of instructions
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
A method and an apparatus for generating segmentation masks for a sequence of frames based on temporally consistent superpixels are described. A sequence of frames is retrieved (10) via an input (21) A superpixel unit (22) obtains (11) temporally consistent superpixels for the sequence of frames. Via a display unit (23) temporally consistent superpixels and further information related to the displayed superpixels for a selected frame from the sequence of frames are displayed (12) to a user. A user interface (24) captures (13) a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the displayed superpixels. Using the selected one or more superpixels a segmentation mask generator (25) generates (14) segmentation masks for the sequence of frames.
Description
METHOD AND APPARATUS FOR INTERACTIVE VIDEO SEGMENTATION
FIELD The present solution relates to a method and an apparatus for interactive video segmentation. More specifically, a method and an apparatus for generating segmentation masks for a sequence of frames based on temporally consistent superpixels are described .
BACKGROUND
Video segmentation is complex and often time- and memory- consuming, especially for high-resolution images. Superpixel algorithms represent a very useful and increasingly popular preprocessing step for video segmentation, but also for a wide range of other computer vision applications, such as tracking, multi-view object segmentation, scene flow, 3D layout
estimation of indoor scenes, interactive scene modeling, image parsing, and semantic segmentation. Grouping similar pixels into so called superpixels leads to a major reduction of the image primitives. This results in an increased computational efficiency for subsequent processing steps, allows for more complex algorithms computationally infeasible on pixel level, and creates a spatial support for region-based features.
Temporally consistent superpixels, as described in [1], help to reduce the complexity.
SUMMARY
It is an object of the present solution to provide an efficient tool for interactive video segmentation based on temporally consistent superpixels.
According to one embodiment, a method for generating segmentation masks for a sequence of frames based on temporally consistent superpixels comprises:
- retrieving a sequence of frames;
- obtaining temporally consistent superpixels for the sequence of frames;
- displaying temporally consistent superpixels and further information related to the displayed superpixels for a selected frame from the sequence of frames to a user;
- capturing a user input selecting one or more of the displayed superpixels or modifying at least part of the further
information related to the selected superpixels; and
- generating segmentation masks for the sequence of frames using the selected one or more superpixels and the further information related to the selected superpixels.
Accordingly, a computer readable storage medium has stored therein instructions enabling generating segmentation masks for a sequence of frames based on temporally consistent
superpixels, which when executed by a computer, cause the computer to:
- retrieve a sequence of frames;
- obtain temporally consistent superpixels for the sequence of frames ;
- display temporally consistent superpixels and further
information related to the displayed superpixels for a selected frame from the sequence of frames to a user;
- capture a user input selecting one or more of the displayed superpixels or modifying at least part of the further
information related to the selected superpixels; and
- generate segmentation masks for the sequence of frames using the selected one or more superpixels and the further
information related to the selected superpixels.
Also, in one embodiment an apparatus configured to generate segmentation masks for a sequence of frames based on temporally consistent superpixels comprises:
- an input configured to retrieve a sequence of frames;
- a superpixel unit configured to obtain temporally consistent superpixels for the sequence of frames;
- a display unit configured to display temporally consistent superpixels and further information related to the displayed superpixels for a selected frame from the sequence of frames to a user;
- a user interface configured to capture a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the selected
superpixels; and
- a segmentation mask generator configured to generate
segmentation masks for the sequence of frames using the
selected one or more superpixels and the further information related to the selected superpixels. In another embodiment, an apparatus configured to generate segmentation masks for a sequence of frames based on temporally consistent superpixels comprises a processing device and a memory device having stored therein instructions, which, when executed by the processing device, cause the apparatus to:
- retrieve a sequence of frames;
- obtain temporally consistent superpixels for the sequence of frames ;
- display temporally consistent superpixels and further
information related to the displayed superpixels for a selected frame from the sequence of frames to a user;
- capture a user input selecting one or more of the displayed superpixels or modifying at least part of the further
information related to the selected superpixels; and
- generate segmentation masks for the sequence of frames using the selected one or more superpixels and the further
information related to the selected superpixels. Preferably, information on selected superpixels is provided in a superpixel table. This table gives an easily accessible overview on the selected superpixels to the user and can be used to manipulate the selection of superpixels. The proposed solution introduces a fast way to interactively segment video sequences and generate segmentation masks. The selection and tracking of regions in frame sequences is based on temporally consistent superpixels, which are obtained, for example, by applying a superpixel algorithm to the sequence of frames or by retrieving existing temporally consistent
superpixels provided for the sequence of frames. The region selection using the displayed superpixels is very intuitive and easy to handle by the user. The video segmentation process can be split into two steps, i.e. an automatic offline-processing (batch-processing) for superpixel generation and a real-time interactive video segmentation using these superpixels.
Segmentation masks for frames of the sequence of frames other than the selected frame are generated using label identifiers of the selected superpixels. In this way the temporal
consistency of the superpixels is used to propagate the
selected regions across the subsequent frames of the sequence.
In one embodiment, one or more start frames and end frames of the sequence of frames are set for a superpixel to limit tracking of the superpixel to selected ranges of frames. This allows the user to restrict tracking to a subsequence of the sequence of frames. In this way the user may accurately specify
which superpixel shall be considered at which point in time for generating a segmentation mask.
In one embodiment, user inputs to select a further superpixel for a frame of the sequence of frames other than the selected frame or to remove a selected superpixel are captured. Each further selected superpixel is added to the superpixel table with the start frame set to the current frame. Thereby, the solution allows the user to interactively refine the
tracked/propagated regions on frame level. Removing a
superpixel will completely remove a superpixel from tracking.
In one embodiment, user inputs to group two or more of the selected superpixels are captured. By grouping selected
superpixels it becomes possible to distinguish different regions during the generation of the segmentation masks.
Preferably, information on selected superpixels is stored in a file. This information can be used as input for subsequent processing steps and allows resuming the superpixel selection at a later time. Alternatively or in addition, the generated segmentation masks are made available via an output or stored, e.g. as image files. Also these segmentation masks can be used as input for subsequent processing steps.
For a better understanding the present solution shall now be explained in more detail in the following description with reference to the figures. It is understood that the solution is not limited to this exemplary embodiment and that specified features can also expediently be combined and/or modified without departing from the scope of the present solution as defined in the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 shows an example of a frame; Fig. 2 depicts a superpixel label map corresponding to the frame of Fig. 1 ;
Fig. 3 depicts the main elements of a graphical user interface of a video segmentation tool
Fig. 4 illustrates the GUI showing the first frame of a sequence with the superpixel boundaries as overlay; Fig. 5 depicts the GUI of Fig. 4 toggled to an original view;
Fig. 6 shows navigation and zoom buttons of the GUI; Fig. 7 shows a superpixel table with exemplary
superpixels ;
Fig. 8 depicts highlighted selected superpixels in a frame, whose end frame number is identical to the current frame number;
Fig. 9 illustrates grouping of superpixels;
Fig. 10 grouped superpixels in the superpixel
Fig. 11 depicts a segmentation mask resulting from two selected groups of superpixels;
Figs. 12 to 16 illustrate the selection of an object in a frame;
Fig. 17 shows a selected region after setting an end
frame ;
Fig. 18 depicts a group resulting from grouping the
superpixels of the selected region of Fig. 17;
Fig. 19 shows exemplary segmentation masks obtained for the group of Fig. 18;
Fig. 20 schematically illustrates one embodiment of a
method a for video segmentation; Fig. 21 shows a first embodiment of an apparatus
configured to implement the method of Fig. 20; and
Fig. 22 schematically illustrates a second embodiment of an apparatus configured to perform the method of
Fig. 20.
DETAILED DESCRIPTION OF PREFERED EMBODIMENTS In the following an exemplary implementation of the proposed solution shall be described. The implementation is an
interactive video segmentation tool programmed in Python with a Qt-graphical user interface (GUI) . The tool is both suited for computers with a mouse and a keyboard as well as for tablet computers with touchscreens using touch gestures instead of mouse clicks.
The implemented solution requires as input a frame sequence and a corresponding sequence of superpixel label maps. These superpixel label maps can be generated using, for example, the
algorithm described in [1], either beforehand in an independent superpixel generating step or upon reception of the frame sequence by the interactive video segmentation tool. Fig. 1 shows an example of a frame, whereas Fig. 2 depicts a
corresponding superpixel label map. The superpixel labels are coded by grey values.
Fig. 3 depicts the main elements of the GUI 1 of the tool. The largest part of the GUI 1 is occupied by a frame area 2.
Located above the frame area 2 is a button area 3 comprising a variety of buttons. On the right side of the frame area 2 there is a superpixel table 4, which shows information about selected superpixels . After loading the frame sequence and the corresponding
superpixel maps into the tool the frame area 2 shows the first frame of the sequence with the superpixel boundaries as
overlay. This is illustrated in Fig. 4. It is possible to toggle between the overlay view and an original view depicted in Fig. 5 by pressing a specific key on a keyboard or clicking a button in the GUI 1.
As can be seen in Fig. 6, the tool allows to playback, pause or go step by step through the sequence by clicking the
appropriate buttons 5 or using keyboard shortcuts. For the navigation through the sequence also a slider 6 that is below the navigation buttons 1 can be used. Furthermore, with the zoom buttons 7 it is possible to zoom in and out, bring the view to the original size of the frame again or fit it to the current window size.
After navigating to the right frame in the sequence the user can start with the interactive video segmentation. To segment an object the user just has to select the region of the object.
The selection of a frame region is based on the selection of superpixels. There are two ways to select superpixels. The first one is to left-click (click with the left mouse button) on a superpixel. Selected superpixels will be highlighted with the color white and added to a superpixel table on the right side of the tool. The second way is helpful for continuous selecting. Dragging the mouse with the left mouse button clicked over the superpixels continuously selects them. The selected superpixels will be highlighted with the color white and added to the superpixel table 4 on the right side after release of the mouse button.
If a wrong superpixel has been selected, it can be deselected. Deselecting superpixels works in a similar way as selecting them. The only difference is that it is additionally necessary to press the shift key and then left-click the superpixel to remove it from the selection. It also works for continuously deselecting superpixels in case the shift key is held pressed and the mouse is dragged holding the left mouse button down over the superpixels that should be deselected. Deselected superpixels will also be removed from the superpixel table 4 on the right side.
For a precise selection the user can zoom into the frame or toggle between the original view and the overlay view. For each selected superpixel, the group identifier, the label identifier and the start as well as the end frame are indicated in the superpixel table 4. Fig. 7 shows the superpixel table 4 with exemplary superpixels in more detail. It contains the following information about the selected superpixels:
- group number;
- label identifier of the superpixel;
- start and end frame numbers.
The label of the superpixel is an identifier for the temporally consistent superpixel. It is calculated, for example, using the unique RGB color of the superpixel in the superpixel label map.
The start and end frame number for a superpixel indicate the ( sub) sequence of frames, i.e. the time slot, for which the superpixel should be tracked. When selecting a superpixel, it is automatically added to the superpixel table 4 and its start and end frame numbers are set in the following way: start frame number is set to the current frame number; and - end frame number is set to the frame number of the last frame in the sequence.
To change the start frame number of a selected superpixel the user can simply navigate to a different frame in the sequence and left-click the superpixel. The new start frame number will be set. By holding the left mouse button down and dragging the mouse over selected superpixels the user can change the start frame numbers of multiple superpixels at once. Changing the end frame numbers works in the same way as
changing the start frame numbers, the only difference is that the user has to right-click the superpixel.
It is likewise possible to edit the start and end frame numbers directly in the superpixel table.
As a support for the user, selected superpixels in the frame, whose end frame number is identical to the current frame number, are highlighted using the unique label grey value of
- li the superpixels. An example is depicted in Fig. 8, where the highlighted superpixels are those in the hat of the mannequin visible in the area identified by the white rectangle. The label identifier of the superpixels is used to propagate the selected region across subsequent frames of the frame sequence. Thus in subsequent frames the superpixels with the same identifier are also selected. Stepping forward, using play, or the slider to navigate to a subsequent frame shows the propagation of the selected region. The start and end frame can be used to refine the selection in the subsequent frames.
Setting the end frame for a superpixel to frame number k
excludes it from tracking for the frames with frame number k + 1 and higher. Moreover, it is possible to add new superpixels in subsequent frames. This is done in the same way as the initial selection. With these two methods the user has the full control to refine the propagated region. A superpixel can have multiple time slots, each time slot having its own start and end frame number. Thus, it is not only possible to exclude a superpixel from the tracking at frame 7+1. It is also possible to
reinclude it in the tracking at frame j + l + l with Z >0. This is especially advantageous, for example, for superpixels that erroneously happen to switch from one object to another one and back. Using multiple time slots these tracking errors can be handled.
The video segmentation tool is not restricted to handle only one region. Different regions can be identified using the group number. The group number preferably is an integer value between 1 and 255. It is used, for example, during the generation of the segmentation masks to distinguish the different regions. By default, the group number is 1. If the user does not change the group number, all selected superpixels will have a grey value of 1 in the generated segmentation masks. If the user wants to
create segmentation masks with multiple separate regions, the group feature of the tool should be used. Figs. 9 to 11 show an example in which the hats of the two mannequins on the right are tracked and each region gets its own group number. Figs. 9 and 10 show the process of setting the group identifier for the hat of the mannequin in the middle. To create a group the user selects appropriate superpixels in the superpixel table 4 and clicks the xGroup' button below the superpixel table 4. As a visual help the superpixels selected in the table are
highlighted in light grey in the view, as visible in the area identified by the white rectangle. In a group dialog that appears when the xGroup' button is selected the user enters an integer value between 1 and 255. This group identifier can be used, for example, as a grey value in the segmentation mask. As visible in Fig. 10, subsequently the grouped superpixels are identified by their associated group number in the superpixel table 4. Fig. 11 depicts the segmentation mask for the
displayed frame resulting from the two selected groups. In order to remove all previously selected superpixels from the superpixel table 4 the user has the possibility to click a xReset' button below the superpixel table 4.
The segmentation tool provides the functionality to assess the propagation of the selected regions. Thereby, the navigation features (play, pause, step, and slider) play a central role. The user can (re-) play the complete sequence and pause at frames, in which the tracked regions need a further inspection, or simply step directly through the complete sequence. For the inspection, the zoom-feature as well as the switching of the views is helpful.
After reviewing and potentially refining the tracked regions, the user can export them as either text files or segmentation
masks, which are generated as grey-scale images. The generated text file will contain information about the selected
superpixels. For example, for each selected superpixel a new line with group number, label identifier, and start and end frame numbers for each time slot is added. A text file with one selected superpixel in two time slots would thus look as follows :
# group label startl endl start2 end2
100 77136081 2 9 14 19
The regions exported as text files can be loaded into the tool again. This is especially useful if the user wants to resume the segmentation work at a later point in time, share the work with others, or create multiple differing versions.
For exporting the selected regions, i.e. the selected
superpixels, as a sequence of segmentation masks, the user has to click the xImage' button below the superpixel table 4. In a dialogue that opens, the user then chooses an output directory, a bit depth for the grey-scale images (8 bit or 24 bit) and then clicks xStart' . After successful completion of the
processing the generated images are available in the output directory .
In the following a brief workflow example shall be described. A short sequence with 20 original frames and the corresponding superpixel label maps are used. In this example, the selection of the region should begin in the third frame and the tracking should stop in frame 17. After loading the project the view depicted in Fig. 4 appears. Using the navigation buttons (or the slider) the user navigates to the third frame. The user then selects the superpixels covering the dress of the
mannequin in the middle. The selection process is depicted in
Figs. 12 to 16. The selected superpixels are automatically added to the superpixel table. As the internal frame numbers start with 0, their start frame number is automatically set to 2 and their end frame number to the end of sequence, which in this example is 19.
Based on the superpixel labels, this selection is now
propagated across the subsequent frames until the end of the sequence. In order to control whether the selection is
correctly propagated, i.e. if the right superpixels are also selected in the subsequent frames, the user can navigate through the sequence. Thereby, it is possible to refine the selection as described further above. After the inspection the end frame number should be set as intended to stop the tracking of the selected superpixels in the frame 17. The end frame number is either set by right- clicking the superpixels in frame 17 or by directly editing their end frame number in the superpixel table. For a visual help, the superpixels, whose end frame number is equal to the frame number of the displayed frame, are highlighted using the unique label grey value. After setting the end frame to frame 17 the selected region looks as shown in Fig. 17. Once the selected region is correct over the frames, groups are created and unique numbers are set for different selected regions. In the present example, however, there is only one region. The user selects the lines with the superpixels belonging to a region in the superpixel table 4 and clicks the xGroup' button. In the present case these are all lines in the table. The user then enters a group number and clicks OK. The resulting group is depicted in Fig. 18.
When each region has a unique group number, the segmentation masks can be exported as described above. In the present case, the generated segmentation masks look like illustrated in
Fig. 19. In this figure not all segmentation masks are shown.
One embodiment of a method for generating segmentation masks for a sequence of frames based on temporally consistent
superpixels is schematically illustrated in Fig. 20. In a first step a sequence of frames is retrieved 10, e.g. from a network or from a local storage. Temporally consistent superpixels for the sequence of frames are then obtained 11, e.g. by applying a superpixel algorithm to the sequence of frames or by retrieving existing temporally consistent superpixels provided for the sequence of frames. Once the temporally consistent superpixels for the sequence of frames are available, the superpixels for a selected frame and further information related to the displayed superpixels are displayed 12 to a user. The method proceeds with capturing 13 a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the selected superpixels. Finally, using the selected one or more superpixels and the further
information related to the selected superpixels segmentation masks are generated 14 for the sequence of frames. Fig. 21 schematically illustrates one embodiment of an
apparatus 20 for generating segmentation masks for a sequence of frames based on temporally consistent superpixels. The apparatus 20 comprises an input 21 for retrieving 10 a sequence of frames, e.g. from a network or from a local storage. A superpixel unit 22 obtains 11 temporally consistent superpixels for the sequence of frames, e.g. by applying a superpixel algorithm to the sequence of frames or by retrieving existing temporally consistent superpixels provided for the sequence of frames. Via a display unit 23, e.g. a display device or an
output connected to a display device, temporally consistent superpixels and further information related to the displayed superpixels for a selected frame from the sequence of frames are displayed 12 to a user. The apparatus further comprises a user interface 24 for capturing 13 a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the selected superpixels. Using the selected one or more superpixels and the further information related to the selected superpixels a segmentation mask generator 25 generates 14 segmentation masks for the sequence of frames. The resulting segmentation masks are preferably stored on a local storage 26 or made available at an output 27. The superpixel unit 22, the segmentation mask generator 25, and the user interface 24 may likewise be fully or partially combined into a single unit or implemented as software running on a processor. In addition, the user
interface 24 may be part of the display unit 23, e.g. in the form of a touch screen. Also, the input 21 and the output 27 can likewise form a single bi-directional interface.
Another embodiment of an apparatus 30 configured to perform the method for generating segmentation masks for a sequence of frames based on temporally consistent superpixels is
schematically illustrated in Fig. 22. The apparatus 30
comprises a processing device 31 and a memory device 32 storing instructions that, when executed, cause the apparatus to perform steps according to one of the described methods.
For example, the processing device 31 can be a processor adapted to perform the steps according to one of the described methods. In an embodiment said adaptation comprises that the processor is configured, e.g. programmed, to perform steps according to one of the described methods.
A processor as used herein may include one or more processing units, such as microprocessors, digital signal processors, or combination thereof. The local storage and the memory device 32 may include volatile and/or non-volatile memory regions and storage devices such hard disk drives and DVD drives. A part of the memory is a non- transitory program storage device readable by the processing device 31, tangibly embodying a program of instructions
executable by the processing device 31 to perform program steps as described herein according to the present principles.
References
[1] M. Reso et al . : "Temporally Consistent Superpixels International Conference on Computer Vision (ICCV) pp. 385-392.
Claims
A method for generating segmentation masks for a sequence of frames based on temporally consistent superpixels, the method comprising:
- retrieving (10) a sequence of frames;
- obtaining (11) temporally consistent superpixels for the sequence of frames;
- displaying (12) temporally consistent superpixels and further information related to the displayed superpixels for a selected frame from the sequence of frames to a user;
- capturing (13) a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the selected superpixels; and
- generating (14) segmentation masks for the sequence of frames using the selected one or more superpixels and the further information related to the selected superpixels.
The method according to claim 1, further comprising
providing information on selected superpixels in a
superpixel table (4).
The method according to claim 1 or 2, wherein segmentation masks for frames of the sequence of frames other than the selected frame are generated using label identifiers of the selected superpixels.
The method according to one of claims 1 to 3, further
comprising setting one or more start frames and end frames in the sequence of frames for a superpixel to limit tracking of the superpixel to selected ranges of frames.
5. The method according to one of the preceding claims, further comprising capturing a user input to select a further
superpixel for a frame of the sequence of frames other than the selected frame or to remove a selected superpixel.
6. The method according to one of the preceding claims, wherein the temporally consistent superpixels for the sequence of frames are retrieved (11) by applying a superpixel algorithm to the sequence of frames or by retrieving existing
temporally consistent superpixels provided for the sequence of frames.
The method according to one of the preceding c 1aims, further comprising capturing a user input to group two or more of the selected superpixels.
8. The method according to one of the preceding claims, further comprising storing information on selected superpixels in a file and/or storing the generated segmentation masks as image files.
9. A computer readable storage medium having stored therein
instructions enabling generating segmentation masks for a sequence of frames based on temporally consistent
superpixels, which when executed by a computer, cause the computer to:
- retrieve (10) a sequence of frames;
- obtain (11) temporally consistent superpixels for the sequence of frames;
- display (12) temporally consistent superpixels and further information related to the displayed superpixels for a selected frame from the sequence of frames to a user;
- capture (13) a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the selected superpixels; and
- generate (14) segmentation masks for the sequence of
frames using the selected one or more superpixels and the further information related to the selected superpixels.
10. An apparatus (20) configured to generate segmentation masks for a sequence of frames based on temporally consistent superpixels, wherein the apparatus (20) comprises:
- an input (21) configured to retrieve (10) a sequence of frames ;
- a superpixel unit (22) configured to obtain (11)
temporally consistent superpixels for the sequence of frames ;
- a display unit (23) configured to display (12) temporally consistent superpixels and further information related to the displayed superpixels for a selected frame from the sequence of frames to a user;
- a user interface (24) configured to capture (13) a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the selected superpixels; and
- a segmentation mask generator (25) configured to generate
(14) segmentation masks for the sequence of frames using the selected one or more superpixels and the further information related to the selected superpixels.
11. An apparatus (30) configured to generate segmentation masks for a sequence of frames based on temporally consistent superpixels, the apparatus (30) comprising a processing device (31) and a memory device (32) having stored therein instructions, which, when executed by the processing device (31), cause the apparatus (30) to:
- retrieve (10) a sequence of frames;
- obtain (11) temporally consistent superpixels for the sequence of frames;
- display (12) temporally consistent superpixels and further
information related to the displayed superpixels for a selected frame from the sequence of frames to a user;
- capture (13) a user input selecting one or more of the displayed superpixels or modifying at least part of the further information related to the selected superpixels; and
- generate (14) segmentation masks for the sequence of frames using the selected one or more superpixels and the further information related to the selected superpixels.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP14306224.8 | 2014-07-31 | ||
EP14306224 | 2014-07-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016016033A1 true WO2016016033A1 (en) | 2016-02-04 |
Family
ID=51301230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2015/066540 WO2016016033A1 (en) | 2014-07-31 | 2015-07-20 | Method and apparatus for interactive video segmentation |
Country Status (2)
Country | Link |
---|---|
TW (1) | TW201610916A (en) |
WO (1) | WO2016016033A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033944A (en) * | 2018-06-07 | 2018-12-18 | 西安电子科技大学 | A kind of all-sky aurora image classification and crucial partial structurtes localization method and system |
CN109919159A (en) * | 2019-01-22 | 2019-06-21 | 西安电子科技大学 | A kind of semantic segmentation optimization method and device for edge image |
CN110096961A (en) * | 2019-04-04 | 2019-08-06 | 北京工业大学 | A kind of indoor scene semanteme marking method of super-pixel rank |
CN111199547A (en) * | 2018-11-20 | 2020-05-26 | Tcl集团股份有限公司 | Image segmentation method and device and terminal equipment |
CN112801068A (en) * | 2021-04-14 | 2021-05-14 | 广东众聚人工智能科技有限公司 | Video multi-target tracking and segmenting system and method |
WO2022252366A1 (en) * | 2021-06-02 | 2022-12-08 | 中国科学院分子植物科学卓越创新中心 | Processing method and apparatus for whole-spike image |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109063723B (en) * | 2018-06-11 | 2020-04-28 | 清华大学 | Weak supervision image semantic segmentation method based on common features of iteratively mined objects |
-
2015
- 2015-07-20 WO PCT/EP2015/066540 patent/WO2016016033A1/en active Application Filing
- 2015-07-24 TW TW104123963A patent/TW201610916A/en unknown
Non-Patent Citations (5)
Title |
---|
"Using Using Adobe Photoshop CS4 for Windows and Mac OS", 1 October 2010, ADOBE SYSTEMS INCORPORATED, article "Using Using Adobe Photoshop CS4 for Windows and Mac OS - Chapters 2 and 18", XP055212771 * |
DONDERA RADU ET AL: "Interactive video segmentation using occlusion boundaries and temporally coherent superpixels", IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, IEEE, 24 March 2014 (2014-03-24), pages 784 - 791, XP032609947, DOI: 10.1109/WACV.2014.6836023 * |
JOHANNES FURCH ET AL: "D4.3.2 Hybrid Scene Analysis Algorithms", 30 October 2013 (2013-10-30), pages 1 - 60, XP055176149, Retrieved from the Internet <URL:http://3d-scene.eu/outcomes.htm> [retrieved on 20150312] * |
LIU Z ET AL: "Semi-automatic video object segmentation using seeded region merging and bidirectional projection", PATTERN RECOGNITION LETTERS, ELSEVIER, AMSTERDAM, NL, vol. 26, no. 5, 1 April 2005 (2005-04-01), pages 653 - 662, XP025292474, ISSN: 0167-8655, [retrieved on 20050401], DOI: 10.1016/J.PATREC.2004.09.017 * |
M. RESO ET AL.: "Temporally Consistent Superpixels", INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV, 2013, pages 385 - 392, XP032572909, DOI: doi:10.1109/ICCV.2013.55 |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033944A (en) * | 2018-06-07 | 2018-12-18 | 西安电子科技大学 | A kind of all-sky aurora image classification and crucial partial structurtes localization method and system |
CN109033944B (en) * | 2018-06-07 | 2021-09-24 | 西安电子科技大学 | Method and system for classifying all-sky aurora images and positioning key local structure |
CN111199547A (en) * | 2018-11-20 | 2020-05-26 | Tcl集团股份有限公司 | Image segmentation method and device and terminal equipment |
CN111199547B (en) * | 2018-11-20 | 2024-01-23 | Tcl科技集团股份有限公司 | Image segmentation method and device and terminal equipment |
CN109919159A (en) * | 2019-01-22 | 2019-06-21 | 西安电子科技大学 | A kind of semantic segmentation optimization method and device for edge image |
CN110096961A (en) * | 2019-04-04 | 2019-08-06 | 北京工业大学 | A kind of indoor scene semanteme marking method of super-pixel rank |
CN110096961B (en) * | 2019-04-04 | 2021-03-02 | 北京工业大学 | Indoor scene semantic annotation method at super-pixel level |
CN112801068A (en) * | 2021-04-14 | 2021-05-14 | 广东众聚人工智能科技有限公司 | Video multi-target tracking and segmenting system and method |
WO2022252366A1 (en) * | 2021-06-02 | 2022-12-08 | 中国科学院分子植物科学卓越创新中心 | Processing method and apparatus for whole-spike image |
Also Published As
Publication number | Publication date |
---|---|
TW201610916A (en) | 2016-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016016033A1 (en) | Method and apparatus for interactive video segmentation | |
US9530195B2 (en) | Interactive refocusing of electronic images | |
US10515143B2 (en) | Web-based system for capturing and sharing instructional material for a software application | |
US8874525B2 (en) | Hierarchical display and navigation of document revision histories | |
US8533595B2 (en) | Hierarchical display and navigation of document revision histories | |
US8533593B2 (en) | Hierarchical display and navigation of document revision histories | |
US11074940B2 (en) | Interface apparatus and recording apparatus | |
US11317028B2 (en) | Capture and display device | |
US20120272151A1 (en) | Hierarchical display and navigation of document revision histories | |
KR101528312B1 (en) | Method for editing video and apparatus therefor | |
US9639330B2 (en) | Programming interface | |
KR20140098009A (en) | Method and system for creating a context based camera collage | |
US20140210944A1 (en) | Method and apparatus for converting 2d video to 3d video | |
Tang et al. | GrabAR: Occlusion-aware grabbing virtual objects in AR | |
US7596764B2 (en) | Multidimensional image data processing | |
US11514651B2 (en) | Utilizing augmented reality to virtually trace cables | |
US11003467B2 (en) | Visual history for content state changes | |
CN111970560A (en) | Video acquisition method and device, electronic equipment and storage medium | |
JP3907344B2 (en) | Movie anchor setting device | |
JP2001111957A (en) | Interactive processing method for video sequence, its storage medium and system | |
CN114025237B (en) | Video generation method and device and electronic equipment | |
US11557065B2 (en) | Automatic segmentation for screen-based tutorials using AR image anchors | |
US20170069354A1 (en) | Method, system and apparatus for generating a position marker in video images | |
JP2020101922A (en) | Image processing apparatus, image processing method and program | |
EP2816458A1 (en) | Method and apparatus for controlling example-based image manipulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15736850 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15736850 Country of ref document: EP Kind code of ref document: A1 |