CN110189333B

CN110189333B - Semi-automatic marking method and device for semantic segmentation of picture

Info

Publication number: CN110189333B
Application number: CN201910430851.2A
Authority: CN
Inventors: 杨文龙; P·尼古拉斯
Original assignee: Hubei Ecarx Technology Co Ltd
Current assignee: Ecarx Hubei Tech Co Ltd
Priority date: 2019-05-22
Filing date: 2019-05-22
Publication date: 2022-03-15
Anticipated expiration: 2039-05-22
Also published as: CN110189333A

Abstract

The invention provides a semi-automatic labeling method and a semi-automatic labeling device for semantic segmentation of pictures, wherein the method comprises the steps of obtaining a group of images comprising a plurality of frames of pictures, and selecting a first frame of picture from the obtained images; semantically dividing a first frame of picture into a plurality of image blocks, and receiving class marking operation of marking personnel on each image block in the first frame of picture; taking a first frame of picture as an initial picture, taking a next frame of picture adjacent to the initial picture in the multiple frames of pictures as a tracked picture, tracking an image area corresponding to the image block with the labeled category in the tracked picture by adopting a preset tracking algorithm, and labeling the tracked image area as a corresponding category; and taking the currently labeled tracked picture as a new initial picture, and continuously tracking and labeling the tracked picture adjacent to the new initial picture by adopting a preset tracking algorithm until all the multiple pictures are labeled. The invention can effectively improve the marking efficiency and reduce the manual marking cost.

Description

Semi-automatic marking method and device for semantic segmentation of picture

Technical Field

The invention relates to the technical field of image processing, in particular to a semi-automatic annotation method and device for semantic segmentation of a picture.

Background

At present, target detection and scene semantic segmentation based on deep learning become main visual perception methods in the field of automatic driving. The semantic segmentation can segment different types of contents such as road surfaces, automobiles, lane lines and the like in the image aiming at the input image, and the essence of the semantic segmentation is the classification aiming at each pixel point in the image.

No matter the target detection based on deep learning or the scene semantic segmentation, a large amount of image and image labeling operations are required, especially the scene semantic segmentation, and the current popular segmentation mode is a polygon segmentation mode, but the segmentation mode is time-consuming and labor-consuming, has huge labor cost, has an unsatisfactory labeling effect, and is easy to cause the problems of rough edges of segmented objects and the like.

Disclosure of Invention

In view of the above problems, the present invention is proposed to provide a semi-automatic labeling method and device for semantic segmentation of pictures, which overcome the above problems or at least partially solve the above problems.

According to one aspect of the invention, a semi-automatic labeling method for semantic segmentation of pictures is provided, which comprises the following steps:

acquiring a group of images containing multiple frames of pictures, and selecting a first frame of picture from the acquired images;

semantically dividing the first frame of picture into a plurality of image blocks, and receiving class marking operation of marking personnel on each image block in the first frame of picture;

taking the first frame of picture as an initial picture, taking the next frame of picture adjacent to the initial picture in the multiple frames of pictures as a tracked picture, tracking an image area corresponding to the image block with the labeled category in the tracked picture by adopting a preset tracking algorithm, and labeling the tracked image area as a corresponding category;

and taking the currently labeled tracked picture as a new initial picture, and continuously tracking and labeling the tracked picture adjacent to the new initial picture by adopting a preset tracking algorithm until all the plurality of pictures are labeled.

Optionally, the method further comprises:

after the tracked picture is labeled at any time, judging whether a labeling error exists in a labeling result of the tracked picture;

if so, displaying a plurality of modification mode options for marking and modifying the tracked picture;

and receiving any option selected by the annotation personnel, and carrying out annotation modification on the tracked picture according to a modification mode corresponding to the selected option.

Optionally, the tracked picture is subjected to multiple modification mode options of annotation modification, including:

carrying out manual labeling modification on the tracked picture;

and an option of performing retracing annotation on the tracked picture.

Optionally, semantically dividing the first frame of picture into a plurality of image blocks, and receiving a class tagging operation of a tagging person on each image block in the first frame of picture, the method includes:

semantically dividing the first frame of picture into a plurality of polygonal blocks as the image blocks;

and receiving the category labeling operation of a labeling person on any polygonal block in the first frame of picture.

semantically dividing the first frame of picture into a plurality of superpixel blocks as the image blocks;

and receiving the class marking operation of a marking person on any super pixel block in the first frame of picture.

Optionally, the method further comprises: merging the super-pixel blocks marked with the same category in the first frame of picture into a combined block;

tracking an image area corresponding to the image block with the labeled category in the tracked picture by adopting a preset tracking algorithm, and labeling the tracked image area into the corresponding category comprises the following steps:

and tracking an image area corresponding to the combined block with the labeled category in the tracked picture by adopting a preset tracking algorithm, and labeling the tracked image area as the corresponding category.

Optionally, the method further comprises: selecting a typical superpixel block from the superpixel blocks of each annotation category in the first frame picture, wherein the periphery of the typical superpixel block comprises a plurality of superpixel blocks belonging to the same annotation category;

tracking an image area corresponding to the typical superpixel block in the tracked picture by adopting a preset tracking algorithm, and marking the tracked image area as a corresponding category;

and receiving continuous selection operation of marking personnel by taking the typical superpixel block marked with the category as a reference, and marking the superpixel block corresponding to the continuous selection operation as the category the same as the category of the typical superpixel block.

Optionally, after labeling the tracked image regions as corresponding categories, the method further includes:

and receiving continuous selection operation of marking personnel by taking any super-pixel block after the category is marked as a reference, and marking the super-pixel block corresponding to the continuous selection operation as the category the same as the category of the reference super-pixel block.

Optionally, the method further comprises:

if an untracked image region still exists in the tracked picture, clustering the untracked image region and the image block marked with the category in the tracked picture according to a preset clustering algorithm;

and carrying out category marking on the untracked image area according to the category of the image block which belongs to the same cluster with the untracked image area.

According to another aspect of the present invention, there is also provided a semi-automatic labeling apparatus for semantic segmentation of pictures, comprising:

the selection module is suitable for sequentially selecting multiple frames of pictures from a pre-collected video according to the frame number interval of a specified picture, and acquiring a first frame of picture in the multiple frames of pictures;

the segmentation labeling module is suitable for semantically segmenting the first frame of picture into a plurality of image blocks and receiving class labeling operation of labeling personnel on each image block in the first frame of picture;

the tracking and labeling module is suitable for taking the first frame of picture as an initial picture, taking the next frame of picture adjacent to the initial picture in the multiple frames of pictures as a tracked picture, tracking an image area corresponding to the image block with the labeled category in the tracked picture by adopting a preset tracking algorithm, and labeling the tracked image area as a corresponding category;

and the tracking and labeling module is also suitable for taking the currently labeled tracked picture as a new initial picture, and continuously tracking and labeling the tracked picture adjacent to the new initial picture by adopting a preset tracking algorithm until all the multiple frames of pictures are labeled.

According to another aspect of the present invention, there is also provided a computer storage medium storing computer program code, which when run on a computing device, causes the computing device to execute the semi-automatic annotation method for semantic segmentation of pictures according to any of the above embodiments.

In the embodiment of the invention, a first frame picture is selected from a group of acquired images, after class labeling is carried out on each image block in the first frame picture by a labeling person, the first frame picture is used as an initial picture, and the next frame picture adjacent to the initial picture in a plurality of frames of pictures is used as a tracked picture, so that an image area corresponding to the image block with the labeled class is tracked in the tracked picture by adopting a preset tracking algorithm, and the tracked image area is labeled as the corresponding class. And then, taking the currently labeled tracked picture as a new initial picture, and continuing to perform tracking labeling on the tracked picture adjacent to the new initial picture by adopting a preset tracking algorithm until all the multiple pictures are labeled. Therefore, the embodiment of the invention can realize semi-automatic semantic segmentation and labeling of the pictures by adopting the preset tracking algorithm based on the pictures with labeled categories for the multiple frames of pictures in the continuous video, thereby greatly improving the labeling efficiency of the pictures and ensuring the accuracy of the labeling of the pictures. Particularly, for two adjacent front and back pictures or a plurality of front and back pictures with similar scenes in the pictures to be labeled which are sampled in sequence in the video, the scheme of the invention greatly saves the repeated or similar labor labeling cost.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow chart of a semi-automatic annotation method for semantic segmentation of a picture according to an embodiment of the invention;

FIG. 2 is a schematic structural diagram illustrating a semi-automatic labeling apparatus for semantic segmentation of pictures according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a semi-automatic labeling apparatus for semantic segmentation of pictures according to another embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a semi-automatic labeling apparatus for semantic segmentation of pictures according to still another embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a semi-automatic labeling apparatus for semantic segmentation of pictures according to another embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Usually, when marking actually collected pictures, a continuous video is collected firstly, then pictures with certain differences are selected from the continuous video at certain frame number intervals for marking, although the videos are sampled at certain intervals, the frame frequency of a camera is usually fast, and the images collected by the same camera in a certain time under the same scene generally have no great difference, so that the pictures cleaned by sampling at intervals in the same section of video still have great similarity, which undoubtedly brings great repetitive work to the picture marking work. In order to effectively improve the image annotation efficiency, the embodiment of the invention provides a semi-automatic annotation method for semantic segmentation of an image. FIG. 1 is a flowchart illustrating a semi-automatic annotation method for semantic segmentation of pictures according to an embodiment of the invention. Referring to fig. 1, the method includes at least steps S102 to S108.

Step S102, a group of images containing multiple frames of pictures are obtained, and a first frame of picture is selected from the obtained images.

And step S104, semantically dividing the first frame of picture into a plurality of image blocks, and receiving class marking operation of a marking person on each image block in the first frame of picture.

And step S106, taking the first frame of picture as an initial picture, taking the next frame of picture adjacent to the initial picture in the multiple frames of pictures as a tracked picture, tracking an image area corresponding to the image block with the labeled category in the tracked picture by adopting a preset tracking algorithm, and labeling the tracked image area as the corresponding category.

And S108, taking the currently labeled tracked picture as a new initial picture, and continuing to perform tracking labeling on the tracked picture adjacent to the new initial picture by adopting a preset tracking algorithm until all the multiple pictures are labeled.

For multiple frames of pictures in a continuous video, the first frame of picture is taken as an initial picture, the next frame of picture adjacent to the initial picture is taken as a tracked picture, and the tracked picture is subjected to category labeling by tracking an image area corresponding to an image block with a labeled category in the tracked picture. Then, the tracked picture which is just marked is used as a new initial picture, the next frame picture which is adjacent to the new initial picture is used as a new tracked picture, an image area which corresponds to an image block with a marked category in the new initial picture is tracked in the new tracked picture, category marking is carried out on the tracked image area, and the operation is continuously carried out until all the multiple frames of pictures are marked, so that semi-automatic semantic segmentation marking of the pictures is carried out by adopting a preset tracking algorithm based on the pictures with the marked categories, the marking efficiency of the pictures is greatly improved, and the marking accuracy of the pictures is also ensured. Particularly, for adjacent pictures with similar scenes in the pictures sampled in sequence in the video, the scheme of the invention can greatly save the repeated or similar labor labeling cost.

Referring to step S102, in an embodiment of the present invention, the acquired group of images including the multi-frame pictures may be a combination of multi-frame pictures sequentially selected from a pre-acquired continuous segment of video according to a specified picture frame number interval. The number of the designated picture frames can be any number of frames such as three frames, five frames and the like, and the purpose is to select pictures with certain difference in the video and carry out corresponding marking. It should be noted that the first frame picture is not necessarily the first frame picture of the complete video, but the first picture is arranged in video time in a plurality of frame pictures sampled and selected according to the specified frame number interval of the pictures.

In an embodiment of the present invention, after the tracked picture is labeled at any time, it may be further determined whether a labeling error exists in a labeling result of the currently labeled picture. If the labeling error exists, various modification mode options for performing labeling modification on the tracked picture can be displayed. And then, by receiving any option selected by the annotation personnel, carrying out annotation modification on the tracked picture according to the modification mode corresponding to the selected option.

In this embodiment, the multiple modification options for performing annotation modification on the tracked picture may include an option for performing manual annotation modification on the tracked picture, and an option for performing retracing annotation on the tracked picture. The annotation personnel can correspondingly select according to the requirements, if the annotation result has fewer annotation errors, the manual annotation modification can be selected, and if the annotation result has more annotation errors, the tracked picture can be selected to be retraced and annotated, so that the workload of annotation modification of the annotation personnel is reduced.

Of course, if it is known through the judgment that there is no annotation error in the annotation result, if there is still a picture which is not tracked and annotated in the group of acquired images, the tracking and annotation of the next frame of picture is continued, and if all the pictures in the group of acquired images are annotated, the final annotation result can be displayed.

After all the multiple frames of pictures in the video collected in advance are marked, the marked results can be displayed to marking personnel, the marking personnel can check the marked results, if the marking is detected to be correct, the marking is finished, if a small number of marking errors exist, the marking personnel can correct the error parts, and the pictures do not need to be marked repeatedly.

Specifically, after all the multiple frames of pictures in the pre-collected video are labeled, the labeling result of the tracking picture is displayed, prompt information for judging whether the labeling result is modified or not is sent out, and if a modification instruction of a labeling person on the labeling result is received, corresponding modification operation is carried out on the labeling result according to the modification instruction.

Referring to step S104 above, in an embodiment of the present invention, the first frame of picture semantics may be divided into a plurality of polygonal blocks as image blocks, that is, the picture semantics may be divided into a plurality of polygonal areas, and each polygonal area is defined as one polygonal block and corresponds to one image block. Furthermore, the class labeling operation of a labeling person on any polygonal block in the first frame picture can be received.

In another embodiment of the present invention, the semantic division of the first frame of picture into a plurality of super pixel blocks as image blocks may be performed, that is, the semantic division of the picture into a plurality of super pixel blocks, each of which serves as an image block. Furthermore, the method can receive the class marking operation of a marking person on any super pixel block in the first frame picture.

Of course, other division methods may be adopted in the embodiment of the present invention to divide the first frame picture into a plurality of sub-units as the image blocks, which is not specifically limited in the embodiment of the present invention.

The following describes a process of labeling a category of each image block in the first frame picture by taking a super pixel block as an image block as an example.

In order to improve the labeling efficiency, a labeling person can label only a small part of the superpixel blocks in the picture in the process of labeling the superpixel blocks of the picture, and realize the unified labeling of other superpixel blocks by adopting a continuous selection operation mode. Specifically, firstly, receiving class marking operation of a marking person on any super pixel block, and distributing a corresponding marking class for the any super pixel block. And then, marking the super-pixel blocks corresponding to the continuous selection operation as the same categories as the basic super-pixel blocks by the marking personnel through the continuous selection operation with the super-pixel blocks with the marked categories as the basic references. And when the super pixel blocks corresponding to the continuous selection operation are monitored to form a closed loop, the super pixel blocks in the closed loop can be marked as the same type as the super pixel blocks corresponding to the selection operation.

In addition, the operation of adjusting the size of the super-pixel threshold in the super-pixel block by a marking person can be received, so that the super-pixel threshold meets the marking requirement of the marking person. For example, if the mouse wheel is rotated from bottom to top, the super pixel threshold is adjusted from large to small, and if the mouse wheel is rotated from top to bottom, the super pixel threshold is adjusted from small to large. For another example, if the mouse wheel is rotated from top to bottom, the super pixel threshold is adjusted from large to small, and if the mouse wheel is rotated from bottom to top, the super pixel threshold is adjusted from small to large.

In this embodiment, when adjusting the threshold size of the superpixel, if the overlapping area of the area occupied by a superpixel block and the area occupied by the superpixel block of the labeled category is greater than 1/2 of the area occupied by the superpixel block, the superpixel block is labeled as the same category as the superpixel block category of the overlapping area.

With reference to the above step S106 and the above embodiment, based on the labeling manner of the superpixel blocks, if all the superpixel blocks in the tracked picture are selected for tracking, the number of the superpixel blocks to be tracked is large, which results in problems of large algorithm computation amount, slow computer operation speed, and the like.

Firstly, before tracking and labeling the tracked picture, the superpixel blocks with the same labeling type in the first frame picture are merged into a combined block, namely the superpixel blocks with the labeled type are merged into a plurality of irregular large combined blocks according to the same type. Furthermore, a preset tracking algorithm can be subsequently adopted to track the image area corresponding to the combined block with the labeled category in the tracked picture, and the tracked image area is labeled as the corresponding category.

And secondly, before tracking and labeling the tracked picture, screening out a typical superpixel block from the superpixel blocks of each labeling type in the first frame picture as a tracking object. And then, tracking an image area corresponding to the typical super-pixel block in the tracked picture by adopting a preset tracking algorithm, and marking the tracked image area as a corresponding category.

In the screening of the typical superpixel block, a clustering method may be used to select a typical pixel block of the same category, for example, if most or all of the superpixel blocks around a certain superpixel block belong to the same category, the superpixel block may be selected as the typical superpixel block. In addition, in this embodiment, when a typical super pixel block is screened out, one typical super pixel block may be selected from super pixel block regions connected with the same class, and certainly, a plurality of typical super pixels may also be selected.

In this method, since only the typical super pixel block is tracked, after the tracked image region is labeled as the corresponding category, the remaining super pixel blocks can be also labeled as the categories by the labeling person. For example, a continuous selection operation is received by the annotating personnel based on the typical superpixel block after the class is annotated, and the superpixel block corresponding to the continuous selection operation is annotated as the same class as the typical superpixel block. Of course, when it is monitored that the super-pixel blocks corresponding to the continuous selection operation form a closed loop, the super-pixel blocks within the closed loop may also be labeled as the same category as the typical super-pixel block.

In the embodiment of the present invention, the preset tracking algorithm may adopt an existing tracking algorithm, and the embodiment of the present invention is not specifically limited in this respect. For example, the image blocks obtained by segmentation are super pixel blocks, and super pixel block tracking can be realized by constructing an absorption state markov chain graph (AMC graph). For another example, the image block obtained by the segmentation is a superpixel block, and the superpixel block tracking can be realized by steps of establishing a decision appearance model (i.e., an appearance model capable of representing and distinguishing a target and a background), obtaining a region confidence map (i.e., a confidence map obtained according to confidence values of all superpixels in a frame of picture), establishing an object tracker observation model, and the like.

In an embodiment of the present invention, if an image area corresponding to an image block with a label type cannot be tracked in a current tracked picture or tracking fails due to a large change in an image scene, an object position, or an object size in a picture, clustering may be performed on the tracked picture of the current frame again, and a type of a labeled image block (an image block which is tracked successfully) closest to the image area which cannot be tracked is directly applied to the image area which cannot be tracked.

Specifically, if an untracked image region still exists in the tracked picture, the untracked image region and the image block marked with the category in the tracked picture can be clustered according to a preset clustering algorithm, and then the untracked image region is subjected to category marking according to the category of the image block which belongs to the same cluster as the untracked image region. In this embodiment, the preset clustering algorithm may include a k-means method, and of course, other clustering algorithms may also be adopted, which is not specifically limited in this embodiment of the present invention.

Based on the same inventive concept, the embodiment of the invention also provides a semi-automatic labeling device for semantic segmentation of pictures, and fig. 2 shows a schematic structural diagram of the semi-automatic labeling device for semantic segmentation of pictures according to an embodiment of the invention. Referring to fig. 2, the semi-automatic annotation device 200 for semantic segmentation of pictures comprises a selecting module 210, a segmentation annotation module 220 and a tracking annotation module 230.

The functions of the components or devices of the semi-automatic image semantic segmentation labeling device 200 and the connection relationship between the components will be described:

the selecting module 210 is adapted to acquire a group of images including multiple frames of pictures, and select a first frame of picture from the acquired images;

the segmentation labeling module 220 is coupled with the selection module 210 and is adapted to semantically segment the first frame of picture into a plurality of image blocks and receive class labeling operation of a labeling person on each image block in the first frame of picture;

the tracking and labeling module 230, coupled to the segmentation and labeling module 220, is adapted to use a first frame of picture as an initial picture, use a next frame of picture adjacent to the initial picture in the multiple frames of pictures as a tracked picture, track an image area corresponding to the image block with a labeled category in the tracked picture by using a preset tracking algorithm, and label the tracked image area as a corresponding category;

the tracking and labeling module 230 is further adapted to use the currently labeled tracked picture as a new initial picture, and continue to perform tracking and labeling on the tracked picture adjacent to the new initial picture by using a preset tracking algorithm until all the multiple pictures are labeled.

In an embodiment of the present invention, the segmentation labeling module 220 is further adapted to semantically segment the first frame of picture into a plurality of polygonal blocks as image blocks, and receive a class labeling operation performed by a labeling person on any polygonal block in the first frame of picture.

In another embodiment of the present invention, the segmentation labeling module 220 is further adapted to semantically segment the first frame of picture into a plurality of super-pixel blocks as image blocks, and receive a class labeling operation performed by a labeling person on any super-pixel block in the first frame of picture.

If the segmentation labeling module 220 semantically segments the first frame of picture into a plurality of super-pixel blocks as image blocks, in an embodiment, referring to fig. 3, the semi-automatic labeling device for picture semantic segmentation 200 may further include a combining module 240, respectively coupled to the segmentation labeling module 220 and the tracking labeling module 230, wherein the combining module 240 is adapted to combine the super-pixel blocks with the same labeling category in the first frame of picture into a combined block. The tracking and labeling module 230 is further adapted to track an image region corresponding to the combined block with a labeled category in the tracked picture by using a preset tracking algorithm, and label the tracked image region as the corresponding category.

If the segmentation labeling module 220 semantically segments the first frame of picture into a plurality of superpixel blocks as image blocks, in another embodiment, referring to fig. 4, the semi-automatic labeling device for picture semantic segmentation 200 may further include a selection module 250, respectively coupled to the segmentation labeling module 220 and the tracking labeling module 230, wherein the selection module 250 is adapted to select a typical superpixel block from the superpixel blocks of each labeling category in the first frame of picture, and the periphery of the typical superpixel block includes a plurality of superpixel blocks belonging to the same labeling category. The tracking labeling module 230 is further adapted to track an image region corresponding to a typical super pixel block in the tracked picture by using a preset tracking algorithm, and label the tracked image region as a corresponding category.

With continued reference to fig. 4, in this embodiment, the semi-automatic image semantic segmentation labeling apparatus 200 may further include a continuous selection module 260, which is adapted to receive a continuous selection operation based on the typical superpixel block labeled by the labeling person after the tracking labeling module 230 labels the tracked image region as the corresponding category, and label the superpixel block corresponding to the continuous selection operation as the same category as the typical superpixel block category.

The embodiment of the invention also provides another semi-automatic marking device for semantic segmentation of pictures, and fig. 5 shows a schematic structural diagram of the semi-automatic marking device for semantic segmentation of pictures according to one embodiment of the invention. Referring to fig. 5, the semi-automatic annotation device 200 for semantic segmentation of pictures includes a selecting module 210, a segmentation annotation module 220, a tracking annotation module 230, a determining module 271, a displaying module 270, a modifying module 280, and a clustering module 290. For the introduction of the selecting module 210, the segmentation labeling module 220, and the tracking labeling module 230, please refer to the above embodiments.

The determining module 271, coupled to the tracking labeling module 230, is adapted to determine whether there is a labeling error in the labeling result of the tracked picture after any labeling of the tracked picture.

The display module 270, coupled to the determining module 271, is adapted to display a plurality of modification mode options for performing annotation modification on the tracked picture if there is an annotation error in the annotation result of the tracked picture.

The method comprises the following steps of selecting a plurality of modification modes for carrying out annotation modification on a tracked picture, wherein the modification modes comprise an option for carrying out manual annotation modification on the tracked picture and an option for carrying out retracing annotation on the tracked picture.

And the modification module 280 is coupled with the presentation module 270 and is adapted to receive any option selected by the annotating person and perform annotation modification on the tracked picture according to the modification mode corresponding to the selected option.

And the clustering module 290 is coupled with the tracking and labeling module 230, and if an untracked image region exists in the tracked picture, clustering the untracked image region and the image block labeled with the category in the tracked picture according to a preset clustering algorithm. And carrying out class marking on the untracked image areas according to the classes of the image blocks which belong to the same cluster with the untracked image areas.

The invention also provides a computer storage medium storing computer program code which, when run on a computing device, causes the computing device to execute the semi-automatic annotation method for semantic segmentation of pictures according to any of the above embodiments.

According to any one or a combination of the above preferred embodiments, the following advantages can be achieved by the embodiments of the present invention:

in the embodiment of the invention, a plurality of frames of pictures are sequentially selected from a pre-collected video according to the specified picture frame number interval, after class labeling is carried out on each image block in a first frame of picture by a labeling person, the first frame of picture is taken as an initial picture, and the next frame of picture adjacent to the initial picture in the plurality of frames of pictures is taken as a tracked picture, so that an image area corresponding to the image block with the labeled class is tracked in the tracked picture by adopting a preset tracking algorithm, and the tracked image area is labeled as the corresponding class. And then, taking the currently labeled tracked picture as a new initial picture, and continuing to perform tracking labeling on the tracked picture adjacent to the new initial picture by adopting a preset tracking algorithm until all the multiple pictures are labeled. Therefore, the embodiment of the invention can realize semi-automatic semantic segmentation and labeling of the pictures by adopting the preset tracking algorithm based on the pictures with labeled categories for the multiple frames of pictures in the continuous video, thereby greatly improving the labeling efficiency of the pictures and ensuring the accuracy of the labeling of the pictures. Particularly, for two adjacent front and back pictures or a plurality of front and back pictures with similar scenes in the pictures to be labeled which are sampled in sequence in the video, the scheme of the invention greatly saves the repeated or similar labor labeling cost.

It is clear to those skilled in the art that the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and for the sake of brevity, further description is omitted here.

In addition, the functional units in the embodiments of the present invention may be physically independent of each other, two or more functional units may be integrated together, or all the functional units may be integrated in one processing unit. The integrated functional units may be implemented in the form of hardware, or in the form of software or firmware.

Those of ordinary skill in the art will understand that: the integrated functional units, if implemented in software and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions, so that a computing device (for example, a personal computer, a server, or a network device) executes all or part of the steps of the method according to the embodiments of the present invention when the instructions are executed. And the aforementioned storage medium includes: u disk, removable hard disk, Read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disk, and other various media capable of storing program code.

Alternatively, all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (such as a personal computer, a server, or a network device) associated with program instructions, which may be stored in a computer-readable storage medium, and when the program instructions are executed by a processor of the computing device, the computing device executes all or part of the steps of the method according to the embodiments of the present invention.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments can be modified or some or all of the technical features can be equivalently replaced within the spirit and principle of the present invention; such modifications or substitutions do not depart from the scope of the present invention.

Claims

1. A semi-automatic labeling method for semantic segmentation of pictures comprises the following steps:

taking the currently labeled tracked picture as a new initial picture, and continuing to perform tracking labeling on the tracked picture adjacent to the new initial picture by adopting a preset tracking algorithm until all the multiple frames of pictures are labeled; wherein

The semantic division of the first frame of picture into a plurality of image blocks, and the step of receiving the category labeling operation of a labeling person on each image block in the first frame of picture comprises the following steps:

receiving category marking operation of marking personnel on any super pixel block in the first frame of picture;

selecting a typical superpixel block from the superpixel blocks of each annotation category in the first frame picture, wherein the periphery of the typical superpixel block comprises a plurality of superpixel blocks belonging to the same annotation category; and is

The step of tracking an image area corresponding to the image block with the labeled category in the tracked picture by adopting a preset tracking algorithm and labeling the tracked image area as the corresponding category comprises the following steps:

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the tracked picture is subjected to a plurality of modification options including:

carrying out manual labeling modification on the tracked picture;

and an option of performing retracing annotation on the tracked picture.

4. The method according to any one of claims 1-3, further comprising:

5. A semi-automatic labeling device for semantic segmentation of pictures comprises the following components:

the selection module is suitable for acquiring a group of images containing multiple frames of pictures and selecting a first frame of picture from the acquired images;

6. A computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform the semi-automatic labeling method for semantic segmentation of pictures according to any one of claims 1 to 4.