WO2022033513A1

WO2022033513A1 - Target segmentation method and apparatus, and computer-readable storage medium and computer device

Info

Publication number: WO2022033513A1
Application number: PCT/CN2021/112044
Authority: WO
Inventors: 贾配洋; 林晓帆; 蔡锦霖
Original assignee: 影石创新科技股份有限公司
Priority date: 2020-08-11
Filing date: 2021-08-11
Publication date: 2022-02-17
Also published as: CN111968134A; CN111968134B

Abstract

The present application is applicable to the field of image processing. Provided are a target segmentation method and apparatus, and a computer-readable storage medium and a computer device. The method comprises: obtaining a first target frame in a target image by using a preset target acquisition model; enlarging the first target frame according to a first preset proportion, so as to obtain an enlarged first target frame; by using a preset target segmentation model, processing an image area corresponding to the enlarged first target frame, so as to obtain a mask image after target segmentation, and a second target frame located in the mask image after target segmentation; mapping the mask image after target segmentation to the size of a target image, so as to obtain a first mask image of the target image; and enlarging the second target frame according to a second preset proportion, so as to obtain an enlarged second target frame, and fusing an image corresponding to the enlarged second target frame with the first mask image, so as to obtain a second mask image of the target image. The present application facilitates correcting a wrongly segmented pixel outside a target instance, thereby improving the accuracy of target segmentation.

Description

Object segmentation method, apparatus, computer-readable storage medium, and computer device

technical field

The present application belongs to the field of image processing, and in particular, relates to a target segmentation method, apparatus, computer-readable storage medium, and computer equipment.

Background technique

Target segmentation refers to performing target segmentation and detection on the part of the image containing the target, separating the target and the background in the image, and obtaining a mask map after target segmentation for subsequent processing of the target. The target can be anything like a portrait, an animal, a car, etc. For example, when the target is a portrait, the subsequent processing of the target may be to perform processing such as beautifying and blurring the target.

However, the target segmentation method in the prior art usually adopts a preset segmentation model, and obtains a mask image after target segmentation according to the enlarged first target frame. The segmentation accuracy of this method is not high, and the edge of the target cannot be accurately segmented.

technical problem

The purpose of the embodiments of the present application is to provide a target segmentation method, apparatus, computer-readable storage medium, computer equipment, and camera, aiming to solve one of the above problems.

technical solutions

In a first aspect, the present application provides a target segmentation method, the method comprising:

S101, using a preset target acquisition model to obtain a first target frame in a target image, where the target image has a target to be segmented;

S102, expanding the first target frame according to a first preset ratio to obtain an enlarged first target frame;

S103, using a preset target segmentation model to process the image area corresponding to the enlarged first target frame, to obtain a mask image after target segmentation and a second target frame located in the mask image after the target segmentation;

S104: Map the mask image after the target segmentation to the size of the target image to obtain a first mask image of the target image; expand the second target frame according to a second preset ratio to obtain an enlarged second target frame, and expand the second target frame. The image corresponding to the second target frame is fused with the first mask image to obtain a second mask image of the target image.

In a second aspect, the present application provides a target segmentation device, the device comprising:

a first target frame acquisition module, configured to use a preset target acquisition model to obtain a first target frame in a target image, where the target image has a target to be segmented;

a first enlargement module, configured to enlarge the first target frame according to a first preset ratio to obtain an enlarged first target frame;

The target segmentation module is used to process the image area corresponding to the enlarged first target frame by using the preset target segmentation model, and obtain the mask image after target segmentation and the first target image in the mask image after the target segmentation. Two target boxes;

The second expansion module is used to map the mask image after the target segmentation to the size of the target image to obtain the first mask image of the target image; and expand the second target frame according to the second preset ratio to obtain the enlarged first mask image. The second target frame is obtained by fusing the image corresponding to the enlarged second target frame with the first mask image to obtain the second mask image of the target image.

In a third aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps of the object segmentation method as described above.

In a fourth aspect, the present application provides a computer device, comprising:

one or more processors;

memory; and

one or more computer programs, the processor and the memory connected by a bus, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors , the processor implements the steps of the object segmentation method when executing the computer program.

In a fifth aspect, the application provides a camera, including:

one or more processors;

memory; and

beneficial effect

In the embodiment of the present application, because the preset target segmentation model is used to process the image area corresponding to the enlarged first target frame, the mask map after target segmentation and the mask map located in the target segmentation are obtained after processing. The second target frame of The image corresponding to the target frame is fused with the first mask image to obtain a second mask image of the target image. Therefore, it is helpful to correct the wrongly segmented pixels outside the target instance and improve the accuracy of target segmentation.

Description of drawings

FIG. 1 is a schematic diagram of an application scenario of a target segmentation method provided by an embodiment of the present application.

FIG. 2 is a flowchart of a target segmentation method provided by an embodiment of the present application.

FIG. 3 is a schematic diagram of a target image being a plane image.

FIG. 4 is a schematic diagram of enlarging each side of the first target frame by a first preset ratio to obtain an enlarged first target frame.

FIG. 5 is a schematic diagram of a mask map after object segmentation.

FIG. 6 is a schematic diagram of a first mask map of a target image.

FIG. 7 is a schematic diagram of an enlarged second target frame.

FIG. 8 is a schematic diagram of a second mask map of the target image.

FIG. 9 is a schematic diagram of a person segmentation map of a target image.

FIG. 10 is a schematic diagram of an object image with multiple objects to be segmented.

FIG. 11 is a schematic diagram of a third mask map of the target image.

FIG. 12 is a schematic diagram of a person segmentation map of a target image.

FIG. 13 is a schematic diagram of a target segmentation apparatus provided by an embodiment of the present application.

FIG. 14 is a specific structural block diagram of a computer device provided by an embodiment of the present application.

FIG. 15 is a specific structural block diagram of a camera provided by an embodiment of the present application.

Embodiments of the present invention

In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

The specific implementation of the present invention is described in detail below in conjunction with specific embodiments:

An application scenario of the target segmentation method provided by an embodiment of the present application may be a computer device or a camera, and the computer device or the camera executes the target segmentation method provided by an embodiment of the present application to obtain a mask image of a target image. An application scenario of the target segmentation method provided by an embodiment of the present application may also include a connected computer device 100 and a camera 200 (as shown in FIG. 1 ). At least one application program of the computer device 100 and the camera 200 may be executed. The computer device 100 may be a server, a desktop computer, a mobile terminal, and the like, and the mobile terminal includes a mobile phone, a tablet computer, a notebook computer, a personal digital assistant, and the like. The camera 200 may be an ordinary camera or a panoramic camera or the like. A common camera refers to a photographing device for taking flat images and flat videos. The computer device 100 or the camera 200 executes the target segmentation method provided by an embodiment of the present application to obtain a mask image of the target image.

Please refer to FIG. 2 , which is a flowchart of a target segmentation method provided by an embodiment of the present application. This embodiment mainly takes the application of the target segmentation method to a computer device or a camera as an example for illustration. The target segmentation method provided by an embodiment of the present application The method includes the following steps:

S101. Use a preset target acquisition model to obtain a first target frame in a target image, where the target image has a target to be segmented.

In an embodiment of the present application, the target image may be a plane image (as shown in FIG. 3 ), a panoramic image, or an image corresponding to the target to be segmented captured from the panoramic image.

The target to be segmented can be any target such as people, animals, cars, etc. The image corresponding to the target to be segmented may include a part of the target to be segmented. For example, when the target to be segmented is a person, the image corresponding to the target to be segmented may be a human face, a human upper body, or a complete human body.

The preset target acquisition model may be a target detection model or a target tracking model. The target detection model can be a classical machine learning method detection model or a deep learning target detection model that uses target detection datasets to learn. Object tracking models include line graph models (Stick Figures Model), 2D Contours Model) and three-dimensional model (Volumetric Model) and so on.

If the target in the video is segmented in real time, the preset target acquisition model may be a target tracking model or a target detection model.

In an embodiment of the present application, the target detection model may be a deep learning target detection model that is learned by using a target detection data set. If the first target frame in the target image is obtained by using a deep learning target detection model that uses target detection data sets for learning, and when the target detection data set includes panoramic images, for the panoramic image and the target to be segmented intercepted from the panoramic image corresponding to The image and the target have a slightly deformed plane image, and the image segmentation effect is better than the detection model of the classical machine learning method. Because the target detection data set includes panoramic images, the objects in the panoramic images are deformed. After data training, the results of the model output are better than other algorithms for the segmentation of deformed targets, and the model has stronger generalization ability on deformed images. It can be compatible with slightly deformed images, and the segmentation accuracy of the target is higher.

In an embodiment of the present application, the deep learning target detection model that uses the target detection data set for learning is a single-stage target detection model or a two-stage target detection model, and the model compression method is used under the condition of ensuring accuracy. Reduce the amount of computation, so it has an advantage in the model inference speed compared to the method without model compression. Model compression refers to the compression of the deep learning network structure and parameters, that is, using fewer network layers and fewer channels of the deep convolutional neural network to achieve similar accuracy or meet the task requirements.

S102: Enlarge the first target frame according to a first preset ratio to obtain an enlarged first target frame.

In an embodiment of the present application, S102 may specifically be:

Each side of the first target frame is enlarged by a first preset ratio to obtain an enlarged first target frame (as shown in FIG. 4 ). The range of the first preset ratio can be between 5% and 30%, that is, each side of the enlarged first target frame is between 105% and 130% of each side of the original first target frame. Other ratios are possible.

Since the edge expansion may exceed the boundary of the image, after the expansion of each edge of the first target frame by a first preset ratio to obtain the enlarged first target frame, the method may further include:

Determine whether all sides of the expanded first target frame are beyond the range covered by the target image, and if so, modify the side beyond the range covered by the target image to be expanded to a position consistent with the edge corresponding to the target image. Therefore, the expanded first target frame can be prevented from exceeding the boundary of the image.

S103, using a preset target segmentation model to process the image area corresponding to the enlarged first target frame to obtain a mask image after target segmentation and a second target frame located in the mask image after the target segmentation ( as shown in Figure 5).

The second target frame in the mask image after the target segmentation can be specifically obtained by: identifying the target in the mask image after the target segmentation, and using the rectangular frame formed by the boundary of the target as the second target frame.

The preset target segmentation model can be a classic segmentation machine learning algorithm, such as YOLACT (You Only Look At CoefficienTs) and other algorithms, and can also be a deep learning target segmentation model that uses the obtained target segmentation data set for learning. The effect of real-time instance segmentation. The target segmentation dataset in this embodiment of the present application may include plane images and panoramic images. Panoramic images have better generalization ability, which makes the segmentation of objects better. Moreover, the target contour annotation is more accurate, and each target instance labeling polygon (polygon) modeling instance consists of more pixels.

Algorithms such as YOLACT (You Only Look At CoefficienTs) are single-stage instance segmentation algorithm models that do not require multi-scale feature information of the corresponding region (multi-scale feature information is extracted from different convolutional layers in the neural network structure). The single-stage instance segmentation model is faster, and further model compression is performed for the deep neural network to further improve the speed of the instance segmentation model.

The deep learning target segmentation model that uses the obtained target segmentation data set for learning can be a single-stage target segmentation model or a two-stage target segmentation model, and the amount of calculation is reduced through model compression and other methods under the condition of ensuring accuracy. Compared with the method without model compression, the model inference speed is more advantageous. Model compression refers to the compression of the deep learning network structure and parameters, that is, using fewer network layers and fewer channels of the deep convolutional neural network to achieve similar accuracy or meet the task requirements.

For the target segmentation model, the current open source datasets have insufficient instance segmentation accuracy for characters, and cannot accurately segment the edges of characters. For example, the coco (Common Objects in Context, public objects in the context) dataset cannot meet the requirements of high-precision person segmentation due to the rough labeling. The Supervisely dataset has high labeling accuracy, but the amount of data is small and contains limited image scenes, so it cannot be directly used for training and applied to high-precision person segmentation products. This application can use a deep learning target segmentation model that uses the target segmentation data set for learning. Compared with the coco data set, the target segmentation data set used in this application is more accurate for human body contour labeling. The target segmentation after training based on this data set The model has better segmentation effect and higher segmentation accuracy for human body edges and portable items (such as mobile phones, hats, helmets, backpacks, helmets, rackets, umbrellas, etc., which may be carried or held by the human body).

In an embodiment of the present application, the mask image obtained after the target segmentation is mapped to the size of the target image to obtain the first mask image of the target image is specifically:

Expand the edge of the mask image after the target segmentation, so that the size of the mask image after the expanded edge is the same as the size of the target image, and obtain the first mask image of the target image, as shown in Figure 6.

Since the first target frame is enlarged according to the first preset ratio in S102, when the enlarged first target frame is obtained, more background pixels may be added, the image may contain more than one person, and there may be interference from other people or misclassification of the background . Therefore, by expanding the second target frame (as shown in the inner frame in FIG. 7 ) according to the second preset ratio, an enlarged second target frame (as shown in FIG. 7 ) is obtained.

The range of the second preset ratio can be between 5% and 30%, that is, each side of the enlarged second target frame is between 105% and 130% of each side of the original second target frame. Other ratios are possible.

After S104, the method may further include the following steps:

The second mask image of the target image (as shown in Figure 8) is fused with the target image to obtain a person segmentation map of the target image (as shown in Figure 9).

In S101, if the target image has multiple targets to be segmented, there are multiple first target frames obtained.

As shown in FIG. 10 , the target image has a plurality of targets to be segmented, then S102 to S104 are performed for each target to be segmented to obtain a plurality of second mask images of the target images, and then multiple The second mask of the target image is fused to obtain the third mask of the target image, as shown in Figure 11, and finally the third mask of the target image is fused with the target image to obtain the character segmentation of the target image Figure, as shown in Figure 12.

In an embodiment of the present application, before S101, the method may further include the following steps:

The target image is normalized to obtain the normalized target image. Normalization is a way of down-sampling, and normalizing the target image refers to down-sampling the target image to improve the calculation speed.

S101 is specifically as follows: using a preset target acquisition model to obtain a first target frame in a normalized target image, where the target image has a target to be segmented.

In an embodiment of the present application, between S102 and S103, the method may further include the following steps:

The image area corresponding to the enlarged first target frame is normalized to obtain a normalized image. That is to normalize the input of the preset target segmentation model (minus the mean, divide the variance, etc.), and normalize the input of the preset target segmentation model is the routine operation of most deep learning models at present, that is, the image The pixel value is normalized from 0-255 to such as [-1,+1], centered at 0, which can speed up model convergence, etc.

S103 is specifically as follows: using a preset target segmentation model to process the normalized image to obtain a mask image after target segmentation and a second target frame located in the mask image after target segmentation.

Between S103 and S104, the method may further include the following steps:

Perform an upsampling operation on the mask image after target segmentation.

Upsampling refers to enlarging the mask image after target segmentation to obtain a mask image after target segmentation that is consistent with the image size of the input preset target segmentation model.

Referring to FIG. 13 , the target segmentation device provided by an embodiment of the present application may be a computer program or a piece of program code running in a computer device or a camera, for example, the target segmentation device is an application software; the target segmentation device can be used for The corresponding steps in the target segmentation method provided by the embodiment of the present application are performed. The target segmentation device provided by an embodiment of the present application includes:

The first target frame acquisition module 11 is used to obtain the first target frame in the target image by adopting a preset target acquisition model, and the target image has the target to be segmented;

The first enlargement module 12 is used to enlarge the first target frame according to a first preset ratio to obtain the enlarged first target frame;

The target segmentation module 13 is configured to use a preset target segmentation model to process the image area corresponding to the enlarged first target frame, and obtain a mask image after target segmentation and a mask image located in the mask image after the target segmentation. the second target frame;

The second enlargement module 14 is used for mapping the mask image after the target segmentation to the size of the target image to obtain the first mask image of the target image; and expanding the second target frame according to the second preset ratio to obtain the enlarged For the second target frame, the image corresponding to the enlarged second target frame is fused with the first mask image to obtain a second mask image of the target image.

The target segmentation device provided by an embodiment of the present application and the target segmentation method provided by an embodiment of the present application belong to the same concept, and the specific implementation process thereof is detailed in the full text of the specification, which will not be repeated here.

An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the target segmentation method provided by an embodiment of the present application. step.

FIG. 14 shows a specific structural block diagram of a computer device provided by an embodiment of the present application. The computer device may be the computer device shown in FIG. 1 . A computer device 100 includes: one or more processors 101 and a memory 102 , and one or more computer programs, wherein the processor 101 and the memory 102 are connected by a bus, the one or more computer programs are stored in the memory 102 and are configured to be executed by the one or A plurality of processors 101 execute, and when the processors 101 execute the computer program, the steps of the target segmentation method provided by an embodiment of the present application are implemented.

The computer equipment may be a desktop computer, a mobile terminal, etc., and the mobile terminal includes a mobile phone, a tablet computer, a notebook computer, a personal digital assistant, and the like.

FIG. 15 shows a specific structural block diagram of a camera provided by an embodiment of the present application. The camera may be the camera shown in FIG. 1 . A camera 200 includes: one or more processors 201 , a memory 202 , and one or more A plurality of computer programs, wherein the processor 201 and the memory 202 are connected by a bus, the one or more computer programs are stored in the memory 202, and are configured to be executed by the one or more processors 201 is executed. When the processor 201 executes the computer program, the steps of the target segmentation method provided by an embodiment of the present application are implemented.

In the present application, because the preset target segmentation model is used to process the image area corresponding to the enlarged first target frame, the mask image after target segmentation and the first target image in the mask image after target segmentation are obtained. Second target frame; then map the mask image after target segmentation to the size of the target image, and expand the second target frame according to the second preset ratio to obtain the enlarged second target frame, and the enlarged second target frame The corresponding image is fused with the first mask image to obtain a second mask image of the target image. Therefore, it is helpful to correct the wrongly segmented pixels outside the target instance, that is, the pixels of other non-instance targets, other similar target pixels, etc., and improve the accuracy of target segmentation.

It should be understood that, the steps in the embodiments of the present application are not necessarily executed sequentially in the order indicated by the step numbers. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these sub-steps or stages The sequence is also not necessarily sequential, but may be performed alternately or alternately with other steps or sub-steps of other steps or at least a portion of a phase.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a non-volatile computer-readable storage medium , when the program is executed, it may include the flow of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, all It is considered to be the range described in this specification.

The above examples only represent several embodiments of the present invention, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of the present invention, several modifications and improvements can also be made, which all belong to the protection scope of the present invention. Therefore, the protection scope of the patent of the present invention should be subject to the appended claims.

Claims

A target segmentation method, characterized in that the method comprises:

S101, using a preset target acquisition model to obtain a first target frame in a target image, where the target image has a target to be segmented;

S102, expanding the first target frame according to a first preset ratio to obtain an enlarged first target frame;

S103, using a preset target segmentation model to process the image area corresponding to the enlarged first target frame, to obtain a mask image after target segmentation and a second target frame located in the mask image after the target segmentation;

S104: Map the mask image after the target segmentation to the size of the target image to obtain a first mask image of the target image; expand the second target frame according to a second preset ratio to obtain an enlarged second target frame, and expand the second target frame. The image corresponding to the second target frame is fused with the first mask image to obtain a second mask image of the target image.
The target segmentation method according to claim 1, wherein the preset target acquisition model is a target detection model or a target tracking model.
The target segmentation method according to claim 1, wherein the S102 is specifically:

Enlarging each side of the first target frame by a first preset ratio to obtain an enlarged first target frame; expanding each side of the first target frame by a first preset ratio to obtain an enlarged first target frame After the box, the method further includes:

Determine whether all sides of the expanded first target frame are beyond the range covered by the target image, and if so, modify the side beyond the range covered by the target image to be expanded to a position consistent with the edge corresponding to the target image.
The target segmentation method according to claim 1, wherein after the S104, the method further comprises:

The second mask image of the target image is fused with the target image to obtain a person segmentation map of the target image.
The target segmentation method according to any one of claims 1 to 4, wherein, in S101, if the target image has multiple targets to be segmented, there are multiple first target frames obtained;

S102 to S104 are executed for each target to be segmented to obtain a plurality of second masks of the target images, and then the second masks of the plurality of target images are fused to obtain a third mask of the target image Finally, the third mask image of the target image is fused with the target image to obtain the person segmentation map of the target image.
The target segmentation method according to claim 1, wherein before the step S101, the method further comprises: normalizing the target image to obtain a normalized target image;

S101 is specifically: using a preset target acquisition model to obtain a first target frame in the normalized target image;

Between S102 and S103, the method further includes: normalizing the image area corresponding to the enlarged first target frame to obtain a normalized image;

S103 is specifically: using a preset target segmentation model to process the normalized image to obtain a mask image after the target segmentation and a second target frame located in the mask image after the target segmentation;

Between S103 and S104, the method further includes: performing an upsampling operation on the mask image after the target segmentation.
A target segmentation device, characterized in that the device comprises:

a first target frame acquisition module, configured to use a preset target acquisition model to obtain a first target frame in a target image, where the target image has a target to be segmented;

a first enlargement module, configured to enlarge the first target frame according to a first preset ratio to obtain an enlarged first target frame;

The target segmentation module is used to process the image area corresponding to the enlarged first target frame by using the preset target segmentation model, and obtain the mask image after target segmentation and the first target image in the mask image after the target segmentation. Two target boxes;

The second expansion module is used to map the mask image after the target segmentation to the size of the target image to obtain the first mask image of the target image; and expand the second target frame according to the second preset ratio to obtain the enlarged first mask image. The second target frame is obtained by fusing the image corresponding to the enlarged second target frame with the first mask image to obtain the second mask image of the target image.
A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the target segmentation method according to any one of claims 1 to 6 is implemented. step.
A computer device comprising:

one or more processors;

memory; and

one or more computer programs, the processor and the memory connected by a bus, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors , characterized in that, when the processor executes the computer program, the steps of the target segmentation method according to any one of claims 1 to 6 are implemented.
A camera comprising:

one or more processors;

memory; and

one or more computer programs, the processor and the memory connected by a bus, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors , characterized in that, when the processor executes the computer program, the steps of the target segmentation method according to any one of claims 1 to 6 are implemented.