CN114757957A

CN114757957A - Image segmentation method, image segmentation device and electronic equipment

Info

Publication number: CN114757957A
Application number: CN202210209570.6A
Authority: CN
Inventors: 胡淑萍; 庞建新
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2022-07-15

Abstract

The application discloses an image segmentation method, an image segmentation device, an electronic device and a computer-readable storage medium. Wherein, the method comprises the following steps: carrying out human body detection on an image to be segmented through a trained human body detection model to obtain at least one human body detection frame of the image to be segmented; performing image matting processing on the image to be segmented based on each human body detection frame to obtain a human body region image corresponding to each human body detection frame; respectively carrying out human body segmentation on each human body region image through a trained human body segmentation model to obtain a human body segmentation result of each human body region image; and fusing the human body segmentation results of the human body region images to obtain the human body segmentation result of the image to be segmented. By the scheme, the segmentation effect on smaller human body regions can be improved when the image is segmented.

Description

Image segmentation method, image segmentation device and electronic equipment

Technical Field

The present application relates to image processing technologies, and in particular, to an image segmentation method, an image segmentation apparatus, an electronic device, and a computer-readable storage medium.

Background

The image segmentation algorithm essentially belongs to a classification algorithm at a pixel level, namely, the image segmentation algorithm is used for judging which type of object each pixel point in an image belongs to. In actual algorithm development, many Full Connected Network (FCN) based segmentation algorithms have been developed. The FCN-based segmentation algorithm firstly scales an image to a fixed size, then obtains high-dimensional features of the image through a continuous convolution kernel pooling layer, and then restores the high-dimensional features through continuous deconvolution operation, and finally obtains a segmentation result with the same size as the original input image.

Such a common FCN-based image segmentation algorithm is generally capable of approximately segmenting the outline of an object in an image. However, most of the current deep neural networks used in human body segmentation algorithms need to be scaled (generally reduced) before image input, and then the feature map is reduced by the pooling layer in the process of extracting high-dimensional features, so that relatively small human body regions in the images are more difficult to segment after scaling. That is, the traditional FCN-based body segmentation algorithm has a poor segmentation effect on a small body region.

Disclosure of Invention

The application provides an image segmentation method, an image segmentation device, an electronic device and a computer-readable storage medium, which can improve the segmentation effect on a smaller human body area during image segmentation.

In a first aspect, the present application provides an image segmentation method, including:

carrying out human body detection on an image to be segmented through a trained human body detection model to obtain at least one human body detection frame of the image to be segmented;

performing image matting processing on the image to be segmented based on each human body detection frame to obtain a human body region image corresponding to each human body detection frame;

respectively carrying out human body segmentation on each human body region image through a trained human body segmentation model to obtain a human body segmentation result of each human body region image;

and fusing the human body segmentation results of the human body region images to obtain the human body segmentation result of the image to be segmented.

In a second aspect, the present application provides an image segmentation apparatus, comprising:

the detection module is used for carrying out human body detection on the image to be segmented through the trained human body detection model to obtain at least one human body detection frame of the image to be segmented;

the image matting module is used for matting the image to be segmented based on each human body detection frame to obtain a human body region image corresponding to each human body detection frame;

the segmentation module is used for respectively carrying out human body segmentation on each human body region image through a trained human body segmentation model to obtain a human body segmentation result of each human body region image;

and the fusion module is used for fusing the human body segmentation results of the human body region images to obtain the human body segmentation results of the images to be segmented.

In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by one or more processors, performs the steps of the method of the first aspect as described above.

Compared with the prior art, the application has the beneficial effects that: the application provides a human body segmentation method based on two cascaded stages, which divides the human body segmentation process of an image into two steps: the method comprises the steps of firstly, selecting a human body frame in an image to be segmented based on a human body detection frame obtained by a human body detection model, and secondly, segmenting a human body based on a human body segmentation model to a matting result (namely a human body region image) corresponding to the human body detection frame. In the second step, the input of the human body segmentation model is changed from the original full image (namely, the image to be segmented) into the single human body region image, so that the situation that the smaller human body region in the full image is scaled to a smaller size does not occur when the input is scaled to a uniform size, and the segmentation effect on the smaller human body region is improved. Meanwhile, the image to be segmented is processed by the human body detection model and the human image segmentation model in sequence, so that pixels in the background in the image to be segmented are filtered twice, the error segmentation of the pixels in the background in the traditional human body segmentation method is eliminated, and the segmentation effect is further improved.

It is to be understood that, for the beneficial effects of the second aspect to the fifth aspect, reference may be made to the relevant description in the first aspect, and details are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of an implementation of an image segmentation method provided in an embodiment of the present application;

FIG. 2 is a diagram illustrating an algorithm flow of an image segmentation method according to an embodiment of the present application;

fig. 3 is an exemplary diagram of a matting process of a human body region image in an image segmentation method provided in the embodiment of the present application;

fig. 4 is a diagram illustrating another matting flow of a human body region image in an image segmentation method provided in an embodiment of the present application;

fig. 5 is a block diagram of an image segmentation apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical solution proposed in the present application, the following description will be given by way of specific examples.

The following is a description of the image segmentation method proposed in the embodiment of the present application. Referring to fig. 1, the implementation flow of the image segmentation method is detailed as follows:

step 101, performing human body detection on an image to be segmented through a trained human body detection model to obtain at least one human body detection frame of the image to be segmented.

In the embodiment of the present application, a human body detection model may be obtained through pre-training, and the human body detection model is used to detect human body information included in an image, and output a corresponding detection box (bounding box) based on the detected human body information, and is recorded as a human body detection box. It can be understood that, for the human body detection model, when detecting an image, each detected human body information is output to a human body detection frame. That is, if the human body detection model detects a plurality of pieces of human body information in the image, it outputs a plurality of human body detection frames correspondingly.

In view of the fact that the image segmentation method proposed in the embodiment of the present application is mainly to better segment the human body image included in the image, the image to be segmented usually includes at least one piece of human body information. Obviously, for an image not containing any human body information, the human body detection model will not output any human body detection frame, and will not execute the steps proposed in the embodiments of the present application. That is, an image that does not include human body information is not a main processing target of the image segmentation method proposed in the embodiment of the present application.

The electronic equipment performs human body detection on the image to be segmented through the human body detection model, and then N human body detection frames of the image to be segmented output by the human body detection model can be obtained. Wherein N is a positive integer.

102, performing image matting processing on the image to be segmented based on each human body detection frame to obtain a human body region image corresponding to each human body detection frame.

In the embodiment of the present application, the human body detection frame can be generally expressed by the following parameters: [ x, y, w, h ]. x and y represent the position of the human body detection frame in the image to be segmented, and specifically comprise: and coordinates of the upper left vertex of the human body detection frame in the image to be segmented, wherein x is an abscissa and y is an ordinate. w and h represent the size of the human body detection frame in the image to be segmented, wherein w is the width and h is the height. It should be noted that the above parameters are all descriptions made in the pixel coordinate system of the image to be segmented.

Considering that each human body detection frame actually frames out the detected human body information in the image to be segmented, the electronic device can perform image matting processing on the image to be segmented based on the position and the coordinate of each human body detection frame in the image to be segmented, so as to obtain the human body region image corresponding to each human body detection frame. So far, the original image to be segmented is already segmented into N human body region images, and theoretically, each human body region image contains single human body information detected by the human body detection model.

And 103, respectively carrying out human body segmentation on each human body region image through the trained human body segmentation model to obtain a human body segmentation result of each human body region image.

In the embodiment of the application, a human body segmentation model can be obtained through pre-training, and the human body segmentation model is used for classifying each pixel point in an image, specifically classifying the pixel points belonging to a human body into one class, and classifying the pixel points not belonging to the human body (namely, the pixel points belonging to the background) into another class, so that image segmentation is realized. For example only, the human body segmentation model may mark the pixel points belonging to the human body as a first preset category "1", and mark the pixel points belonging to the background as a second preset category "0". Of course, it is also understood that: the human body segmentation model sets the pixel value of the pixel point belonging to the human body to be 1, namely white; the pixel value of the pixel point belonging to the background is set to "0", that is, black.

It can be understood that when performing body segmentation, the electronic device actually considers each body region image separately. That is, for the human body segmentation model, the input is not the image to be segmented but the human body region image. Therefore, when the human body segmentation model carries out human body segmentation, the image to be segmented is not zoomed to a uniform size, but the image of the human body area is directly zoomed to the uniform size, so that the situation that the smaller human body area in the image to be segmented is zoomed to a smaller size is avoided, and the segmentation effect is improved.

And step 104, fusing the human body segmentation results of the human body region images to obtain the human body segmentation result of the image to be segmented.

In the embodiment of the present application, the respective human body segmentation result of each human body region image can be obtained through step 103, but considering that each human body region image is only a part of the image to be segmented, the electronic device should finally fuse the human body segmentation results of the respective human body region images together to obtain the final human body segmentation result of the image to be segmented.

For an understanding of the above process, please refer to fig. 2. Fig. 2 shows an example of the algorithm flow proposed in the embodiment of the present application. It can be seen that the embodiment of the present application actually proposes a cascaded two-stage human body segmentation process, which includes two deep learning models, namely a human body detection model and a human body segmentation model. Firstly, inputting an image to be segmented into a human body detection model by electronic equipment to obtain a human body detection frame; then, the electronic equipment extracts image areas respectively corresponding to all the human body detection frames to obtain corresponding human body area images; then, the electronic equipment respectively inputs the human body region images into the human body segmentation model to obtain the human body segmentation result of each human body region image; and finally, the electronic equipment fuses all the obtained human body segmentation results to obtain a complete human body segmentation result.

In some embodiments, step 101 specifically includes:

and A1, zooming the image to be segmented to a first preset size.

The first predetermined size is determined based on a human detection model. For example only, the first predetermined size may be 320 x 320. It can be understood that, if the size of the image to be segmented is larger than the first preset size, the electronic device will perform reduction processing on the image to be segmented until the size of the image to be segmented is the first preset size; if the size of the image to be segmented is smaller than the first preset size, the electronic device performs amplification processing on the image to be segmented until the size of the image to be segmented is the first preset size. If the size of the image to be segmented is equal to the first preset size, the electronic equipment does not need to adjust the size of the image to be segmented.

And A2, normalizing the zoomed image to be segmented.

By way of example only, the normalization process specifically refers to: and (3) processing the pixel value (usually in the range of 0-255) of each pixel point in the zoomed image to be segmented by means of subtracting a mean value, removing a variance and the like, so that the pixel value of each pixel point is in the range of [0,1 ].

And A3, inputting the normalized image to be segmented into the trained human body detection model to obtain at least one human body detection frame of the image to be segmented.

It can be understood that what is finally input into the image to be segmented is not the original image to be segmented, but the image to be segmented after the scaling and normalization processes.

It should be noted that, after obtaining the human body detection frame output by the human body detection model, the electronic device maps the parameters of the human body detection frame back to the original image to be segmented, that is, the electronic device finally expresses the parameters of the position and the size of each human body detection frame on the basis of the original image to be segmented, for the following reasons:

since the image actually input into the human body detection model is scaled, the size of the original image to be segmented is assumed to be 640 × 480; for the scaled image to be segmented, which is actually input into the human body detection model, the size of the scaled image to be segmented is assumed to be 320 × 320; for convenience of understanding, the original image to be segmented is denoted as I, and the zoomed image to be segmented is denoted as I ', it is obvious that, for the same human body information a to be detected, the position and size of the same human body information a in the image I are different from the position and size of the same human body information a in the image I', which results in that parameters of a human body detection frame are different when the human body detection frame is expressed based on the original image to be segmented and the zoomed image to be segmented. For example, the position and size of the human body detection frame corresponding to the human body information a in the original image I to be segmented are [64,48,64,48], and the position and size of the human body detection frame corresponding to the human body information a in the scaled image I' to be segmented are [32,32,32,32 ]. Since the scaled image I' to be segmented is actually input to the human body detection model, the position and size of the human body detection frame actually output by the human body detection model are [32,32,32,32 ]. During the subsequent processing and actual use, the electronic device preferably knows the position and size of the human body detection frame of the human body information a in the original image I to be segmented, so the electronic device can map the parameters of the position and size of the human body detection frame output by the human body detection model back to the original image I to be segmented based on the scaling operation performed on the image to be segmented in step a 1. That is, the position and size of the human body detection frame mentioned below refer to the position and size of the human body detection frame in the original image I to be segmented, unless otherwise specified.

In some embodiments, step 102 comprises:

and B1, expanding the size of the human body detection frame based on a preset expansion ratio for each human body detection frame.

In consideration of the fact that a human body detection frame output by a human body detection model is possibly small, in order to avoid that the human body detection frame cannot completely frame and select information of limbs and the like of a human body in an image to be segmented, the human body detection frame is expanded outwards to a certain extent in the embodiment of the application. Therefore, the human body region image input into the human body segmentation model subsequently can be guaranteed to contain complete human body information. Obviously, since the size of the human body detection frame is expanded, the expansion ratio is inevitably greater than 1. By way of example only, a typical value for the expansion ratio may be 120%. That is, the human body detection frame can be expanded to 1.2 times of the original human body detection frame. Obviously, when the expansion is performed, the central point of the human body detection frame is usually unchanged, and each vertex is orderly expanded outwards in equal proportion.

And B2, dividing the human body area image corresponding to the human body detection frame from the image to be divided based on the position and the expanded size of the human body detection frame in the image to be divided.

It is understood that the positions of the extended human body detection frame and the human body detection frame before the extension are unchanged, wherein the positions refer to the positions of the central points, but not the positions of the vertices. That is, the coordinates of each vertex of the human body detection frame before the expansion are changed. Note that the coordinate of the top left vertex after the change is (x ', y'), it can be considered that the parameter of the human body detection frame is changed from [ x, y, w, h ] to [ x ', y', α w, α h ], where α is the expansion ratio.

In some embodiments, considering that the position of the human body detection frame in the image to be segmented is uncertain, if the human body detection frame exists at the edge position of the original image to be segmented, a situation that the extended human body detection frame exceeds the original image to be segmented may occur. To cope with such a possible situation, step B2 specifically includes:

and B21, determining candidate detection frames in the image to be segmented based on the position of the human body detection frame in the image to be segmented and the expanded size.

In fact, the candidate frame is the augmented human frame [ x ', y', α w, α h ].

And B22, judging whether the candidate detection frame exceeds the image range of the image to be segmented.

And B23, if the candidate detection frame does not exceed the image range of the image to be segmented, segmenting the area image corresponding to the candidate detection frame from the image to be segmented as the human body area image corresponding to the human body detection frame.

And B24, if the candidate detection frame exceeds the image range of the image to be segmented, the image to be segmented is filled up based on the candidate detection frame, and the area image corresponding to the candidate detection frame is segmented from the filled image to be segmented as the human body area image corresponding to the human body detection frame.

The electronic device performs a padding operation with the first predetermined pixel value, that is, the electronic device specifically pads the image to be segmented based on the candidate detection frame and the first predetermined pixel value. To facilitate understanding of steps B21 through B24, the following examples are given:

referring to fig. 3 and 4, fig. 3 shows an example of a process of matting based on a human body detection frame corresponding to human body information located in the middle of an image to be segmented, and fig. 4 shows an example of a process of matting based on a human body detection frame corresponding to human body information located at an edge of an image to be segmented.

Fig. 3 shows in sequence: a human body detection frame before expansion, a human body detection frame after expansion (namely a candidate detection frame) and a human body area image obtained based on the candidate detection frame. It can be seen that, because the extended human body detection frame (i.e. the candidate detection frame) is still within the image range of the image to be segmented, the region image corresponding to the extended human body detection frame can be directly segmented as the human body region image.

Fig. 4 shows in sequence: the image segmentation method comprises a human body detection frame before expansion, a human body detection frame after expansion (namely a candidate detection frame), an image to be segmented which is completed based on the candidate detection frame and a human body region image obtained based on the candidate detection frame. It can be seen that, since the extended human body detection frame (i.e. the candidate detection frame) is beyond the image range of the image to be segmented, the portion beyond the range in the image to be segmented can be filled with black (i.e. the pixel value is 0), and the padding operation can be completed. After the completion of the completion operation, the region image corresponding to the extended human body detection frame can be normally divided to be used as a human body region image. Obviously, the human body region image obtained by the filling-up operation has the following characteristics: there is a black filled area and this is usually at the edge of the body region image.

In some embodiments, step 103 specifically includes:

c1, for each body region image, scaling the body region image to a second preset size.

The second preset size is determined based on the body segmentation model. For example only, the second predetermined size may be the same as the first predetermined size, e.g., also 320 × 320; alternatively, the second predetermined size may be different from the first predetermined size, and is not limited herein. It is to be understood that step C1 is similar to step a1, and reference may be made to the description of step a1, which is not repeated herein.

And C2, carrying out normalization processing on the zoomed human body area image.

It is to be understood that step C2 is similar to step a2, and reference may be made to the description of step a2, which is not repeated herein.

And C3, inputting the human body area image after the normalization processing into the trained human body segmentation model to obtain the human body segmentation result of the human body area image.

It can be understood that the body region image that is finally input into the body segmentation model is not the original body region image any more, but the body region image after the scaling and normalization processing.

It should be noted that, after obtaining the human body segmentation result output by the human body segmentation model, the electronic device will map each parameter of the human body segmentation result back to the original human body region image, and the reason for this operation is similar to the reason described in step a3, and is not described herein again. That is, the human body segmentation result finally obtained by the electronic device expresses the category to which each pixel point in the original human body region image belongs.

In some embodiments, step 104 specifically includes:

d1, creating a new image.

The size of the new image is the same as that of the original image to be segmented, the new image is empty, and the pixel values of all the pixel points of the new image are first preset pixel values, wherein the first preset pixel values are 0. That is, an all black image is created that is the same size as the original image to be segmented.

D2, traversing the human body segmentation results of all the human body region images, and determining the pixel point coordinates of at least one first target pixel point mapped in the image to be segmented.

The first target pixel point refers to: and the pixel value in the human body segmentation result of each human body region image is a pixel point of a second preset pixel value, wherein the second preset pixel value is 1. Of course, the first target pixel point may also be understood as: in each human body region image, the pixel points marked as the first preset category based on the corresponding human body segmentation result, that is, the pixel points determined as belonging to the human body based on the corresponding human body segmentation result.

And D3, updating the pixel value of at least one second target pixel point in the new image to be a second preset pixel value.

Because the human body region image is obtained by matting from the image to be segmented, each pixel point in the human body region image is mapped in the image to be segmented. Based on the method, the electronic equipment can determine second target pixel points in the new image according to the first target pixel points, wherein the pixel point coordinates of the second target pixel points in the new image are respectively the same as the pixel point coordinates of the first target pixel points mapped in the image to be segmented. After the second target pixel point is determined, the electronic device may update all the pixel values of the second target pixel point in the new image from the first preset pixel value (i.e., 0) to the second preset pixel value (i.e., 1).

It is noted that for the part of the image of the body region that exceeds the image to be segmented, i.e. the part that is filled up by the electronic device, this part can be directly ignored by the electronic device. In fact, since the part is obtained by performing the padding operation with the first predetermined pixel value (i.e. 0) during the padding, the part is actually filled with black, and the human segmentation model does not mark the pixels of the part as pixels belonging to the human body under normal conditions.

In some embodiments, the training process of the human detection model is briefly as follows:

after the original training images are acquired, human body detection frame labels are manufactured and obtained on the basis of human body regions contained in the training images. And inputting the original training picture into the human body detection model to be trained, and obtaining a human body detection frame result. Calculating loss based on the human body detection frame result and the prepared human body detection frame label, and performing back propagation optimization on the human body detection model to be trained by using a Stochastic Gradient Descent (SGD) optimization algorithm. And finally obtaining the trained human body detection model through multiple iterations.

In some embodiments, the training process of the human segmentation model is briefly as follows:

after the original training images are obtained, human body segmentation labels are manufactured and obtained on the basis of human body information contained in each training image, and human body detection frames are marked in the training images. In order to ensure the size of the image of the human body segmentation model to be trained, the marked human body detection frame in the training image can be expanded outwards based on the preset expansion ratio alpha, and all human body regions in the training image are extracted based on the expanded human body detection frame. The area corresponding to the extended human body detection frame exceeds the boundary of the training image, and is filled and supplemented by black (namely 0 pixel). And inputting the extracted images into a human body segmentation model to be trained to obtain a human body segmentation result. And calculating loss based on the human body segmentation result and the human body segmentation label manufactured before, and performing back propagation optimization on the human body segmentation model to be trained by using a random gradient descent optimization algorithm. And finally obtaining the trained human body segmentation model through multiple iterations.

It should be noted that the electronic device trains the human body detection model and the human body segmentation model independently, and the embodiment of the present application does not limit the training sequence of the human body detection model and the human body segmentation model.

From the above, through the embodiment of the application, the process of segmenting the human body of the image is divided into two steps: the method comprises the steps of firstly, selecting a human body frame in an image to be segmented based on a human body detection frame obtained by a human body detection model, and secondly, segmenting the human body based on a human body segmentation model on an image matting result (namely a human body region image) corresponding to the human body detection frame. In the second step, the input of the human body segmentation model is changed from the original full image (namely, the image to be segmented) into the single human body region image, so that the situation that the smaller human body region in the full image is scaled to a smaller size does not occur when the input is scaled to a uniform size, and the segmentation effect on the smaller human body region is improved. Meanwhile, the image to be segmented is processed through the human body detection model and the human image segmentation model in sequence, so that pixels in the background in the image to be segmented are filtered twice, the error segmentation of the pixels in the background in the traditional human body segmentation method is eliminated, and the segmentation effect is further improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Corresponding to the image segmentation method provided above, the embodiment of the present application further provides an image segmentation apparatus. As shown in fig. 5, the image segmentation apparatus 500 includes:

the detection module 501 is configured to perform human body detection on an image to be segmented through a trained human body detection model to obtain at least one human body detection frame of the image to be segmented;

a matting module 502, configured to perform matting processing on the to-be-segmented image based on each human body detection frame to obtain a human body region image corresponding to each human body detection frame;

a segmentation module 503, configured to perform human body segmentation on each of the human body region images through a trained human body segmentation model, to obtain a human body segmentation result of each of the human body region images;

a fusion module 504, configured to fuse the human body segmentation results of the human body region images to obtain a human body segmentation result of the image to be segmented.

Optionally, the detecting module 501 includes:

the first zooming unit is used for zooming the image to be segmented to a first preset size;

the first normalization unit is used for performing normalization processing on the zoomed image to be segmented;

and the human body detection unit is used for inputting the normalized image to be segmented into a trained human body detection model to obtain at least one human body detection frame of the image to be segmented.

Optionally, the image matting module 502 includes:

an expansion unit, configured to expand, for each human body detection frame, a size of the human body detection frame based on a preset expansion ratio, where the expansion ratio is greater than 1;

and a dividing unit, configured to divide the human body region image corresponding to the human body detection frame from the image to be divided based on the position of the human body detection frame in the image to be divided and the expanded size.

Optionally, the dividing unit includes:

a candidate detection frame determining subunit, configured to determine a candidate detection frame in the image to be segmented based on the position of the human body detection frame in the image to be segmented and the expanded size;

an image range judging subunit, configured to judge whether the candidate detection frame exceeds an image range of the image to be segmented;

a first dividing subunit, configured to, if the candidate detection frame does not exceed the image range of the image to be divided, divide an area image corresponding to the candidate detection frame from the image to be divided as a human body area image corresponding to the human body detection frame;

and a second dividing subunit, configured to, if the candidate detection frame is out of the image range of the image to be divided, complement the image to be divided based on the candidate detection frame, and divide an area image corresponding to the candidate detection frame from the complemented image to be divided as a human body area image corresponding to the human body detection frame.

Optionally, the second partitioning subunit includes:

and the filling subunit is used for filling the image to be segmented based on the candidate detection frame and the first preset pixel value.

Optionally, the dividing module 503 includes:

a second scaling unit, configured to scale the human body region image to a second preset size for each human body region image;

the second normalization unit is used for performing normalization processing on the zoomed human body area image;

and the human body segmentation unit is used for inputting the human body region image after the normalization processing into a trained human body segmentation model to obtain a human body segmentation result of the human body region image.

Optionally, the fusion module 504 includes:

the image segmentation device comprises a creation unit, a segmentation unit and a segmentation unit, wherein the creation unit is used for creating a new image, the size of the new image is the same as that of the image to be segmented, and all pixel values of the new image are first preset pixel values;

a traversing unit, configured to traverse all human body segmentation results of the human body region image, and determine a pixel coordinate of at least one first target pixel mapped in the image to be segmented, where the first target pixel is: pixel points with pixel values of a second preset pixel value in the human body segmentation results of all the human body region images;

and an updating unit, configured to update a pixel value of at least one second target pixel point in the new image to the second preset pixel value, where a pixel coordinate of each second target pixel point in the new image is the same as a pixel coordinate of each first target pixel point mapped in the image to be segmented.

From the above, through the embodiment of the application, the process of segmenting the human body in the image is divided into two steps: the method comprises the steps of firstly, selecting a human body frame in an image to be segmented based on a human body detection frame obtained by a human body detection model, and secondly, segmenting the human body based on a human body segmentation model on an image matting result (namely a human body region image) corresponding to the human body detection frame. In the second step, the input of the human body segmentation model is changed from the original full image (namely, the image to be segmented) into the single human body region image, so that the situation that the smaller human body region in the full image is scaled to a smaller size does not occur when the input is scaled to a uniform size, and the segmentation effect on the smaller human body region is improved. Meanwhile, the image to be segmented is processed by the human body detection model and the human image segmentation model in sequence, so that pixels in the background in the image to be segmented are filtered twice, the error segmentation of the pixels in the background in the traditional human body segmentation method is eliminated, and the segmentation effect is further improved.

Corresponding to the image segmentation method provided above, an embodiment of the present application further provides an electronic device. Referring to fig. 6, an electronic device 6 in the embodiment of the present application includes: a memory 601, one or more processors 602 (only one shown in fig. 6), and computer programs stored on the memory 601 and executable on the processors. Wherein: the memory 601 is used for storing software programs and units, and the processor 602 executes various functional applications and diagnoses by running the software programs and units stored in the memory 601, so as to obtain resources corresponding to the preset events. Specifically, the processor 602 implements the following steps by running the above-mentioned computer program stored in the memory 601:

Assuming that the above is the first possible implementation manner, in a second possible implementation manner provided based on the first possible implementation manner, the above detecting a human body of the image to be segmented by the trained human body detection model to obtain at least one human body detection frame of the image to be segmented includes:

zooming the image to be segmented to a first preset size;

normalizing the zoomed image to be segmented;

and inputting the normalized image to be segmented into a trained human body detection model to obtain at least one human body detection frame of the image to be segmented.

In a third possible embodiment based on the first possible embodiment, the matting the to-be-segmented image based on each human body detection frame to obtain a human body region image corresponding to each human body detection frame includes:

for each human body detection frame, expanding the size of the human body detection frame based on a preset expansion proportion, wherein the expansion proportion is more than 1;

and segmenting the human body area image corresponding to the human body detection frame from the image to be segmented based on the position of the human body detection frame in the image to be segmented and the expanded size.

In a fourth possible embodiment based on the third possible embodiment, the segmenting the human body region image corresponding to the human body detection frame from the image to be segmented based on the position of the human body detection frame in the image to be segmented and the expanded size includes:

determining candidate detection frames in the image to be segmented based on the position of the human body detection frame in the image to be segmented and the expanded size;

judging whether the candidate detection frame exceeds the image range of the image to be segmented or not;

if the candidate detection frame does not exceed the image range of the image to be segmented, segmenting an area image corresponding to the candidate detection frame from the image to be segmented as a human body area image corresponding to the human body detection frame;

and if the candidate detection frame exceeds the image range of the image to be segmented, the image to be segmented is supplemented based on the candidate detection frame, and the area image corresponding to the candidate detection frame is segmented from the supplemented image to be segmented as the human body area image corresponding to the human body detection frame.

In a fifth possible embodiment based on the fourth possible embodiment, the registering the image to be segmented based on the candidate detection frame includes:

and completing the image to be segmented based on the candidate detection frame and the first preset pixel value.

In a sixth possible embodiment based on the first possible embodiment, the obtaining of the human body segmentation result of each human body region image by performing human body segmentation on each human body region image by using a trained human body segmentation model includes:

for each human body area image, zooming the human body area image to a second preset size;

normalizing the zoomed human body region image;

and inputting the human body region image after normalization processing into a trained human body segmentation model to obtain a human body segmentation result of the human body region image.

In a seventh possible embodiment based on the first possible embodiment, the fusing the human body segmentation results of the human body region images to obtain the human body segmentation result of the image to be segmented includes:

creating a new image, wherein the size of the new image is the same as that of the image to be segmented, and all pixel values of the new image are first preset pixel values;

traversing all human body segmentation results of the human body region image, and determining pixel point coordinates of at least one first target pixel point mapped in the image to be segmented, wherein the target pixel points are as follows: pixel points with pixel values of a second preset pixel value in the human body segmentation results of all the human body region images;

and updating the pixel value of at least one second target pixel point in the new image into the second preset pixel value, wherein the pixel point coordinates of each second target pixel point in the new image are respectively the same as the pixel point coordinates of each first target pixel point mapped in the image to be segmented.

It should be understood that in the embodiments of the present Application, the Processor 602 may be a Central Processing Unit (CPU), and the Processor may be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Memory 601 may include both read-only memory and random-access memory, and provides instructions and data to processor 602. Some or all of memory 601 may also include non-volatile random access memory. For example, the memory 601 may also store device class information.

As can be seen from the above, the process of segmenting the human body in the image is divided into two steps by the embodiment of the present application: the method comprises the steps of firstly, selecting a human body frame in an image to be segmented based on a human body detection frame obtained by a human body detection model, and secondly, segmenting a human body based on a human body segmentation model to a matting result (namely a human body region image) corresponding to the human body detection frame. In the second step, the input of the human body segmentation model is changed from the original full image (namely, the image to be segmented) into the single human body region image, so that the situation that the smaller human body region in the full image is scaled to a smaller size does not occur when the input is scaled to a uniform size, and the segmentation effect on the smaller human body region is improved. Meanwhile, the image to be segmented is processed by the human body detection model and the human image segmentation model in sequence, so that pixels in the background in the image to be segmented are filtered twice, the error segmentation of the pixels in the background in the traditional human body segmentation method is eliminated, and the segmentation effect is further improved.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of external device software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above modules or units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by the present application, and the computer program can also be executed by associated hardware, and the computer program can be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments described above can be realized. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying the above-described computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer readable Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, software distribution medium, etc. It should be noted that the computer readable storage medium may contain other contents which can be appropriately increased or decreased according to the requirements of the legislation and the patent practice in the jurisdiction, for example, in some jurisdictions, the computer readable storage medium does not include an electrical carrier signal and a telecommunication signal according to the legislation and the patent practice.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An image segmentation method, comprising:

2. The image segmentation method of claim 1, wherein the performing human body detection on the image to be segmented through the trained human body detection model to obtain at least one human body detection frame of the image to be segmented comprises:

zooming the image to be segmented to a first preset size;

normalizing the zoomed image to be segmented;

and inputting the image to be segmented after the normalization processing into a trained human body detection model to obtain at least one human body detection frame of the image to be segmented.

3. The image segmentation method of claim 1, wherein the matting the image to be segmented based on each human body detection frame to obtain a human body region image corresponding to each human body detection frame comprises:

for each human body detection frame, expanding the size of the human body detection frame based on a preset expansion ratio, wherein the expansion ratio is greater than 1;

4. The image segmentation method according to claim 3, wherein the segmenting the human body region image corresponding to the human body detection frame from the image to be segmented based on the position of the human body detection frame in the image to be segmented and the expanded size comprises:

determining a candidate detection frame in the image to be segmented based on the position of the human body detection frame in the image to be segmented and the size after expansion;

and if the candidate detection frame exceeds the image range of the image to be segmented, the image to be segmented is supplemented based on the candidate detection frame, and the area image corresponding to the candidate detection frame is segmented from the supplemented image to be segmented to be used as the human body area image corresponding to the human body detection frame.

5. The image segmentation method according to claim 4, wherein the complementing the image to be segmented based on the candidate detection frame includes:

and supplementing the image to be segmented based on the candidate detection frame and a first preset pixel value.

6. The image segmentation method of claim 1, wherein the performing the body segmentation on each of the body region images through the trained body segmentation model to obtain the body segmentation result of each of the body region images comprises:

normalizing the zoomed human body region image;

7. The image segmentation method according to claim 1, wherein the fusing the human body segmentation results of the respective human body region images to obtain the human body segmentation result of the image to be segmented comprises:

traversing all human body segmentation results of the human body region image, and determining pixel point coordinates of at least one first target pixel point mapped in the image to be segmented, wherein the first target pixel point is as follows: pixel points with pixel values of a second preset pixel value in the human body segmentation results of all the human body region images;

in the new image, updating the pixel value of at least one second target pixel point to the second preset pixel value, wherein the pixel point coordinates of each second target pixel point in the new image are respectively the same as the pixel point coordinates of each first target pixel point mapped in the image to be segmented.

8. An image segmentation apparatus, characterized in that the image segmentation apparatus comprises:

the image matting module is used for matting the image to be segmented based on the human body detection frames to obtain human body region images corresponding to the human body detection frames;

the segmentation module is used for respectively carrying out human body segmentation on each human body region image through the trained human body segmentation model to obtain a human body segmentation result of each human body region image;

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.