CN114299084A

CN114299084A - Image segmentation method and system

Info

Publication number: CN114299084A
Application number: CN202111587133.XA
Authority: CN
Inventors: 沈立超; 谌明; 盛玉庭; 金朝汇
Original assignee: Hithink Royalflush Information Network Co Ltd
Current assignee: Hithink Royalflush Information Network Co Ltd
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-04-08

Abstract

An embodiment of the present specification provides an image segmentation method, including: acquiring an image to be processed, wherein the image to be processed comprises an RGB channel image and a depth channel image; processing the depth channel image, and determining the pixel category of a target pixel in the depth channel image, wherein the pixel category comprises at least one of a foreground pixel, a background pixel and an unknown pixel; determining unknown pixels as foreground pixels or background pixels through an image segmentation model based on the RGB channel image and the processed depth channel image, and obtaining a mask image for distinguishing the foreground pixels and the background pixels of the image to be processed; and based on the mask image and the image to be processed, realizing the segmentation of the foreground image and the background image of the image to be processed.

Description

Image segmentation method and system

Technical Field

The present disclosure relates to the field of image processing, and in particular, to an image segmentation method and system.

Background

Image segmentation is an important subject in the fields of computer image processing and image understanding, and is widely applied in the fields of intelligent navigation, intelligent monitoring, medical image analysis, video image coding and transmission and the like. The image segmentation can further perform background replacement, background blurring and the like by separating a foreground image and a background image of the image or the video from the original image or the video, so that the image or the video achieves a required effect.

The application of image segmentation is increasing, and therefore, a more convenient and efficient image segmentation method is needed.

Disclosure of Invention

One embodiment of the present disclosure provides an image segmentation method. The image segmentation method comprises the following steps: acquiring an image to be processed, wherein the image to be processed comprises an RGB channel image and a depth channel image; processing the depth channel image, and determining the pixel category of a target pixel in the depth channel image, wherein the pixel category comprises at least one of a foreground pixel, a background pixel and an unknown pixel; determining unknown pixels as the foreground pixels or the background pixels through an image segmentation model based on the RGB channel image and the processed depth channel image, and obtaining a mask image for distinguishing the foreground pixels and/or the background pixels of the image to be processed; and realizing the segmentation of the foreground image and the background image of the image to be processed based on the mask image and the image to be processed.

One embodiment of the present specification provides an image segmentation system. The image segmentation system comprises an image acquisition module, an image processing module, a mask image generation module and an image segmentation module: the image acquisition module is used for acquiring an image to be processed, and the image to be processed comprises an RGB channel image and a depth channel image; the image processing module is used for processing the depth channel image and determining the pixel category of a target pixel in the depth channel image, wherein the pixel category comprises at least one of a foreground pixel, a background pixel and an unknown pixel; the mask image generation module determines unknown pixels to be the foreground pixels or the background pixels through an image segmentation model based on the RGB channel image and the processed depth channel image, and obtains a mask image used for distinguishing the foreground pixels and the background pixels of the image to be processed; and the image segmentation module realizes the segmentation of the foreground image and the background image of the image to be processed based on the mask image and the image to be processed.

One of the embodiments of the present specification provides an image segmentation apparatus, including a processor configured to execute an image segmentation method.

One of the embodiments of the present specification provides a computer-readable storage medium storing computer instructions, and when the computer reads the computer instructions in the storage medium, the computer executes an image segmentation method.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a schematic diagram of an application scenario of an image segmentation system according to some embodiments of the present description;

FIG. 2 is an exemplary flow diagram of an image segmentation method according to some embodiments of the present description;

FIG. 3 is a schematic illustration of steps of an image segmentation method according to some embodiments of the present description;

FIG. 4 is a schematic diagram of another step of an image segmentation method according to some embodiments of the present description;

FIG. 5 is an exemplary block diagram of an image segmentation system in accordance with some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "apparatus", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

FIG. 1 is a schematic diagram of an application scenario of an image segmentation system according to some embodiments of the present description.

In some embodiments, the application scenario 100 may include a network 110, a storage device 120, an image capture apparatus 130, and a computing device 140. The image to be processed 131 acquired by the image acquisition device 130 is subjected to image segmentation to obtain a processed image 141, and the processed image 141 includes a foreground image 141-1 and a background image 141-2.

In some application scenarios, the application scenario 100 may be related to computer image processing and image understanding, for example, in intelligent navigation, intelligent monitoring, medical image analysis, application software automatic cropping, video image encoding and transmission, and the like.

The network 110 may connect the various components of the system and/or connect the system with external resource components. In some embodiments, the network 110 provides a conduit for data interaction for the storage device 120, the image acquisition apparatus 130, and the computing device 140. The network 110 enables communication between the various components and with other components outside the system to facilitate the exchange of data and/or information. In some embodiments, the network 110 may be any one or more of a wired network or a wireless network. In some embodiments, network 110 may include one or more network access points. For example, the network 110 may include wired or wireless network access points, such as base stations and/or network switching points 110-1, 110-2, …, through which one or more components of the application scenario 100 may connect to the network 110 to exchange data and/or information.

Storage device 120 may be used to store data and/or instructions. Storage device 120 may include one or more storage components, each of which may be a separate device or part of another device. In some embodiments, the storage device 120 may be implemented on a cloud platform. In some embodiments, the storage device 120 may store the to-be-processed image 131 uploaded by the image acquisition apparatus 130 through the network 110, and may also store the processed image 141 generated by the computing device 140.

The image capturing device 130 may be an RGBD camera or other device capable of capturing depth images, such as a laser radar. In some embodiments, the image capturing apparatus 130 may be configured to acquire the to-be-processed image 131, and transmit the to-be-processed image 131 to the storage device 120 for storage, or transmit the to-be-processed image 131 to the computing device 140 for data processing or display.

The computing device 140 may be used for the uploading, reception, processing, output, and/or display of data. In some embodiments, a processing device is configured in the computing device 140. In some embodiments, the computing device 140 receives the image to be processed 131 acquired by the image acquisition apparatus 130 through the network 110, performs image processing, and outputs a processed image 141. In some embodiments, the computing device 140 may be one or any combination of a server, a processing device, a mobile device, a tablet computer, a laptop computer, or other device having data uploading, receiving, processing, outputting, and/or displaying capabilities. In some embodiments, the user of the computing device 140 may be one or more users, may include users who directly use the service, and may also include other related users.

FIG. 2 is an exemplary flow diagram of an image segmentation method according to some embodiments of the present description. As shown in fig. 2, the process 200 includes the following steps.

Step 210, acquiring an image to be processed. In some embodiments step 210 is performed by image acquisition module 510.

The image to be processed is an image of which the foreground and the background need to be segmented.

In some embodiments, the image to be processed may be an image acquired by a camera device that captures depth channel images, for example, an image captured with an RGBD camera. In some embodiments, the image to be processed may be a single-frame image, for example, an RGBD camera captures a data stream S composed of multiple-frame images, denoted as S ═ f₁,f₂,f₃,...f_t.., converting the data stream into a single image by a decoding unit, and extracting a single frame image f in the single image_tThe single frame image f_tThe image is to be processed. Among them, the RGBD video camera is a camera capable of capturing a color image (RGB channel image) and a depth image (also referred to as a depth channel image) in a general sense, and each frame thereof is a 4-channel image.

In some embodiments, the images to be processed may also come from a storage device, a third party data source (e.g., the internet), or the like. For example, the image to be processed may be a depth image or the like downloaded from a storage device storage and/or image website.

In some embodiments, the image to be processed includes an RGB channel image and a depth channel image.

The RGB channel image comprises channels of three basic colors (red R, green G, blue B), each color channel storing its corresponding grey value representing the brightness of its color and representing a different color. In some embodiments, the RGB channel image is represented by f_rgbIs represented by f_rgbBelonging to color original drawings.

Depth channel images, also known as range images (depth images), are images that have the distance (depth) from the image capture device to each point in the scene as pixel values, and directly reflect the geometry of the visible surface of the scene. In some embodiments of the present description, the depth channel image is represented by f_dIs represented by f_dThe method belongs to a black-white binary image, and is a result of obtaining depth after the parallax image of an image is reprocessed or a video background is modeled.

In some embodiments, the depth channel image generation method includes lidar depth imaging, computer stereo imaging, coordinate measuring machine methods, moire fringe methods, structured light methods, etc., and the depth channel image may be acquired by a stereo camera or a TOF camera.

Step 220, processing the depth channel image, and determining the pixel type of the target pixel in the depth channel image. In some embodiments, step 220 is performed by image processing module 520.

The target pixel refers to a pixel of an image area in the depth channel image, which needs to be subjected to image processing. In some embodiments, the target pixel may be a portion or all of the pixels of the image. For example, the target pixel may be each pixel of the whole image, or may be a pixel of a partial region in the image.

The foreground image and the background image refer to portions of an image where objects such as a person, a landscape, and an article as a foreground and a background are located, respectively. For example, in a desert camel image, a camel can be regarded as a foreground image, and a desert and a sky can be regarded as a background image.

The pixel classes may be distinguished according to depth information of the image. The pixel classes may include at least one of foreground pixels, background pixels, and unknown pixels; the foreground pixel refers to a pixel belonging to a foreground image, the background pixel refers to a pixel belonging to a background image, and the unknown pixel refers to a pixel which cannot be determined whether the unknown pixel belongs to the foreground pixel or the background pixel temporarily.

In some embodiments, the determination method of the pixel class may include an edge detection-based method, a threshold segmentation method, region growing, region splitting and merging, and the like.

In some embodiments, the processor may determine a pixel class to which the target pixel corresponds based on a pixel value of the target pixel in the depth channel image.

The pixel value represents the average luminance information of a certain small block in the image, or the average reflection (transmission) density information of the small block. For example, in a depth channel image with m × n pixels, the depth channel image is divided into m × n small squares by row lines numbered 0 to m and column lines numbered 0 to n, each square serves as a pixel, and the gray values of the whole area are consistent.

In some embodiments, the category of each target pixel may be determined by presetting a pixel threshold corresponding to a different pixel category. For example, a preset pixel threshold x₁And x₂(x₂Greater than x₁) (ii) a Depth channel image f_dF, classifying the pixel values of (1)_dThe middle pixel value is located at [ x ]₁,x₂]The pixel of (2) is regarded as a background pixel; pixel value less than x₁The pixel of (2) is regarded as a foreground pixel; pixel value greater than x₂The pixel of (2) is regarded as an unknown pixel, and the pixel value of the unknown pixel may be relatively large or abnormal. Wherein x is₂Mainly determined by the RBGD camera itself, since a conventional RBGD camera has a range of applicability, if 0.2 m to 2 m is the range of applicability of a certain RBGD camera, it is represented on the depth channel image as an unknown pixel value (e.g., may be a maximum value) for distances beyond the range.

In some embodiments, depth channel image f of known pixel classes may also be processed_d' further processing is carried out, for exampleE.g. for depth channel image f based on the pixel class corresponding to the target pixel_d' quantization processing is carried out on the pixel value of the intermediate target pixel to obtain a depth channel image f after quantization processing_dAnd assigning specific values to the foreground pixel, the background pixel and the unknown pixel in the target pixel, for example, setting the foreground pixel to 0, the background pixel to 1 and the unknown pixel to 2, thereby facilitating subsequent image processing.

In some embodiments, the image f to be processed may also be treated directly_tDepth channel image f in (1)_dQuantization processing is performed for subsequent image processing. For example, according to the depth channel image f_dThe range of the value of each pixel is respectively quantized, such as a range parameter x determined according to the parameters of the RBGD camera₁And x₂The corresponding quantization process is formulated as follows:

in formula (1), i and j represent rows and columns of pixels, respectively; f. of_d"(i, j) is the quantized pixel value of row i and column j; f. of_d(i, j) is the pixel value of the ith row and jth column before quantization; quantizing the pixel values includes locating background pixels (i.e., pixel values at [ x ])₁,x₂]Pixels of interval) is set to 0, and foreground pixels (i.e., pixels having a value less than x) are set to 0₁Pixel(s) is set to 1, and an unknown pixel (i.e., pixel value greater than x) is set to 1₂Pixel(s) is set to 2.

In some embodiments of the present description, the pixel values of the depth channel image may be divided into specific numerical values through quantization processing on the pixel values, so as to clarify the pixel categories, and facilitate subsequent image processing, such as foreground image and background image segmentation.

Step 230, based on the RGB channel image and the processed depth channel image, determining the unknown pixel as a foreground pixel or a background pixel through the image segmentation model, and obtaining a mask image for distinguishing the foreground pixel and/or the background pixel of the image to be processed. In some embodiments, step 230 is performed by mask image generation module 530.

The processed depth channel image may be a depth channel image f with pixels of a determined pixel class_d' or a depth channel image f in which pixel values are further quantized on the basis of the determined pixel class_d″。

In some embodiments, the image segmentation model may calculate an accurate boundary between the foreground image and the background image, and further determine the classification of the unknown pixel, that is, determine whether the unknown pixel is specifically a foreground pixel or a background pixel, and obtain a mask image. In some embodiments, the image segmentation model may be composed of multiple functional layers. For example, the image segmentation model may include a pixel classification layer and a mask image generation layer. In some embodiments, the inputs to the pixel classification layer are an RGB channel image and a post-pixel type determined depth channel image (f'_dOr f_d") output as a depth channel image f of the pixel class for which the unknown pixel is determined_d"(depth channel image f)_d"' the middle pixel class is classified as foreground pixel or background pixel); input of mask image generation layer is depth channel image f with determined pixel class_dAnd' output is a mask image.

In some embodiments, the image segmentation model may be a trained Neural Network (NN) or Gaussian Mixture Model (GMM). The input of the image segmentation model is RGB channel image and depth channel image (f ') with determined pixel type'_dOr f_d") output as a mask image. Illustratively, depth channel image f 'of pixel type will be determined'_dAnd RGB channel image f_rgbInput image segmentation model M (f'_d,f_rgb) To obtain an output mask image alpha_maskM(f′_d,f_rgb)。

In some embodiments, the image segmentation model may be trained from a plurality of labeled training samples. For example, a plurality of labeled training samples may be input into the image segmentation model, a loss function may be constructed from the labels and the results of the image segmentation model, and parameters of the image segmentation model may be iteratively updated based on the loss function. And finishing model training when the loss function of the image segmentation model meets the preset condition to obtain the trained image segmentation model. The preset condition may be that the loss function converges, the number of iterations reaches a threshold, and the like.

In some embodiments, the training samples may include at least RGB channel images corresponding to a number of historical images and depth channel image (f'_dOr f_d"). The label may characterize the mask image. The tags may be manually labeled or generated by other means.

Other methods may also be used for RGB channel image and depth channel image (f'_dOr f_d") to obtain a mask image. For example, other means may include python-OpenCV.

The mask image is an image obtained by performing a masking operation on an image. Wherein the masking operation of the image comprises an operation of recalculating pixel values in the image. After the masking operation, the pixel value of the masked portion in the generated mask image is set to 0, and the other portions are set to 1. A portion having a pixel value of 0 may be used to mask the image corresponding to the pixel value area, e.g., the image area corresponding to a pixel value of 0 will not be displayed, whereas a portion having a pixel value of 1 will not be masked and may be retained. In some embodiments, the mask image may be used to distinguish between the foreground image and the background image. For example, the pixel value of the foreground image is set to 1 and the pixel value of the background image is set to 0 in the mask image. In some embodiments, the size and dimension of the mask image may be determined by its corresponding image to be processed, for example, the size and dimension of the mask image may be the same as the image to be processed.

For further explanation of the smoothing process of the mask image, reference is made to the description of fig. 3.

And 240, based on the mask image and the image to be processed, realizing the segmentation of the foreground image and the background image of the image to be processed. In some embodiments, step 240 is performed by image segmentation module 540.

In some embodiments, after the image to be processed and the mask image are operated, the foreground image and the background image can be separated. For example, if the foreground pixel in the mask image takes a value of 1 and the background pixel takes a value of 0, the foreground image corresponding to the foreground pixel is obtained after the to-be-processed image and the region taking a value of 1 in the mask image are calculated, and the to-be-processed image and the region taking a value of 0 in the mask image are calculated to be 0 (black region), that is, the image region outside the foreground pixel; similarly, if the background pixel in the mask image is taken as 1, a background image corresponding to the background pixel is obtained; and further, the separation of the foreground image and the background image of the image to be processed is realized.

In some embodiments, the foreground image and the background image may be segmented by other methods, such as a threshold-based segmentation method, an edge detection-based segmentation method, a region-based segmentation method, a depth learning-based segmentation method, and the like.

In some embodiments, by incorporating RGBD cameras, no special scene placement is required, increasing flexibility in field selection. Meanwhile, the foreground can be well defined through the depth channel image, a large amount of data is not needed for defining the foreground, the current RGBD camera can generally achieve the accuracy, the parameter number of the model can be effectively controlled, the real-time segmentation effect can be achieved on the terminal equipment, and the segmentation efficiency of the foreground image and the background image is favorably improved.

FIG. 3 is a schematic diagram of steps of an image segmentation method according to some embodiments of the present description. The step diagram 300 includes the following

steps

310 and 320, and the

steps

310 and 320 are executed by the mask image generation module 530.

And 310, smoothing the mask image of the image to be processed to obtain a processed mask image.

In some embodiments, a mask image of the image to be processed may be obtained by an image segmentation model. The relevant contents of the image segmentation model are described with reference to the corresponding contents in fig. 2.

In some embodiments, there may be a case where the edge transformation of the mask image between two adjacent frames is discontinuous, which causes a problem of jitter of the foreground edge (a critical area of the foreground image and the background image) of the output video, and therefore, it is necessary to smooth the mask image of the image to be processed.

Smoothing processing is also called "blurring processing" (blurring) and refers to an image processing technique for reducing problems such as noise or distortion on an image by using a spatial domain filtering technique of low-frequency enhancement. The smoothing algorithm includes mean filtering (Simple filtering), Median filtering (media filtering), Gaussian filtering (Gaussian filtering), Bilateral filtering (Bilateral filtering), and so on.

In some embodiments, the smoothing process may be implemented by: acquiring an adjacent mask image, wherein the adjacent mask image is a mask image of at least one frame of image adjacent to an image to be processed; and reassigning the pixel value of the target pixel of the mask image of the image to be processed based on the mask image and the adjacent mask image of the image to be processed.

In some embodiments, the mask image may be a frame unit, the mask image of the current frame corresponding to the image to be processed is used as the current mask image (as shown in fig. 3), and the current mask image may be smoothed based on the neighboring mask image of the current mask image. For example, the pixel values of the current mask image may be reassigned based on the current mask image and the adjacent mask image.

The proximity mask image is a mask image of an image of at least one frame adjacent to the image to be processed. For example, the image S corresponding to the current video is set to { f ═ f₁,f₂,f₃,...f_tExtracting depth channel images S thereof respectively_d＝{f_d1,f_d2,f_d3,...f_dtAnd the continuous mask images corresponding to the depth channel image are S_alpha＝{m₁,m₂,m₃,...,m_t-2,m_t-1,m_tWherein the adjacent mask image is a mask image of at least one frame adjacent to the image to be processed, e.g. m_t-2、,m_t-1、m_tMay be considered to be a proximity mask image.

In some embodiments, for a current mask image m_tMay be reassigned by: m'_t＝(a²*m_tx+a*m_ty+m_t) And (5) realizing. Wherein m'_tIs the value of (m) as the mask image_tValue after reassignment, m_tx、m_ty、m_tIs an adjacent mask image, and m_tx、m_ty、m_tThe number of frames between intervals can take any value; for example, if the interval value is one frame, then m_txMay be a mask image m_t-2，m_tyMay be a mask image m_t-1(ii) a a is a decimal between 0 and 1, and as the value of a is larger, the frames are more consecutive, but a phenomenon that a mask is dragged (i.e., ghost) may occur, and the value of a needs to be adjusted according to the frame rate of a specific device to obtain the best effect (the correlation between a and the frame rate is mainly obtained based on experiments).

In some embodiments, smoothing the mask image may further include normalizing the pixel values of the mask image, for example, when a portion of the non-background and non-foreground pixels of the mask image is not 0 or 1, the pixel value of the portion may be its own pixel value alpha_mask(i, j), the self pixel value alpha is calculated_mask(i, j) normalized to a pixel value of 0 to 1, alpha'_mask(i, j) represents a pixel value after the normalization processing, and can be obtained by the following formula (2):

formula (2) shows that the pixel value alpha after normalization processing is performed on one mask image_mask' (i, j) is equal to the native pixel value alpha_mask(i, j) and the maximum pixel value max (alpha) in the mask image_mask) And the minimum pixel value min (alpha) in the mask image_mask) Quotient of the subtracted differences, in some embodiments min (alpha)_mask) The value of (d) is 0.

In some embodiments, smoothing the mask image may further include feathering the mask image to make the edges smoother. The method of feathering may include a mean filtering method.

In some embodiments, in the mask image after the smoothing process, the pixel value of the foreground pixel is 1, the pixel value of the background pixel is 0, and the pixel values of the edges of the foreground and the background are values between 0 and 1. Illustratively, for each mask image, the maximum pixel value is 1, i.e., (max (alpha)_mask)＝a²+ a + 1); the pixel values of the general edge pixels in the multiple mask images may be different, and the main part of the foreground image (i.e., most of the image area) and the main part of the background image in the multiple mask images are the same, for example, the foreground image of a certain mask image is the same as the main part of the foreground image of the mask image adjacent to the mask image, and the corresponding main parts of the two background images are also the same; it can be found that for the foreground part, the pixel values in the continuous multiple mask images are all 1, and the foreground pixel is (a) after reassignment²+ a +1), normalized to yield

For the background part, the pixel values in the continuous multiple mask images are all 0, the background pixels are 0 after reassignment, and the normalization is still 0; for the edge part, if the pixel values in the continuous mask images are 1 in some and 0 in some, the pixel value of the edge part is 0 after reassigning (a-a)²+ a +1), the pixel value obtained after normalization is a value between 0 and 1.

And step 320, based on the processed mask image and the processed image, realizing the segmentation of the foreground image and the background image of the processed image.

In some embodiments, it may be determined that each pixel is a foreground pixel or a background pixel based on the foregoing operation, the to-be-processed image and the processed mask image are operated, and the corresponding foreground image and the corresponding background image are determined based on the foreground pixel and the background pixel, respectively. In some embodiments, segmenting the foreground image and the background image may include setting a value of a foreground pixel or a value of a background pixel to be 0 or 1, respectively, and performing pixel matrix calculation and then obtaining the values, specifically referring to the related description of fig. 4.

In some embodiments, by performing the smoothing processing on the mask image, it is possible to avoid a situation where the edge transition between two adjacent frames may be discontinuous, for example, there may be a problem that a foreground edge may shake when the video is output, so that the output of the smoothed video may be achieved by performing the smoothing processing on the image.

It should be noted that the above description of the flow is for illustration and description only and does not limit the scope of the application of the present specification. Various modifications and alterations to the flow may occur to those skilled in the art, given the benefit of this description. However, such modifications and variations are intended to be within the scope of the present description.

FIG. 4 is a schematic diagram of another step of an image segmentation method according to some embodiments of the present description. In some embodiments, the flow 400 may be performed by the computing device 140. The process 400 includes the following steps.

And step 410, acquiring a foreground image of the image to be processed based on the mask image of the image to be processed and the image to be processed.

In some embodiments, the foreground image and the background image of the image to be processed may be determined after the mask image is obtained. As described above, the size of the mask image is the same as that of the image to be processed, and each pixel in the mask image has a corresponding relationship with each pixel in the image to be processed, it can be determined that the corresponding pixel in the image to be processed is a foreground pixel or a background pixel based on the value of each pixel in the mask image, and the image formed by all foreground pixels is the foreground image. For example, according to the foregoing convention, if a certain pixel in the mask image takes a value of 0, that means that in the image to be processed, the pixel corresponding to the pixel taking the value of 0 is a background pixel. Therefore, the pixel matrix corresponding to the image to be processed and the pixel matrix corresponding to the mask image can be multiplied, so that the pixel values of all background pixels in the image to be processed are all set to be 0, and the finally obtained image formed by the pixels with the pixel values not being 0 is the foreground image.

In some embodiments, the image to be processed may be the image comprising the RGB channel image and the depth channel image, and the pixel matrix of the image to be processed is represented by f_tThe pixel matrix of the mask image is represented by alpha_maskThe manner of obtaining the foreground image can be represented by formula (f)_t*alpha_mask) And (4) showing. In some embodiments, the pixel matrix f of the RGB channel image may also be directly mapped_rgbMultiplying the pixel matrix corresponding to the mask image to set the values of all background pixels in the RGB channel image to 0, and further obtaining the foreground image, wherein the mode of obtaining the foreground image can be determined by a formula (f)_rgb*alpha_mask) And (4) showing.

And step 420, acquiring a target background image based on the mask image of the image to be processed and the target image.

The target image refers to a background image designated by the user, that is, an image that the user wants to be a background image. For example, if a user wants to use a space image as a background image, the space image is a target image.

In some embodiments, the target image is an image of the same size as the size of the image to be processed and the mask image of the image to be processed. In some embodiments, the target image may be scaled so that its size is the same as the size of the image to be processed and the mask image of the image to be processed.

The target background image refers to an image that is acquired based on the target image (i.e., a background image specified by the user), which may be a background of the foreground image obtained in step 410. For example, if the foreground image extracted in step 410 is an image of a set of space suits, the target background image is a partial image that can be taken as a background image of space suits and is acquired from a space background image designated by a user.

The target background image may be determined based on the mask image and the target image of the image to be processed. As described above, it may be determined that each pixel is a foreground pixel or a background pixel based on the mask image, and then the pixel matrix f of the target image is obtained by setting the value of the foreground pixel in the mask image to 0_backgroundImage with mask imageAnd (3) multiplying the pixel matrixes, namely setting the pixel value of the image corresponding to the foreground image of the image to be processed to be 0 in the target image correspondingly, and obtaining the image corresponding to all the pixels with the value not being 0 as the target background image.

Setting the value of the foreground pixel in the mask image to 0 can be achieved in various ways, for example, the foreground pixel in the mask image can be directly re-assigned according to the classification of each pixel in the mask image, and for example, the foreground pixel in the mask image can be set to 0 by subtracting the pixel matrix of the mask image from the virtual matrix, where the size of the virtual matrix is the same as the size of the pixel matrix of the mask image, and the value of each pixel is 1, and as the convention, the value of the foreground pixel in the pixel matrix of the mask image is 1, and then the value of the foreground pixel in the mask image is set to 0 after subtracting the pixel matrix of the mask image from the virtual matrix.

If the virtual matrix is represented by 1, the manner of obtaining the target background image can be represented by formula (f)_background*(1-alpha_mask) Is) is shown.

Step 430, a composite image is obtained based on the foreground image and the target background image of the image to be processed.

The synthetic image is an image obtained by combining a target background image and a foreground image, and for example, the synthetic image is an image obtained by synthesizing the space suit into a space background.

And superposing the foreground image and the target background image together to obtain a composite image. For example, after the processing of step 420, the position of the foreground image is reserved in the target background image, that is, in the target background image, the pixel corresponding to the foreground image has no value or has a value of 0, and then the value of each pixel of the foreground image is added to the corresponding pixel in the target background image, so as to obtain the composite image.

In some embodiments, the composite image may also be obtained in other manners, for example, the composite image may be obtained by performing layer combination on the foreground image and the target background image as two layers to be synthesized respectively based on a manner such as a composite layer.

By the method, the foreground segmentation and the background replacement can be performed on the image, the special arrangement of a background scene can be avoided, and a good segmentation effect can be obtained.

In some embodiments, the image segmentation system 500 includes an image acquisition module 510, an image processing module 520, a mask image generation module 530, and an image segmentation module 540.

In some embodiments, the image acquisition module 510 is configured to acquire a to-be-processed image, which includes an RGB channel image and a depth channel image.

In some embodiments, the image processing module 520 is configured to process the depth channel image to determine a pixel class of a target pixel in the depth channel image, where the pixel class includes at least one of a foreground pixel, a background pixel, and an unknown pixel. In some embodiments, the image processing module 520 is further configured to determine a pixel class corresponding to the target pixel based on the pixel value of the target pixel in the depth channel image.

In some embodiments, the mask image generating module 530 determines, based on the RGB channel image and the processed depth channel image, that the unknown pixel is a foreground pixel or a background pixel through the image segmentation model, and obtains a mask image for distinguishing the foreground pixel and/or the background pixel of the image to be processed.

In some embodiments, the image segmentation module 540 performs segmentation of the foreground image and the background image of the image to be processed based on the mask image and the image to be processed.

It should be noted that the above description of the system and its modules is for convenience only and should not limit the present disclosure to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. In some embodiments, the image acquisition module 510, the image processing module 520, the mask image generation module 530, and the image segmentation module 540 disclosed in fig. 5 may be different modules in a system, or may be a module that implements the functions of two or more of the above modules. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. An image segmentation method comprising:

acquiring an image to be processed, wherein the image to be processed comprises an RGB channel image and a depth channel image;

processing the depth channel image, and determining the pixel category of a target pixel in the depth channel image, wherein the pixel category comprises at least one of a foreground pixel, a background pixel and an unknown pixel;

determining unknown pixels as the foreground pixels or the background pixels through an image segmentation model based on the RGB channel image and the processed depth channel image, and obtaining a mask image for distinguishing the foreground pixels and/or the background pixels of the image to be processed;

and realizing the segmentation of the foreground image and the background image of the image to be processed based on the mask image and the image to be processed.

2. The image segmentation method of claim 1, the processing the depth channel image comprising:

determining the pixel class corresponding to the target pixel based on the pixel value of the target pixel in the depth channel image.

3. The image segmentation method of claim 1, the method further comprising:

and performing quantization processing on the pixel value of the target pixel based on the pixel class corresponding to the target pixel.

4. The image segmentation method according to claim 1, wherein the performing the segmentation of the foreground image and the background image of the image to be processed based on the mask image and the image to be processed comprises:

smoothing the mask image of the image to be processed to obtain a processed mask image;

and realizing the segmentation of the foreground image and the background image of the image to be processed based on the processed mask image and the image to be processed.

5. The image segmentation method of claim 4, the smoothing process comprising:

acquiring an adjacent mask image, wherein the adjacent mask image is a mask image of at least one frame of image adjacent to the image to be processed;

reassigning pixel values of the target pixels of the mask image of the image to be processed based on the mask image and the proximity mask image of the image to be processed.

6. The image segmentation method of claim 1, the method further comprising:

acquiring the foreground image of the image to be processed based on the mask image of the image to be processed and the image to be processed;

acquiring a target background image based on the mask image and a target image of the image to be processed;

and obtaining a composite image based on the foreground image and the target background image of the image to be processed.

7. An image segmentation system comprises an image acquisition module, an image processing module, a mask image generation module and an image segmentation module:

the image acquisition module is used for acquiring an image to be processed, and the image to be processed comprises an RGB channel image and a depth channel image;

the image processing module is used for processing the depth channel image and determining the pixel category of a target pixel in the depth channel image, wherein the pixel category comprises at least one of a foreground pixel, a background pixel and an unknown pixel;

the mask image generation module determines unknown pixels to be the foreground pixels or the background pixels through an image segmentation model based on the RGB channel image and the processed depth channel image, and obtains a mask image used for distinguishing the foreground pixels and the background pixels of the image to be processed;

and the image segmentation module realizes the segmentation of the foreground image and the background image of the image to be processed based on the mask image and the image to be processed.

8. An image segmentation apparatus comprising a processor for performing the image segmentation method of any one of claims 1-6.

9. A computer-readable storage medium storing computer instructions which, when read by a computer, cause the computer to perform the image segmentation method according to any one of claims 1 to 6.