CN112241960A

CN112241960A - Matting method and system based on depth information

Info

Publication number: CN112241960A
Application number: CN202011065060.3A
Authority: CN
Inventors: 胡正; 张�林; 黄源浩; 肖振中
Original assignee: Shenzhen Orbbec Co Ltd
Current assignee: Shenzhen Orbbec Co Ltd
Priority date: 2020-10-01
Filing date: 2020-10-01
Publication date: 2021-01-19
Anticipated expiration: 2040-10-01

Abstract

The invention discloses a cutout method and a cutout system based on depth information, which comprise the following steps: s1, collecting a depth image and a color image containing a target; s2, aligning the depth image with the color image to obtain an aligned depth image; s3, performing threshold value binarization processing and corrosion expansion processing on the aligned depth image to obtain a ternary diagram; s4, processing the ternary diagram and the color image through a matting algorithm to obtain an Alpha diagram; and S5, separating the target by using the Alpha image and the color image to obtain a target image. The invention can realize the matting function of real-time video under the condition that a user does not change a 2D camera or use a curtain, and automatically realizes the obtaining of the Trimap by replacing manual interaction with a depth camera, thereby having high precision and high speed.

Description

Matting method and system based on depth information

Technical Field

The invention relates to the technical field of digital image processing, in particular to a matting method and a matting system based on depth information.

Background

The moving foreground is extracted from the static background, theoretically, only one background image needs to be determined, and then the background image is removed from the new image, so that the foreground image can be obtained. However, in most cases, there is no such background image.

The existing foreground image extraction technology includes a background modeling method, a green curtain-based foreground extraction method, a pedestrian segmentation-based method, a ternary map (Trimap) -based mask creation method, and the like. The method based on background modeling extracts moving foreground from static background, and can filter out the foreground only by subtracting the background from a new image by using a threshold, but the method has high requirement on the threshold, is very sensitive to the threshold, and is easy to generate double shadow and hollow phenomena; the foreground extraction method based on the green screen has the same defects as the method based on the background modeling, a green screen working room needs to be built, a large amount of manpower needs to be consumed, and the method is not suitable in general situations; the pedestrian segmentation-based method is only suitable for known data set categories, but cannot cover any category, and has certain limitation; the method for creating the mask based on the Trimap requires a user to manually select the Trimap, which consumes time and has low efficiency.

The above background disclosure is only for the purpose of assisting understanding of the inventive concept and technical solutions of the present invention, and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above content is disclosed at the filing date of the present patent application.

Disclosure of Invention

The present invention is directed to a depth information-based matting method and system, so as to solve at least one of the above-mentioned problems in the background art.

In order to achieve the above purpose, the technical solution of the embodiment of the present invention is realized as follows:

a matting method based on depth information comprises the following steps:

s1, collecting a depth image and a color image containing a target;

s2, aligning the depth image with the color image to obtain an aligned depth image;

s3, performing threshold value binarization processing and corrosion expansion processing on the aligned depth image to obtain a ternary diagram;

s4, processing the ternary diagram and the color image through a matting algorithm to obtain an Alpha diagram;

and S5, separating the target by using the Alpha image and the color image to obtain a target image.

In some embodiments, in step S1, the depth image and the color image are respectively captured by a depth camera and a color camera, wherein the images output by the two cameras are captured by exposure at the same time.

In some embodiments, the exposure acquisition times of the depth camera and the color camera are controlled by a control and processing module to be synchronized; the control and processing module comprises a synchronous information calculation unit, the synchronous information calculation unit calculates time delay and position errors of the target object in the depth image and the color image based on a phase scanning algorithm, and continuously iterates phase parameters until the time delay and the position errors are minimum so as to determine a synchronous phase.

In some embodiments, step S3 includes:

s30, performing threshold value binarization processing on the foreground and the background based on the depth image;

s31, performing morphological erosion operation on the foreground and the background which are subjected to threshold binarization processing respectively to obtain foreground and background partitions of a contracted version;

and S32, determining the corrosion area as an undetermined boundary area to obtain the ternary diagram.

In some implementations, step S3 further includes: uniformly setting the values of all pixel points of a background area and a foreground area in the depth image, and setting the values of all pixel points of the background area as a first preset value; setting the value of each pixel point of the foreground area as a second preset value; wherein the first preset value is not equal to the second preset value.

In some embodiments, before performing the threshold binarization and erosion expansion processing on the depth image in step S3, the method further includes the following steps:

and carrying out filtering denoising smoothing processing on the depth image.

In some embodiments, in step S4, based on the linear relationship of the current pixel point of the image to the corresponding foreground and background:

I＝αF+(1-α)B

wherein, I is an image pixel point, alpha is transparency, F is foreground color, and B is background color; and obtaining the Alpha value corresponding to each pixel through the matting algorithm, wherein the Alpha value corresponding to each pixel forms an Alpha image.

In some embodiments, in step S5, the Alpha map and the color image are synthesized to obtain the target image.

As another technical scheme of the embodiment of the invention:

a matting system based on depth information, comprising:

the acquisition module is used for acquiring a depth image and a color image which comprise a target;

a control and processing module comprising: the alignment matching unit is used for aligning the depth image with the color image to obtain an aligned depth image; a ternary diagram obtaining unit, configured to perform threshold binarization processing and corrosion expansion processing on the aligned depth images to obtain a ternary diagram; the Alpha image acquisition unit is used for processing the ternary image and the color image to obtain an Alpha image; and the image separation unit is used for separating the foreground by utilizing the Alpha image and the color image to obtain the target image.

In some embodiments, the acquisition module acquires the depth image and the color image by using a phase scanning method; the Alpha image acquisition unit processes the ternary image and the color image based on a matting algorithm to obtain the Alpha image.

The technical scheme of the invention has the beneficial effects that:

compared with the prior art, the method and the device have the advantages that the user can realize the cutout function of the real-time video under the conditions of not replacing the 2D camera and not using the curtain, the manual interaction is replaced by the depth camera, the Trimap is automatically acquired, and the precision is high and the speed is high.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart illustration of a depth information based matting method according to one embodiment of the invention.

Fig. 2 is a diagram of obtaining a ternary map based on a depth map in the embodiment of fig. 1.

Fig. 3 is a functional block diagram of a matting system based on depth information according to another embodiment of the invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element. The connection may be for fixation or for circuit connection.

It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for convenience in describing the embodiments of the present invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed in a particular orientation, and be in any way limiting of the present invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.

Referring to fig. 1, fig. 1 is a schematic flowchart of a matting method based on depth information according to an embodiment of the present invention, where the method includes the following steps:

s1, collecting a depth image and a color image containing a target;

in one embodiment, a depth image and a color image are acquired with a depth camera and a color camera (2D camera), respectively; the working frame rate and the phase of the depth camera are set to be consistent with those of the color camera, and the relative positions of the depth camera and the color camera are fixed, for example, the depth camera and the color camera can be fixedly connected by a clamp, and can also be respectively and fixedly connected on the frame, so that in a dynamic shooting scene, images output by the two cameras are exposed and collected at the same time. In some embodiments, the depth camera may be a depth camera based on a structured light scheme, a binocular scheme, or a TOF (time of flight algorithm) scheme; the color camera and the depth camera can be arranged independently, and the color camera can be integrated in the depth camera.

In one embodiment, the depth camera includes a projection module and a collection module, the collection module includes a lens unit and a depth image sensor unit, the depth image sensor unit includes a plurality of depth sensor arrays, and a depth modulation clock, wherein the depth modulation clock is used to gate the depth sensor arrays in the depth image sensor unit. The color camera includes a lens and a color image sensor unit including a plurality of color sensor arrays, a color modulation clock for gating the color sensor arrays in the color image sensor unit.

In one embodiment, the exposure acquisition time synchronization of the depth camera and the color camera may be controlled by a control and processing module, wherein the control and processing module comprises a synchronization information calculation unit. The depth image and the color image are transmitted to the control and processing module, and the synchronous information calculation unit determines the exposure frequency and/or the exposure phase common to the depth camera and the color camera through the processing of the depth image and the color image. Further, the synchronization information calculating unit calculates the time delay and the position error of the target object in the depth image and the color image based on the phase scanning algorithm, and iterates the phase parameters continuously, when the time delay and the position error generated by a certain phase parameter are minimum, the phase can be regarded as the synchronization phase, and the synchronization phase is provided to the depth camera and the color camera as control information (i.e., synchronization information), and the image sensor units in the depth camera and the color camera control the phases of the respective modulation clocks according to the synchronization information.

it will be appreciated that the depth image is aligned with the color image with the aim of achieving spatial alignment of the two. Specifically, each pixel in the depth image corresponds to a pixel in the color image one by one through the control and processing module; that is, pixel coordinates (X) in the depth image sensor coordinate system_W，Y_W，Z_W) Conversion to colour camera coordinate system (X)_C，Y_C，Z_C) Specifically, the following equation is used for calculation, and the process needs to solve r 11-tz and other 12 parameters, so at least 12 known conditions or constraints need to be obtained for realization.

The method comprises the steps of simultaneously acquiring 12 pictures of a human hand at different positions by using a depth camera and a color camera in a static state, manually identifying the 12 pairs of position coordinates, and filling the position coordinates into two sides of an equation, so that 12 parameters can be solved.

It is understood that the color information of the target object in the color image may be closer to other objects or backgrounds, so that the image segmentation based on the color image alone is more difficult to achieve. In the depth image, the depth information of each object in the scene is reflected, so that the target object can be distinguished and recognized from other objects or the background conveniently if the depth information of the target object is different from that of other objects. However, the contour fineness of the target object determined in the depth image is often low, and only the rough area of the target object in the depth image can be preliminarily known according to the depth information contained in the depth image, but the contour of the rough area is not equal to the real contour of the target object, so that the contour of the target object obtained from the depth image is greatly different from the real contour of the target object in the color image. Therefore, in order to achieve a finer image segmentation effect, after acquiring the color image and the depth image, the following steps should be performed:

it is understood that the ternary diagram includes: the depth image processing method comprises a foreground area containing a target, a buffer area and a background area, wherein the approximate area of the target object in the depth image can be preliminarily obtained according to depth information contained in the depth image, so that the outline is expanded to a certain degree according to the outline representing the approximate area of the target object in the depth image, and a buffer area is obtained.

Referring to FIG. 2, in one embodiment, obtaining a ternary diagram includes the steps of:

s30, performing threshold value binarization processing on the foreground and the background through the control and processing module based on the depth map to obtain approximate division of the foreground and the background;

s31, performing erosion morphological operations on the foreground and the background which are subjected to threshold binarization processing respectively to obtain the foreground and the background partitions (namely white and black parts in a ternary map (Trimap)) of a contracted version;

and S32, determining the corrosion area as an undetermined boundary area (a gray part in the Trimap), thereby obtaining a ternary diagram.

Specifically, firstly, values of all pixel points of a background area and a foreground area in a depth image are uniformly set, and the values of all the pixel points of the background area are set to be first preset values; and setting the value of each pixel point of the foreground area as a second preset value. As an example, the first preset value may be 0, the second preset value may be 255, in this embodiment, the first preset value is not equal to the second preset value, and specific values of the first preset value and the second preset value are not limited. And the value of the pixel point of the buffer area is the shortest distance between the pixel point of the buffer area and the pixel point of the background area.

If the first preset value is 0 and the second preset value is 255, and the value of each pixel point in the ternary diagram is an integer value in the interval of [0,255], the background area is displayed to be darkest and the foreground area is displayed to be brightest in the whole ternary diagram. The value of each pixel point in the buffer area is lower as the pixel point is closer to the background area, so that the pixel point is displayed with lower brightness; the pixel points farther away from the background region have higher values and are therefore displayed with higher brightness. Different from the depth image, in the embodiment, the values of the pixel points in the background area of the ternary diagram are unified into a first preset value, and the values of the pixel points in the foreground area are unified into a second preset value, so that the background irrelevant to the target object can be well isolated by using the ternary diagram.

In an embodiment, before the threshold binarization and erosion expansion processing are performed on the depth image, a laplacian-gaussian operator and other methods may be used to perform filtering, denoising and smoothing on the depth image, so as to improve the accuracy of subsequent image processing.

S4, processing the ternary diagram and the color image through a matting algorithm to obtain an Alpha diagram; based on the fact that the current pixel point I of the image is in a linear relation with the corresponding foreground F and background B, namely:

I＝αF+(1-α)B，

wherein, I is a pixel point of the image, i.e. the current observable color of the image, α is transparency, F is foreground color, and B is background color, wherein I is known, and α, F and B are unknown. The foreground F and background B are synthesized by transparency α, which is 1 for a pixel that can be completely determined as foreground; conversely, α is 0 for a pixel that can be completely determined as the background. The probability of belonging to the foreground or the background, namely the Alpha (Alpha) value in the equation, is returned by the matting algorithm, so that the Alpha value corresponding to each pixel can be obtained through the matting algorithm, and the Alpha value corresponding to each pixel forms an Alpha image.

In one embodiment, the color image comprises a first foreground region and a first background region, the trigram comprises a second foreground region, a second background region, and a second buffer region, it being understood that the first foreground region is different from the second foreground region, the first background region is different from the second background region, the difference region between the first foreground region and the second foreground region, and the difference region between the first background region and the second background region constitute the second buffer region in the trigram, and the image of the first foreground region, i.e. the Alpha map, is further derived from the trigram and the color image. Furthermore, the process of the matting processing is to determine a category and a transparency corresponding to the category of each pixel contained in the second buffer area in the ternary diagram, wherein the category includes a foreground and a background, and the sum of the transparency corresponding to the foreground and the transparency corresponding to the background is 1.

In one embodiment, knowing that the color image needs to obtain three unknowns (i.e. knowing I, and needing to solve for α, F, and B), the Alpha map can be obtained through a transfer-based algorithm, such as a poisson method, a closed-form (closed-form) method, a sampling-based algorithm, such as a bayesian method, and a deep neural network-based method. In the embodiment of the present application, a sharing method or a global method may be used, where the sharing method can process an Alpha image corresponding to a foreground F faster, and can meet a requirement for an image processing scene with a high real-time requirement, for example, a user shoots an image through a mobile terminal to process the image fast, so as to obtain a foreground corresponding to the shot image, and then transform the foreground or the background.

S5, separating the target by using the Alpha image and the color image obtained in the step S4 to obtain a target image;

the Alpha image and the color image can be synthesized by a synthesis method to obtain a target image.

Optionally, after the target image is obtained, the color, the scale, and the like of the target area may be adjusted to achieve the effect desired by the user. Optionally, after the target image is obtained, a color image corresponding to the first background region may also be obtained according to the target image and the color image, the color and scale of the color image are further adjusted to obtain a processed image, and the processed image is used to replace the image corresponding to the original first background region. For example, in a portrait image, an object image, that is, an image of a portrait region may be obtained by matting, and a beautifying and stylizing process may be performed on the image of the portrait region (that is, the object image), a background of the object image may be replaced with another image, or a process such as blurring or decoloring may be performed on the background of the portrait image, so as to achieve an aesthetic effect.

It will be appreciated that the image Processing processes described above may be performed in a control and Processing module, which in one embodiment may be a processor, e.g., a processor implemented by FPGA (Field-Programmable Gate Array) technology, DSP (Digital Signal Processing) technology, ISP (image Signal Processing) technology, etc. the processor may be any type of Programmable master control chip or include one or more microprocessors, such as one or more "general purpose" microprocessors, one or more special purpose microprocessors and/or application specific microprocessors (ASICs), or a combination of these Processing components, e.g., the processor may include one or more instruction set (e.g., RISC) processors, as well as Graphics Processors (GPUs), video processors, etc, An audio processor and/or an associated chipset.

Fig. 3 is a depth information-based matting system provided according to an embodiment of the present invention, and as shown in fig. 3, the system includes: the acquisition module is used for acquiring a depth image and a color image which comprise a target; a control and processing module comprising: the alignment matching unit is used for aligning the depth image with the color image to obtain an aligned depth image; a ternary diagram obtaining unit, configured to perform threshold binarization processing and corrosion expansion processing on the aligned depth images to obtain a ternary diagram; the Alpha image acquisition unit is used for processing the ternary image and the color image to obtain an Alpha image; and the image separation unit is used for separating the foreground by utilizing the Alpha image and the color image to obtain a target image. Wherein, the acquisition module adopts a phase scanning method to acquire a depth image and a color image; the Alpha image acquisition unit processes the ternary image and the color image based on a matting algorithm to obtain an Alpha image.

It is understood that each module described above may be a single independent module, or several modules may be combined together to form one module; of course, similar functions of several modules may be realized by using one module in combination.

It should be noted that the depth information-based matting system in fig. 3 can be used to execute the depth information-based matting method in the embodiment shown in fig. 1, where specific descriptions of the modules refer to the foregoing depth information-based matting method embodiment, and are not described herein again.

The matting method and the matting system based on the depth information can enable a user to realize the matting function of real-time video under the conditions of not replacing a 2D camera and not using a curtain through the depth camera and the control processing module, and can automatically realize the obtaining of Trimap by replacing manual interaction with the depth camera without excessive manual interaction, so that the precision is high and the speed is high; the system can be adapted to 2D cameras of different models, different frame rates (10-120 fps), resolutions (20 ten thousand to 4K) and data interfaces (USB, HDMI and internet access); the original 2D shooting and recording system configuration of a user does not need to be changed, and the video stream of the 2D camera is directly processed by the control and processing module and then output in the original video format.

The invention further provides a computer readable storage medium, and a computer scale storage medium stores a computer program, and when the computer program is executed by a processor, the matting method based on depth information of the above embodiment is implemented. The storage medium may be implemented by any type of volatile or non-volatile storage device, or combination thereof.

Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. The computer-readable medium storing the computer-executable instructions is a physical storage medium. Computer-readable media carrying computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can include at least two distinct computer-readable media: physical computer-readable storage media and transmission computer-readable media.

The embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement at least the depth information-based matting method described in the foregoing embodiment.

It is to be understood that the foregoing is a more detailed description of the invention, and that specific embodiments are not to be considered as limiting the invention. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention.

In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the invention as defined by the appended claims.

Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. One of ordinary skill in the art will readily appreciate that the above-disclosed, presently existing or later to be developed, processes, machines, manufacture, compositions of matter, means, methods, or steps, that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.

Claims

1. A matting method based on depth information is characterized by comprising the following steps:

s1, collecting a depth image and a color image containing a target;

2. A matting method based on depth information as claimed in claim 1, characterized by: in step S1, a depth camera and a color camera are used to capture the depth image and the color image, respectively, wherein the images output by the two cameras are captured by exposure at the same time.

3. A matting method based on depth information as claimed in claim 2, characterized by: controlling the exposure acquisition time synchronization of the depth camera and the color camera through a control and processing module; the control and processing module comprises a synchronous information calculation unit, the synchronous information calculation unit calculates time delay and position errors of the target object in the depth image and the color image based on a phase scanning algorithm, and continuously iterates phase parameters until the time delay and the position errors are minimum so as to determine a synchronous phase.

4. A matting method based on depth information as claimed in claim 1, characterized by: step S3 includes:

5. A matting method based on depth information as claimed in claim 4, characterized by: step S3 further includes: uniformly setting the values of all pixel points of a background area and a foreground area in the depth image, and setting the values of all pixel points of the background area as a first preset value; setting the value of each pixel point of the foreground area as a second preset value; wherein the first preset value is not equal to the second preset value.

6. A matting method based on depth information as claimed in claim 1, characterized by: in step S3, before performing threshold binarization and erosion dilation processing on the depth image, the method further includes the following steps:

and carrying out filtering denoising smoothing processing on the depth image.

7. A matting method based on depth information as claimed in claim 1, characterized by: in step S4, based on the linear relationship between the current pixel point of the image and the corresponding foreground and background:

I＝αF+(1-α)B

the method comprises the following steps that I is an image pixel point, Alpha is transparency, F is foreground color, B is background color, Alpha values corresponding to all pixels are obtained through the matting algorithm, and the Alpha values corresponding to all the pixels form an Alpha image.

8. A matting method based on depth information as claimed in claim 1, characterized by: in step S5, the Alpha map and the color image are synthesized by a synthesis method to obtain the target image.

9. A matting system based on depth information, comprising:

10. A depth information based matting system according to claim 9 characterised in that: the acquisition module acquires the depth image and the color image by adopting a phase scanning method; the Alpha image acquisition unit processes the ternary image and the color image based on a matting algorithm to obtain the Alpha image.