CN111899266B

CN111899266B - RGBD camera-based image matting method and system

Info

Publication number: CN111899266B
Application number: CN202010699261.2A
Authority: CN
Inventors: 莫曜阳; 钱贝贝; 关国雄
Original assignee: Orbbec Inc
Current assignee: Orbbec Inc
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2024-07-12
Anticipated expiration: 2040-07-17
Also published as: CN111899266A

Abstract

The invention discloses a matting method based on an RGBD camera, which comprises the following steps: s1, obtaining a background image; s2, acquiring a current frame color image and a corresponding depth image; s3, segmenting the depth image to obtain a foreground soft segmentation image; s4, extracting features to obtain feature graphs corresponding to the images respectively; s5, splicing the characteristic image of the color image of the current frame with the characteristic image of the background image and the characteristic image of the foreground soft segmentation image respectively, and outputting a first characteristic image and a second characteristic image; s6, constructing a matting model with an encoder-decoder structure, and splicing the first and second feature images with the feature image of the current frame color image to obtain an encoding feature image; decoding by a decoder, and outputting the foreground transparency; the foreground transparency and the current frame color image are used for fusion with different background images. According to the invention, the depth value is utilized to segment the depth image and the depth network optimization is carried out on the segmented image, so that the matting of any type of object can be realized, and the universality is wide.

Description

RGBD camera-based image matting method and system

Technical Field

The invention relates to the technical field of digital image processing, in particular to a method and a system for matting based on an RGBD camera.

Background

The foreground object can be obtained by extracting the moving foreground from the stationary background, theoretically only one background image is needed, and then subtracting the background image from the new image. However, in most cases, there is no such background image.

In the existing foreground image extraction technology, there are a background modeling-based method, a green curtain-based foreground extraction method, a pedestrian segmentation-based method, a mask creation method based on a Trimap, and the like. The background modeling method is based on the fact that a moving foreground is extracted from a static background, the foreground can be filtered out only by subtracting the background from a new image through a threshold value, but the method is very high in threshold value requirement and very sensitive to the threshold value, and is easy to appear in the phenomena of double shadows and holes; the foreground extraction method based on the green curtain has the same defects as the background modeling method, a green curtain working chamber needs to be built, a great deal of manpower is required, and the method is not applicable to general situations; the pedestrian segmentation-based method is only suitable for the known data set category, but cannot cover any category, and has certain limitation; and the method for creating the mask based on the Trimap requires the user to manually select the Trimap, which consumes time and has low efficiency.

The foregoing background is only for the purpose of providing an understanding of the inventive concepts and technical aspects of the present application and is not necessarily prior art to the present application and is not intended to be used as an aid in the evaluation of the novelty and creativity of the present application in the event that no clear evidence indicates that such is already disclosed at the date of filing of the present application.

Disclosure of Invention

The invention aims to provide a RGBD camera-based matting method and system, which are used for solving at least one of the problems in the background technology.

In order to achieve the above object, the technical solution of the embodiment of the present invention is as follows:

A method for matting based on RGBD camera comprises the following steps:

s1, acquiring the background image through a fixed RGBD camera under the condition that no foreground object exists in a scene;

s2, acquiring a current frame color image and a depth image corresponding to the current frame of a target object entering a scene;

S3, segmenting the depth image of the current frame obtained in the step S2 to obtain a foreground soft segmentation image;

S4, extracting the characteristics of the current frame color image, the foreground soft segmentation image and the background image to respectively obtain a characteristic image of the current frame color image, a characteristic image of the foreground soft segmentation image and a characteristic image of the background image;

s5, respectively splicing the characteristic images of the current frame color image with the characteristic images of the background image and the characteristic images of the foreground soft segmentation image, and outputting a first characteristic image and a second characteristic image;

S6, constructing a matting model with an encoder-decoder structure, and inputting the first characteristic image, the second characteristic image and the characteristic image of the current frame color image into an encoder for splicing to obtain an encoding characteristic image; decoding the coding feature map through a decoder, and outputting foreground transparency; and fusing the foreground transparency and the current frame color image with different background images to obtain a composite image.

In some embodiments, in step S3, an effective depth range is preset, and the current frame depth image is segmented according to the effective depth range, so as to obtain the foreground soft segmentation image.

In some embodiments, in step S4, feature extraction is performed on the color image, the foreground soft segmentation image, and the background image of the current frame by using a feature extraction network, where the feature extraction network includes three convolution modules, and each convolution module includes a convolution layer, a batch normalization layer, and a Relu activation function layer.

In some embodiments, in step S6, the feature map of the color image of the current frame, the feature map of the foreground soft segmentation image, and the feature map of the background image are respectively spliced with the first feature map and the second feature map by using the convolution layer, the batch normalization layer, and the ReLU activation function layer, so as to obtain a coding feature map and transmit the coding feature map to the decoder.

In some embodiments, for the obtained depth image, for the determined foreground portion, step S6 further comprises the steps of:

S61, selecting a3 multiplied by 3 structural matrix based on the foreground soft segmentation image, performing logical AND operation by using the structural matrix and the foreground soft segmentation image, wherein if the values of the structural matrix and the foreground soft segmentation image are both 1, the pixel of the point of the output image is 1, otherwise, the pixel of the output image is 0;

s62, obtaining a foreground soft segmentation image based on the step S61, selecting a3 multiplied by 3 structural matrix, performing logical AND operation by using the structural matrix and the foreground soft segmentation image, wherein if the values of the structural matrix and the foreground soft segmentation image are both 0, the pixel of the point of the output image is 0, otherwise, the pixel of the output image is 1;

S63, carrying out edge blurring processing on the foreground soft segmentation image obtained based on the step S62 by utilizing Gaussian filtering to obtain a second foreground soft segmentation image;

s64, traversing each pixel in the front-field depth image in the step S2 and the second foreground soft segmentation image in the step S63, and performing maximum value taking operation on the transparency in the front-field depth image and the second foreground soft segmentation image, namely: Transparency is obtained 。

In some embodiments, for the obtained depth image, for the determined background portion, step S6 further comprises the steps of:

s65, based on the step S3, segmenting the depth image outside the effective depth range to obtain a background soft segmentation image;

S66, selecting a 3 multiplied by 3 structural matrix, performing logical AND operation by using the structural matrix and the background soft segmentation image, wherein if the values of the structural matrix and the background soft segmentation image are both 1, the pixel of the point of the output image is 1, otherwise, the pixel of the output image is 0;

S67, selecting a3 multiplied by 3 structural matrix based on the background soft segmentation image obtained in the step 66, performing logical AND operation by using the structural matrix and the background soft segmentation image, wherein if the values of the structural matrix and the background soft segmentation image are both 0, the pixel of the point of the output image is 0, otherwise, the pixel of the output image is 1;

S68, traversing each pixel point on the background soft segmentation image obtained in step S67, if the background soft segmentation image W _i,j =1, then the step S64 is based Setting 0 to obtain new transparency of。

In some embodiments, step S1 further comprises:

s10, acquiring a first background depth image corresponding to the background image, and segmenting the first background depth image to obtain a second background depth image;

s11, converting the pixel points on the second background depth image into a camera coordinate system or a world coordinate system to obtain a point cloud;

s12, randomly extracting a plurality of pixel points in the point cloud to fit a plane equation, and representing the ground by the plane equation.

In some embodiments, step S3 further comprises:

And S31, traversing each pixel point in the point cloud obtained in the step S12, substituting the pixel point into the plane equation obtained in the step S12, calculating the distance between the pixel point and the plane, judging the point as the ground if the distance between the pixel point and the plane is smaller than a threshold value, and setting the pixel value of the corresponding pixel point in the corresponding foreground soft segmentation image to 0.

The other technical scheme of the embodiment of the invention is as follows:

a matte system based on RGBD cameras, comprising:

the RGBD camera is fixed at a preset position and is used for acquiring a current frame color image, a current frame depth image and a background image;

The image processing module is used for setting the effective depth range of the depth image, segmenting the depth image and acquiring a soft segmentation image;

The feature extraction module is used for extracting features of the current frame color image, the background image and the foreground soft segmentation image to obtain a feature map of the current frame color image, a feature map of the background image and a feature map of the foreground soft segmentation image;

The image matting model module is provided with an encoder-decoder structure and is used for respectively splicing the characteristic image of the current frame color image with the characteristic image of the background image and the characteristic image of the foreground soft segmentation image and outputting a first characteristic image and a second characteristic image; and constructing a matting model with an encoder-decoder structure, respectively inputting the characteristic image of the color image of the current frame, the characteristic image of the background image and the characteristic image of the foreground soft segmentation image into an encoder together with the first characteristic image and the second characteristic image to be spliced to obtain an encoded characteristic image, transmitting the encoded characteristic image to a decoder, decoding the encoded characteristic image by the decoder to output foreground transparency, and fusing the foreground transparency and the color image of the current frame with different background images to obtain a composite image.

A further technical solution of the embodiment of the invention is:

a storage medium storing a computer program that, when executed, performs at least the RGBD camera-based matting method according to any one of the embodiments described above.

The technical scheme of the invention has the beneficial effects that:

Compared with the prior art, the method and the device have the advantages that the depth value is utilized to segment the depth image and the depth network is optimized for the segmented image, so that the matting of any type of object can be realized, any type of object can be segmented, the segmentation result is optimized, the universality is wide, the matting can be directly performed on the original background, a green screen background is not required to be provided, and a user does not need to manually add labels.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a flow diagram of a matting method based on an RGBD camera according to an embodiment of the present invention.

Fig. 2 is a block diagram of a RGBD camera based matting system in accordance with another embodiment of the invention.

Fig. 3 is another block diagram of a structural module of a RGBD camera based matting system in accordance with another embodiment of the invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved by the embodiments of the present invention more clear, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It will be understood that when an element is referred to as being "mounted" or "disposed" on another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element. In addition, the connection may be for a fixing function or for a circuit communication function.

It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are merely for convenience in describing embodiments of the invention and to simplify the description by referring to the figures, rather than to indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus are not to be construed as limiting the invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present invention, the meaning of "plurality" is two or more, unless explicitly defined otherwise.

Referring to fig. 1, fig. 1 is a flowchart illustrating a matting method based on an RGBD camera according to an embodiment of the present invention, where the method includes the following steps:

S1, obtaining a background image;

Acquiring a background image through a fixed RGBD camera under the condition that no foreground object exists in the scene; specifically, the RGBD camera is fixed, no foreground object is ensured in the scene, and the RGBD camera is started to acquire the background image of the scene.

Specifically, a target object enters a scene, and an RGBD camera is started to acquire a color image of a current frame and a depth image corresponding to the current frame.

Specifically, an effective depth range is preset, and the current frame depth image is segmented according to the effective depth range, so that a foreground soft segmentation image is obtained. For ease of understanding, the effective depth range is briefly described, and the effective depth range may be understood as a pixel corresponding to a depth value being an effective pixel when the depth value on the depth image is within the effective depth range.

S5, splicing the characteristic image of the current frame color image with the characteristic image of the background image, and outputting a first characteristic image; and splicing the characteristic image of the current frame color image with the characteristic image of the foreground soft segmentation image, and outputting a second characteristic image.

S6, constructing a matting model with an encoder-decoder structure, and inputting the first feature map, the second feature map and the feature map of the current frame color image into an encoder for splicing so as to fuse semantic features of different modes and obtain an encoding feature map; decoding the coding feature map through a decoder, and outputting foreground transparency; and fusing the foreground transparency, the current frame color image and different background images to obtain a composite image.

It can be understood that, after the scene is determined, the background image of the scene is fixed, so that the step S1 is only required to be performed once to obtain the background image of the scene; after the target enters the scene, every time the RGBD camera acquires new data (a new frame of color image and depth image), step S2-step S6 are re-executed to generate a new composite image.

More specifically, in step S3, the effective depth range may be designed according to the actual requirement, for example: and if the effective shooting range of the depth camera is set to be 1-3 meters by a user, the pixels with the depth of 1-3 meters on the depth image are considered as effective pixels, and the effective depth range is utilized to segment the depth image of the current frame, so that a foreground soft segmentation image is obtained.

In one embodiment, if the current frame depth image captured by the depth camera includes a ground, the ground is removed by applying the following steps, where S1 further includes the following steps:

s10, acquiring a first background depth image corresponding to a background image, setting an effective depth range, segmenting the first background depth image, and reserving effective pixels to obtain a second background depth image;

s11, converting the pixel points on the second background depth image into a camera coordinate system or a world coordinate system to obtain a point cloud; the point cloud comprises three-dimensional coordinates of pixel points;

s12, randomly extracting a plurality of pixel points in the point cloud to fit a plane equation, and if enough pixel points accord with the plane equation, the plane equation can represent the ground.

Still further, it is necessary to obtain a foreground soft segmentation image not including the ground, based on steps S10-S12, step S3 includes the steps of:

S31, traversing each pixel point in the point cloud obtained based on the step S12, substituting the pixel point into the plane equation obtained in the step S12, calculating the distance between the pixel point and the plane, and considering the point as the ground if the distance between the pixel point and the plane is smaller than a threshold value; if the pixel belongs to the ground, setting the pixel value of the corresponding pixel in the corresponding foreground soft segmentation image to 0, and keeping the original pixel value unchanged under the other conditions.

In step S4, three feature extraction networks designed based on a residual network (Resnet) are utilized to perform feature extraction on the current frame color image, the foreground soft segmentation image and the background image, the feature extraction networks comprise three convolution modules, each convolution module comprises a convolution layer, a batch normalization layer and a Relu activation function layer, and three images of the current frame color image, the foreground soft segmentation image and the background image are input to respectively and correspondingly obtain a feature image of the current frame color image, a feature image of the foreground soft segmentation image and a feature image of the background image.

In step S5, feature fusion is carried out on the feature image of the current frame color image and the feature image of the background image through a splicing layer so as to realize splicing, and then a first feature image is obtained; and splicing the characteristic image of the color image of the current frame with the characteristic image of the foreground soft segmentation image to obtain a second characteristic image. The feature image stitching by the stitching layer means that two feature images are stitched in the channel dimension, and the stitched feature image contains rich semantics through multiple convolution operations.

In step S6, a matting model with an encoder-decoder structure is built, the characteristic images of the current frame color image, the foreground soft segmentation image and the background image obtained in step S4 are respectively input to an encoder together with the first characteristic image and the second characteristic image in step S5, and the characteristic images of a plurality of channels are generated through the encoder; specifically, in step S6, the feature map of the color image of the current frame in step S4, the feature map of the foreground soft segmentation image and the feature map of the background image are spliced with the first feature map and the second feature map in step S5 respectively by using a convolution layer, a batch normalization layer and a ReLU activation function layer, so as to fuse semantic features of different modes, obtain a coding feature map, and transmit the coding feature map to a decoder, and the decoder decodes the coding feature map by using a decoder composed of a plurality of deconvolution layers (or an up-sampling layer+a convolution layer) to obtain detailed information reserved by the coding feature map; the detail information comprises a transparency matrix of the foreground imageAnd the color matrix F of the foreground image is fused with different background images by utilizing the detail information so as to generate a composite image, thereby completing high-quality matting and composite image operation.

In the processing, there is a case that the foreground region similar to the background color is classified as the background image, and therefore, in some embodiments, the result obtained in step S6 needs to be optimized:

Specifically, based on the depth image obtained in step S2, the following operations are performed on the determined foreground portion:

S61, selecting a3 multiplied by 3 structural matrix based on the foreground soft segmentation image obtained in the step S3, wherein elements in the matrix are 1, scanning each pixel in the foreground soft segmentation image by taking 1 as a step length, performing logical AND operation by using the structural matrix and the foreground soft segmentation image, if the values of the structural matrix and the foreground soft segmentation image are 1, outputting the pixel value of the point of the image as 1, and outputting the pixel value of the image under the other conditions as 0, wherein the process is called a corrosion process, and the foreground soft segmentation image is reduced by one circle through the corrosion process;

S62, obtaining a foreground soft segmentation image based on the step S61, selecting a 3 multiplied by 3 structural matrix, wherein elements in the matrix are 1, performing logical AND operation by using the structural matrix and the foreground soft segmentation image, if the values of the structural matrix and the foreground soft segmentation image are 0, the pixel value of the point of the output image is 0, and the pixel values of the output image under the other conditions are 1, wherein the process is called an expansion process, and expanding the foreground soft segmentation image obtained based on the step S61 by one circle through the expansion process.

The above-described corrosion-before-expansion process is called an open operation, and noise can be eliminated by the open operation, so that the object is separated at a fine point and the boundary of a large object is smoothed without significantly changing the area.

And S63, carrying out edge blurring processing on the foreground soft segmentation image obtained based on the step S62 by utilizing Gaussian filtering to obtain a second foreground soft segmentation image T.

S64, traversing each pixel in the front field depth image in the step S2 and the second foreground soft segmentation image T in the step S63, and performing elementwise-max operation on the transparency in the front field depth image and the second foreground soft segmentation image, namely taking the maximum valueTransparency is obtained。

Based on the depth image obtained in step S2, for the determined background portion, step S6 further includes the following steps:

S65, based on the step S3, segmenting the depth image outside the effective depth range according to the effective depth range set by the user to obtain a background soft segmentation image W;

S66, selecting a 3 multiplied by 3 structural matrix, wherein the elements in the matrix are 1, scanning each pixel in the background soft segmentation image by taking 1 as a step length, performing logical AND operation by using the structural matrix and the background soft segmentation image, if the values of the structural matrix and the background soft segmentation image are 1, the pixel value of the point of the output image is 1, and the pixel values of the output image under the other conditions are 0, wherein the process is called a corrosion process, and the background soft segmentation image can be reduced by one circle;

S67, on the basis of the background soft segmentation image obtained in the step S66, a3 multiplied by 3 structural matrix is selected, elements in the matrix are 1, logical AND operation is carried out on the structural matrix and the background soft segmentation image, if the values of the structural matrix and the background soft segmentation image are 0, the pixel value of the point of the output image is 0, and the pixel value of the output image is 1 under the other conditions.

S68, traversing each pixel point on the background soft segmented image W obtained in step S67, if the background soft segmented image W _i,j =1, then the step S64 is followedSetting to 0 to obtain new transparencyHere, whereFor the final required transparency.

And (3) fusing the data obtained in the step (S68) with a background image provided by a user to obtain a composite image, wherein the formula of the composite image is as follows:

wherein B is RGB color value matrix of background image, Transparency obtained for step S68And F is a color value matrix of the foreground image.

As another embodiment of the present invention, there is also provided a matting system based on an RGBD camera, referring to fig. 2, the system includes:

The image matting model module is provided with an encoder-decoder structure and is used for respectively splicing the characteristic image of the current frame color image with the characteristic image of the background image and the characteristic image of the foreground soft segmentation image and outputting a first characteristic image and a second characteristic image; and constructing a matting model with an encoder-decoder structure, respectively inputting the characteristic image of the current frame color image, the characteristic image of the background image and the characteristic image of the foreground soft segmentation image to an encoder to be spliced with the first characteristic image and the second characteristic image so as to fuse semantic characteristics of different modes, obtaining an encoding characteristic image, transmitting the encoding characteristic image to a decoder, decoding the encoding characteristic image through the decoder to output foreground transparency, and fusing the current frame color image with different background images by utilizing the foreground transparency to obtain a composite image.

Referring to fig. 3, in some embodiments, the system further includes a ground image removing module, when the current frame depth image captured by the depth camera includes a ground, the ground image is removed by the ground image removing module, so as to obtain a foreground soft segmentation image that does not include the ground.

In some embodiments, the system further comprises a background image processing module, which is used for optimizing the processing result of the matting model module with the encoder-decoder structure, solving the problem that the foreground area similar to the background color is classified as the background image in the actual processing process.

It will be appreciated that each of the above modules may be a single independent module, or may be a module in which several modules are combined together to form one module; of course, similar functions of several modules may be implemented by combining one module.

The RGBD camera-based matting system is configured to execute the RGBD camera-based matting method in the embodiment shown in fig. 1, where the specific description of each module refers to the foregoing RGBD camera matting method embodiment, and is not repeated herein.

According to the invention, the depth value is utilized to segment the depth image and the depth network optimization is carried out on the segmented image, so that the matting of any type of object can be realized, any type of object can be segmented, the segmentation result is optimized, the universality is wide, the matting can be directly carried out on the original background, the green screen background is not required to be provided, and the manual adding of labels by a user is also not required.

The embodiment of the invention also provides a storage medium for storing a computer program, which when executed, at least executes the RGBD camera-based matting method described in any one of the foregoing embodiments.

The storage medium may be implemented by any type of volatile or non-volatile storage device, or combination thereof. The nonvolatile Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), an erasable programmable Read Only Memory (EPROM, erasableProgrammable Read-Only Memory), an electrically erasable programmable Read Only Memory (EEPROM, electricallyErasable Programmable Read-Only Memory), a magnetic random Access Memory (FRAM, ferromagneticRandom Access Memory), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a compact disk Read Only Memory (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (RAM, random Access Memory) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronousStatic Random Access Memory), dynamic random access memory (DRAM, dynamic Random AccessMemory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random AccessMemory), double data rate synchronous dynamic random access memory (ddr SDRAM, double Data RateSynchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). The storage media described in embodiments of the present invention are intended to comprise, without being limited to, these and any other suitable types of memory.

It is to be understood that the foregoing is a further detailed description of the invention in connection with specific/preferred embodiments, and that the invention is not to be considered as limited to such description. It will be apparent to those skilled in the art that several alternatives or modifications can be made to the described embodiments without departing from the spirit of the invention, and these alternatives or modifications should be considered to be within the scope of the invention. In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "preferred embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention.

In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope as defined by the appended claims.

Furthermore, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. Those of ordinary skill in the art will readily appreciate that the above-described disclosures, procedures, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.

Claims

1. The RGBD camera-based image matting method is characterized by comprising the following steps of:

s1, acquiring a background image through a fixed RGBD camera under the condition that no foreground object exists in a scene;

s2, acquiring a current frame color image of a target object entering a scene and a depth image corresponding to the current frame color image;

S3, segmenting the depth image corresponding to the color image of the current frame obtained in the step S2 to obtain a foreground soft segmentation image;

S5, respectively splicing the characteristic image of the current frame color image with the characteristic image of the background image and the characteristic image of the foreground soft segmentation image in the channel dimension, and outputting a first characteristic image and a second characteristic image, wherein the first characteristic image and the second characteristic image comprise rich semantics subjected to multiple convolution operations;

2. An RGBD camera based matting method as defined in claim 1, characterised by: in step S3, an effective depth range is preset, and the depth image corresponding to the current frame color image is segmented according to the effective depth range, so as to obtain the foreground soft segmentation image.

3. An RGBD camera based matting method as defined in claim 1, characterised by: in step S4, feature extraction is performed on the current frame color image, the foreground soft segmentation image and the background image by using a feature extraction network, where the feature extraction network includes three convolution modules, and each convolution module includes a convolution layer, a batch normalization layer and Relu activation function layers.

4. A method of RGBD camera based matting as claimed in claim 3, characterised in that: in step S6, the feature map of the color image of the current frame, the feature map of the foreground soft segmentation image and the feature map of the background image are spliced with the first feature map and the second feature map respectively by using the convolution layer, the batch normalization layer and the ReLU activation function layer, so as to obtain an encoding feature map and transmit the encoding feature map to the decoder.

5. An RGBD camera based matting method as defined in claim 2, characterised by: for the obtained depth image, for the determined foreground portion, step S6 further comprises the steps of:

S61, selecting a 3 multiplied by 3 structural matrix based on the foreground soft segmentation image, performing logical AND operation by using the structural matrix and the foreground soft segmentation image, wherein if the values of the structural matrix and the foreground soft segmentation image are both 1, the corresponding point pixel of the output image is 1, otherwise, the corresponding point pixel of the output image is 0;

S62, obtaining a foreground soft segmentation image based on the step S61, selecting a3 multiplied by 3 structural matrix, performing logical AND operation by using the structural matrix and the foreground soft segmentation image, wherein if the values of the structural matrix and the foreground soft segmentation image are both 0, the corresponding point pixel of the output image is 0, otherwise, the corresponding point pixel of the output image is 1;

6. An RGBD camera based matting method as defined in claim 5, characterised by: for the obtained depth image, for the determined background portion, step S6 further includes the steps of:

S66, selecting a 3 multiplied by 3 structural matrix, performing logical AND operation by using the structural matrix and the background soft segmentation image, wherein if the values of the structural matrix and the background soft segmentation image are 1, the corresponding point pixel of the output image is 1, otherwise, the corresponding point pixel of the output image is 0;

S67, selecting a 3 multiplied by 3 structural matrix based on the background soft segmentation image obtained in the step 66, performing logical AND operation by using the structural matrix and the background soft segmentation image, wherein if the values of the structural matrix and the background soft segmentation image are both 0, the corresponding point pixel of the output image is 0, otherwise, the corresponding point pixel of the output image is 1;

7. An RGBD camera based matting method as defined in claim 1, characterised by: step S1 further includes:

8. An RGBD camera based matting method as defined in claim 7, characterised in that: step S3 further includes:

and S31, traversing each pixel point in the point cloud obtained in the step S12, substituting the pixel point into the plane equation obtained in the step S12, calculating the distance between the pixel point and the plane, judging the pixel point as the ground if the distance between the pixel point and the plane is smaller than a threshold value, and setting the pixel value of the corresponding pixel point in the corresponding foreground soft segmentation image to 0.

9. A matting system based on an RGBD camera, comprising:

the device comprises a matting model module with an encoder-decoder structure, a first segmentation module and a second segmentation module, wherein the matting model module is used for respectively splicing a characteristic image of a current frame color image with a characteristic image of a background image and a characteristic image of a foreground soft segmentation image in a channel dimension and outputting a first characteristic image and a second characteristic image, and the first characteristic image and the second characteristic image comprise rich semantics subjected to multiple convolution operations; and constructing a matting model with an encoder-decoder structure, respectively inputting the characteristic image of the color image of the current frame, the characteristic image of the background image and the characteristic image of the foreground soft segmentation image into an encoder together with the first characteristic image and the second characteristic image to be spliced to obtain an encoded characteristic image, transmitting the encoded characteristic image to a decoder, decoding the encoded characteristic image by the decoder to output foreground transparency, and fusing the foreground transparency and the color image of the current frame with different background images to obtain a composite image.

10. A storage medium storing a computer program, characterized by: the computer program when executed performs at least the RGBD camera based matting method of any of claims 1-8.