CN115222762A

CN115222762A - Green curtain image matting method and device

Info

Publication number: CN115222762A
Application number: CN202210785154.0A
Authority: CN
Inventors: 李兆歆; 靳悦; 石敏; 朱登明; 王兆其
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-10-21

Abstract

The invention provides a green curtain image matting method and a green curtain image matting device, wherein the method comprises the steps of constructing a green curtain image matting data set and a green curtain background data set; randomly selecting a first foreground image and a first Alpha image corresponding to the first foreground image from the green screen image matting dataset, and randomly selecting a green screen background image from the green screen background dataset; synthesizing the first foreground image, the first Alpha image and the green curtain background image to generate an initial green curtain image; generating a target green curtain background image by using the initial green curtain image; and inputting the initial green curtain image and the target green curtain background image into a deep learning model for training, and outputting a second foreground image and a second Alpha image corresponding to the second foreground image. The method can automatically realize the green screen image matting processing in real time without manual participation, effectively remove the phenomenon of green overflow and achieve more vivid visual effect.

Description

Green curtain image matting method and device

Technical Field

The invention relates to the technical field of image processing, in particular to a green curtain image matting method and device.

Background

The traditional green curtain image matting method comprises methods of chroma image matting, color difference image matting, brightness image matting, triangular image matting and the like, and a plurality of professional green curtain image matting software are generated by improving the traditional methods. Although the effect of professional green screen image matting software is good, the setting of parameters needs a professional to process, and a large amount of labor cost and time cost are consumed. In addition, the conventional green-curtain matting method generally divides the green-curtain matting and the green spill removal into two independent steps for processing, which is tedious. Natural image keying is keying images shot under the natural scene, does not need to consider the green problem of spilling over, and consequently the effect is unnatural when being used for green curtain keying, and few schemes can reach automatic and real-time effect moreover.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a green curtain image matting method and a green curtain image matting device, which can automatically and real-timely realize the green curtain image matting processing without manual participation, effectively remove the green overflow phenomenon and achieve more vivid visual effect.

In order to achieve the above object, an aspect of the present invention provides a green screen matting method, including:

constructing a green curtain image matting data set and a green curtain background data set, wherein the green curtain image matting data set comprises green overflow information;

randomly selecting a first foreground image and a first Alpha image corresponding to the first foreground image from the green curtain image matting dataset, wherein the first foreground image comprises green spill information;

randomly selecting a green screen background image from the green screen background data set;

synthesizing the first foreground image, the first Alpha image and the green curtain background image to generate an initial green curtain image;

generating a target green curtain background image by using the initial green curtain image;

inputting the initial green curtain image and the target green curtain background image into a deep learning model for training, and outputting a second foreground image and a second Alpha image corresponding to the second foreground image, wherein the second foreground image does not contain green overflow information.

Optionally, different green screen shooting scenes are simulated by using different green pixel information, and the green screen background data set is constructed.

Optionally, the first foreground image, the first Alpha image, and the green-curtain background image are all subjected to geometric enhancement processing and pixel enhancement method processing.

Optionally, the generated initial green curtain image is:

C＝F×α+B×(1-α)

wherein, F represents the first foreground image containing green overflow information, α represents a first Alpha image corresponding to the first foreground image, B represents the green curtain background image, and C represents the initial green curtain image.

Optionally, the generating a target green curtain background image by using the initial green curtain image includes:

converting the initial green screen image from an RGB color space to an HSV color space;

based on HSV color space, performing binary image processing on the initial green screen image to generate a binary green screen image;

synthesizing the initial green screen image and the binary green screen image to obtain an initial green screen background image, wherein the initial green screen background image comprises green pixel information and black pixel information;

and clustering the initial green screen background image to obtain the target green screen background image, wherein the target green screen background image comprises green pixel information.

Optionally, the generated binarization green curtain image is:

wherein miGreen and maGreen respectively represent a minimum threshold and a maximum threshold set by HSV color space;

synthesizing the initial green curtain image and the binarization green curtain image to obtain an initial green curtain background image which is:

B'＝C×(1-mask)

wherein C represents the initial green screen image.

Optionally, the deep learning model is a lightweight deep learning model.

Optionally, calculating a loss function for the second Alpha image,

wherein,

a first loss function is represented as a function of,

representing a second loss function, the first loss function being used in combination with the second loss function for training the second Alpha image, the first loss function being used to measure a predicted second Alpha image Alpha _i Real data corresponding to the second Alpha image

The difference between the above-mentioned two components,

the second loss function is a gradient loss,

calculating a loss function for the second foreground image,

wherein, F _i Representing the second foreground image as predicted by the second image,

and representing real data corresponding to the second foreground image.

Optionally, the green-curtain matting dataset and the green-curtain background dataset are constructed according to different classifications of an opaque class, a semi-transparent class, a transparent class, and a complex structure class.

The invention also provides a green curtain image matting device, which adopts the green curtain image matting method and at least comprises the following steps:

the data set construction module is used for constructing a green curtain image matting data set and a green curtain background data set, and the green curtain image matting data set comprises green overflow information;

the image synthesis module is used for randomly selecting a first foreground image from the green screen image matting dataset and a first Alpha image corresponding to the first foreground image, wherein the first foreground image comprises green overflow information;

and the image training module is used for inputting the initial green curtain image and the target green curtain background image into a deep learning model for training and outputting a second foreground image and a second Alpha image corresponding to the second foreground image, wherein the second foreground image does not contain green overflow information.

Another aspect of the present invention further provides a storage medium storing a computer program for executing the above-mentioned green screen matting method.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the green curtain matting method when executing the computer program.

According to the scheme, the invention has the advantages that:

the green curtain matting method provided by the embodiment of the invention can process the green curtain matting and green spill removal problems in a network training mode by constructing a method of a diversified green curtain matting data set and green curtain background data set containing green spill information. Meanwhile, a first foreground image containing green spill information and a corresponding first Alpha image are randomly selected from the green curtain image matting dataset; synthesizing the first foreground image, the first Alpha image and a green curtain background image to generate an initial green curtain image; then utilize the green curtain background image of initial green curtain image automatic generation target as auxiliary information, initial green curtain image combines with the green curtain background image of target for it is more meticulous to scratch the image quality, has realized the automation simultaneously, has avoided the needs of artifical adjustment parameter, has guaranteed under the condition that does not need artifical the participation and has scratched the image precision. Then, the initial green curtain image and the target green curtain background image are input into a built deep learning model for training, and a second foreground image and a corresponding second Alpha image which do not contain green overflow information are output, so that the two problems of green curtain image matting and green overflow removal are solved in the same model, the use of a lightweight network structure improves the processing speed while ensuring the image matting precision, and the real-time effect is achieved.

Drawings

Fig. 1 is a schematic flow chart of a green curtain image matting method provided by an embodiment of the present invention;

FIG. 2 is a detailed flowchart of step S4 of the green screen matting method provided in FIG. 1;

FIG. 3 is a comparison result of the green-curtain image matting method and the natural image matting methods CF, KNN, FBA, MODNet, LFP, BGMV2 and AIM;

FIG. 4 is a comparison result of Keylinght and Aximmetry of the green screen image matting method and professional image matting software;

FIG. 5 is a frame view of a green curtain matting device of the invention;

FIG. 6 is a schematic structural diagram of an electronic device;

wherein:

400-green curtain matting device;

401-a data set construction module;

402-an image composition module;

403-image training module;

500-an electronic device;

501, a processor;

502-memory.

Detailed Description

In order to make the aforementioned features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

As described above, in the conventional green screen image matting method, a large number of parameters need to be adjusted by a professional, the overall processing speed is low, and the steps are complicated. The green screen image matting method can be directly applied to movie and television play shooting, virtual studio, live webcast and goods taking, VR off-line experience museum and the like.

Specifically, referring to fig. 1, fig. 1 shows a schematic flow chart of the green screen image matting method provided in this embodiment;

a green screen matting method, comprising:

s1, constructing a green curtain image matting data set and a green curtain background data set, wherein the green curtain image matting data set contains green overflow information.

In this embodiment, a plurality of types of green screen materials are shot in a real green screen scene, and a foreground image with green overflow is added in the group route, so that a green screen image matting data set and a green screen background data set containing green overflow are constructed.

In specific implementation, for the green curtain image matting dataset, the green curtain image matting dataset contains green overflow information, and the green curtain image matting dataset can be constructed according to different classified diversified foreground objects of an opaque class, a semi-transparent class, a transparent class and a complex structure class. For example, opaque such as different colored clothes, characters (hairlines) in different postures, tables and chairs, etc.; semi-transparent, transparent such as gauze, mineral water bottle, glasses, etc.; complex structures such as twisted wires, mesh structures, etc.

For the green curtain background data set, different green curtain shooting scenes can be simulated by adopting different green pixel information to construct the green curtain background data set.

S2, randomly selecting a first foreground image from the green screen image matting dataset and a first Alpha image corresponding to the first foreground image, wherein the first foreground image comprises green overflow information; and

and randomly selecting a green curtain background image from the green curtain background data set.

In specific implementation, by using keyright processing, a first foreground image containing green overflow information is randomly selected from the green curtain matting dataset, and a first Alpha image corresponding to the first foreground image is generated, wherein the first foreground image containing green overflow information is a dataset newly added in this embodiment, so that a network deep learning model can conveniently learn characteristics of green overflow.

And S3, synthesizing the first foreground image, the first Alpha image and the green curtain background image to generate an initial green curtain image.

In a specific implementation, synthesizing the first foreground image, the first Alpha image, and the green curtain background image to generate an initial green curtain image is:

C＝F×α+B×(1-α)

To the removal problem that the green spills over, this embodiment is in green curtain keying data set has increased the image that has the green overflow, contains the first foreground image of green overflow information promptly, through set up its and green curtain background data and predict the mode of Alpha picture and foreground picture simultaneously for can learn the green characteristics that spill over and carry out effectual getting rid of to it, thereby reach lifelike effect in the vision.

In addition, the first foreground image, the first Alpha image, and the green-curtain background image may be subjected to geometric enhancement processing and pixel enhancement method processing in advance, so as to enhance diversity of data sets, so as to facilitate subsequent synthesis and generate an initial green-curtain image. Wherein the geometric enhancement processing comprises operations of up-down turning, left-right turning, rotation, translation, scaling and the like; the pixel enhancement method comprises operations of contrast adjustment, saturation adjustment, chrominance adjustment, brightness adjustment and the like.

And S4, generating a target green screen background image by using the initial green screen image.

In a specific implementation, as shown in fig. 2, fig. 2 shows a specific flowchart of step S4, and the generating of the target green-curtain background image by using the initial green-curtain image specifically includes:

s41, converting the initial green screen image from an RGB color space to an HSV color space;

s42, based on the HSV color space, performing binary image processing on the initial green screen image to generate a binary green screen image, wherein the generated binary green screen image is as follows:

wherein, miGreen and maGreen respectively represent the minimum threshold and the maximum threshold set by the HSV color space.

S43, synthesizing the initial green screen image and the binary green screen image to obtain an initial green screen background image, wherein the initial green screen background image comprises green pixel information and black pixel information, and the obtained initial green screen background image is as follows:

B'＝C×(1-mask)

wherein C represents the initial green screen image.

And S44, clustering the initial green curtain background image to obtain the target green curtain background image, wherein the target green curtain background image comprises green pixel information. In this embodiment, two types of images may be specifically clustered by using a K-means clustering method, and one type of background image including green pixel information is selected as the target green-screen background image B ″.

In the embodiment, the initial green screen image is utilized to automatically generate the target green screen background image, so that the fine degree of green screen image matting is effectively improved, and manual adjustment in the traditional method is avoided.

And S5, inputting the initial green curtain image and the target green curtain background image into a deep learning model for training, and outputting a second foreground image and a second Alpha image corresponding to the second foreground image, wherein the second foreground image does not contain green overflow information.

In a specific implementation, the initial green-shade image C and the target green-shade background image B ″ are input together into a constructed deep learning model for training, wherein a deep learning model structure adopts a lightweight deep learning model and consists of an encoder, an ASPP module, a jump connection and a decoder, the encoder adopts a lightweight MobileNetV2 model to extract image features, the lightweight MobileNetV2 model is used for carrying out lightweight processing on a standard MobileNetV2 model, in this embodiment, expansion is used on the last module of the standard MobileNetV2 model to keep output stride as 16, and a classifier module originally used for classification is deleted. The ASPP module refers to a hole space convolution pooling pyramid, which samples given input with hole convolutions of different sampling rates in parallel, which is equivalent to capturing the context of an image in multiple proportions, i.e., increasing the network receptive field without changing the resolution, thereby enhancing the network's ability to obtain multi-scale context. The skip concatenation enables the decoder to obtain both high-order semantic features and low-order detail features, by returning all resolution feature maps for use by the decoder in the forward method of MobileNetV 2. The decoder fuses the high-low order features and simultaneously predicts Alpha and removes the foreground map of green overflow. The decoder network has four convolutional layers, each except the last layer being followed by a BN layer and a ReLU activation function. Bilinear upsampling is used before each convolutional layer and concatenated with the skip-join feature from the encoder.

In the embodiment, the light-weight deep learning model is adopted, so that the model greatly improves the processing speed while keeping the image matting precision, and the requirement of real-time property is met.

In addition, in this embodiment, by calculating the loss function of the second Alpha image and the loss function of the second foreground image, and by combining the loss function of the second Alpha image and the loss function of the second foreground image, model training is performed to evaluate the second Alpha image and remove the foreground image after green overflow, that is, the second foreground image, so that real-time automatic green-curtain image matting processing that can remove green overflow is realized.

Wherein the loss function of the second Alpha image comprises two parts, namely

The first loss function is represented as a function of,

The difference between the above-mentioned two components,

the second loss function is a gradient loss,

the loss function of the second foreground image is expressed as:

wherein, F _i Representing the second foreground image as predicted,

and representing real data corresponding to the second foreground image.

In summary, the green-curtain matting method provided by this embodiment is based on a deep learning method, and a method for constructing a diversified green-curtain matting data set and green-curtain background data set containing green-spill information is used, so that the green-curtain matting and green-spill removal problems can be handled in a network training manner. Meanwhile, a first foreground image containing green overflow information and a corresponding first Alpha image are randomly selected from the green screen image matting dataset; synthesizing the first foreground image, the first Alpha image and a green curtain background image to generate an initial green curtain image; then utilize initial green curtain image automatic generation target green curtain background image as auxiliary information, initial green curtain image combines with target green curtain background image for it is more meticulous to scratch the image quality, has realized the automation simultaneously, has avoided the needs of artifical adjustment parameter, has guaranteed under the condition that does not need artifical the participation and has scratched the image precision. Then, the initial green curtain image and the target green curtain background image are input into a built deep learning model for training, and a second foreground image and a corresponding second Alpha image which do not contain green overflow information are output, so that the two problems of green curtain image matting and green overflow removal are solved in the same model, the use of a lightweight network structure improves the processing speed while ensuring the image matting precision, and the real-time effect is achieved. Compared with the existing green screen image matting method, the green screen image matting method provided by the invention has better performance in challenging scenes such as uneven light emission and the like, is automatic and real-time, does not need manual participation, effectively removes the green overflow phenomenon, achieves a more vivid effect in vision, does not need trimap or background as additional input, and has higher efficiency.

Next, the green screen matting method provided by this embodiment is compared with the existing natural image matting methods CF, KNN, FBA, MODNet, LFP, BGMV2, and AIM, and compared with the existing professional matting software keyright and aximemery, so as to perform effect verification.

Specifically, the results of comparison with the natural image matting methods CF, KNN, FBA, MODNet, LFP, BGMV2, and AIM are shown in fig. 3. 5 human and semi-transparent real green screen images were specifically selected to demonstrate the effectiveness of the proposed method. The CF predicts fine boundary details but does not perform well in areas with small holes. KNN performs better than CF, but may also fail to matte in some translucent scenes. MODNet cannot image other types because it only trains on the portrait dataset. These three methods all do not work well in predicting the foreground because they do not consider one of the key issues in green screen matting: the green color overflows. Although AIM is an automatic matting method, it usually predicts rough edges. BGMv2 does not perform well due to the weak generalization ability to the input background and the inability to remove green spillover. The FBA and LFP methods generally perform better in predicting Alpha maps. However, their ability to remove green bleed-over is very limited. In contrast, the green curtain matting method provided by the invention can show more visually real matting performance.

The comparison result with the existing professional matting software keyright and aximemry is shown in fig. 4. As professional green screen image matting software is well represented in a general green screen scene, 3 pieces of challenging green screen data are selected for comparison, wherein human body images with different types of hair are displayed, and as can be seen from the figure, the green screen image matting method provided by the invention can obtain more hair details than the professional green screen image matting software. Professional green screen matting software works poorly in some cases because they try to adjust parameters to get a clean background, which can lead to coarse edge details. The method realizes green screen image matting and green overflow removal based on a deep learning method, firstly creates a green screen image matting data set and a green screen background data set containing green overflow, then inputs an initial green screen image and an automatically generated target green screen background image into a built network model, and simultaneously outputs and outputs a second foreground image not containing green overflow information and a corresponding second Alpha image, thereby simultaneously solving two problems of green screen image matting and green overflow removal.

The above embodiment of the present invention may be applied to a terminal device with a function of a green screen image matting method, where the terminal device may include a personal terminal, an upper computer terminal, and the like, and the embodiment of the present invention is not limited thereto. The terminal can support operating systems such as Windows, android (Android), IOS and Windows Phone.

Referring to fig. 5, fig. 5 shows a green curtain image matting device 400, which is applied to a green curtain image matting method that can be applied to a personal terminal and an upper computer terminal device, and can implement the method shown in fig. 1 and fig. 2, and the green curtain image matting device provided in the embodiment of the present application can implement each process implemented by the green curtain image matting method.

A green curtain image matting device 400 adopting the above green curtain image matting method at least comprises:

a data set constructing module 401, configured to construct a green-curtain matting data set and a green-curtain background data set, where the green-curtain matting data set includes green-overflow information;

an image synthesis module 402, configured to randomly select a first foreground image from the green-curtain matting dataset and a first Alpha image corresponding to the first foreground image, where the first foreground image includes green-overflow information;

the image training module 403 is configured to input the initial green screen image and the target green screen background image into a deep learning model for training, and output a second foreground image and a second Alpha image corresponding to the second foreground image, where the second foreground image does not include green spill information.

The green curtain image matting device provided by the embodiment can automatically and real-timely realize the green curtain image matting processing without manual participation, effectively remove the green overflow phenomenon, achieve more vivid effect in vision, and has higher efficiency.

It should be understood that the descriptions of the green matting method are equally applicable to the green matting device 400 according to the embodiments of the application, and are not described in detail to avoid repetition.

Furthermore, it should be understood that in the green screen matting device 400 according to the embodiment of the present application, only the division of the above functional modules is illustrated, and in practical applications, the above function distribution can be completed by different functional modules according to needs, that is, the green screen matting device 400 can be divided into different functional modules from the above illustrated modules to complete all or part of the above described functions.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

As shown in fig. 6, an electronic device 500 is further provided in the embodiment of the present application, and includes a processor 501, a memory 502, and a program or an instruction stored in the memory 502 and executable on the processor 501, where the program or the instruction when executed by the processor 501 implements the steps of the above-mentioned green-curtain matting method, and can achieve the same technical effects.

It should be noted that the electronic devices in the embodiments of the present application may include mobile electronic devices and non-mobile electronic devices.

The embodiment of the application also provides a readable storage medium, wherein a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the steps of the green curtain image matting method are realized, and the same technical effect can be achieved.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatuses in the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions recited, e.g., the methods described may be performed in an order different from that described, and various steps may be applied, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A green curtain image matting method is characterized by comprising the following steps:

2. The method of claim 1,

and simulating different green screen shooting scenes by adopting different green pixel information to construct the green screen background data set.

3. The method of claim 1, further comprising:

and performing geometric enhancement processing and pixel enhancement method processing on the first foreground image, the first Alpha image and the green curtain background image.

4. The method of claim 1, wherein the initial green screen image generated is:

C＝F×α+B×(1-α)

5. The method of claim 1 or 4, wherein generating a target green curtain background image using the initial green curtain image comprises:

6. The method according to claim 5, wherein the generated binary green screen image is:

wherein miGreen and maGreen respectively represent a minimum threshold and a maximum threshold set by an HSV color space;

synthesizing the initial green curtain image and the binaryzation green curtain image to obtain an initial green curtain background image which is:

B'＝C×(1-mask)

wherein C represents the initial green screen image.

7. The method of claim 1, wherein the deep learning model is a lightweight deep learning model.

8. The method of claim 1, further comprising:

calculating a loss function for the second Alpha image,

wherein,

the first loss function is represented as a function of,

The difference between the above-mentioned two components,

the second loss function is a gradient loss,

calculating a loss function for the second foreground image,

wherein, F _i Representing the second foreground image as predicted,

and representing real data corresponding to the second foreground image.

9. The method of claim 1,

and constructing a green curtain matting data set and a green curtain background data set according to different classifications of an opaque class, a semi-transparent class, a transparent class and a complex structure class.

10. A green screen matting device, characterized in that the green screen matting method of any one of claims 1 to 9 is adopted, the device at least comprises:

and the image training module is used for inputting the initial green curtain image and the target green curtain background image into a deep learning model for training and outputting a second foreground image and a second Alpha image corresponding to the second foreground image, wherein the second foreground image does not contain green spill information.