WO2023159746A1

WO2023159746A1 - Image matting method and apparatus based on image segmentation, computer device, and medium

Info

Publication number: WO2023159746A1
Application number: PCT/CN2022/089507
Authority: WO
Inventors: 郑喜民; 张祎頔; 舒畅; 陈又新
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-02-23
Filing date: 2022-04-27
Publication date: 2023-08-31
Also published as: CN114529574A

Abstract

Embodiments of the present application relate to the technical field of artificial intelligence, and relate to an image matting method based on image segmentation. The method comprises: inputting an obtained training image set into a pre-constructed initial image matting model, wherein the image matting model comprises an image segmentation layer and an image matting layer; segmenting images in the training image set by means of the image segmentation layer to obtain a set of preliminarily segmented images; inputting the set of preliminarily segmented images into the image matting layer to obtain finely segmented images; determining a target loss function on the basis of the finely segmented images, performing iterative updating on the initial image matting model according to the target loss function, and outputting a trained image matting model; and inputting a target image into the image matting model to obtain a matting result. The present application further provides an image matting apparatus based on image segmentation, a computer device, and a medium. In addition, the present application further relates to a blockchain technology, and the target image can be stored in a blockchain. The present application can improve the precision and accuracy of image matting.

Description

Image matting method, device, computer equipment and medium based on image segmentation

This application claims the priority of the Chinese patent application submitted to the China Patent Office on February 23, 2022, with the application number 202210168421.X, and the title of the invention is "image matting method, device, computer equipment and media based on image segmentation", The entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the technical field of artificial intelligence, and in particular to an image segmentation-based image matting method, device, computer equipment and media.

Background technique

Image matting means that for a given image, the network can separate its foreground area and background area. It is an important topic in the field of computer vision and is widely used in video conferencing, image editing, and post-production scenarios. . At present, image matting technology usually uses additional input, such as trimap, background image, etc., to generate a mask through additional input, and use the mask to extract the matting object.

However, the inventors found that the generation of additional input often requires manual intervention, such as the generation of prior masks such as trimap; sometimes the acquisition of additional input is not always feasible, such as the acquisition of a complete background image, which makes image matting time-consuming At the same time, it causes low efficiency of image matting and inaccurate matting results.

Contents of the invention

The purpose of the embodiments of the present application is to propose an image segmentation-based image matting method, device, computer equipment, and storage medium to solve the problems of time-consuming and laborious image matting, low image matting efficiency, and inaccurate matting results in related technologies. technical problem.

In order to solve the above technical problems, the embodiment of the present application provides an image matting method based on image segmentation, which adopts the following technical solution:

Obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the initial image matting model includes an image segmentation layer and an image matting layer;

Segmenting images in the training image set by the image segmentation layer to obtain a preliminary segmented image set;

Inputting the preliminary segmented image set to the image matting layer to obtain a refined segmented image;

Determining a target loss function based on the refined segmented image, iteratively updating the initial image matting model according to the target loss function, and outputting a trained image matting model;

A target image is acquired, and the target image is input into an image matting model to obtain a matting result.

In order to solve the above technical problems, the embodiment of the present application also provides an image matting device based on image segmentation, which adopts the following technical solutions:

The acquisition module is used to obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer;

A preliminary segmentation module, configured to segment images in the training image set through the image segmentation layer to obtain a preliminary segmented image set;

A refined segmentation module, configured to input the preliminary segmented image set to the image matting layer to obtain a refined segmented image;

A training module, configured to determine a target loss function based on the refined segmented image, iteratively update the initial image matting model according to the target loss function, and output a trained image matting model;

The image matting module is used to obtain a target image, input the target image into the image matting model, and obtain a matting result.

In order to solve the above technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:

The computer device includes a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the steps of the image matting method based on image segmentation are implemented as follows:

In order to solve the above technical problems, the embodiment of the present application also provides a computer-readable storage medium, which adopts the following technical solution:

Computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the image matting method based on image segmentation are implemented as follows:

Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:

The application obtains the training image set, and inputs the training image set into the pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer; the images in the training image set are segmented through the image segmentation layer , to obtain a preliminary segmented image set; input the preliminary segmented image set to the image matting layer to obtain a refined segmented image; determine the target loss function based on the refined segmented image, and iteratively update the initial image matting model according to the target loss function, and output The image matting model that has been trained; obtain the target image, input the target image into the image matting model, and obtain the matting result; this application uses the image segmentation layer in the trained image matting model to initially segment the image, and then The output image matting layer of the segmentation result is further refined and segmented, which can realize image matting without any additional input, and completely avoids manual intervention, achieves complete automatic matting, improves the efficiency of image matting, and at the same time, through the image The matting model can achieve more accurate matting results, further improving the precision and accuracy of image matting.

Description of drawings

In order to illustrate the solution in this application more clearly, a brief introduction will be given below to the accompanying drawings that need to be used in the description of the embodiments of the application. Obviously, the accompanying drawings in the following description are some embodiments of the application. Ordinary technicians can also obtain other drawings based on these drawings on the premise of not paying creative work.

FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;

Fig. 2 is the flowchart of an embodiment of the image matting method based on image segmentation according to the present application;

FIG. 3 is a flow chart of a specific implementation of step S203 in FIG. 2;

Fig. 4 is the flow chart of another embodiment of the image matting method based on image segmentation according to the present application;

Fig. 5 is a schematic structural diagram of an embodiment of an image matting device based on image segmentation according to the present application;

Fig. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.

Detailed ways

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of the application; the terms used herein in the description of the application are only to describe specific embodiments The purpose is not to limit the present application; the terms "comprising" and "having" and any variations thereof in the specification and claims of the present application and the description of the above drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.

Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings.

This application provides an image matting method based on image segmentation, which involves artificial intelligence and can be applied to the system architecture 100 shown in Figure 1. The system architecture 100 can include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 is used as a medium for providing communication links between the

terminal devices

101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

Users can use

terminal devices

101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like. Various communication client applications can be installed on the

terminal devices

101, 102, 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.

Terminal devices

101, 102, 103 can be various electronic devices with display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4) player, laptop portable computer and desktop computer, etc.

The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the

terminal devices

101 , 102 , 103 .

It should be noted that the image segmentation-based image matting method provided in the embodiment of the present application is generally executed by a server/terminal device, and correspondingly, the image segmentation-based image matting device is generally set in the server/terminal device.

It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

Continuing to refer to FIG. 2 , a flow chart of an embodiment of an image matting method based on image segmentation according to the present application is shown, including the following steps:

Step S201, acquiring a training image set, and inputting the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer.

In this embodiment, the pre-built initial image matting model includes an image segmentation layer and an image matting layer, wherein the image segmentation layer can adopt Double DIP (Deep-Image-Priors, depth image prior) network, Double DIP network Use two DIP networks to divide the input image into a foreground layer and a background layer; the backbone of the image matting layer uses the U-Net network for encoding and decoding, and an auxiliary output layer is added to the base layer of the decoding part for deep supervision and progressive attention The Progressive Attention Refinement Module (PAR), uses the intermediate layer output of the decoder to perform layer-by-layer refinement to obtain the final accurate mask to obtain an accurate segmented image.

In this embodiment, the training image set can be obtained from a public data set, for example, the Alphamatting data set, the Alphamatting data set contains 27 training images and 8 test images, and these images all have standard results of foreground and background after matting Then, the foreground images of these images are combined with 500 indoor scene images and 500 outdoor scene images respectively, and the combined images are rotated at three different angles, and the obtained images are used as training image sets and test images It can also be generated according to the obtained original pictures, specifically, obtain the original pictures (for example, portrait pictures, product pictures, environment pictures, animal pictures, vehicle pictures, etc.), calculate the signal-to-noise ratio corresponding to each original picture, and The original picture is filtered according to the signal-to-noise ratio, and the salient foreground in the filtered original picture is marked, so as to generate a training data set based on the marked original picture.

Step S202, segment the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set.

In this embodiment, the image segmentation layer divides the training image in the training image set into the foreground layer and the background layer, and mixes the foreground layer and the background layer through a mask to obtain a reconstructed image, which is the preliminary segmented image .

Specifically, the training images in the training image set are input into the image segmentation layer, a DIP network is assigned to each layer, each DIP network inputs a random noise z _i , and the foreground layer y is obtained by calculating y _i =DIP(z _i ) ₁ and the background layer y ₂ , the two layers of the foreground layer and the background layer are fused through the mask m to obtain the reconstructed image I', the formula is as follows:

I'＝my ₁ +(1-m)y ₂

It should be understood that the image segmentation layer is pre-trained.

Step S203, inputting the preliminary segmented image set into the image matting layer to obtain a refined segmented image.

In this embodiment, the image matting layer at least includes an encoder, a decoder, a progressive attention refinement layer (Progressive Attention Refinement, PAR) and an auxiliary output layer (Output), and the preliminary segmented image is input into the image matting layer for refinement. Segmentation to obtain a refined segmented image.

In some optional implementations, the above steps of inputting the preliminary segmented image set to the image matting layer to obtain the refined segmented image include:

Step S301, inputting the preliminary segmented image set to the encoder for feature extraction to obtain encoded features.

Wherein, the encoder includes a plurality of convolutional neural network layers (Convolution neural network, CNN) and a downsampling layer, and the downsampling layer can be a maximum pooling layer (max-pooling). The spatial features are extracted through the CNN layer to generate a feature map, and the max-pooling layer downsamples the feature map to retain strong features.

Optionally, the encoder includes 5 convolutional layers and 4 downsampling layers, and the 5 convolutional layers are respectively the first coded convolutional layer, the second coded convolutional layer, the third coded convolutional layer, and the fourth coded convolutional layer. Convolutional layer and the fifth coded convolutional layer, between the first coded convolutional layer and the second coded convolutional layer, between the second coded convolutional layer and the third coded convolutional layer, between the third coded convolutional layer and Between the fourth coded convolutional layer and between the fourth coded convolutional layer and the fifth coded convolutional layer, there is a downsampling layer, respectively the first downsampling layer, the second downsampling layer, and the third downsampling layer. sampling layer and a fourth downsampling layer.

The preliminary segmented images in the preliminary segmented image set pass through the first encoding convolution layer, the first down-sampling layer, the second encoding convolution layer, the second down-sampling layer, the third encoding convolution layer, the third down-sampling layer, the second The four-coded convolutional layer, the fourth down-sampling layer, and the fifth coded convolutional layer perform feature extraction to obtain coded features.

It should be understood that the convolution kernel and the convolution step size of the encoder convolution layer can be set according to actual conditions.

In step S302, the coded features are decoded by a decoder, and the decoded features are output.

In this embodiment, the decoder is composed of a plurality of decoding modules, and each decoding module includes a plurality of up-sampling layers (Up-sampling layer) and a CNN layer. After each decoding, the size of the feature map is increased by the corresponding multiple, and after multiple decodings, the feature map with the same size as the original input preliminary segmented image is obtained, that is, the decoding feature. Furthermore, the decoded features after each decoding are concatenated with the corresponding encoded features of the same size in the encoding stage to fuse low-level and high-level features.

Optionally, the decoder includes 5 convolutional layers and 4 upsampling layers, and the 5 convolutional layers are respectively the first decoding convolutional layer, the second decoding convolutional layer, the third decoding convolutional layer, and the fourth decoding The convolutional layer and the fifth decoding convolutional layer, between the first decoding convolutional layer and the second decoding convolutional layer, between the second decoding convolutional layer and the third decoding convolutional layer, between the third decoding convolutional layer and Between the fourth decoding convolutional layer and between the fourth decoding convolutional layer and the fifth decoding convolutional layer, there is an upsampling layer, which are respectively the first upsampling layer, the second upsampling layer, and the third upsampling layer. sampling layer and a fourth upsampling layer.

Input the encoded features into the decoder, and pass through the first decoding convolutional layer, the first upsampling layer, the second decoding convolutional layer, the second upsampling layer, the third decoding convolutional layer, the third upsampling layer, the fourth The decoding convolutional layer, the fourth upsampling layer and the fifth decoding convolutional layer are decoded to obtain the decoded features.

It should be understood that the convolution kernel and the convolution step size of the convolution layer of the decoder can also be set according to actual conditions.

Step S303, input the decoded feature into the auxiliary output layer to obtain the output feature.

In this embodiment, each convolutional layer of the decoder is connected with an auxiliary output layer, which is used to perform convolution pooling operation on the output features, and more feature information of the image has been preserved.

Step S304, perform attention calculation on the output features through the progressive attention refinement layer, obtain attention features, and output a refined segmented image according to the attention features.

In this embodiment, the progressive attention layer is respectively connected to the auxiliary output layer of the previous layer, the current decoding auxiliary output layer connected to the current decoding convolution layer, and the auxiliary output layer corresponding to the progressive attention layer (current layer auxiliary output layer) , specifically, the input of the progressive attention layer is the output of the auxiliary output layer of the previous layer and the auxiliary output layer of the current decoding. After the attention operation is performed, the output is output through the auxiliary output layer of the current layer.

It should be noted that the first decoding convolutional layer does not have a corresponding progressive attention layer connected to it. If the first decoding convolutional layer is the current layer, the auxiliary output layer connected to it is not only the current decoding output layer, but also equivalent to the current layer output layer.

The progressive attention layer includes at least an encoding layer, an attention convolution layer, a first fusion layer, a softmax layer, a second fusion layer, a connection layer, and a decoding layer, and the output features are sequentially passed through the encoding layer, attention convolution layer, first The fusion layer, the softmax layer, the second fusion layer, the connection layer and the decoding layer perform corresponding calculations, and output more accurate refined segmentation images.

In step S204, the target loss function is determined based on the refined and segmented image, the initial image matting model is iteratively updated according to the target loss function, and the trained image matting model is output.

In this embodiment, the training image set is input into the initial image matting model for training. After a round of training, the target loss function of the initial image matting model is calculated to obtain the loss function value, and the model parameters are adjusted according to the loss function value , continue iterative training, and the model is trained to a certain extent. At this time, the performance of the model reaches the optimal state, and the value of the loss function cannot continue to decrease, that is, convergence. The method of judging the convergence only needs to calculate the loss function value in the previous two rounds of iterations. If the loss function value is still changing, continue to select the training image and input it to the image matting model after adjusting the model parameters to improve the image matting model. Continue iterative training; if the loss function value does not change significantly, the model can be considered converged. After the model converges, the final image matting model is output.

Step S205, acquiring a target image, inputting the target image into the image matting model, and obtaining a matting result.

In this embodiment, the acquired target image is input into the trained image matting model to perform a matting operation, and a corresponding matting result can be obtained.

It should be emphasized that, in order to further ensure the privacy and security of the target image, the above target image can also be stored in a block chain node.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

In this application, the image is initially segmented through the image segmentation layer in the trained image matting model, and then the preliminary segmentation result is output to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and It completely avoids manual intervention, achieves complete automatic matting, and improves the efficiency of image matting. At the same time, more accurate matting results can be achieved through the image matting model, further improving the accuracy and accuracy of matting.

In some optional implementations of this embodiment, before the above step of segmenting the images in the training image set through the image segmentation layer to obtain the preliminary segmented image set, it also includes:

Pre-train the image segmentation layer to obtain the pre-trained image segmentation layer, including:

Input the obtained original image set into the image segmentation layer, and output the reconstructed image;

The first loss function is obtained according to the reconstructed image, the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained image segmentation layer is output.

In this embodiment, the acquisition channel of the original image set may be the same as that of the above training data set, or it may be different, and it is selected according to actual needs.

The original image in the original image set is input into the image segmentation layer, which is divided into the foreground layer y ₁ and the background layer y ₂ , and the foreground layer y ₁ and the background layer y ₂ are fused through the initial mask m ₀ to obtain the reconstructed image I'. Wherein, the initial mask m ₀ can be preset or randomly generated, and the random generation is obtained according to input random noise.

In some optional implementations, the first loss function is obtained according to the reconstructed image, the image segmentation layer is iteratively updated based on the first loss function, and the step of outputting the pre-trained image segmentation layer includes:

The reconstruction loss is calculated from the reconstructed image and the original image in the original image set;

determining a first loss function according to the reconstruction loss;

Adjusting segmentation parameters of the image segmentation layer based on the first loss function;

When the iteration end condition is met, output the pre-trained image segmentation layer according to the segmentation parameters.

Among them, the first loss function is calculated by the following formula:

Loss _DDIP ＝Loss _Reconst +β·Loss _Excl +γ·Loss _Reg

Among them, Loss _Reconst is the reconstruction loss, Loss _Reconst =||I-I'||, I is the original image; Loss _Excl is a mutually exclusive loss, which minimizes the correlation between the gradients of y ₁ and y ₂ ; Loss _Reg ＝(Σ _x |m(x)-0.5|) ^-1 , Loss _Reg is a regularization loss, which is mainly used to constrain the fusion mask (mask), and is used to binarize the foreground initial mask m ₀ , β, γ are preset weighting parameters.

Calculate the first loss function, adjust the segmentation parameters of the image segmentation layer based on the first loss function, and continue iterative training. The image segmentation layer is trained to a certain extent. At this time, the performance of the image segmentation layer reaches the preset state, and the preset state can be It is convergence, or the loss function value of the first loss function reaches the preset threshold, which means that the iteration end condition is satisfied, the image segmentation layer pre-training is completed, and the image segmentation layer is output according to the segmentation parameters at the end of the iteration.

In this embodiment, by pre-training the image segmentation layer, the initial mask m ₀ of the image segmentation layer is adjusted. After the pre-training is completed, the mask m of the pre-trained image segmentation layer is obtained, and the mask m remains Learning adjustments will be made to obtain a more accurate foreground mask m for subsequent image matting layers.

In some optional implementations of this embodiment, the above step of performing attention calculation on the output features through the progressive attention refinement layer, and obtaining the attention features includes:

Step S401, upsampling the output features output by the previous auxiliary output layer to the same size as the output features of the current auxiliary output layer to obtain the upsampled output features, and calculating an uncertain region mask based on the upsampled output features.

Specifically, the output feature α _l-1 output by the previous auxiliary output layer is upsampled to the same size as the output feature α _l of the current auxiliary output layer, and then the following transformation formula is used:

Where f _α→m (x, y) is the transformation formula for obtaining the mask m of the uncertain region from the α mask at point (x, y) of the image, and α _l is the α mask of the auxiliary output layer l of the current layer. pass

The uncertainty region mask m _l-1 output by the previous auxiliary output layer and the uncertainty region mask m _l output by the current auxiliary output layer can be obtained.

Step S402, perform feature extraction on the mask of the uncertain region to obtain the feature of the uncertain region, and perform attention calculation on the feature of the uncertain region to obtain the attention score.

The uncertain region mask m _l-1 output by the previous auxiliary output layer and the uncertain region mask m _l output by the auxiliary output layer of the current layer are extracted through the encoding layer composed of CNN respectively, and the obtained

and

Then use the following formula to calculate and obtain the attention score, which acts as an optimization trend and corrects the uncertain region characteristics output by the auxiliary output layer of the current layer

The formula for calculating the attention score is as follows:

in,

is a 1x1 convolution operation,

is the uncertainty region feature of layer l-1,

is the uncertainty region feature of layer l, and X is the calculated optimization trend attention score.

Step S403, correcting the feature of the uncertain region according to the attention score, and obtaining the corrected feature of the uncertain region as the attention feature.

The following formula is used to correct the characteristics of the uncertain region:

in,

is a 1x1 convolution operation,

is the feature of the uncertain region of the corrected layer l.

In this embodiment, attention features are obtained by correcting uncertain region features through attention calculations, which can ensure more accurate subsequent refinement and matting.

In some optional implementation manners, the above step of outputting the refined segmented image according to the attention feature includes:

Feature extraction is performed on the output features of the auxiliary output layer of the current layer to obtain the extracted features, and the attention features are spliced with the extracted features to obtain the spliced features;

Decode the concatenated features and output a refined segmented image.

Specifically, the output α _l of the auxiliary output layer of the current layer is extracted through the coding layer composed of CNN, and the extracted feature F _α is combined with the obtained corrected attention feature of the auxiliary output layer of the current layer, as follows:

in,

is the attention feature, F _α is the feature extracted by the mask of the current auxiliary output layer α _l through the encoding layer, and F _α ' is the modified feature of the mask of the current auxiliary output layer α, that is, the stitching feature.

The spliced feature F _α ' is decoded by a decoding layer composed of CNN, and a refined segmented image is obtained.

In some optional implementations of this embodiment, the target loss function is determined based on the refined segmented image, the initial image matting model is iteratively updated according to the target loss function, and the step of outputting the trained image matting model includes:

Obtaining a second loss function according to the refined segmentation image;

determining a target loss function based on the first loss function and the second loss function;

Adjust the model parameters of the initial image matting model according to;

When the iteration end condition is satisfied, a trained image matting model is generated according to the model parameters.

Specifically, the calculation formula of the second loss function is as follows:

Among them, ω _l is a hyperparameter; α _gt represents the output truth value of the auxiliary output layer of the current layer; Loss _l represents the loss of the uncertain region of the auxiliary output layer of the current layer, including L1 loss, combination loss and Laplacian loss, the formula for:

Loss _l (α _gt ·m _l ,α _l ·m _l )＝Loss _L1 (α _gt ·m _l ,α _l ·m _l )+Loss _comp (α _gt ·m _l ,α _l ·m _l )+Loss _lap (α _gt m _l ,α _l m _l )

The target loss function is determined based on the first loss function and the second loss function, and the calculation formula of the target loss function is as follows:

Loss＝δLoss _DDIP +ε·Loss _Matting

Among them, δ and ε are preset weighting parameters.

In this embodiment, the model parameters are adjusted according to the value of the loss function, and the iterative training is continued until the model is trained to a certain extent. At this time, the performance of the model reaches the optimal state, and the value of the loss function cannot continue to decrease, that is, it converges.

Satisfying the iteration end condition is the model convergence. After the model converges, the final image matting model is output according to the final adjusted model parameters.

In this embodiment, by training the pre-built image matting model, the matting accuracy and accuracy of the image matting model can be improved.

The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

The application can be used in numerous general purpose or special purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc. This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the program is executed, it may include the procedures of the embodiments of the above-mentioned methods. Wherein, the aforementioned storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).

It should be understood that although the various steps in the flow charts of the drawings are shown sequentially according to the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

Further referring to FIG. 5 , as an implementation of the above-mentioned method shown in FIG. 2 , the present application provides an embodiment of an image matting device based on image segmentation, which corresponds to the method embodiment shown in FIG. 2 , the device can be specifically applied to various electronic devices.

As shown in FIG. 5 , the image matting device 500 based on image segmentation in this embodiment includes: an acquisition module 501 , a preliminary segmentation module 502 , a refined segmentation module 503 , a training module 504 and a matting module 505 . in:

The obtaining module 501 is used to obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer.

The preliminary segmentation module 502 is used to segment the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set;

The refinement and segmentation module 503 is used to input the preliminary segmentation image set to the image matting layer to obtain a refinement segmentation image;

The training module 504 is configured to determine a target loss function based on the refined segmented image, iteratively update the initial image matting model according to the target loss function, and output the trained image matting model;

The matting module 505 is used to acquire a target image, and input the target image into the image matting model to obtain a matting result.

The above-mentioned image segmentation-based image matting device performs preliminary segmentation on the image through the image segmentation layer in the trained image matting model, and then outputs the preliminary segmentation result to the image matting layer for further refinement and segmentation, which can realize without any additional Input is used for image matting, and human intervention is completely avoided to achieve complete automatic matting, which improves the efficiency of image matting. At the same time, through the image matting model, more accurate matting results can be achieved, improving the accuracy and Accuracy.

In some optional implementations of this embodiment, the preliminary segmentation module 502 includes a reconstruction submodule and a pre-training submodule, wherein the reconstruction submodule is used to input the obtained original image set into the image segmentation layer, and output the reconstructed submodule. Constructing an image; the pre-training submodule is used to obtain a first loss function according to the reconstructed image, iteratively update the image segmentation layer based on the first loss function, and output the pre-trained image segmentation layer.

In this embodiment, the initial mask of the image segmentation layer is adjusted by pre-training the image segmentation layer. After the pre-training is completed, the mask of the pre-trained image segmentation layer is obtained to ensure that a more accurate foreground mask is obtained in the subsequent training process. Membrane, used for subsequent image matting layers for matting.

In this embodiment, the refinement and segmentation module 503 includes a feature extraction submodule, a decoding submodule, an output submodule, and an attention submodule, wherein:

The feature extraction submodule is used to input the preliminary segmented image set to the encoder for feature extraction to obtain encoded features;

The decoding submodule is used to decode the encoded features through the decoder, and output the decoded features;

The output sub-module is used to input the decoding features into the auxiliary output layer to obtain output features;

The attention sub-module is used to perform attention calculation on the output features through the progressive attention refinement layer to obtain attention features, and output the refined segmentation image according to the attention features.

In this embodiment, attention features are obtained through attention calculation, and more accurate refined and segmented images can be output.

In this embodiment, the attention submodule includes an upsampling unit, an attention calculation unit, and a correction unit, wherein:

The upsampling unit is used to upsample the output features output by the previous auxiliary output layer to the same size as the output features of the current auxiliary output layer to obtain the upsampled output features, and calculate the uncertain region mask according to the upsampled output features membrane;

The attention calculation unit is used to perform feature extraction on the uncertain area mask to obtain uncertain area features, and perform attention calculation on the uncertain area features to obtain an attention score;

The correction unit is used to modify the feature of the uncertain region according to the attention score, and obtain the corrected feature of the uncertain region as the attention feature.

In some optional implementations, the attention submodule also includes a splicing unit and a decoding unit, wherein:

The splicing unit is used to extract the features of the output features of the current layer auxiliary output layer to obtain the extracted features, and splice the attention features and the extracted features to obtain the spliced features;

The decoding unit is used to decode the mosaic feature, and output the thinned and segmented image.

In this embodiment, the training module 504 includes a loss function calculation submodule, an adjustment submodule, and a generation submodule, wherein:

The loss function calculation submodule is used to obtain a second loss function according to the thinned and segmented image;

The loss function calculation submodule is also used to determine a target loss function based on the first loss function and the second loss function;

The adjustment sub-module is used to adjust the model parameters of the initial image matting model according to the description;

The generation sub-module is used to generate a trained image matting model according to the model parameters when the iteration end condition is met.

In this embodiment, the pre-training submodule includes a calculation unit, an adjustment unit, and an output unit, wherein:

The calculation unit is used to calculate the reconstruction loss according to the reconstructed image and the original images in the original image set;

The computing unit is further configured to determine the first loss function according to the reconstruction loss;

An adjustment unit is configured to adjust segmentation parameters of the image segmentation layer based on the first loss function;

The output unit is used to output the pre-trained image segmentation layer according to the segmentation parameters when the iteration end condition is met.

In this embodiment, by adjusting the parameters according to the first loss function determined by the reconstruction loss, the efficiency of pre-training can be improved, and at the same time, more accurate masks can be ensured for subsequent training.

In order to solve the above technical problems, the embodiment of the present application further provides computer equipment. Please refer to FIG. 6 for details. FIG. 6 is a block diagram of the basic structure of the computer device in this embodiment.

The computer device 6 includes a memory 61 , a processor 62 , and a network interface 63 connected to each other through a system bus. It should be noted that only the computer device 6 is shown with components 61-63, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, dedicated Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.

The computer equipment may be computing equipment such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can perform human-computer interaction with the user through keyboard, mouse, remote controller, touch panel or voice control device.

The memory 61 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6 , such as a hard disk or memory of the computer device 6 . In other embodiments, the memory 61 can also be an external storage device of the computer device 6, such as a plug-in hard disk equipped on the computer device 6, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 61 is generally used to store the operating system and various application software installed in the computer device 6, such as computer-readable instructions of the image matting method based on image segmentation. In addition, the memory 61 can also be used to temporarily store various types of data that have been output or will be output.

The processor 62 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. This processor 62 is generally used to control the general operation of said computer device 6 . In this embodiment, the processor 62 is configured to run computer-readable instructions stored in the memory 61 or process data, such as computer-readable instructions for running the image segmentation-based image matting method.

The network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.

In this embodiment, when the processor executes the computer-readable instructions stored in the memory, the steps of the image matting method based on image segmentation in the above embodiment are realized, and the image is initially segmented through the image segmentation layer in the trained image matting model. , and then output the preliminary segmentation results to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and completely avoid manual intervention, achieve complete automatic matting, and improve the efficiency of image matting. At the same time, more accurate matting results can be achieved through the image matting model, and the accuracy and accuracy of matting can be improved.

The present application also provides another implementation manner, which is to provide a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor, so that the at least one processor executes the steps of the image matting method based on image segmentation as described above , the image is initially segmented through the image segmentation layer in the trained image matting model, and then the preliminary segmentation result is output to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and is completely It avoids manual intervention, achieves complete automatic matting, and improves the efficiency of image matting. At the same time, through the image matting model, more accurate matting results can be achieved, and the accuracy and accuracy of matting can be improved.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.

Apparently, the embodiments described above are only some of the embodiments of the present application, but not all of them. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms, on the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure content of the present application more thorough and comprehensive. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features . All equivalent structures made using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are also within the scope of protection of this application.

Claims

A method for image matting based on image segmentation, comprising the steps of:

Obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the initial image matting model includes an image segmentation layer and an image matting layer;

Segmenting images in the training image set by the image segmentation layer to obtain a preliminary segmented image set;

Inputting the preliminary segmented image set to the image matting layer to obtain a refined segmented image;

Determining a target loss function based on the refined segmented image, iteratively updating the initial image matting model according to the target loss function, and outputting a trained image matting model;

A target image is acquired, and the target image is input into an image matting model to obtain a matting result.
The image matting method based on image segmentation according to claim 1, wherein, before the step of segmenting the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set, it also includes:

Input the obtained original image set into the image segmentation layer, and output the reconstructed image;

A first loss function is obtained according to the reconstructed image, and the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained image segmentation layer is output.
The image matting method based on image segmentation according to claim 2, wherein the image matting layer at least includes an encoder, a decoder, a progressive attention refinement layer, and an auxiliary output layer, and the preliminary segmentation The image set is input to the image matting layer, and the steps of obtaining the refined and segmented images include:

Inputting the preliminary segmented image set to the encoder for feature extraction to obtain encoded features;

Decoding the encoded features by the decoder, and outputting the decoded features;

inputting the decoded features into the auxiliary output layer to obtain output features;

Attention calculation is performed on the output features through the progressive attention refinement layer to obtain attention features, and the refined segmentation image is output according to the attention features.
The image matting method based on image segmentation according to claim 3, wherein the step of performing attention calculation on the output features through the progressive attention refinement layer, and obtaining attention features comprises:

Upsampling the output feature output by the previous layer of auxiliary output layer to the same size as the output feature of the current layer auxiliary output layer to obtain an upsampling output feature, and calculating an uncertain region mask according to the upsampling output feature;

performing feature extraction on the uncertain region mask to obtain uncertain region features, and performing attention calculation on the uncertain region features to obtain an attention score;

The uncertain region features are corrected according to the attention scores, and the corrected uncertain region features are obtained as attention features.
The image matting method based on image segmentation according to claim 4, wherein the step of outputting the refined segmented image according to the attention feature comprises:

Carrying out feature extraction on the output features of the auxiliary output layer of the current layer to obtain the extracted features, and splicing the attention features and the extracted features to obtain the spliced features;

The spliced features are decoded, and the thinned and segmented images are output.
The image matting method based on image segmentation according to any one of claims 2 to 5, wherein the target loss function is determined based on the refined segmented image, and the initial image matting is performed according to the target loss function The graph model is iteratively updated, and the steps of outputting the trained image matting model include:

obtaining a second loss function according to the refined segmented image;

determining a target loss function based on the first loss function and the second loss function;

Adjusting the model parameters of the initial image matting model according to the description;

When the iteration end condition is satisfied, a trained image matting model is generated according to the model parameters.
The image matting method based on image segmentation according to claim 2, wherein the first loss function is obtained according to the reconstructed image, the image segmentation layer is iteratively updated based on the first loss function, and the output The steps of the image segmentation layer completed by pre-training include:

calculating a reconstruction loss according to the reconstructed image and the original images in the original image set;

determining the first loss function based on the reconstruction loss;

adjusting segmentation parameters of the image segmentation layer based on the first loss function;

When the iteration end condition is met, the pre-trained image segmentation layer is output according to the segmentation parameters.
An image matting device based on image segmentation, comprising:

The acquisition module is used to obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer;

A preliminary segmentation module, configured to segment images in the training image set through the image segmentation layer to obtain a preliminary segmented image set;

A refined segmentation module, configured to input the preliminary segmented image set to the image matting layer to obtain a refined segmented image;

A training module, configured to determine a target loss function based on the refined segmented image, iteratively update the initial image matting model according to the target loss function, and output a trained image matting model;

The image matting module is used to obtain a target image, input the target image into the image matting model, and obtain a matting result.
A computer device comprising a memory and a processor, the memory storing computer-readable instructions running on the processor, the processor executing the computer-readable instructions to implement image-based segmentation as described below The steps of the image matting method:

Obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the initial image matting model includes an image segmentation layer and an image matting layer;

Segmenting images in the training image set by the image segmentation layer to obtain a preliminary segmented image set;

Inputting the preliminary segmented image set to the image matting layer to obtain a refined segmented image;

Determining a target loss function based on the refined segmented image, iteratively updating the initial image matting model according to the target loss function, and outputting a trained image matting model;

A target image is acquired, and the target image is input into an image matting model to obtain a matting result.
The computer device according to claim 9, wherein, before the step of segmenting the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set, it also includes:

Input the obtained original image set into the image segmentation layer, and output the reconstructed image;

A first loss function is obtained according to the reconstructed image, and the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained image segmentation layer is output.
The computer device according to claim 10, wherein said image matting layer comprises at least an encoder, a decoder, a progressive attention refinement layer, and an auxiliary output layer, said inputting said preliminary segmented image set into said The image matting layer, the steps of obtaining the refined segmented image include:

Inputting the preliminary segmented image set to the encoder for feature extraction to obtain encoded features;

Decoding the encoded features by the decoder, and outputting the decoded features;

inputting the decoded features into the auxiliary output layer to obtain output features;

Attention calculation is performed on the output features through the progressive attention refinement layer to obtain attention features, and the refined segmentation image is output according to the attention features.
The computer device according to claim 11, wherein the step of performing attention calculation on the output features through the progressive attention refinement layer, and obtaining attention features comprises:

Upsampling the output feature output by the previous layer of auxiliary output layer to the same size as the output feature of the current layer auxiliary output layer to obtain an upsampling output feature, and calculating an uncertain region mask according to the upsampling output feature;

performing feature extraction on the uncertain region mask to obtain uncertain region features, and performing attention calculation on the uncertain region features to obtain an attention score;

The uncertain region features are corrected according to the attention scores, and the corrected uncertain region features are obtained as attention features.
The computer device according to claim 12, wherein said step of outputting said refined segmented image according to said attention feature comprises:

Carry out feature extraction to the output feature of current layer auxiliary output layer, obtain extraction feature, described attention feature and described extraction feature are spliced, obtain splicing feature;

The spliced features are decoded, and the thinned and segmented images are output.
The computer device according to any one of claims 10 to 13, wherein the target loss function is determined based on the refined segmented image, and the initial image matting model is iteratively updated according to the target loss function, The steps of outputting the trained image matting model include:

obtaining a second loss function according to the refined segmented image;

determining a target loss function based on the first loss function and the second loss function;

Adjusting the model parameters of the initial image matting model according to the description;

When the iteration end condition is satisfied, a trained image matting model is generated according to the model parameters.
The computer device according to claim 10, wherein the first loss function is obtained according to the reconstructed image, the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained completed The steps of image segmentation layer include:

calculating a reconstruction loss according to the reconstructed image and the original images in the original image set;

determining the first loss function based on the reconstruction loss;

adjusting segmentation parameters of the image segmentation layer based on the first loss function;

When the iteration end condition is met, the pre-trained image segmentation layer is output according to the segmentation parameters.
A computer-readable storage medium, the computer-readable storage medium is stored with computer-readable instructions, and when the computer-readable instructions are executed by a processor, the steps of the image matting method based on image segmentation are realized as follows:

Obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the initial image matting model includes an image segmentation layer and an image matting layer;

Segmenting images in the training image set by the image segmentation layer to obtain a preliminary segmented image set;

Inputting the preliminary segmented image set to the image matting layer to obtain a refined segmented image;

Determining a target loss function based on the refined segmented image, iteratively updating the initial image matting model according to the target loss function, and outputting a trained image matting model;

A target image is acquired, and the target image is input into an image matting model to obtain a matting result.
The computer-readable storage medium according to claim 16, wherein, before the step of segmenting the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set, further comprising:

Input the obtained original image set into the image segmentation layer, and output the reconstructed image;

A first loss function is obtained according to the reconstructed image, and the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained image segmentation layer is output.
The computer-readable storage medium according to claim 17, wherein the image matting layer includes at least an encoder, a decoder, a progressive attention refinement layer, and an auxiliary output layer, and the input of the preliminary segmented image set is To the image matting layer, the step of obtaining the refined segmented image comprises:

Inputting the preliminary segmented image set to the encoder for feature extraction to obtain encoded features;

Decoding the encoded features by the decoder, and outputting the decoded features;

inputting the decoded features into the auxiliary output layer to obtain output features;

Attention calculation is performed on the output features through the progressive attention refinement layer to obtain attention features, and the refined segmentation image is output according to the attention features.
The computer-readable storage medium according to claim 18, wherein the step of performing attention calculation on the output features through the progressive attention refinement layer, and obtaining attention features comprises:

Upsampling the output feature output by the previous layer of auxiliary output layer to the same size as the output feature of the current layer auxiliary output layer to obtain an upsampling output feature, and calculating an uncertain region mask according to the upsampling output feature;

performing feature extraction on the uncertain region mask to obtain uncertain region features, and performing attention calculation on the uncertain region features to obtain an attention score;

The uncertain region features are corrected according to the attention scores, and the corrected uncertain region features are obtained as attention features.
The computer-readable storage medium according to claim 19, wherein the step of outputting the refined segmentation image according to the attention feature comprises:

Carrying out feature extraction on the output features of the auxiliary output layer of the current layer to obtain the extracted features, and splicing the attention features and the extracted features to obtain the spliced features;

The spliced features are decoded, and the thinned and segmented images are output.