WO2023159746A1 - Image matting method and apparatus based on image segmentation, computer device, and medium - Google Patents

Image matting method and apparatus based on image segmentation, computer device, and medium Download PDF

Info

Publication number
WO2023159746A1
WO2023159746A1 PCT/CN2022/089507 CN2022089507W WO2023159746A1 WO 2023159746 A1 WO2023159746 A1 WO 2023159746A1 CN 2022089507 W CN2022089507 W CN 2022089507W WO 2023159746 A1 WO2023159746 A1 WO 2023159746A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
layer
matting
features
output
Prior art date
Application number
PCT/CN2022/089507
Other languages
French (fr)
Chinese (zh)
Inventor
郑喜民
张祎頔
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023159746A1 publication Critical patent/WO2023159746A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to an image segmentation-based image matting method, device, computer equipment and media.
  • Image matting means that for a given image, the network can separate its foreground area and background area. It is an important topic in the field of computer vision and is widely used in video conferencing, image editing, and post-production scenarios. .
  • image matting technology usually uses additional input, such as trimap, background image, etc., to generate a mask through additional input, and use the mask to extract the matting object.
  • the purpose of the embodiments of the present application is to propose an image segmentation-based image matting method, device, computer equipment, and storage medium to solve the problems of time-consuming and laborious image matting, low image matting efficiency, and inaccurate matting results in related technologies. technical problem.
  • the embodiment of the present application provides an image matting method based on image segmentation, which adopts the following technical solution:
  • a target image is acquired, and the target image is input into an image matting model to obtain a matting result.
  • the embodiment of the present application also provides an image matting device based on image segmentation, which adopts the following technical solutions:
  • the acquisition module is used to obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer;
  • a preliminary segmentation module configured to segment images in the training image set through the image segmentation layer to obtain a preliminary segmented image set
  • a refined segmentation module configured to input the preliminary segmented image set to the image matting layer to obtain a refined segmented image
  • a training module configured to determine a target loss function based on the refined segmented image, iteratively update the initial image matting model according to the target loss function, and output a trained image matting model;
  • the image matting module is used to obtain a target image, input the target image into the image matting model, and obtain a matting result.
  • the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
  • the computer device includes a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the steps of the image matting method based on image segmentation are implemented as follows:
  • a target image is acquired, and the target image is input into an image matting model to obtain a matting result.
  • the embodiment of the present application also provides a computer-readable storage medium, which adopts the following technical solution:
  • Computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the image matting method based on image segmentation are implemented as follows:
  • a target image is acquired, and the target image is input into an image matting model to obtain a matting result.
  • the application obtains the training image set, and inputs the training image set into the pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer; the images in the training image set are segmented through the image segmentation layer , to obtain a preliminary segmented image set; input the preliminary segmented image set to the image matting layer to obtain a refined segmented image; determine the target loss function based on the refined segmented image, and iteratively update the initial image matting model according to the target loss function, and output The image matting model that has been trained; obtain the target image, input the target image into the image matting model, and obtain the matting result; this application uses the image segmentation layer in the trained image matting model to initially segment the image, and then The output image matting layer of the segmentation result is further refined and segmented, which can realize image matting without any additional input, and completely avoids manual intervention, achieves complete automatic matting, improves the efficiency of image matting, and at the same time, through the
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • Fig. 2 is the flowchart of an embodiment of the image matting method based on image segmentation according to the present application
  • FIG. 3 is a flow chart of a specific implementation of step S203 in FIG. 2;
  • Fig. 4 is the flow chart of another embodiment of the image matting method based on image segmentation according to the present application.
  • Fig. 5 is a schematic structural diagram of an embodiment of an image matting device based on image segmentation according to the present application
  • Fig. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the system architecture 100 can include terminal devices 101, 102, 103, a network 104 and a server 105.
  • the network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 .
  • Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • Terminal devices 101 , 102 , 103 Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
  • Terminal devices 101, 102, 103 can be various electronic devices with display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4) player, laptop portable computer and desktop computer, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4
  • laptop portable computer and desktop computer etc.
  • the server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal devices 101 , 102 , 103 .
  • the image segmentation-based image matting method provided in the embodiment of the present application is generally executed by a server/terminal device, and correspondingly, the image segmentation-based image matting device is generally set in the server/terminal device.
  • terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • FIG. 2 a flow chart of an embodiment of an image matting method based on image segmentation according to the present application is shown, including the following steps:
  • Step S201 acquiring a training image set, and inputting the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer.
  • the pre-built initial image matting model includes an image segmentation layer and an image matting layer, wherein the image segmentation layer can adopt Double DIP (Deep-Image-Priors, depth image prior) network, Double DIP network Use two DIP networks to divide the input image into a foreground layer and a background layer; the backbone of the image matting layer uses the U-Net network for encoding and decoding, and an auxiliary output layer is added to the base layer of the decoding part for deep supervision and progressive attention
  • the Progressive Attention Refinement Module uses the intermediate layer output of the decoder to perform layer-by-layer refinement to obtain the final accurate mask to obtain an accurate segmented image.
  • the training image set can be obtained from a public data set, for example, the Alphamatting data set, the Alphamatting data set contains 27 training images and 8 test images, and these images all have standard results of foreground and background after matting Then, the foreground images of these images are combined with 500 indoor scene images and 500 outdoor scene images respectively, and the combined images are rotated at three different angles, and the obtained images are used as training image sets and test images It can also be generated according to the obtained original pictures, specifically, obtain the original pictures (for example, portrait pictures, product pictures, environment pictures, animal pictures, vehicle pictures, etc.), calculate the signal-to-noise ratio corresponding to each original picture, and The original picture is filtered according to the signal-to-noise ratio, and the salient foreground in the filtered original picture is marked, so as to generate a training data set based on the marked original picture.
  • the original pictures for example, portrait pictures, product pictures, environment pictures, animal pictures, vehicle pictures, etc.
  • Step S202 segment the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set.
  • the image segmentation layer divides the training image in the training image set into the foreground layer and the background layer, and mixes the foreground layer and the background layer through a mask to obtain a reconstructed image, which is the preliminary segmented image .
  • image segmentation layer is pre-trained.
  • Step S203 inputting the preliminary segmented image set into the image matting layer to obtain a refined segmented image.
  • the image matting layer at least includes an encoder, a decoder, a progressive attention refinement layer (Progressive Attention Refinement, PAR) and an auxiliary output layer (Output), and the preliminary segmented image is input into the image matting layer for refinement. Segmentation to obtain a refined segmented image.
  • a progressive attention refinement layer Progressive Attention Refinement, PAR
  • Output auxiliary output layer
  • the above steps of inputting the preliminary segmented image set to the image matting layer to obtain the refined segmented image include:
  • Step S301 inputting the preliminary segmented image set to the encoder for feature extraction to obtain encoded features.
  • the encoder includes a plurality of convolutional neural network layers (Convolution neural network, CNN) and a downsampling layer, and the downsampling layer can be a maximum pooling layer (max-pooling).
  • CNN convolution neural network
  • max-pooling maximum pooling layer
  • the encoder includes 5 convolutional layers and 4 downsampling layers, and the 5 convolutional layers are respectively the first coded convolutional layer, the second coded convolutional layer, the third coded convolutional layer, and the fourth coded convolutional layer.
  • Convolutional layer and the fifth coded convolutional layer between the first coded convolutional layer and the second coded convolutional layer, between the second coded convolutional layer and the third coded convolutional layer, between the third coded convolutional layer and Between the fourth coded convolutional layer and between the fourth coded convolutional layer and the fifth coded convolutional layer, there is a downsampling layer, respectively the first downsampling layer, the second downsampling layer, and the third downsampling layer. sampling layer and a fourth downsampling layer.
  • the preliminary segmented images in the preliminary segmented image set pass through the first encoding convolution layer, the first down-sampling layer, the second encoding convolution layer, the second down-sampling layer, the third encoding convolution layer, the third down-sampling layer, the second The four-coded convolutional layer, the fourth down-sampling layer, and the fifth coded convolutional layer perform feature extraction to obtain coded features.
  • the convolution kernel and the convolution step size of the encoder convolution layer can be set according to actual conditions.
  • step S302 the coded features are decoded by a decoder, and the decoded features are output.
  • the decoder is composed of a plurality of decoding modules, and each decoding module includes a plurality of up-sampling layers (Up-sampling layer) and a CNN layer.
  • each decoding module includes a plurality of up-sampling layers (Up-sampling layer) and a CNN layer.
  • Up-sampling layer Up-sampling layer
  • the size of the feature map is increased by the corresponding multiple, and after multiple decodings, the feature map with the same size as the original input preliminary segmented image is obtained, that is, the decoding feature.
  • the decoded features after each decoding are concatenated with the corresponding encoded features of the same size in the encoding stage to fuse low-level and high-level features.
  • the decoder includes 5 convolutional layers and 4 upsampling layers, and the 5 convolutional layers are respectively the first decoding convolutional layer, the second decoding convolutional layer, the third decoding convolutional layer, and the fourth decoding The convolutional layer and the fifth decoding convolutional layer, between the first decoding convolutional layer and the second decoding convolutional layer, between the second decoding convolutional layer and the third decoding convolutional layer, between the third decoding convolutional layer and Between the fourth decoding convolutional layer and between the fourth decoding convolutional layer and the fifth decoding convolutional layer, there is an upsampling layer, which are respectively the first upsampling layer, the second upsampling layer, and the third upsampling layer. sampling layer and a fourth upsampling layer.
  • the convolution kernel and the convolution step size of the convolution layer of the decoder can also be set according to actual conditions.
  • Step S303 input the decoded feature into the auxiliary output layer to obtain the output feature.
  • each convolutional layer of the decoder is connected with an auxiliary output layer, which is used to perform convolution pooling operation on the output features, and more feature information of the image has been preserved.
  • Step S304 perform attention calculation on the output features through the progressive attention refinement layer, obtain attention features, and output a refined segmented image according to the attention features.
  • the progressive attention layer is respectively connected to the auxiliary output layer of the previous layer, the current decoding auxiliary output layer connected to the current decoding convolution layer, and the auxiliary output layer corresponding to the progressive attention layer (current layer auxiliary output layer) , specifically, the input of the progressive attention layer is the output of the auxiliary output layer of the previous layer and the auxiliary output layer of the current decoding. After the attention operation is performed, the output is output through the auxiliary output layer of the current layer.
  • the first decoding convolutional layer does not have a corresponding progressive attention layer connected to it. If the first decoding convolutional layer is the current layer, the auxiliary output layer connected to it is not only the current decoding output layer, but also equivalent to the current layer output layer.
  • the progressive attention layer includes at least an encoding layer, an attention convolution layer, a first fusion layer, a softmax layer, a second fusion layer, a connection layer, and a decoding layer, and the output features are sequentially passed through the encoding layer, attention convolution layer, first The fusion layer, the softmax layer, the second fusion layer, the connection layer and the decoding layer perform corresponding calculations, and output more accurate refined segmentation images.
  • step S204 the target loss function is determined based on the refined and segmented image, the initial image matting model is iteratively updated according to the target loss function, and the trained image matting model is output.
  • the training image set is input into the initial image matting model for training.
  • the target loss function of the initial image matting model is calculated to obtain the loss function value, and the model parameters are adjusted according to the loss function value , continue iterative training, and the model is trained to a certain extent.
  • the performance of the model reaches the optimal state, and the value of the loss function cannot continue to decrease, that is, convergence.
  • the method of judging the convergence only needs to calculate the loss function value in the previous two rounds of iterations. If the loss function value is still changing, continue to select the training image and input it to the image matting model after adjusting the model parameters to improve the image matting model. Continue iterative training; if the loss function value does not change significantly, the model can be considered converged. After the model converges, the final image matting model is output.
  • Step S205 acquiring a target image, inputting the target image into the image matting model, and obtaining a matting result.
  • the acquired target image is input into the trained image matting model to perform a matting operation, and a corresponding matting result can be obtained.
  • the above target image can also be stored in a block chain node.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the image is initially segmented through the image segmentation layer in the trained image matting model, and then the preliminary segmentation result is output to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and It completely avoids manual intervention, achieves complete automatic matting, and improves the efficiency of image matting.
  • more accurate matting results can be achieved through the image matting model, further improving the accuracy and accuracy of matting.
  • Pre-train the image segmentation layer to obtain the pre-trained image segmentation layer including:
  • the first loss function is obtained according to the reconstructed image, the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained image segmentation layer is output.
  • the acquisition channel of the original image set may be the same as that of the above training data set, or it may be different, and it is selected according to actual needs.
  • the original image in the original image set is input into the image segmentation layer, which is divided into the foreground layer y 1 and the background layer y 2 , and the foreground layer y 1 and the background layer y 2 are fused through the initial mask m 0 to obtain the reconstructed image I'.
  • the initial mask m 0 can be preset or randomly generated, and the random generation is obtained according to input random noise.
  • the first loss function is obtained according to the reconstructed image
  • the image segmentation layer is iteratively updated based on the first loss function
  • the step of outputting the pre-trained image segmentation layer includes:
  • the reconstruction loss is calculated from the reconstructed image and the original image in the original image set
  • the first loss function is calculated by the following formula:
  • Loss DDIP Loss Reconst + ⁇ Loss Excl + ⁇ Loss Reg
  • Loss Reconst is the reconstruction loss
  • Loss Reconst
  • I is the original image
  • Loss Excl is a mutually exclusive loss, which minimizes the correlation between the gradients of y 1 and y 2
  • Loss Reg ( ⁇ x
  • Loss Reg is a regularization loss, which is mainly used to constrain the fusion mask (mask), and is used to binarize the foreground initial mask m 0 , ⁇ , ⁇ are preset weighting parameters.
  • the image segmentation layer is trained to a certain extent. At this time, the performance of the image segmentation layer reaches the preset state, and the preset state can be It is convergence, or the loss function value of the first loss function reaches the preset threshold, which means that the iteration end condition is satisfied, the image segmentation layer pre-training is completed, and the image segmentation layer is output according to the segmentation parameters at the end of the iteration.
  • the initial mask m 0 of the image segmentation layer is adjusted.
  • the mask m of the pre-trained image segmentation layer is obtained, and the mask m remains Learning adjustments will be made to obtain a more accurate foreground mask m for subsequent image matting layers.
  • the above step of performing attention calculation on the output features through the progressive attention refinement layer, and obtaining the attention features includes:
  • Step S401 upsampling the output features output by the previous auxiliary output layer to the same size as the output features of the current auxiliary output layer to obtain the upsampled output features, and calculating an uncertain region mask based on the upsampled output features.
  • the output feature ⁇ l-1 output by the previous auxiliary output layer is upsampled to the same size as the output feature ⁇ l of the current auxiliary output layer, and then the following transformation formula is used:
  • f ⁇ m (x, y) is the transformation formula for obtaining the mask m of the uncertain region from the ⁇ mask at point (x, y) of the image
  • ⁇ l is the ⁇ mask of the auxiliary output layer l of the current layer.
  • Step S402 perform feature extraction on the mask of the uncertain region to obtain the feature of the uncertain region, and perform attention calculation on the feature of the uncertain region to obtain the attention score.
  • the uncertain region mask m l-1 output by the previous auxiliary output layer and the uncertain region mask m l output by the auxiliary output layer of the current layer are extracted through the encoding layer composed of CNN respectively, and the obtained and Then use the following formula to calculate and obtain the attention score, which acts as an optimization trend and corrects the uncertain region characteristics output by the auxiliary output layer of the current layer
  • the formula for calculating the attention score is as follows:
  • Step S403 correcting the feature of the uncertain region according to the attention score, and obtaining the corrected feature of the uncertain region as the attention feature.
  • attention features are obtained by correcting uncertain region features through attention calculations, which can ensure more accurate subsequent refinement and matting.
  • the above step of outputting the refined segmented image according to the attention feature includes:
  • Feature extraction is performed on the output features of the auxiliary output layer of the current layer to obtain the extracted features, and the attention features are spliced with the extracted features to obtain the spliced features;
  • the output ⁇ l of the auxiliary output layer of the current layer is extracted through the coding layer composed of CNN, and the extracted feature F ⁇ is combined with the obtained corrected attention feature of the auxiliary output layer of the current layer, as follows:
  • F ⁇ is the feature extracted by the mask of the current auxiliary output layer ⁇ l through the encoding layer
  • F ⁇ ' is the modified feature of the mask of the current auxiliary output layer ⁇ , that is, the stitching feature.
  • the spliced feature F ⁇ ' is decoded by a decoding layer composed of CNN, and a refined segmented image is obtained.
  • the target loss function is determined based on the refined segmented image
  • the initial image matting model is iteratively updated according to the target loss function
  • the step of outputting the trained image matting model includes:
  • a trained image matting model is generated according to the model parameters.
  • ⁇ l is a hyperparameter
  • ⁇ gt represents the output truth value of the auxiliary output layer of the current layer
  • Loss l represents the loss of the uncertain region of the auxiliary output layer of the current layer, including L1 loss, combination loss and Laplacian loss, the formula for:
  • Loss l ( ⁇ gt ⁇ m l , ⁇ l ⁇ m l ) Loss L1 ( ⁇ gt ⁇ m l , ⁇ l ⁇ m l )+Loss comp ( ⁇ gt ⁇ m l , ⁇ l ⁇ m l )+Loss lap ( ⁇ gt m l , ⁇ l m l )
  • the target loss function is determined based on the first loss function and the second loss function, and the calculation formula of the target loss function is as follows:
  • ⁇ and ⁇ are preset weighting parameters.
  • the model parameters are adjusted according to the value of the loss function, and the iterative training is continued until the model is trained to a certain extent. At this time, the performance of the model reaches the optimal state, and the value of the loss function cannot continue to decrease, that is, it converges.
  • the model convergence After the model converges, the final image matting model is output according to the final adjusted model parameters.
  • the matting accuracy and accuracy of the image matting model can be improved.
  • AI artificial intelligence
  • the embodiments of the present application may acquire and process relevant data based on artificial intelligence technology.
  • artificial intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the application can be used in numerous general purpose or special purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc.
  • This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • the aforementioned storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
  • the present application provides an embodiment of an image matting device based on image segmentation, which corresponds to the method embodiment shown in FIG. 2 , the device can be specifically applied to various electronic devices.
  • the image matting device 500 based on image segmentation in this embodiment includes: an acquisition module 501 , a preliminary segmentation module 502 , a refined segmentation module 503 , a training module 504 and a matting module 505 . in:
  • the obtaining module 501 is used to obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer.
  • the preliminary segmentation module 502 is used to segment the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set;
  • the refinement and segmentation module 503 is used to input the preliminary segmentation image set to the image matting layer to obtain a refinement segmentation image
  • the training module 504 is configured to determine a target loss function based on the refined segmented image, iteratively update the initial image matting model according to the target loss function, and output the trained image matting model;
  • the matting module 505 is used to acquire a target image, and input the target image into the image matting model to obtain a matting result.
  • the above-mentioned image segmentation-based image matting device performs preliminary segmentation on the image through the image segmentation layer in the trained image matting model, and then outputs the preliminary segmentation result to the image matting layer for further refinement and segmentation, which can realize without any additional Input is used for image matting, and human intervention is completely avoided to achieve complete automatic matting, which improves the efficiency of image matting.
  • the image matting model through the image matting model, more accurate matting results can be achieved, improving the accuracy and Accuracy.
  • the preliminary segmentation module 502 includes a reconstruction submodule and a pre-training submodule, wherein the reconstruction submodule is used to input the obtained original image set into the image segmentation layer, and output the reconstructed submodule. Constructing an image; the pre-training submodule is used to obtain a first loss function according to the reconstructed image, iteratively update the image segmentation layer based on the first loss function, and output the pre-trained image segmentation layer.
  • the initial mask of the image segmentation layer is adjusted by pre-training the image segmentation layer. After the pre-training is completed, the mask of the pre-trained image segmentation layer is obtained to ensure that a more accurate foreground mask is obtained in the subsequent training process.
  • Membrane used for subsequent image matting layers for matting.
  • the refinement and segmentation module 503 includes a feature extraction submodule, a decoding submodule, an output submodule, and an attention submodule, wherein:
  • the feature extraction submodule is used to input the preliminary segmented image set to the encoder for feature extraction to obtain encoded features
  • the decoding submodule is used to decode the encoded features through the decoder, and output the decoded features
  • the output sub-module is used to input the decoding features into the auxiliary output layer to obtain output features
  • the attention sub-module is used to perform attention calculation on the output features through the progressive attention refinement layer to obtain attention features, and output the refined segmentation image according to the attention features.
  • attention features are obtained through attention calculation, and more accurate refined and segmented images can be output.
  • the attention submodule includes an upsampling unit, an attention calculation unit, and a correction unit, wherein:
  • the upsampling unit is used to upsample the output features output by the previous auxiliary output layer to the same size as the output features of the current auxiliary output layer to obtain the upsampled output features, and calculate the uncertain region mask according to the upsampled output features membrane;
  • the attention calculation unit is used to perform feature extraction on the uncertain area mask to obtain uncertain area features, and perform attention calculation on the uncertain area features to obtain an attention score;
  • the correction unit is used to modify the feature of the uncertain region according to the attention score, and obtain the corrected feature of the uncertain region as the attention feature.
  • attention features are obtained by correcting uncertain region features through attention calculations, which can ensure more accurate subsequent refinement and matting.
  • the attention submodule also includes a splicing unit and a decoding unit, wherein:
  • the splicing unit is used to extract the features of the output features of the current layer auxiliary output layer to obtain the extracted features, and splice the attention features and the extracted features to obtain the spliced features;
  • the decoding unit is used to decode the mosaic feature, and output the thinned and segmented image.
  • the training module 504 includes a loss function calculation submodule, an adjustment submodule, and a generation submodule, wherein:
  • the loss function calculation submodule is used to obtain a second loss function according to the thinned and segmented image
  • the loss function calculation submodule is also used to determine a target loss function based on the first loss function and the second loss function;
  • the adjustment sub-module is used to adjust the model parameters of the initial image matting model according to the description
  • the generation sub-module is used to generate a trained image matting model according to the model parameters when the iteration end condition is met.
  • the matting accuracy and accuracy of the image matting model can be improved.
  • the pre-training submodule includes a calculation unit, an adjustment unit, and an output unit, wherein:
  • the calculation unit is used to calculate the reconstruction loss according to the reconstructed image and the original images in the original image set;
  • the computing unit is further configured to determine the first loss function according to the reconstruction loss
  • An adjustment unit is configured to adjust segmentation parameters of the image segmentation layer based on the first loss function
  • the output unit is used to output the pre-trained image segmentation layer according to the segmentation parameters when the iteration end condition is met.
  • the efficiency of pre-training can be improved, and at the same time, more accurate masks can be ensured for subsequent training.
  • FIG. 6 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 6 includes a memory 61 , a processor 62 , and a network interface 63 connected to each other through a system bus. It should be noted that only the computer device 6 is shown with components 61-63, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, dedicated Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the computer equipment may be computing equipment such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can perform human-computer interaction with the user through keyboard, mouse, remote controller, touch panel or voice control device.
  • the memory 61 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc.
  • the memory 61 may be an internal storage unit of the computer device 6 , such as a hard disk or memory of the computer device 6 .
  • the memory 61 can also be an external storage device of the computer device 6, such as a plug-in hard disk equipped on the computer device 6, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device.
  • the memory 61 is generally used to store the operating system and various application software installed in the computer device 6, such as computer-readable instructions of the image matting method based on image segmentation.
  • the memory 61 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 62 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. This processor 62 is generally used to control the general operation of said computer device 6 . In this embodiment, the processor 62 is configured to run computer-readable instructions stored in the memory 61 or process data, such as computer-readable instructions for running the image segmentation-based image matting method.
  • CPU Central Processing Unit
  • controller a controller
  • microcontroller a microcontroller
  • microprocessor microprocessor
  • This processor 62 is generally used to control the general operation of said computer device 6 .
  • the processor 62 is configured to run computer-readable instructions stored in the memory 61 or process data, such as computer-readable instructions for running the image segmentation-based image matting method.
  • the network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
  • the processor executes the computer-readable instructions stored in the memory
  • the steps of the image matting method based on image segmentation in the above embodiment are realized, and the image is initially segmented through the image segmentation layer in the trained image matting model. , and then output the preliminary segmentation results to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and completely avoid manual intervention, achieve complete automatic matting, and improve the efficiency of image matting.
  • more accurate matting results can be achieved through the image matting model, and the accuracy and accuracy of matting can be improved.
  • the present application also provides another implementation manner, which is to provide a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor, so that the at least one processor executes the steps of the image matting method based on image segmentation as described above , the image is initially segmented through the image segmentation layer in the trained image matting model, and then the preliminary segmentation result is output to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and is completely It avoids manual intervention, achieves complete automatic matting, and improves the efficiency of image matting. At the same time, through the image matting model, more accurate matting results can be achieved, and the accuracy and accuracy of matting can be improved.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Abstract

Embodiments of the present application relate to the technical field of artificial intelligence, and relate to an image matting method based on image segmentation. The method comprises: inputting an obtained training image set into a pre-constructed initial image matting model, wherein the image matting model comprises an image segmentation layer and an image matting layer; segmenting images in the training image set by means of the image segmentation layer to obtain a set of preliminarily segmented images; inputting the set of preliminarily segmented images into the image matting layer to obtain finely segmented images; determining a target loss function on the basis of the finely segmented images, performing iterative updating on the initial image matting model according to the target loss function, and outputting a trained image matting model; and inputting a target image into the image matting model to obtain a matting result. The present application further provides an image matting apparatus based on image segmentation, a computer device, and a medium. In addition, the present application further relates to a blockchain technology, and the target image can be stored in a blockchain. The present application can improve the precision and accuracy of image matting.

Description

基于图像分割的图像抠图方法、装置、计算机设备及介质Image matting method, device, computer equipment and medium based on image segmentation
本申请要求于2022年2月23日提交中国专利局、申请号为202210168421.X,发明名称为“基于图像分割的图像抠图方法、装置、计算机设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on February 23, 2022, with the application number 202210168421.X, and the title of the invention is "image matting method, device, computer equipment and media based on image segmentation", The entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种基于图像分割的图像抠图方法、装置、计算机设备及介质。The present application relates to the technical field of artificial intelligence, and in particular to an image segmentation-based image matting method, device, computer equipment and media.
背景技术Background technique
图像抠图(Image Matting)是指对于一张给定的图像,网络可以分离其前景区域和背景区域,是目前计算机视觉领域的重要课题,广泛应用于视频会议、图像编辑和后期制作等场景中。目前,图像抠图技术通常使用额外输入,如:trimap、背景图像等,通过额外输入来实现蒙版的生成,利用蒙版提取抠图对象。Image matting means that for a given image, the network can separate its foreground area and background area. It is an important topic in the field of computer vision and is widely used in video conferencing, image editing, and post-production scenarios. . At present, image matting technology usually uses additional input, such as trimap, background image, etc., to generate a mask through additional input, and use the mask to extract the matting object.
但是,发明人发现,额外输入的生成往往需要人工干预,例如trimap等先验蒙版的生成;有时额外输入的获取并不总是可行的,例如完整背景图像的获取,这使得图像抠图费时费力,同时,造成图像抠图效率低下以及抠图结果不准确。However, the inventors found that the generation of additional input often requires manual intervention, such as the generation of prior masks such as trimap; sometimes the acquisition of additional input is not always feasible, such as the acquisition of a complete background image, which makes image matting time-consuming At the same time, it causes low efficiency of image matting and inaccurate matting results.
发明内容Contents of the invention
本申请实施例的目的在于提出一种基于图像分割的图像抠图方法、装置、计算机设备及存储介质,以解决相关技术中图像抠图费时费力,图像抠图效率低下以及抠图结果不准确的技术问题。The purpose of the embodiments of the present application is to propose an image segmentation-based image matting method, device, computer equipment, and storage medium to solve the problems of time-consuming and laborious image matting, low image matting efficiency, and inaccurate matting results in related technologies. technical problem.
为了解决上述技术问题,本申请实施例提供一种基于图像分割的图像抠图方法,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application provides an image matting method based on image segmentation, which adopts the following technical solution:
获取训练图像集,将所述训练图像集输入预构建的初始图像抠图模型,其中,所述初始图像抠图模型包括图像分割层和图像抠图层;Obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the initial image matting model includes an image segmentation layer and an image matting layer;
通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集;Segmenting images in the training image set by the image segmentation layer to obtain a preliminary segmented image set;
将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像;Inputting the preliminary segmented image set to the image matting layer to obtain a refined segmented image;
基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型;Determining a target loss function based on the refined segmented image, iteratively updating the initial image matting model according to the target loss function, and outputting a trained image matting model;
获取目标图像,将所述目标图像输入图像抠图模型,得到抠图结果。A target image is acquired, and the target image is input into an image matting model to obtain a matting result.
为了解决上述技术问题,本申请实施例还提供一种基于图像分割的图像抠图装置,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application also provides an image matting device based on image segmentation, which adopts the following technical solutions:
获取模块,用于获取训练图像集,将所述训练图像集输入预构建的初始图像抠图模型,其中,图像抠图模型包括图像分割层和图像抠图层;The acquisition module is used to obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer;
初步分割模块,用于通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集;A preliminary segmentation module, configured to segment images in the training image set through the image segmentation layer to obtain a preliminary segmented image set;
细化分割模块,用于将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像;A refined segmentation module, configured to input the preliminary segmented image set to the image matting layer to obtain a refined segmented image;
训练模块,用于基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型;A training module, configured to determine a target loss function based on the refined segmented image, iteratively update the initial image matting model according to the target loss function, and output a trained image matting model;
抠图模块,用于获取目标图像,将所述目标图像输入图像抠图模型,得到抠图结果。The image matting module is used to obtain a target image, input the target image into the image matting model, and obtain a matting result.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
该计算机设备包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的基于图像分割的图像抠图方法的步骤:The computer device includes a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the steps of the image matting method based on image segmentation are implemented as follows:
获取训练图像集,将所述训练图像集输入预构建的初始图像抠图模型,其中,所述初始图像抠图模型包括图像分割层和图像抠图层;Obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the initial image matting model includes an image segmentation layer and an image matting layer;
通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集;Segmenting images in the training image set by the image segmentation layer to obtain a preliminary segmented image set;
将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像;Inputting the preliminary segmented image set to the image matting layer to obtain a refined segmented image;
基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型;Determining a target loss function based on the refined segmented image, iteratively updating the initial image matting model according to the target loss function, and outputting a trained image matting model;
获取目标图像,将所述目标图像输入图像抠图模型,得到抠图结果。A target image is acquired, and the target image is input into an image matting model to obtain a matting result.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application also provides a computer-readable storage medium, which adopts the following technical solution:
所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的基于图像分割的图像抠图方法的步骤:Computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the image matting method based on image segmentation are implemented as follows:
获取训练图像集,将所述训练图像集输入预构建的初始图像抠图模型,其中,所述初始图像抠图模型包括图像分割层和图像抠图层;Obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the initial image matting model includes an image segmentation layer and an image matting layer;
通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集;Segmenting images in the training image set by the image segmentation layer to obtain a preliminary segmented image set;
将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像;Inputting the preliminary segmented image set to the image matting layer to obtain a refined segmented image;
基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型;Determining a target loss function based on the refined segmented image, iteratively updating the initial image matting model according to the target loss function, and outputting a trained image matting model;
获取目标图像,将所述目标图像输入图像抠图模型,得到抠图结果。A target image is acquired, and the target image is input into an image matting model to obtain a matting result.
与现有技术相比,本申请实施例主要有以下有益效果:Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:
本申请通过获取训练图像集,将训练图像集输入预构建的初始图像抠图模型,其中,图像抠图模型包括图像分割层和图像抠图层;通过图像分割层对训练图像集中的图像进行分割,得到初步分割图像集;将初步分割图像集输入至图像抠图层,得到细化分割图像;基于细化分割图像确定目标损失函数,根据目标损失函数对初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型;获取目标图像,将目标图像输入图像抠图模型,得到抠图结果;本申请通过训练完成的图像抠图模型中的图像分割层对图像进行初步分割,再将初步分割结果输出图像抠图层进一步进行细化分割,可以实现无需任何额外输入来进行图像抠图,并且完全避免了人工干预,达到完全的自动化抠图,提高了图像抠图效率,同时,通过图像抠图模型可以达到更精确的抠图结果,进一步提高图像抠图的精确度和准确率。The application obtains the training image set, and inputs the training image set into the pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer; the images in the training image set are segmented through the image segmentation layer , to obtain a preliminary segmented image set; input the preliminary segmented image set to the image matting layer to obtain a refined segmented image; determine the target loss function based on the refined segmented image, and iteratively update the initial image matting model according to the target loss function, and output The image matting model that has been trained; obtain the target image, input the target image into the image matting model, and obtain the matting result; this application uses the image segmentation layer in the trained image matting model to initially segment the image, and then The output image matting layer of the segmentation result is further refined and segmented, which can realize image matting without any additional input, and completely avoids manual intervention, achieves complete automatic matting, improves the efficiency of image matting, and at the same time, through the image The matting model can achieve more accurate matting results, further improving the precision and accuracy of image matting.
附图说明Description of drawings
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solution in this application more clearly, a brief introduction will be given below to the accompanying drawings that need to be used in the description of the embodiments of the application. Obviously, the accompanying drawings in the following description are some embodiments of the application. Ordinary technicians can also obtain other drawings based on these drawings on the premise of not paying creative work.
图1是本申请可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是根据本申请的基于图像分割的图像抠图方法的一个实施例的流程图;Fig. 2 is the flowchart of an embodiment of the image matting method based on image segmentation according to the present application;
图3是图2中步骤S203的一种具体实施方式的流程图;FIG. 3 is a flow chart of a specific implementation of step S203 in FIG. 2;
图4是根据本申请的基于图像分割的图像抠图方法的另一个实施例的流程图;Fig. 4 is the flow chart of another embodiment of the image matting method based on image segmentation according to the present application;
图5是根据本申请的基于图像分割的图像抠图装置的一个实施例的结构示意图;Fig. 5 is a schematic structural diagram of an embodiment of an image matting device based on image segmentation according to the present application;
图6是根据本申请的计算机设备的一个实施例的结构示意图。Fig. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.
具体实施方式Detailed ways
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描 述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of the application; the terms used herein in the description of the application are only to describe specific embodiments The purpose is not to limit the present application; the terms "comprising" and "having" and any variations thereof in the specification and claims of the present application and the description of the above drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings.
本申请提供了一种基于图像分割的图像抠图方法,涉及人工智能,可以应用于如图1所示的系统架构100中,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。This application provides an image matting method based on image segmentation, which involves artificial intelligence and can be applied to the system architecture 100 shown in Figure 1. The system architecture 100 can include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 101, 102, 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。 Terminal devices 101, 102, 103 can be various electronic devices with display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4) player, laptop portable computer and desktop computer, etc.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal devices 101 , 102 , 103 .
需要说明的是,本申请实施例所提供的基于图像分割的图像抠图方法一般由服务器/终端设备执行,相应地,基于图像分割的图像抠图装置一般设置于服务器/终端设备中。It should be noted that the image segmentation-based image matting method provided in the embodiment of the present application is generally executed by a server/terminal device, and correspondingly, the image segmentation-based image matting device is generally set in the server/terminal device.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
继续参考图2,示出了根据本申请的基于图像分割的图像抠图方法的一个实施例的流程图,包括以下步骤:Continuing to refer to FIG. 2 , a flow chart of an embodiment of an image matting method based on image segmentation according to the present application is shown, including the following steps:
步骤S201,获取训练图像集,将训练图像集输入预构建的初始图像抠图模型,其中,图像抠图模型包括图像分割层和图像抠图层。Step S201, acquiring a training image set, and inputting the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer.
在本实施例中,预构建的初始图像抠图模型包括图像分割层和图像抠图层,其中,图像分割层可以采用Double DIP(Deep-Image-Priors,深度图像先验)网络,Double DIP网络利用两个DIP网络将输入图像分割为前景层和背景层;图像抠图层主干使用U-Net网络进行编码解码,并在解码部分的后基层加入辅助输出层,进行深度监督,并加入渐进注意力细化模块(Progressive Attention Refinement Module,PAR),利用解码器的中间层输出进行逐层细化,获得最终的精确蒙版,以得到精确的分割图像。In this embodiment, the pre-built initial image matting model includes an image segmentation layer and an image matting layer, wherein the image segmentation layer can adopt Double DIP (Deep-Image-Priors, depth image prior) network, Double DIP network Use two DIP networks to divide the input image into a foreground layer and a background layer; the backbone of the image matting layer uses the U-Net network for encoding and decoding, and an auxiliary output layer is added to the base layer of the decoding part for deep supervision and progressive attention The Progressive Attention Refinement Module (PAR), uses the intermediate layer output of the decoder to perform layer-by-layer refinement to obtain the final accurate mask to obtain an accurate segmented image.
在本实施例中,训练图像集可以从公开数据集获取,例如,Alphamatting数据集,Alphamatting数据集包含27张训练图像和8张测试图像,这些图像均有抠图后前景和后景的标准结果图,然后,将这些图像的前景图分别与500张室内场景图像以及500张室外场景图像进行组合,对组合后的图像进行三个不同角度的旋转,这样得到的图像作为训练图像集和测试图像集;也可以根据获取的原始图片生成,具体的,获取原始图片(例如,人像图片、商品图片、环境图片、动物图片、交通工具图片等),计算每张原始图片对应的信噪比,并根据信噪比对原始图片进行过滤,对过滤后的原始图片中的显著性前景进行标注,以便根据标注后的原始图片生成训练数据集。In this embodiment, the training image set can be obtained from a public data set, for example, the Alphamatting data set, the Alphamatting data set contains 27 training images and 8 test images, and these images all have standard results of foreground and background after matting Then, the foreground images of these images are combined with 500 indoor scene images and 500 outdoor scene images respectively, and the combined images are rotated at three different angles, and the obtained images are used as training image sets and test images It can also be generated according to the obtained original pictures, specifically, obtain the original pictures (for example, portrait pictures, product pictures, environment pictures, animal pictures, vehicle pictures, etc.), calculate the signal-to-noise ratio corresponding to each original picture, and The original picture is filtered according to the signal-to-noise ratio, and the salient foreground in the filtered original picture is marked, so as to generate a training data set based on the marked original picture.
步骤S202,通过图像分割层对训练图像集中的图像进行分割,得到初步分割图像集。Step S202, segment the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set.
在本实施例中,图像分割层将训练图像集中的训练图像进行前景层和背景层分割,并通过掩膜将前景层和背景层进行混合,得到重构图像,重构图像即为初步分割图像。In this embodiment, the image segmentation layer divides the training image in the training image set into the foreground layer and the background layer, and mixes the foreground layer and the background layer through a mask to obtain a reconstructed image, which is the preliminary segmented image .
具体的,将训练图像集中的训练图像输入图像分割层,为每个图层分配一个DIP网络,每个DIP网络输入一个随机噪声z i,通过y i=DIP(z i)计算得到前景层y 1和背景层y 2,前景层和背景层两个图层通过掩膜m进行融合,得到重构图像I’,公式如下: Specifically, the training images in the training image set are input into the image segmentation layer, a DIP network is assigned to each layer, each DIP network inputs a random noise z i , and the foreground layer y is obtained by calculating y i =DIP(z i ) 1 and the background layer y 2 , the two layers of the foreground layer and the background layer are fused through the mask m to obtain the reconstructed image I', the formula is as follows:
I'=my 1+(1-m)y 2 I'=my 1 +(1-m)y 2
应当理解,图像分割层为预训练完成的。It should be understood that the image segmentation layer is pre-trained.
步骤S203,将初步分割图像集输入至图像抠图层,得到细化分割图像。Step S203, inputting the preliminary segmented image set into the image matting layer to obtain a refined segmented image.
在本实施例中,图像抠图层至少包括编码器、解码器、渐进注意力细化层(Progressive Attention Refinement,PAR)以及辅助输出层(Output),将初步分割图像输入图像抠图层进行细化分割,得到细化分割图像。In this embodiment, the image matting layer at least includes an encoder, a decoder, a progressive attention refinement layer (Progressive Attention Refinement, PAR) and an auxiliary output layer (Output), and the preliminary segmented image is input into the image matting layer for refinement. Segmentation to obtain a refined segmented image.
在一些可选的实现方式中,上述将初步分割图像集输入至图像抠图层,得到细化分割图像的步骤包括:In some optional implementations, the above steps of inputting the preliminary segmented image set to the image matting layer to obtain the refined segmented image include:
步骤S301,将初步分割图像集输入至编码器进行特征提取,得到编码特征。Step S301, inputting the preliminary segmented image set to the encoder for feature extraction to obtain encoded features.
其中,编码器包括多个卷积神经网络层(Convolution neural network,CNN)和下采样层,下采样层可以为最大池化层(max-pooling)。通过CNN层提取空间特征,生成特征图,max-pooling层对特征图进行下采样,保留较强特征。Wherein, the encoder includes a plurality of convolutional neural network layers (Convolution neural network, CNN) and a downsampling layer, and the downsampling layer can be a maximum pooling layer (max-pooling). The spatial features are extracted through the CNN layer to generate a feature map, and the max-pooling layer downsamples the feature map to retain strong features.
可选的,编码器包括5个卷积层和4个下采样层,5个卷积层分别为第一编码卷积层、第二编码卷积层、第三编码卷积层、第四编码卷积层和第五编码卷积层,第一编码卷积层与第二编码卷积层之间、第二编码卷积层与第三编码卷积层之间、第三编码卷积层与第四编码卷积层之间以及第四编码卷积层与第五编码卷积层之间,均设置有一个下采样层,分别为第一下采样层、第二下采样层、第三下采样层以及第四下采样层。Optionally, the encoder includes 5 convolutional layers and 4 downsampling layers, and the 5 convolutional layers are respectively the first coded convolutional layer, the second coded convolutional layer, the third coded convolutional layer, and the fourth coded convolutional layer. Convolutional layer and the fifth coded convolutional layer, between the first coded convolutional layer and the second coded convolutional layer, between the second coded convolutional layer and the third coded convolutional layer, between the third coded convolutional layer and Between the fourth coded convolutional layer and between the fourth coded convolutional layer and the fifth coded convolutional layer, there is a downsampling layer, respectively the first downsampling layer, the second downsampling layer, and the third downsampling layer. sampling layer and a fourth downsampling layer.
初步分割图像集中的初步分割图像依次通过第一编码卷积层、第一下采样层、第二编码卷积层、第二下采样层、第三编码卷积层、第三下采样层、第四编码卷积层、第四下采样层和第五编码卷积层进行特征提取,得到编码特征。The preliminary segmented images in the preliminary segmented image set pass through the first encoding convolution layer, the first down-sampling layer, the second encoding convolution layer, the second down-sampling layer, the third encoding convolution layer, the third down-sampling layer, the second The four-coded convolutional layer, the fourth down-sampling layer, and the fifth coded convolutional layer perform feature extraction to obtain coded features.
应当理解,编码器卷积层的卷积核和卷积步长可以根据实际情况进行设置。It should be understood that the convolution kernel and the convolution step size of the encoder convolution layer can be set according to actual conditions.
步骤S302,通过解码器将编码特征进行解码,输出解码特征。In step S302, the coded features are decoded by a decoder, and the decoded features are output.
在本实施例中,解码器由多个解码模块组成,每个解码模块包括多个上采样层(Up-sampling layer)和CNN层。每经过一次解码,特征图的大小增加相应倍数,经过多次解码,得到与原始输入的初步分割图像大小相同的特征图,即解码特征。此外,每次解码后的解码特征与编码阶段对应的同样大小的编码特征连接,以融合低级和高级特征。In this embodiment, the decoder is composed of a plurality of decoding modules, and each decoding module includes a plurality of up-sampling layers (Up-sampling layer) and a CNN layer. After each decoding, the size of the feature map is increased by the corresponding multiple, and after multiple decodings, the feature map with the same size as the original input preliminary segmented image is obtained, that is, the decoding feature. Furthermore, the decoded features after each decoding are concatenated with the corresponding encoded features of the same size in the encoding stage to fuse low-level and high-level features.
可选的,解码器包括5个卷积层和4个上采样层,5个卷积层分别为第一解码卷积层、第二解码卷积层、第三解码卷积层、第四解码卷积层和第五解码卷积层,第一解码卷积层与第二解码卷积层之间、第二解码卷积层与第三解码卷积层之间、第三解码卷积层与第四解码卷积层之间以及第四解码卷积层与第五解码卷积层之间,均设置有一个上采样层,分别为第一上采样层、第二上采样层、第三上采样层以及第四上采样层。Optionally, the decoder includes 5 convolutional layers and 4 upsampling layers, and the 5 convolutional layers are respectively the first decoding convolutional layer, the second decoding convolutional layer, the third decoding convolutional layer, and the fourth decoding The convolutional layer and the fifth decoding convolutional layer, between the first decoding convolutional layer and the second decoding convolutional layer, between the second decoding convolutional layer and the third decoding convolutional layer, between the third decoding convolutional layer and Between the fourth decoding convolutional layer and between the fourth decoding convolutional layer and the fifth decoding convolutional layer, there is an upsampling layer, which are respectively the first upsampling layer, the second upsampling layer, and the third upsampling layer. sampling layer and a fourth upsampling layer.
将编码特征输入解码器,依次通过第一解码卷积层、第一上采样层、第二解码卷积层、第二上采样层、第三解码卷积层、第三上采样层、第四解码卷积层、第四上采样层和第五解码卷积层进行解码,得到解码特征。Input the encoded features into the decoder, and pass through the first decoding convolutional layer, the first upsampling layer, the second decoding convolutional layer, the second upsampling layer, the third decoding convolutional layer, the third upsampling layer, the fourth The decoding convolutional layer, the fourth upsampling layer and the fifth decoding convolutional layer are decoded to obtain the decoded features.
应当理解,解码器卷积层的卷积核和卷积步长也是可以根据实际情况进行设置。It should be understood that the convolution kernel and the convolution step size of the convolution layer of the decoder can also be set according to actual conditions.
步骤S303,将解码特征输入辅助输出层,得到输出特征。Step S303, input the decoded feature into the auxiliary output layer to obtain the output feature.
在本实施例中,解码器每个卷积层后对应设置连接有一个辅助输出层,用于对输出的特征进行卷积池化操作,已保留图像更多的特征信息。In this embodiment, each convolutional layer of the decoder is connected with an auxiliary output layer, which is used to perform convolution pooling operation on the output features, and more feature information of the image has been preserved.
步骤S304,通过渐进注意力细化层对输出特征进行注意力计算,得到注意力特征,并根据注意力特征输出细化分割图像。Step S304, perform attention calculation on the output features through the progressive attention refinement layer, obtain attention features, and output a refined segmented image according to the attention features.
在本实施例中,渐进注意力层分别与前一层辅助输出层、与当前解码卷积层连接的当前解码辅助输出层以及渐进注意力层对应的辅助输出层(当前层辅助输出层)连接,具体的,渐进注意力层的输入为前一层辅助输出层以及当前解码辅助输出层的输出,进行注意 力操作后,通过当前层辅助输出层输出。In this embodiment, the progressive attention layer is respectively connected to the auxiliary output layer of the previous layer, the current decoding auxiliary output layer connected to the current decoding convolution layer, and the auxiliary output layer corresponding to the progressive attention layer (current layer auxiliary output layer) , specifically, the input of the progressive attention layer is the output of the auxiliary output layer of the previous layer and the auxiliary output layer of the current decoding. After the attention operation is performed, the output is output through the auxiliary output layer of the current layer.
需要说明的是,第一解码卷积层没有设置对应的渐进注意力层与其连接,如果第一解码卷积层为当前层,与其连接的辅助输出层既是当前解码输出层,也相当于当前层输出层。It should be noted that the first decoding convolutional layer does not have a corresponding progressive attention layer connected to it. If the first decoding convolutional layer is the current layer, the auxiliary output layer connected to it is not only the current decoding output layer, but also equivalent to the current layer output layer.
渐进注意力层至少包括编码层、注意力卷积层、第一融合层、softmax层、第二融合层、连接层以及解码层,将输出特征依次通过编码层、注意力卷积层、第一融合层、softmax层、第二融合层、连接层以及解码层进行相应计算,输出更精确的细化分割图像。The progressive attention layer includes at least an encoding layer, an attention convolution layer, a first fusion layer, a softmax layer, a second fusion layer, a connection layer, and a decoding layer, and the output features are sequentially passed through the encoding layer, attention convolution layer, first The fusion layer, the softmax layer, the second fusion layer, the connection layer and the decoding layer perform corresponding calculations, and output more accurate refined segmentation images.
步骤S204,基于细化分割图像确定目标损失函数,根据目标损失函数对初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型。In step S204, the target loss function is determined based on the refined and segmented image, the initial image matting model is iteratively updated according to the target loss function, and the trained image matting model is output.
在本实施例中,将训练图像集输入至初始图像抠图模型中进行训练,一轮训练结束后,计算初始图像抠图模型的目标损失函数,得到损失函数值,根据损失函数值调整模型参数,继续进行迭代训练,模型训练到一定程度,此时,模型的性能达到最优状态,损失函数值无法继续下降,即收敛。而判断收敛的方式只需要计算前后两轮迭代中的损失函数值,若损失函数值仍在变化,则继续选择训练图像输入至调整模型参数后的图像抠图模型中,以对图像抠图模型继续进行迭代训练;若损失函数值没有显著变化,则可认为模型收敛。模型收敛后,输出最终的图像抠图模型。In this embodiment, the training image set is input into the initial image matting model for training. After a round of training, the target loss function of the initial image matting model is calculated to obtain the loss function value, and the model parameters are adjusted according to the loss function value , continue iterative training, and the model is trained to a certain extent. At this time, the performance of the model reaches the optimal state, and the value of the loss function cannot continue to decrease, that is, convergence. The method of judging the convergence only needs to calculate the loss function value in the previous two rounds of iterations. If the loss function value is still changing, continue to select the training image and input it to the image matting model after adjusting the model parameters to improve the image matting model. Continue iterative training; if the loss function value does not change significantly, the model can be considered converged. After the model converges, the final image matting model is output.
步骤S205,获取目标图像,将目标图像输入图像抠图模型,得到抠图结果。Step S205, acquiring a target image, inputting the target image into the image matting model, and obtaining a matting result.
在本实施例中,将获取到的目标图像输入至训练完成的图像抠图模型中进行抠图操作,可以得到对应的抠图结果。In this embodiment, the acquired target image is input into the trained image matting model to perform a matting operation, and a corresponding matting result can be obtained.
需要强调的是,为进一步保证目标图像的私密和安全性,上述目标图像还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the target image, the above target image can also be stored in a block chain node.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
本申请通过训练完成的图像抠图模型中的图像分割层对图像进行初步分割,再将初步分割结果输出图像抠图层进一步进行细化分割,可以实现无需任何额外输入来进行图像抠图,并且完全避免了人工干预,达到完全的自动化抠图,提高了图像抠图效率,同时,通过图像抠图模型可以达到更精确的抠图结果,进一步提高抠图精确度和准确率。In this application, the image is initially segmented through the image segmentation layer in the trained image matting model, and then the preliminary segmentation result is output to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and It completely avoids manual intervention, achieves complete automatic matting, and improves the efficiency of image matting. At the same time, more accurate matting results can be achieved through the image matting model, further improving the accuracy and accuracy of matting.
在本实施例的一些可选的实现方式中,在上述通过图像分割层对训练图像集中的图像进行分割,得到初步分割图像集的步骤之前还包括:In some optional implementations of this embodiment, before the above step of segmenting the images in the training image set through the image segmentation layer to obtain the preliminary segmented image set, it also includes:
对图像分割层进行预训练,得到预训练完成的图像分割层,具体包括:Pre-train the image segmentation layer to obtain the pre-trained image segmentation layer, including:
将获取到的原始图像集输入图像分割层,输出重构图像;Input the obtained original image set into the image segmentation layer, and output the reconstructed image;
根据重构图像得到第一损失函数,基于第一损失函数对图像分割层进行迭代更新,输出预训练完成的图像分割层。The first loss function is obtained according to the reconstructed image, the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained image segmentation layer is output.
在本实施例中,原始图像集可以与上述训练数据集的获取渠道相同,也可以不同,根据实际需要进行选择。In this embodiment, the acquisition channel of the original image set may be the same as that of the above training data set, or it may be different, and it is selected according to actual needs.
将原始图像集中的原始图像输入图像分割层,分割为前景层y 1和背景层y 2,通过初始掩膜m 0将前景层y 1和背景层y 2进行融合,得到重构图像I’。其中,初始掩膜m 0可以是预先设置的,也可以是随机生成的,随机生成则是根据输入随机噪声获得。 The original image in the original image set is input into the image segmentation layer, which is divided into the foreground layer y 1 and the background layer y 2 , and the foreground layer y 1 and the background layer y 2 are fused through the initial mask m 0 to obtain the reconstructed image I'. Wherein, the initial mask m 0 can be preset or randomly generated, and the random generation is obtained according to input random noise.
在一些可选的实现方式中,上述根据重构图像得到第一损失函数,基于第一损失函数对图像分割层进行迭代更新,输出预训练完成的图像分割层的步骤包括:In some optional implementations, the first loss function is obtained according to the reconstructed image, the image segmentation layer is iteratively updated based on the first loss function, and the step of outputting the pre-trained image segmentation layer includes:
根据重构图像和原始图像集中的原始图像计算得到重构损失;The reconstruction loss is calculated from the reconstructed image and the original image in the original image set;
根据重构损失确定第一损失函数;determining a first loss function according to the reconstruction loss;
基于第一损失函数调整图像分割层的分割参数;Adjusting segmentation parameters of the image segmentation layer based on the first loss function;
当满足迭代结束条件时,根据分割参数输出预训练完成的图像分割层。When the iteration end condition is met, output the pre-trained image segmentation layer according to the segmentation parameters.
其中,第一损失函数采用如下公式计算:Among them, the first loss function is calculated by the following formula:
Loss DDIP=Loss Reconst+β·Loss Excl+γ·Loss Reg Loss DDIP =Loss Reconst +β·Loss Excl +γ·Loss Reg
其中,Loss Reconst为重构损失,Loss Reconst=||I-I'||,I为原始图像;Loss Excl为一种互斥损失,让y 1和y 2的梯度之间相关性最小化;Loss Reg=(Σ x|m(x)-0.5|) -1,Loss Reg为正则化损失,主要是用来约束融合mask(掩膜)的,用于将前景初始掩膜m 0二值化,β、γ为预设的加权参数。 Among them, Loss Reconst is the reconstruction loss, Loss Reconst =||I-I'||, I is the original image; Loss Excl is a mutually exclusive loss, which minimizes the correlation between the gradients of y 1 and y 2 ; Loss Reg =(Σ x |m(x)-0.5|) -1 , Loss Reg is a regularization loss, which is mainly used to constrain the fusion mask (mask), and is used to binarize the foreground initial mask m 0 , β, γ are preset weighting parameters.
计算出第一损失函数,基于第一损失函数调整图像分割层的分割参数,继续进行迭代训练,图像分割层训练到一定程度,此时,图像分割层的性能达到预设状态,预设状态可以是收敛,也可以是第一损失函数的损失函数值达到预设阈值,即说明满足迭代结束条件,图像分割层预训练完成,根据迭代结束时的分割参数输出图像分割层。Calculate the first loss function, adjust the segmentation parameters of the image segmentation layer based on the first loss function, and continue iterative training. The image segmentation layer is trained to a certain extent. At this time, the performance of the image segmentation layer reaches the preset state, and the preset state can be It is convergence, or the loss function value of the first loss function reaches the preset threshold, which means that the iteration end condition is satisfied, the image segmentation layer pre-training is completed, and the image segmentation layer is output according to the segmentation parameters at the end of the iteration.
本实施例通过对图像分割层进行预训练,调整图像分割层的初始掩膜m 0,预训练完成后,获得预训练的图像分割层的掩膜m,掩膜m在之后的训练过程中仍会进行学习调整,以便获得更加精确的前景掩膜m,用于后续图像抠图层。 In this embodiment, by pre-training the image segmentation layer, the initial mask m 0 of the image segmentation layer is adjusted. After the pre-training is completed, the mask m of the pre-trained image segmentation layer is obtained, and the mask m remains Learning adjustments will be made to obtain a more accurate foreground mask m for subsequent image matting layers.
在本实施例的一些可选的实现方式中,上述通过渐进注意力细化层对输出特征进行注意力计算,得到注意力特征的步骤包括:In some optional implementations of this embodiment, the above step of performing attention calculation on the output features through the progressive attention refinement layer, and obtaining the attention features includes:
步骤S401,将前一层辅助输出层输出的输出特征上采样到与当前层辅助输出层的输出特征相同尺寸,得到上采样输出特征,根据上采样输出特征计算得到不确定区域掩膜。Step S401, upsampling the output features output by the previous auxiliary output layer to the same size as the output features of the current auxiliary output layer to obtain the upsampled output features, and calculating an uncertain region mask based on the upsampled output features.
具体的,前一层辅助输出层输出的输出特征α l-1上采样到与当前辅助输出层输出特征α l相同尺寸,之后通过以下变换公式: Specifically, the output feature α l-1 output by the previous auxiliary output layer is upsampled to the same size as the output feature α l of the current auxiliary output layer, and then the following transformation formula is used:
Figure PCTCN2022089507-appb-000001
Figure PCTCN2022089507-appb-000001
其中f α→m(x,y)为图像(x,y)点处由α蒙版获得不确定区域掩膜m的变换公式,α l为当前层辅助输出层l的α蒙版。通过
Figure PCTCN2022089507-appb-000002
可以获得前一层辅助输出层输出的不确定区域掩膜m l-1,以及当前层辅助输出层输出的不确定区域掩膜m l
Where f α→m (x, y) is the transformation formula for obtaining the mask m of the uncertain region from the α mask at point (x, y) of the image, and α l is the α mask of the auxiliary output layer l of the current layer. pass
Figure PCTCN2022089507-appb-000002
The uncertainty region mask m l-1 output by the previous auxiliary output layer and the uncertainty region mask m l output by the current auxiliary output layer can be obtained.
步骤S402,对不确定区域掩膜进行特征提取,得到不确定区域特征,对不确定区域特征进行注意力计算,得到注意力分数。Step S402, perform feature extraction on the mask of the uncertain region to obtain the feature of the uncertain region, and perform attention calculation on the feature of the uncertain region to obtain the attention score.
将前一层辅助输出层输出的不确定区域掩膜m l-1与当前层辅助输出层输出的不确定区域掩膜m l经过CNN组成的编码层分别进行特征提取,得到
Figure PCTCN2022089507-appb-000003
Figure PCTCN2022089507-appb-000004
然后通过如下公式,计算获得注意力分数,作为优化趋势作用并修正于当前层辅助输出层输出的不确定区域特征
Figure PCTCN2022089507-appb-000005
注意力分数计算公式如下:
The uncertain region mask m l-1 output by the previous auxiliary output layer and the uncertain region mask m l output by the auxiliary output layer of the current layer are extracted through the encoding layer composed of CNN respectively, and the obtained
Figure PCTCN2022089507-appb-000003
and
Figure PCTCN2022089507-appb-000004
Then use the following formula to calculate and obtain the attention score, which acts as an optimization trend and corrects the uncertain region characteristics output by the auxiliary output layer of the current layer
Figure PCTCN2022089507-appb-000005
The formula for calculating the attention score is as follows:
Figure PCTCN2022089507-appb-000006
Figure PCTCN2022089507-appb-000006
其中,
Figure PCTCN2022089507-appb-000007
为1x1的卷积操作,
Figure PCTCN2022089507-appb-000008
为l-1层的不确定区域特征,
Figure PCTCN2022089507-appb-000009
为l层的不确定区域特征,X为计算的优化趋势注意力分数。
in,
Figure PCTCN2022089507-appb-000007
is a 1x1 convolution operation,
Figure PCTCN2022089507-appb-000008
is the uncertainty region feature of layer l-1,
Figure PCTCN2022089507-appb-000009
is the uncertainty region feature of layer l, and X is the calculated optimization trend attention score.
步骤S403,根据注意力分数修正不确定区域特征,得到修正后的不确定区域特征作为注意力特征。Step S403, correcting the feature of the uncertain region according to the attention score, and obtaining the corrected feature of the uncertain region as the attention feature.
采用如下公式对不确定区域特征进行修正:The following formula is used to correct the characteristics of the uncertain region:
Figure PCTCN2022089507-appb-000010
Figure PCTCN2022089507-appb-000010
其中,
Figure PCTCN2022089507-appb-000011
为1x1的卷积操作,
Figure PCTCN2022089507-appb-000012
为修正后的l层的不确定区域特征。
in,
Figure PCTCN2022089507-appb-000011
is a 1x1 convolution operation,
Figure PCTCN2022089507-appb-000012
is the feature of the uncertain region of the corrected layer l.
本实施例通过注意力计算修正不确定区域特征得到注意力特征,可以保证后续细化抠图更加精确。In this embodiment, attention features are obtained by correcting uncertain region features through attention calculations, which can ensure more accurate subsequent refinement and matting.
在一些可选的实现方式中,上述根据注意力特征输出细化分割图像的步骤包括:In some optional implementation manners, the above step of outputting the refined segmented image according to the attention feature includes:
对当前层辅助输出层的输出特征进行特征提取,得到提取特征,将注意力特征与提取特征进行拼接,得到拼接特征;Feature extraction is performed on the output features of the auxiliary output layer of the current layer to obtain the extracted features, and the attention features are spliced with the extracted features to obtain the spliced features;
将拼接特征进行解码,输出细化分割图像。Decode the concatenated features and output a refined segmented image.
具体的,将当前层辅助输出层输出α l通过由CNN组成的编码层进行特征提取,将提取的特征F α与获得的修正后的当前层辅助输出层注意力特征进行特征拼接,如下公式: Specifically, the output α l of the auxiliary output layer of the current layer is extracted through the coding layer composed of CNN, and the extracted feature F α is combined with the obtained corrected attention feature of the auxiliary output layer of the current layer, as follows:
Figure PCTCN2022089507-appb-000013
Figure PCTCN2022089507-appb-000013
其中,
Figure PCTCN2022089507-appb-000014
为注意力特征,F α为当前层辅助输出层α l蒙版经过编码层提取的特征,F α'为修正的当前辅助输出层α蒙版特征,即拼接特征。
in,
Figure PCTCN2022089507-appb-000014
is the attention feature, F α is the feature extracted by the mask of the current auxiliary output layer α l through the encoding layer, and F α ' is the modified feature of the mask of the current auxiliary output layer α, that is, the stitching feature.
通过由CNN组成的解码层对拼接特征F α'进行解码,并获得细化分割图像。 The spliced feature F α ' is decoded by a decoding layer composed of CNN, and a refined segmented image is obtained.
在本实施例的一些可选的实现方式中,上述基于细化分割图像确定目标损失函数,根据目标损失函数对初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型的步骤包括:In some optional implementations of this embodiment, the target loss function is determined based on the refined segmented image, the initial image matting model is iteratively updated according to the target loss function, and the step of outputting the trained image matting model includes:
根据细化分割图像得到第二损失函数;Obtaining a second loss function according to the refined segmentation image;
基于第一损失函数和第二损失函数确定目标损失函数;determining a target loss function based on the first loss function and the second loss function;
根据调整初始图像抠图模型的模型参数;Adjust the model parameters of the initial image matting model according to;
当满足迭代结束条件时,根据模型参数生成训练完成的图像抠图模型。When the iteration end condition is satisfied, a trained image matting model is generated according to the model parameters.
具体的,第二损失函数的计算公式如下:Specifically, the calculation formula of the second loss function is as follows:
Figure PCTCN2022089507-appb-000015
Figure PCTCN2022089507-appb-000015
其中,ω l为超参;α gt表示当前层辅助输出层的输出真值;Loss l表示当前层辅助输出层的不确定区域的损失,包括L1损失、组合损失和拉普拉斯损失,公式为: Among them, ω l is a hyperparameter; α gt represents the output truth value of the auxiliary output layer of the current layer; Loss l represents the loss of the uncertain region of the auxiliary output layer of the current layer, including L1 loss, combination loss and Laplacian loss, the formula for:
Loss lgt·m ll·m l)=Loss L1gt·m ll·m l)+Loss compgt·m ll·m l)+Loss lapgt·m ll·m l) Loss lgt ·m ll ·m l )=Loss L1gt ·m ll ·m l )+Loss compgt ·m ll ·m l )+Loss lapgt m ll m l )
基于第一损失函数和第二损失函数确定目标损失函数,则目标损失函数计算公式如下:The target loss function is determined based on the first loss function and the second loss function, and the calculation formula of the target loss function is as follows:
Loss=δLoss DDIP+ε·Loss Matting Loss=δLoss DDIP +ε·Loss Matting
其中,δ和ε为预设的加权参数。Among them, δ and ε are preset weighting parameters.
在本实施例中,根据损失函数值调整模型参数,继续进行迭代训练,模型训练到一定程度,此时,模型的性能达到最优状态,损失函数值无法继续下降,即收敛。In this embodiment, the model parameters are adjusted according to the value of the loss function, and the iterative training is continued until the model is trained to a certain extent. At this time, the performance of the model reaches the optimal state, and the value of the loss function cannot continue to decrease, that is, it converges.
满足迭代结束条件即为模型收敛,模型收敛后,根据最终调整的模型参数输出最终的图像抠图模型。Satisfying the iteration end condition is the model convergence. After the model converges, the final image matting model is output according to the final adjusted model parameters.
本实施例通过训练预构建的图像抠图模型,可以提高图像抠图模型的抠图精确度和准确率。In this embodiment, by training the pre-built image matting model, the matting accuracy and accuracy of the image matting model can be improved.
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The application can be used in numerous general purpose or special purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc. This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质, 或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the program is executed, it may include the procedures of the embodiments of the above-mentioned methods. Wherein, the aforementioned storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flow charts of the drawings are shown sequentially according to the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
进一步参考图5,作为对上述图2所示方法的实现,本申请提供了一种基于图像分割的图像抠图装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 5 , as an implementation of the above-mentioned method shown in FIG. 2 , the present application provides an embodiment of an image matting device based on image segmentation, which corresponds to the method embodiment shown in FIG. 2 , the device can be specifically applied to various electronic devices.
如图5所示,本实施例所述的基于图像分割的图像抠图装置500包括:获取模块501、初步分割模块502、细化分割模块503、训练模块504以及抠图模块505。其中:As shown in FIG. 5 , the image matting device 500 based on image segmentation in this embodiment includes: an acquisition module 501 , a preliminary segmentation module 502 , a refined segmentation module 503 , a training module 504 and a matting module 505 . in:
获取模块501用于获取训练图像集,将所述训练图像集输入预构建的初始图像抠图模型,其中,图像抠图模型包括图像分割层和图像抠图层。The obtaining module 501 is used to obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer.
初步分割模块502用于通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集;The preliminary segmentation module 502 is used to segment the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set;
细化分割模块503用于将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像;The refinement and segmentation module 503 is used to input the preliminary segmentation image set to the image matting layer to obtain a refinement segmentation image;
训练模块504用于基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型;The training module 504 is configured to determine a target loss function based on the refined segmented image, iteratively update the initial image matting model according to the target loss function, and output the trained image matting model;
抠图模块505用于获取目标图像,将所述目标图像输入图像抠图模型,得到抠图结果。The matting module 505 is used to acquire a target image, and input the target image into the image matting model to obtain a matting result.
上述基于图像分割的图像抠图装置,通过训练完成的图像抠图模型中的图像分割层对图像进行初步分割,再将初步分割结果输出图像抠图层进一步进行细化分割,可以实现无需任何额外输入来进行图像抠图,并且完全避免了人工干预,达到完全的自动化抠图,提高了图像抠图效率,同时,通过图像抠图模型可以达到更精确的抠图结果,提高抠图精确度和准确率。The above-mentioned image segmentation-based image matting device performs preliminary segmentation on the image through the image segmentation layer in the trained image matting model, and then outputs the preliminary segmentation result to the image matting layer for further refinement and segmentation, which can realize without any additional Input is used for image matting, and human intervention is completely avoided to achieve complete automatic matting, which improves the efficiency of image matting. At the same time, through the image matting model, more accurate matting results can be achieved, improving the accuracy and Accuracy.
在本实施例的一些可选的实现方式中,初步分割模块502包括重构子模块和预训练子模块,其中,重构子模块用于将获取到的原始图像集输入图像分割层,输出重构图像;预训练子模块用于根据所述重构图像得到第一损失函数,基于所述第一损失函数对所述图像分割层进行迭代更新,输出预训练完成的所述图像分割层。In some optional implementations of this embodiment, the preliminary segmentation module 502 includes a reconstruction submodule and a pre-training submodule, wherein the reconstruction submodule is used to input the obtained original image set into the image segmentation layer, and output the reconstructed submodule. Constructing an image; the pre-training submodule is used to obtain a first loss function according to the reconstructed image, iteratively update the image segmentation layer based on the first loss function, and output the pre-trained image segmentation layer.
本实施例通过对图像分割层进行预训练,调整图像分割层的初始掩膜,预训练完成后,获得预训练的图像分割层的掩膜,保证在之后的训练过程中获得更加精确的前景掩膜,用于后续图像抠图层进行抠图。In this embodiment, the initial mask of the image segmentation layer is adjusted by pre-training the image segmentation layer. After the pre-training is completed, the mask of the pre-trained image segmentation layer is obtained to ensure that a more accurate foreground mask is obtained in the subsequent training process. Membrane, used for subsequent image matting layers for matting.
在本实施例中,细化分割模块503包括特征提取子模块、解码子模块、输出子模块以及注意力子模块,其中:In this embodiment, the refinement and segmentation module 503 includes a feature extraction submodule, a decoding submodule, an output submodule, and an attention submodule, wherein:
特征提取子模块用于将所述初步分割图像集输入至所述编码器进行特征提取,得到编码特征;The feature extraction submodule is used to input the preliminary segmented image set to the encoder for feature extraction to obtain encoded features;
解码子模块用于通过所述解码器将所述编码特征进行解码,输出解码特征;The decoding submodule is used to decode the encoded features through the decoder, and output the decoded features;
输出子模块用于将所述解码特征输入所述辅助输出层,得到输出特征;The output sub-module is used to input the decoding features into the auxiliary output layer to obtain output features;
注意力子模块用于通过所述渐进注意力细化层对所述输出特征进行注意力计算,得到注意力特征,并根据所述注意力特征输出所述细化分割图像。The attention sub-module is used to perform attention calculation on the output features through the progressive attention refinement layer to obtain attention features, and output the refined segmentation image according to the attention features.
本实施例通过注意力计算得到注意力特征,可以输出更精确的细化分割图像。In this embodiment, attention features are obtained through attention calculation, and more accurate refined and segmented images can be output.
在本实施例中,注意力子模块包括上采样单元、注意力计算单元以及修正单元,其中:In this embodiment, the attention submodule includes an upsampling unit, an attention calculation unit, and a correction unit, wherein:
上采样单元用于将前一层辅助输出层输出的输出特征上采样到与当前层辅助输出层的输出特征相同尺寸,得到上采样输出特征,根据所述上采样输出特征计算得到不确定区域掩膜;The upsampling unit is used to upsample the output features output by the previous auxiliary output layer to the same size as the output features of the current auxiliary output layer to obtain the upsampled output features, and calculate the uncertain region mask according to the upsampled output features membrane;
注意力计算单元用于对所述不确定区域掩膜进行特征提取,得到不确定区域特征,对所述不确定区域特征进行注意力计算,得到注意力分数;The attention calculation unit is used to perform feature extraction on the uncertain area mask to obtain uncertain area features, and perform attention calculation on the uncertain area features to obtain an attention score;
修正单元用于根据所述注意力分数修正不确定区域特征,得到修正后的不确定区域特征作为注意力特征。The correction unit is used to modify the feature of the uncertain region according to the attention score, and obtain the corrected feature of the uncertain region as the attention feature.
本实施例通过注意力计算修正不确定区域特征得到注意力特征,可以保证后续细化抠图更加精确。In this embodiment, attention features are obtained by correcting uncertain region features through attention calculations, which can ensure more accurate subsequent refinement and matting.
在一些可选的实现方式中,注意力子模块还包括拼接单元和解码单元,其中:In some optional implementations, the attention submodule also includes a splicing unit and a decoding unit, wherein:
拼接单元用于对当前层辅助输出层的输出特征进行特征提取,得到提取特征,将所述注意力特征与所述提取特征进行拼接,得到拼接特征;The splicing unit is used to extract the features of the output features of the current layer auxiliary output layer to obtain the extracted features, and splice the attention features and the extracted features to obtain the spliced features;
解码单元用于将所述拼接特征进行解码,输出所述细化分割图像。The decoding unit is used to decode the mosaic feature, and output the thinned and segmented image.
在本实施例中,训练模块504包括损失函数计算子模块、调整子模块以及生成子模块,其中:In this embodiment, the training module 504 includes a loss function calculation submodule, an adjustment submodule, and a generation submodule, wherein:
损失函数计算子模块用于根据所述细化分割图像得到第二损失函数;The loss function calculation submodule is used to obtain a second loss function according to the thinned and segmented image;
损失函数计算子模块还用于基于所述第一损失函数和所述第二损失函数确定目标损失函数;The loss function calculation submodule is also used to determine a target loss function based on the first loss function and the second loss function;
调整子模块用于根据所述调整所述初始图像抠图模型的模型参数;The adjustment sub-module is used to adjust the model parameters of the initial image matting model according to the description;
生成子模块用于当满足迭代结束条件时,根据所述模型参数生成训练完成的图像抠图模型。The generation sub-module is used to generate a trained image matting model according to the model parameters when the iteration end condition is met.
本实施例通过训练预构建的图像抠图模型,可以提高图像抠图模型的抠图精确度和准确率。In this embodiment, by training the pre-built image matting model, the matting accuracy and accuracy of the image matting model can be improved.
在本实施例中,预训练子模块包括计算单元、调整单元以及输出单元,其中:In this embodiment, the pre-training submodule includes a calculation unit, an adjustment unit, and an output unit, wherein:
计算单元用于根据所述重构图像和所述原始图像集中的原始图像计算得到重构损失;The calculation unit is used to calculate the reconstruction loss according to the reconstructed image and the original images in the original image set;
计算单元还用于根据所述重构损失确定所述第一损失函数;The computing unit is further configured to determine the first loss function according to the reconstruction loss;
调整单元用于基于所述第一损失函数调整所述图像分割层的分割参数;An adjustment unit is configured to adjust segmentation parameters of the image segmentation layer based on the first loss function;
输出单元用于当满足迭代结束条件时,根据所述分割参数输出预训练完成的图像分割层。The output unit is used to output the pre-trained image segmentation layer according to the segmentation parameters when the iteration end condition is met.
本实施例通过根据重构损失确定的第一损失函数进行参数调整,可以提升预训练效率,同时保证后续训练获得更精确的掩膜。In this embodiment, by adjusting the parameters according to the first loss function determined by the reconstruction loss, the efficiency of pre-training can be improved, and at the same time, more accurate masks can be ensured for subsequent training.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图6,图6为本实施例计算机设备基本结构框图。In order to solve the above technical problems, the embodiment of the present application further provides computer equipment. Please refer to FIG. 6 for details. FIG. 6 is a block diagram of the basic structure of the computer device in this embodiment.
所述计算机设备6包括通过系统总线相互通信连接存储器61、处理器62、网络接口63。需要指出的是,图中仅示出了具有组件61-63的计算机设备6,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 6 includes a memory 61 , a processor 62 , and a network interface 63 connected to each other through a system bus. It should be noted that only the computer device 6 is shown with components 61-63, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, dedicated Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be computing equipment such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can perform human-computer interaction with the user through keyboard, mouse, remote controller, touch panel or voice control device.
所述存储器61至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器61可以是所述计算机设备6的内部存储单元,例如该计算机设备6的硬盘或内存。在另一些实施例中,所述存储器61也可以是所述计算机设备6的外部存储设备,例如该计算机 设备6上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器61还可以既包括所述计算机设备6的内部存储单元也包括其外部存储设备。本实施例中,所述存储器61通常用于存储安装于所述计算机设备6的操作系统和各类应用软件,例如基于图像分割的图像抠图方法的计算机可读指令等。此外,所述存储器61还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 61 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6 , such as a hard disk or memory of the computer device 6 . In other embodiments, the memory 61 can also be an external storage device of the computer device 6, such as a plug-in hard disk equipped on the computer device 6, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 61 is generally used to store the operating system and various application software installed in the computer device 6, such as computer-readable instructions of the image matting method based on image segmentation. In addition, the memory 61 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器62在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器62通常用于控制所述计算机设备6的总体操作。本实施例中,所述处理器62用于运行所述存储器61中存储的计算机可读指令或者处理数据,例如运行所述基于图像分割的图像抠图方法的计算机可读指令。The processor 62 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. This processor 62 is generally used to control the general operation of said computer device 6 . In this embodiment, the processor 62 is configured to run computer-readable instructions stored in the memory 61 or process data, such as computer-readable instructions for running the image segmentation-based image matting method.
所述网络接口63可包括无线网络接口或有线网络接口,该网络接口63通常用于在所述计算机设备6与其他电子设备之间建立通信连接。The network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
本实施例通过处理器执行存储在存储器的计算机可读指令时实现如上述实施例基于图像分割的图像抠图方法的步骤,通过训练完成的图像抠图模型中的图像分割层对图像进行初步分割,再将初步分割结果输出图像抠图层进一步进行细化分割,可以实现无需任何额外输入来进行图像抠图,并且完全避免了人工干预,达到完全的自动化抠图,提高了图像抠图效率,同时,通过图像抠图模型可以达到更精确的抠图结果,提高抠图精确度和准确率。In this embodiment, when the processor executes the computer-readable instructions stored in the memory, the steps of the image matting method based on image segmentation in the above embodiment are realized, and the image is initially segmented through the image segmentation layer in the trained image matting model. , and then output the preliminary segmentation results to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and completely avoid manual intervention, achieve complete automatic matting, and improve the efficiency of image matting. At the same time, more accurate matting results can be achieved through the image matting model, and the accuracy and accuracy of matting can be improved.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于图像分割的图像抠图方法的步骤,通过训练完成的图像抠图模型中的图像分割层对图像进行初步分割,再将初步分割结果输出图像抠图层进一步进行细化分割,可以实现无需任何额外输入来进行图像抠图,并且完全避免了人工干预,达到完全的自动化抠图,提高了图像抠图效率,同时,通过图像抠图模型可以达到更精确的抠图结果,提高抠图精确度和准确率。The present application also provides another implementation manner, which is to provide a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor, so that the at least one processor executes the steps of the image matting method based on image segmentation as described above , the image is initially segmented through the image segmentation layer in the trained image matting model, and then the preliminary segmentation result is output to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and is completely It avoids manual intervention, achieves complete automatic matting, and improves the efficiency of image matting. At the same time, through the image matting model, more accurate matting results can be achieved, and the accuracy and accuracy of matting can be improved.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Apparently, the embodiments described above are only some of the embodiments of the present application, but not all of them. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms, on the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure content of the present application more thorough and comprehensive. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features . All equivalent structures made using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are also within the scope of protection of this application.

Claims (20)

  1. 一种基于图像分割的图像抠图方法,包括下述步骤:A method for image matting based on image segmentation, comprising the steps of:
    获取训练图像集,将所述训练图像集输入预构建的初始图像抠图模型,其中,所述初始图像抠图模型包括图像分割层和图像抠图层;Obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the initial image matting model includes an image segmentation layer and an image matting layer;
    通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集;Segmenting images in the training image set by the image segmentation layer to obtain a preliminary segmented image set;
    将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像;Inputting the preliminary segmented image set to the image matting layer to obtain a refined segmented image;
    基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型;Determining a target loss function based on the refined segmented image, iteratively updating the initial image matting model according to the target loss function, and outputting a trained image matting model;
    获取目标图像,将所述目标图像输入图像抠图模型,得到抠图结果。A target image is acquired, and the target image is input into an image matting model to obtain a matting result.
  2. 根据权利要求1所述的基于图像分割的图像抠图方法,其中,在所述通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集的步骤之前还包括:The image matting method based on image segmentation according to claim 1, wherein, before the step of segmenting the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set, it also includes:
    将获取到的原始图像集输入图像分割层,输出重构图像;Input the obtained original image set into the image segmentation layer, and output the reconstructed image;
    根据所述重构图像得到第一损失函数,基于所述第一损失函数对所述图像分割层进行迭代更新,输出预训练完成的所述图像分割层。A first loss function is obtained according to the reconstructed image, and the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained image segmentation layer is output.
  3. 根据权利要求2所述的基于图像分割的图像抠图方法,其中,所述图像抠图层至少包括编码器、解码器、渐进注意力细化层以及辅助输出层,所述将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像的步骤包括:The image matting method based on image segmentation according to claim 2, wherein the image matting layer at least includes an encoder, a decoder, a progressive attention refinement layer, and an auxiliary output layer, and the preliminary segmentation The image set is input to the image matting layer, and the steps of obtaining the refined and segmented images include:
    将所述初步分割图像集输入至所述编码器进行特征提取,得到编码特征;Inputting the preliminary segmented image set to the encoder for feature extraction to obtain encoded features;
    通过所述解码器将所述编码特征进行解码,输出解码特征;Decoding the encoded features by the decoder, and outputting the decoded features;
    将所述解码特征输入所述辅助输出层,得到输出特征;inputting the decoded features into the auxiliary output layer to obtain output features;
    通过所述渐进注意力细化层对所述输出特征进行注意力计算,得到注意力特征,并根据所述注意力特征输出所述细化分割图像。Attention calculation is performed on the output features through the progressive attention refinement layer to obtain attention features, and the refined segmentation image is output according to the attention features.
  4. 根据权利要求3所述的基于图像分割的图像抠图方法,其中,所述通过所述渐进注意力细化层对所述输出特征进行注意力计算,得到注意力特征的步骤包括:The image matting method based on image segmentation according to claim 3, wherein the step of performing attention calculation on the output features through the progressive attention refinement layer, and obtaining attention features comprises:
    将前一层辅助输出层输出的输出特征上采样到与当前层辅助输出层的输出特征相同尺寸,得到上采样输出特征,根据所述上采样输出特征计算得到不确定区域掩膜;Upsampling the output feature output by the previous layer of auxiliary output layer to the same size as the output feature of the current layer auxiliary output layer to obtain an upsampling output feature, and calculating an uncertain region mask according to the upsampling output feature;
    对所述不确定区域掩膜进行特征提取,得到不确定区域特征,对所述不确定区域特征进行注意力计算,得到注意力分数;performing feature extraction on the uncertain region mask to obtain uncertain region features, and performing attention calculation on the uncertain region features to obtain an attention score;
    根据所述注意力分数修正不确定区域特征,得到修正后的不确定区域特征作为注意力特征。The uncertain region features are corrected according to the attention scores, and the corrected uncertain region features are obtained as attention features.
  5. 根据权利要求4所述的基于图像分割的图像抠图方法,其中,所述根据所述注意力特征输出所述细化分割图像的步骤包括:The image matting method based on image segmentation according to claim 4, wherein the step of outputting the refined segmented image according to the attention feature comprises:
    对当前层辅助输出层的输出特征进行特征提取,得到提取特征,将所述注意力特征与所述提取特征进行拼接,得到拼接特征;Carrying out feature extraction on the output features of the auxiliary output layer of the current layer to obtain the extracted features, and splicing the attention features and the extracted features to obtain the spliced features;
    将所述拼接特征进行解码,输出所述细化分割图像。The spliced features are decoded, and the thinned and segmented images are output.
  6. 根据权利要求2至5中任一项所述的基于图像分割的图像抠图方法,其中,所述基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型的步骤包括:The image matting method based on image segmentation according to any one of claims 2 to 5, wherein the target loss function is determined based on the refined segmented image, and the initial image matting is performed according to the target loss function The graph model is iteratively updated, and the steps of outputting the trained image matting model include:
    根据所述细化分割图像得到第二损失函数;obtaining a second loss function according to the refined segmented image;
    基于所述第一损失函数和所述第二损失函数确定目标损失函数;determining a target loss function based on the first loss function and the second loss function;
    根据所述调整所述初始图像抠图模型的模型参数;Adjusting the model parameters of the initial image matting model according to the description;
    当满足迭代结束条件时,根据所述模型参数生成训练完成的图像抠图模型。When the iteration end condition is satisfied, a trained image matting model is generated according to the model parameters.
  7. 根据权利要求2所述的基于图像分割的图像抠图方法,其中,所述根据所述重构图像得到第一损失函数,基于所述第一损失函数对所述图像分割层进行迭代更新,输出预训练完成的所述图像分割层的步骤包括:The image matting method based on image segmentation according to claim 2, wherein the first loss function is obtained according to the reconstructed image, the image segmentation layer is iteratively updated based on the first loss function, and the output The steps of the image segmentation layer completed by pre-training include:
    根据所述重构图像和所述原始图像集中的原始图像计算得到重构损失;calculating a reconstruction loss according to the reconstructed image and the original images in the original image set;
    根据所述重构损失确定所述第一损失函数;determining the first loss function based on the reconstruction loss;
    基于所述第一损失函数调整所述图像分割层的分割参数;adjusting segmentation parameters of the image segmentation layer based on the first loss function;
    当满足迭代结束条件时,根据所述分割参数输出预训练完成的图像分割层。When the iteration end condition is met, the pre-trained image segmentation layer is output according to the segmentation parameters.
  8. 一种基于图像分割的图像抠图装置,包括:An image matting device based on image segmentation, comprising:
    获取模块,用于获取训练图像集,将所述训练图像集输入预构建的初始图像抠图模型,其中,图像抠图模型包括图像分割层和图像抠图层;The acquisition module is used to obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer;
    初步分割模块,用于通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集;A preliminary segmentation module, configured to segment images in the training image set through the image segmentation layer to obtain a preliminary segmented image set;
    细化分割模块,用于将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像;A refined segmentation module, configured to input the preliminary segmented image set to the image matting layer to obtain a refined segmented image;
    训练模块,用于基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型;A training module, configured to determine a target loss function based on the refined segmented image, iteratively update the initial image matting model according to the target loss function, and output a trained image matting model;
    抠图模块,用于获取目标图像,将所述目标图像输入图像抠图模型,得到抠图结果。The image matting module is used to obtain a target image, input the target image into the image matting model, and obtain a matting result.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的基于图像分割的图像抠图方法的步骤:A computer device comprising a memory and a processor, the memory storing computer-readable instructions running on the processor, the processor executing the computer-readable instructions to implement image-based segmentation as described below The steps of the image matting method:
    获取训练图像集,将所述训练图像集输入预构建的初始图像抠图模型,其中,所述初始图像抠图模型包括图像分割层和图像抠图层;Obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the initial image matting model includes an image segmentation layer and an image matting layer;
    通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集;Segmenting images in the training image set by the image segmentation layer to obtain a preliminary segmented image set;
    将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像;Inputting the preliminary segmented image set to the image matting layer to obtain a refined segmented image;
    基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型;Determining a target loss function based on the refined segmented image, iteratively updating the initial image matting model according to the target loss function, and outputting a trained image matting model;
    获取目标图像,将所述目标图像输入图像抠图模型,得到抠图结果。A target image is acquired, and the target image is input into an image matting model to obtain a matting result.
  10. 根据权利要求9所述的计算机设备,其中,在所述通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集的步骤之前还包括:The computer device according to claim 9, wherein, before the step of segmenting the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set, it also includes:
    将获取到的原始图像集输入图像分割层,输出重构图像;Input the obtained original image set into the image segmentation layer, and output the reconstructed image;
    根据所述重构图像得到第一损失函数,基于所述第一损失函数对所述图像分割层进行迭代更新,输出预训练完成的所述图像分割层。A first loss function is obtained according to the reconstructed image, and the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained image segmentation layer is output.
  11. 根据权利要求10所述的计算机设备,其中,所述图像抠图层至少包括编码器、解码器、渐进注意力细化层以及辅助输出层,所述将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像的步骤包括:The computer device according to claim 10, wherein said image matting layer comprises at least an encoder, a decoder, a progressive attention refinement layer, and an auxiliary output layer, said inputting said preliminary segmented image set into said The image matting layer, the steps of obtaining the refined segmented image include:
    将所述初步分割图像集输入至所述编码器进行特征提取,得到编码特征;Inputting the preliminary segmented image set to the encoder for feature extraction to obtain encoded features;
    通过所述解码器将所述编码特征进行解码,输出解码特征;Decoding the encoded features by the decoder, and outputting the decoded features;
    将所述解码特征输入所述辅助输出层,得到输出特征;inputting the decoded features into the auxiliary output layer to obtain output features;
    通过所述渐进注意力细化层对所述输出特征进行注意力计算,得到注意力特征,并根据所述注意力特征输出所述细化分割图像。Attention calculation is performed on the output features through the progressive attention refinement layer to obtain attention features, and the refined segmentation image is output according to the attention features.
  12. 根据权利要求11所述的计算机设备,其中,所述通过所述渐进注意力细化层对所述输出特征进行注意力计算,得到注意力特征的步骤包括:The computer device according to claim 11, wherein the step of performing attention calculation on the output features through the progressive attention refinement layer, and obtaining attention features comprises:
    将前一层辅助输出层输出的输出特征上采样到与当前层辅助输出层的输出特征相同尺寸,得到上采样输出特征,根据所述上采样输出特征计算得到不确定区域掩膜;Upsampling the output feature output by the previous layer of auxiliary output layer to the same size as the output feature of the current layer auxiliary output layer to obtain an upsampling output feature, and calculating an uncertain region mask according to the upsampling output feature;
    对所述不确定区域掩膜进行特征提取,得到不确定区域特征,对所述不确定区域特征进行注意力计算,得到注意力分数;performing feature extraction on the uncertain region mask to obtain uncertain region features, and performing attention calculation on the uncertain region features to obtain an attention score;
    根据所述注意力分数修正不确定区域特征,得到修正后的不确定区域特征作为注意力特征。The uncertain region features are corrected according to the attention scores, and the corrected uncertain region features are obtained as attention features.
  13. 根据权利要求12所述的计算机设备,其中,所述根据所述注意力特征输出所述细化分割图像的步骤包括:The computer device according to claim 12, wherein said step of outputting said refined segmented image according to said attention feature comprises:
    对当前层辅助输出层的输出特征进行特征提取,得到提取特征,将所述注意力特征与 所述提取特征进行拼接,得到拼接特征;Carry out feature extraction to the output feature of current layer auxiliary output layer, obtain extraction feature, described attention feature and described extraction feature are spliced, obtain splicing feature;
    将所述拼接特征进行解码,输出所述细化分割图像。The spliced features are decoded, and the thinned and segmented images are output.
  14. 根据权利要求10至13中任一项所述的计算机设备,其中,所述基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型的步骤包括:The computer device according to any one of claims 10 to 13, wherein the target loss function is determined based on the refined segmented image, and the initial image matting model is iteratively updated according to the target loss function, The steps of outputting the trained image matting model include:
    根据所述细化分割图像得到第二损失函数;obtaining a second loss function according to the refined segmented image;
    基于所述第一损失函数和所述第二损失函数确定目标损失函数;determining a target loss function based on the first loss function and the second loss function;
    根据所述调整所述初始图像抠图模型的模型参数;Adjusting the model parameters of the initial image matting model according to the description;
    当满足迭代结束条件时,根据所述模型参数生成训练完成的图像抠图模型。When the iteration end condition is satisfied, a trained image matting model is generated according to the model parameters.
  15. 根据权利要求10所述的计算机设备,其中,所述根据所述重构图像得到第一损失函数,基于所述第一损失函数对所述图像分割层进行迭代更新,输出预训练完成的所述图像分割层的步骤包括:The computer device according to claim 10, wherein the first loss function is obtained according to the reconstructed image, the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained completed The steps of image segmentation layer include:
    根据所述重构图像和所述原始图像集中的原始图像计算得到重构损失;calculating a reconstruction loss according to the reconstructed image and the original images in the original image set;
    根据所述重构损失确定所述第一损失函数;determining the first loss function based on the reconstruction loss;
    基于所述第一损失函数调整所述图像分割层的分割参数;adjusting segmentation parameters of the image segmentation layer based on the first loss function;
    当满足迭代结束条件时,根据所述分割参数输出预训练完成的图像分割层。When the iteration end condition is met, the pre-trained image segmentation layer is output according to the segmentation parameters.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的基于图像分割的图像抠图方法的步骤:A computer-readable storage medium, the computer-readable storage medium is stored with computer-readable instructions, and when the computer-readable instructions are executed by a processor, the steps of the image matting method based on image segmentation are realized as follows:
    获取训练图像集,将所述训练图像集输入预构建的初始图像抠图模型,其中,所述初始图像抠图模型包括图像分割层和图像抠图层;Obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the initial image matting model includes an image segmentation layer and an image matting layer;
    通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集;Segmenting images in the training image set by the image segmentation layer to obtain a preliminary segmented image set;
    将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像;Inputting the preliminary segmented image set to the image matting layer to obtain a refined segmented image;
    基于所述细化分割图像确定目标损失函数,根据所述目标损失函数对所述初始图像抠图模型进行迭代更新,输出训练完成的图像抠图模型;Determining a target loss function based on the refined segmented image, iteratively updating the initial image matting model according to the target loss function, and outputting a trained image matting model;
    获取目标图像,将所述目标图像输入图像抠图模型,得到抠图结果。A target image is acquired, and the target image is input into an image matting model to obtain a matting result.
  17. 根据权利要求16所述的计算机可读存储介质,其中,在所述通过所述图像分割层对所述训练图像集中的图像进行分割,得到初步分割图像集的步骤之前还包括:The computer-readable storage medium according to claim 16, wherein, before the step of segmenting the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set, further comprising:
    将获取到的原始图像集输入图像分割层,输出重构图像;Input the obtained original image set into the image segmentation layer, and output the reconstructed image;
    根据所述重构图像得到第一损失函数,基于所述第一损失函数对所述图像分割层进行迭代更新,输出预训练完成的所述图像分割层。A first loss function is obtained according to the reconstructed image, and the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained image segmentation layer is output.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述图像抠图层至少包括编码器、解码器、渐进注意力细化层以及辅助输出层,所述将所述初步分割图像集输入至所述图像抠图层,得到细化分割图像的步骤包括:The computer-readable storage medium according to claim 17, wherein the image matting layer includes at least an encoder, a decoder, a progressive attention refinement layer, and an auxiliary output layer, and the input of the preliminary segmented image set is To the image matting layer, the step of obtaining the refined segmented image comprises:
    将所述初步分割图像集输入至所述编码器进行特征提取,得到编码特征;Inputting the preliminary segmented image set to the encoder for feature extraction to obtain encoded features;
    通过所述解码器将所述编码特征进行解码,输出解码特征;Decoding the encoded features by the decoder, and outputting the decoded features;
    将所述解码特征输入所述辅助输出层,得到输出特征;inputting the decoded features into the auxiliary output layer to obtain output features;
    通过所述渐进注意力细化层对所述输出特征进行注意力计算,得到注意力特征,并根据所述注意力特征输出所述细化分割图像。Attention calculation is performed on the output features through the progressive attention refinement layer to obtain attention features, and the refined segmentation image is output according to the attention features.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述通过所述渐进注意力细化层对所述输出特征进行注意力计算,得到注意力特征的步骤包括:The computer-readable storage medium according to claim 18, wherein the step of performing attention calculation on the output features through the progressive attention refinement layer, and obtaining attention features comprises:
    将前一层辅助输出层输出的输出特征上采样到与当前层辅助输出层的输出特征相同尺寸,得到上采样输出特征,根据所述上采样输出特征计算得到不确定区域掩膜;Upsampling the output feature output by the previous layer of auxiliary output layer to the same size as the output feature of the current layer auxiliary output layer to obtain an upsampling output feature, and calculating an uncertain region mask according to the upsampling output feature;
    对所述不确定区域掩膜进行特征提取,得到不确定区域特征,对所述不确定区域特征进行注意力计算,得到注意力分数;performing feature extraction on the uncertain region mask to obtain uncertain region features, and performing attention calculation on the uncertain region features to obtain an attention score;
    根据所述注意力分数修正不确定区域特征,得到修正后的不确定区域特征作为注意力特征。The uncertain region features are corrected according to the attention scores, and the corrected uncertain region features are obtained as attention features.
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述根据所述注意力特征输出所述细化分割图像的步骤包括:The computer-readable storage medium according to claim 19, wherein the step of outputting the refined segmentation image according to the attention feature comprises:
    对当前层辅助输出层的输出特征进行特征提取,得到提取特征,将所述注意力特征与所述提取特征进行拼接,得到拼接特征;Carrying out feature extraction on the output features of the auxiliary output layer of the current layer to obtain the extracted features, and splicing the attention features and the extracted features to obtain the spliced features;
    将所述拼接特征进行解码,输出所述细化分割图像。The spliced features are decoded, and the thinned and segmented images are output.
PCT/CN2022/089507 2022-02-23 2022-04-27 Image matting method and apparatus based on image segmentation, computer device, and medium WO2023159746A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210168421.XA CN114529574A (en) 2022-02-23 2022-02-23 Image matting method and device based on image segmentation, computer equipment and medium
CN202210168421.X 2022-02-23

Publications (1)

Publication Number Publication Date
WO2023159746A1 true WO2023159746A1 (en) 2023-08-31

Family

ID=81623939

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089507 WO2023159746A1 (en) 2022-02-23 2022-04-27 Image matting method and apparatus based on image segmentation, computer device, and medium

Country Status (2)

Country Link
CN (1) CN114529574A (en)
WO (1) WO2023159746A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576076A (en) * 2023-12-14 2024-02-20 湖州宇泛智能科技有限公司 Bare soil detection method and device and electronic equipment

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782460B (en) * 2022-06-21 2022-10-18 阿里巴巴达摩院(杭州)科技有限公司 Image segmentation model generation method, image segmentation method and computer equipment
CN116167922B (en) * 2023-04-24 2023-07-18 广州趣丸网络科技有限公司 Matting method and device, storage medium and computer equipment
CN116524577A (en) * 2023-07-05 2023-08-01 电子科技大学 Portrait matting method based on progressive refinement algorithm

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961303A (en) * 2018-07-23 2018-12-07 北京旷视科技有限公司 A kind of image processing method, device, electronic equipment and computer-readable medium
CN111815649A (en) * 2020-06-30 2020-10-23 清华大学深圳国际研究生院 Image matting method and computer readable storage medium
US20200357142A1 (en) * 2019-05-09 2020-11-12 Disney Enterprises, Inc. Learning-based sampling for image matting
CN112446380A (en) * 2019-09-02 2021-03-05 华为技术有限公司 Image processing method and device
CN112529913A (en) * 2020-12-14 2021-03-19 北京达佳互联信息技术有限公司 Image segmentation model training method, image processing method and device
WO2021164429A1 (en) * 2020-02-21 2021-08-26 京东方科技集团股份有限公司 Image processing method, image processing apparatus, and device
CN113379786A (en) * 2021-06-30 2021-09-10 深圳市斯博科技有限公司 Image matting method and device, computer equipment and storage medium
WO2022001464A1 (en) * 2020-06-30 2022-01-06 稿定(厦门)科技有限公司 Automatic matting method and system
CN114038006A (en) * 2021-08-09 2022-02-11 奥比中光科技集团股份有限公司 Matting network training method and matting method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961303A (en) * 2018-07-23 2018-12-07 北京旷视科技有限公司 A kind of image processing method, device, electronic equipment and computer-readable medium
US20200357142A1 (en) * 2019-05-09 2020-11-12 Disney Enterprises, Inc. Learning-based sampling for image matting
CN112446380A (en) * 2019-09-02 2021-03-05 华为技术有限公司 Image processing method and device
WO2021164429A1 (en) * 2020-02-21 2021-08-26 京东方科技集团股份有限公司 Image processing method, image processing apparatus, and device
CN111815649A (en) * 2020-06-30 2020-10-23 清华大学深圳国际研究生院 Image matting method and computer readable storage medium
WO2022001464A1 (en) * 2020-06-30 2022-01-06 稿定(厦门)科技有限公司 Automatic matting method and system
CN112529913A (en) * 2020-12-14 2021-03-19 北京达佳互联信息技术有限公司 Image segmentation model training method, image processing method and device
CN113379786A (en) * 2021-06-30 2021-09-10 深圳市斯博科技有限公司 Image matting method and device, computer equipment and storage medium
CN114038006A (en) * 2021-08-09 2022-02-11 奥比中光科技集团股份有限公司 Matting network training method and matting method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576076A (en) * 2023-12-14 2024-02-20 湖州宇泛智能科技有限公司 Bare soil detection method and device and electronic equipment

Also Published As

Publication number Publication date
CN114529574A (en) 2022-05-24

Similar Documents

Publication Publication Date Title
WO2023159746A1 (en) Image matting method and apparatus based on image segmentation, computer device, and medium
WO2022105125A1 (en) Image segmentation method and apparatus, computer device, and storage medium
CN107293296B (en) Voice recognition result correction method, device, equipment and storage medium
WO2021155713A1 (en) Weight grafting model fusion-based facial recognition method, and related device
WO2023035531A1 (en) Super-resolution reconstruction method for text image and related device thereof
CN111915480B (en) Method, apparatus, device and computer readable medium for generating feature extraction network
CN114792355B (en) Virtual image generation method and device, electronic equipment and storage medium
CN113012712A (en) Face video synthesis method and device based on generation countermeasure network
CN114863229A (en) Image classification method and training method and device of image classification model
CN113780326A (en) Image processing method and device, storage medium and electronic equipment
JP2023001926A (en) Method and apparatus of fusing image, method and apparatus of training image fusion model, electronic device, storage medium and computer program
CN117095019A (en) Image segmentation method and related device
CN115565177B (en) Character recognition model training, character recognition method, device, equipment and medium
CN116975347A (en) Image generation model training method and related device
US20230237713A1 (en) Method, device, and computer program product for generating virtual image
WO2023173536A1 (en) Chemical formula identification method and apparatus, computer device, and storage medium
CN112990046B (en) Differential information acquisition method, related device and computer program product
WO2022178975A1 (en) Noise field-based image noise reduction method and apparatus, device, and storage medium
CN113592074B (en) Training method, generating method and device and electronic equipment
WO2021169356A1 (en) Voice file repairing method and apparatus, computer device, and storage medium
CN114926322A (en) Image generation method and device, electronic equipment and storage medium
CN114040129A (en) Video generation method, device, equipment and storage medium
CN114463466A (en) Smart card surface pattern customization method and device, electronic equipment and medium
CN113361535A (en) Image segmentation model training method, image segmentation method and related device
CN114758130A (en) Image processing and model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928019

Country of ref document: EP

Kind code of ref document: A1