WO2023159746A1 - Procédé et appareil de montage sur sous-carte d'image basés sur une segmentation d'image, dispositif informatique et support - Google Patents

Procédé et appareil de montage sur sous-carte d'image basés sur une segmentation d'image, dispositif informatique et support Download PDF

Info

Publication number
WO2023159746A1
WO2023159746A1 PCT/CN2022/089507 CN2022089507W WO2023159746A1 WO 2023159746 A1 WO2023159746 A1 WO 2023159746A1 CN 2022089507 W CN2022089507 W CN 2022089507W WO 2023159746 A1 WO2023159746 A1 WO 2023159746A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
layer
matting
features
output
Prior art date
Application number
PCT/CN2022/089507
Other languages
English (en)
Chinese (zh)
Inventor
郑喜民
张祎頔
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023159746A1 publication Critical patent/WO2023159746A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to an image segmentation-based image matting method, device, computer equipment and media.
  • Image matting means that for a given image, the network can separate its foreground area and background area. It is an important topic in the field of computer vision and is widely used in video conferencing, image editing, and post-production scenarios. .
  • image matting technology usually uses additional input, such as trimap, background image, etc., to generate a mask through additional input, and use the mask to extract the matting object.
  • the purpose of the embodiments of the present application is to propose an image segmentation-based image matting method, device, computer equipment, and storage medium to solve the problems of time-consuming and laborious image matting, low image matting efficiency, and inaccurate matting results in related technologies. technical problem.
  • the embodiment of the present application provides an image matting method based on image segmentation, which adopts the following technical solution:
  • a target image is acquired, and the target image is input into an image matting model to obtain a matting result.
  • the embodiment of the present application also provides an image matting device based on image segmentation, which adopts the following technical solutions:
  • the acquisition module is used to obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer;
  • a preliminary segmentation module configured to segment images in the training image set through the image segmentation layer to obtain a preliminary segmented image set
  • a refined segmentation module configured to input the preliminary segmented image set to the image matting layer to obtain a refined segmented image
  • a training module configured to determine a target loss function based on the refined segmented image, iteratively update the initial image matting model according to the target loss function, and output a trained image matting model;
  • the image matting module is used to obtain a target image, input the target image into the image matting model, and obtain a matting result.
  • the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
  • the computer device includes a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the steps of the image matting method based on image segmentation are implemented as follows:
  • a target image is acquired, and the target image is input into an image matting model to obtain a matting result.
  • the embodiment of the present application also provides a computer-readable storage medium, which adopts the following technical solution:
  • Computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the image matting method based on image segmentation are implemented as follows:
  • a target image is acquired, and the target image is input into an image matting model to obtain a matting result.
  • the application obtains the training image set, and inputs the training image set into the pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer; the images in the training image set are segmented through the image segmentation layer , to obtain a preliminary segmented image set; input the preliminary segmented image set to the image matting layer to obtain a refined segmented image; determine the target loss function based on the refined segmented image, and iteratively update the initial image matting model according to the target loss function, and output The image matting model that has been trained; obtain the target image, input the target image into the image matting model, and obtain the matting result; this application uses the image segmentation layer in the trained image matting model to initially segment the image, and then The output image matting layer of the segmentation result is further refined and segmented, which can realize image matting without any additional input, and completely avoids manual intervention, achieves complete automatic matting, improves the efficiency of image matting, and at the same time, through the
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • Fig. 2 is the flowchart of an embodiment of the image matting method based on image segmentation according to the present application
  • FIG. 3 is a flow chart of a specific implementation of step S203 in FIG. 2;
  • Fig. 4 is the flow chart of another embodiment of the image matting method based on image segmentation according to the present application.
  • Fig. 5 is a schematic structural diagram of an embodiment of an image matting device based on image segmentation according to the present application
  • Fig. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the system architecture 100 can include terminal devices 101, 102, 103, a network 104 and a server 105.
  • the network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 .
  • Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • Terminal devices 101 , 102 , 103 Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
  • Terminal devices 101, 102, 103 can be various electronic devices with display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4) player, laptop portable computer and desktop computer, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4
  • laptop portable computer and desktop computer etc.
  • the server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal devices 101 , 102 , 103 .
  • the image segmentation-based image matting method provided in the embodiment of the present application is generally executed by a server/terminal device, and correspondingly, the image segmentation-based image matting device is generally set in the server/terminal device.
  • terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • FIG. 2 a flow chart of an embodiment of an image matting method based on image segmentation according to the present application is shown, including the following steps:
  • Step S201 acquiring a training image set, and inputting the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer.
  • the pre-built initial image matting model includes an image segmentation layer and an image matting layer, wherein the image segmentation layer can adopt Double DIP (Deep-Image-Priors, depth image prior) network, Double DIP network Use two DIP networks to divide the input image into a foreground layer and a background layer; the backbone of the image matting layer uses the U-Net network for encoding and decoding, and an auxiliary output layer is added to the base layer of the decoding part for deep supervision and progressive attention
  • the Progressive Attention Refinement Module uses the intermediate layer output of the decoder to perform layer-by-layer refinement to obtain the final accurate mask to obtain an accurate segmented image.
  • the training image set can be obtained from a public data set, for example, the Alphamatting data set, the Alphamatting data set contains 27 training images and 8 test images, and these images all have standard results of foreground and background after matting Then, the foreground images of these images are combined with 500 indoor scene images and 500 outdoor scene images respectively, and the combined images are rotated at three different angles, and the obtained images are used as training image sets and test images It can also be generated according to the obtained original pictures, specifically, obtain the original pictures (for example, portrait pictures, product pictures, environment pictures, animal pictures, vehicle pictures, etc.), calculate the signal-to-noise ratio corresponding to each original picture, and The original picture is filtered according to the signal-to-noise ratio, and the salient foreground in the filtered original picture is marked, so as to generate a training data set based on the marked original picture.
  • the original pictures for example, portrait pictures, product pictures, environment pictures, animal pictures, vehicle pictures, etc.
  • Step S202 segment the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set.
  • the image segmentation layer divides the training image in the training image set into the foreground layer and the background layer, and mixes the foreground layer and the background layer through a mask to obtain a reconstructed image, which is the preliminary segmented image .
  • image segmentation layer is pre-trained.
  • Step S203 inputting the preliminary segmented image set into the image matting layer to obtain a refined segmented image.
  • the image matting layer at least includes an encoder, a decoder, a progressive attention refinement layer (Progressive Attention Refinement, PAR) and an auxiliary output layer (Output), and the preliminary segmented image is input into the image matting layer for refinement. Segmentation to obtain a refined segmented image.
  • a progressive attention refinement layer Progressive Attention Refinement, PAR
  • Output auxiliary output layer
  • the above steps of inputting the preliminary segmented image set to the image matting layer to obtain the refined segmented image include:
  • Step S301 inputting the preliminary segmented image set to the encoder for feature extraction to obtain encoded features.
  • the encoder includes a plurality of convolutional neural network layers (Convolution neural network, CNN) and a downsampling layer, and the downsampling layer can be a maximum pooling layer (max-pooling).
  • CNN convolution neural network
  • max-pooling maximum pooling layer
  • the encoder includes 5 convolutional layers and 4 downsampling layers, and the 5 convolutional layers are respectively the first coded convolutional layer, the second coded convolutional layer, the third coded convolutional layer, and the fourth coded convolutional layer.
  • Convolutional layer and the fifth coded convolutional layer between the first coded convolutional layer and the second coded convolutional layer, between the second coded convolutional layer and the third coded convolutional layer, between the third coded convolutional layer and Between the fourth coded convolutional layer and between the fourth coded convolutional layer and the fifth coded convolutional layer, there is a downsampling layer, respectively the first downsampling layer, the second downsampling layer, and the third downsampling layer. sampling layer and a fourth downsampling layer.
  • the preliminary segmented images in the preliminary segmented image set pass through the first encoding convolution layer, the first down-sampling layer, the second encoding convolution layer, the second down-sampling layer, the third encoding convolution layer, the third down-sampling layer, the second The four-coded convolutional layer, the fourth down-sampling layer, and the fifth coded convolutional layer perform feature extraction to obtain coded features.
  • the convolution kernel and the convolution step size of the encoder convolution layer can be set according to actual conditions.
  • step S302 the coded features are decoded by a decoder, and the decoded features are output.
  • the decoder is composed of a plurality of decoding modules, and each decoding module includes a plurality of up-sampling layers (Up-sampling layer) and a CNN layer.
  • each decoding module includes a plurality of up-sampling layers (Up-sampling layer) and a CNN layer.
  • Up-sampling layer Up-sampling layer
  • the size of the feature map is increased by the corresponding multiple, and after multiple decodings, the feature map with the same size as the original input preliminary segmented image is obtained, that is, the decoding feature.
  • the decoded features after each decoding are concatenated with the corresponding encoded features of the same size in the encoding stage to fuse low-level and high-level features.
  • the decoder includes 5 convolutional layers and 4 upsampling layers, and the 5 convolutional layers are respectively the first decoding convolutional layer, the second decoding convolutional layer, the third decoding convolutional layer, and the fourth decoding The convolutional layer and the fifth decoding convolutional layer, between the first decoding convolutional layer and the second decoding convolutional layer, between the second decoding convolutional layer and the third decoding convolutional layer, between the third decoding convolutional layer and Between the fourth decoding convolutional layer and between the fourth decoding convolutional layer and the fifth decoding convolutional layer, there is an upsampling layer, which are respectively the first upsampling layer, the second upsampling layer, and the third upsampling layer. sampling layer and a fourth upsampling layer.
  • the convolution kernel and the convolution step size of the convolution layer of the decoder can also be set according to actual conditions.
  • Step S303 input the decoded feature into the auxiliary output layer to obtain the output feature.
  • each convolutional layer of the decoder is connected with an auxiliary output layer, which is used to perform convolution pooling operation on the output features, and more feature information of the image has been preserved.
  • Step S304 perform attention calculation on the output features through the progressive attention refinement layer, obtain attention features, and output a refined segmented image according to the attention features.
  • the progressive attention layer is respectively connected to the auxiliary output layer of the previous layer, the current decoding auxiliary output layer connected to the current decoding convolution layer, and the auxiliary output layer corresponding to the progressive attention layer (current layer auxiliary output layer) , specifically, the input of the progressive attention layer is the output of the auxiliary output layer of the previous layer and the auxiliary output layer of the current decoding. After the attention operation is performed, the output is output through the auxiliary output layer of the current layer.
  • the first decoding convolutional layer does not have a corresponding progressive attention layer connected to it. If the first decoding convolutional layer is the current layer, the auxiliary output layer connected to it is not only the current decoding output layer, but also equivalent to the current layer output layer.
  • the progressive attention layer includes at least an encoding layer, an attention convolution layer, a first fusion layer, a softmax layer, a second fusion layer, a connection layer, and a decoding layer, and the output features are sequentially passed through the encoding layer, attention convolution layer, first The fusion layer, the softmax layer, the second fusion layer, the connection layer and the decoding layer perform corresponding calculations, and output more accurate refined segmentation images.
  • step S204 the target loss function is determined based on the refined and segmented image, the initial image matting model is iteratively updated according to the target loss function, and the trained image matting model is output.
  • the training image set is input into the initial image matting model for training.
  • the target loss function of the initial image matting model is calculated to obtain the loss function value, and the model parameters are adjusted according to the loss function value , continue iterative training, and the model is trained to a certain extent.
  • the performance of the model reaches the optimal state, and the value of the loss function cannot continue to decrease, that is, convergence.
  • the method of judging the convergence only needs to calculate the loss function value in the previous two rounds of iterations. If the loss function value is still changing, continue to select the training image and input it to the image matting model after adjusting the model parameters to improve the image matting model. Continue iterative training; if the loss function value does not change significantly, the model can be considered converged. After the model converges, the final image matting model is output.
  • Step S205 acquiring a target image, inputting the target image into the image matting model, and obtaining a matting result.
  • the acquired target image is input into the trained image matting model to perform a matting operation, and a corresponding matting result can be obtained.
  • the above target image can also be stored in a block chain node.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the image is initially segmented through the image segmentation layer in the trained image matting model, and then the preliminary segmentation result is output to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and It completely avoids manual intervention, achieves complete automatic matting, and improves the efficiency of image matting.
  • more accurate matting results can be achieved through the image matting model, further improving the accuracy and accuracy of matting.
  • Pre-train the image segmentation layer to obtain the pre-trained image segmentation layer including:
  • the first loss function is obtained according to the reconstructed image, the image segmentation layer is iteratively updated based on the first loss function, and the pre-trained image segmentation layer is output.
  • the acquisition channel of the original image set may be the same as that of the above training data set, or it may be different, and it is selected according to actual needs.
  • the original image in the original image set is input into the image segmentation layer, which is divided into the foreground layer y 1 and the background layer y 2 , and the foreground layer y 1 and the background layer y 2 are fused through the initial mask m 0 to obtain the reconstructed image I'.
  • the initial mask m 0 can be preset or randomly generated, and the random generation is obtained according to input random noise.
  • the first loss function is obtained according to the reconstructed image
  • the image segmentation layer is iteratively updated based on the first loss function
  • the step of outputting the pre-trained image segmentation layer includes:
  • the reconstruction loss is calculated from the reconstructed image and the original image in the original image set
  • the first loss function is calculated by the following formula:
  • Loss DDIP Loss Reconst + ⁇ Loss Excl + ⁇ Loss Reg
  • Loss Reconst is the reconstruction loss
  • Loss Reconst
  • I is the original image
  • Loss Excl is a mutually exclusive loss, which minimizes the correlation between the gradients of y 1 and y 2
  • Loss Reg ( ⁇ x
  • Loss Reg is a regularization loss, which is mainly used to constrain the fusion mask (mask), and is used to binarize the foreground initial mask m 0 , ⁇ , ⁇ are preset weighting parameters.
  • the image segmentation layer is trained to a certain extent. At this time, the performance of the image segmentation layer reaches the preset state, and the preset state can be It is convergence, or the loss function value of the first loss function reaches the preset threshold, which means that the iteration end condition is satisfied, the image segmentation layer pre-training is completed, and the image segmentation layer is output according to the segmentation parameters at the end of the iteration.
  • the initial mask m 0 of the image segmentation layer is adjusted.
  • the mask m of the pre-trained image segmentation layer is obtained, and the mask m remains Learning adjustments will be made to obtain a more accurate foreground mask m for subsequent image matting layers.
  • the above step of performing attention calculation on the output features through the progressive attention refinement layer, and obtaining the attention features includes:
  • Step S401 upsampling the output features output by the previous auxiliary output layer to the same size as the output features of the current auxiliary output layer to obtain the upsampled output features, and calculating an uncertain region mask based on the upsampled output features.
  • the output feature ⁇ l-1 output by the previous auxiliary output layer is upsampled to the same size as the output feature ⁇ l of the current auxiliary output layer, and then the following transformation formula is used:
  • f ⁇ m (x, y) is the transformation formula for obtaining the mask m of the uncertain region from the ⁇ mask at point (x, y) of the image
  • ⁇ l is the ⁇ mask of the auxiliary output layer l of the current layer.
  • Step S402 perform feature extraction on the mask of the uncertain region to obtain the feature of the uncertain region, and perform attention calculation on the feature of the uncertain region to obtain the attention score.
  • the uncertain region mask m l-1 output by the previous auxiliary output layer and the uncertain region mask m l output by the auxiliary output layer of the current layer are extracted through the encoding layer composed of CNN respectively, and the obtained and Then use the following formula to calculate and obtain the attention score, which acts as an optimization trend and corrects the uncertain region characteristics output by the auxiliary output layer of the current layer
  • the formula for calculating the attention score is as follows:
  • Step S403 correcting the feature of the uncertain region according to the attention score, and obtaining the corrected feature of the uncertain region as the attention feature.
  • attention features are obtained by correcting uncertain region features through attention calculations, which can ensure more accurate subsequent refinement and matting.
  • the above step of outputting the refined segmented image according to the attention feature includes:
  • Feature extraction is performed on the output features of the auxiliary output layer of the current layer to obtain the extracted features, and the attention features are spliced with the extracted features to obtain the spliced features;
  • the output ⁇ l of the auxiliary output layer of the current layer is extracted through the coding layer composed of CNN, and the extracted feature F ⁇ is combined with the obtained corrected attention feature of the auxiliary output layer of the current layer, as follows:
  • F ⁇ is the feature extracted by the mask of the current auxiliary output layer ⁇ l through the encoding layer
  • F ⁇ ' is the modified feature of the mask of the current auxiliary output layer ⁇ , that is, the stitching feature.
  • the spliced feature F ⁇ ' is decoded by a decoding layer composed of CNN, and a refined segmented image is obtained.
  • the target loss function is determined based on the refined segmented image
  • the initial image matting model is iteratively updated according to the target loss function
  • the step of outputting the trained image matting model includes:
  • a trained image matting model is generated according to the model parameters.
  • ⁇ l is a hyperparameter
  • ⁇ gt represents the output truth value of the auxiliary output layer of the current layer
  • Loss l represents the loss of the uncertain region of the auxiliary output layer of the current layer, including L1 loss, combination loss and Laplacian loss, the formula for:
  • Loss l ( ⁇ gt ⁇ m l , ⁇ l ⁇ m l ) Loss L1 ( ⁇ gt ⁇ m l , ⁇ l ⁇ m l )+Loss comp ( ⁇ gt ⁇ m l , ⁇ l ⁇ m l )+Loss lap ( ⁇ gt m l , ⁇ l m l )
  • the target loss function is determined based on the first loss function and the second loss function, and the calculation formula of the target loss function is as follows:
  • ⁇ and ⁇ are preset weighting parameters.
  • the model parameters are adjusted according to the value of the loss function, and the iterative training is continued until the model is trained to a certain extent. At this time, the performance of the model reaches the optimal state, and the value of the loss function cannot continue to decrease, that is, it converges.
  • the model convergence After the model converges, the final image matting model is output according to the final adjusted model parameters.
  • the matting accuracy and accuracy of the image matting model can be improved.
  • AI artificial intelligence
  • the embodiments of the present application may acquire and process relevant data based on artificial intelligence technology.
  • artificial intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the application can be used in numerous general purpose or special purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc.
  • This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • the aforementioned storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
  • the present application provides an embodiment of an image matting device based on image segmentation, which corresponds to the method embodiment shown in FIG. 2 , the device can be specifically applied to various electronic devices.
  • the image matting device 500 based on image segmentation in this embodiment includes: an acquisition module 501 , a preliminary segmentation module 502 , a refined segmentation module 503 , a training module 504 and a matting module 505 . in:
  • the obtaining module 501 is used to obtain a training image set, and input the training image set into a pre-built initial image matting model, wherein the image matting model includes an image segmentation layer and an image matting layer.
  • the preliminary segmentation module 502 is used to segment the images in the training image set through the image segmentation layer to obtain a preliminary segmented image set;
  • the refinement and segmentation module 503 is used to input the preliminary segmentation image set to the image matting layer to obtain a refinement segmentation image
  • the training module 504 is configured to determine a target loss function based on the refined segmented image, iteratively update the initial image matting model according to the target loss function, and output the trained image matting model;
  • the matting module 505 is used to acquire a target image, and input the target image into the image matting model to obtain a matting result.
  • the above-mentioned image segmentation-based image matting device performs preliminary segmentation on the image through the image segmentation layer in the trained image matting model, and then outputs the preliminary segmentation result to the image matting layer for further refinement and segmentation, which can realize without any additional Input is used for image matting, and human intervention is completely avoided to achieve complete automatic matting, which improves the efficiency of image matting.
  • the image matting model through the image matting model, more accurate matting results can be achieved, improving the accuracy and Accuracy.
  • the preliminary segmentation module 502 includes a reconstruction submodule and a pre-training submodule, wherein the reconstruction submodule is used to input the obtained original image set into the image segmentation layer, and output the reconstructed submodule. Constructing an image; the pre-training submodule is used to obtain a first loss function according to the reconstructed image, iteratively update the image segmentation layer based on the first loss function, and output the pre-trained image segmentation layer.
  • the initial mask of the image segmentation layer is adjusted by pre-training the image segmentation layer. After the pre-training is completed, the mask of the pre-trained image segmentation layer is obtained to ensure that a more accurate foreground mask is obtained in the subsequent training process.
  • Membrane used for subsequent image matting layers for matting.
  • the refinement and segmentation module 503 includes a feature extraction submodule, a decoding submodule, an output submodule, and an attention submodule, wherein:
  • the feature extraction submodule is used to input the preliminary segmented image set to the encoder for feature extraction to obtain encoded features
  • the decoding submodule is used to decode the encoded features through the decoder, and output the decoded features
  • the output sub-module is used to input the decoding features into the auxiliary output layer to obtain output features
  • the attention sub-module is used to perform attention calculation on the output features through the progressive attention refinement layer to obtain attention features, and output the refined segmentation image according to the attention features.
  • attention features are obtained through attention calculation, and more accurate refined and segmented images can be output.
  • the attention submodule includes an upsampling unit, an attention calculation unit, and a correction unit, wherein:
  • the upsampling unit is used to upsample the output features output by the previous auxiliary output layer to the same size as the output features of the current auxiliary output layer to obtain the upsampled output features, and calculate the uncertain region mask according to the upsampled output features membrane;
  • the attention calculation unit is used to perform feature extraction on the uncertain area mask to obtain uncertain area features, and perform attention calculation on the uncertain area features to obtain an attention score;
  • the correction unit is used to modify the feature of the uncertain region according to the attention score, and obtain the corrected feature of the uncertain region as the attention feature.
  • attention features are obtained by correcting uncertain region features through attention calculations, which can ensure more accurate subsequent refinement and matting.
  • the attention submodule also includes a splicing unit and a decoding unit, wherein:
  • the splicing unit is used to extract the features of the output features of the current layer auxiliary output layer to obtain the extracted features, and splice the attention features and the extracted features to obtain the spliced features;
  • the decoding unit is used to decode the mosaic feature, and output the thinned and segmented image.
  • the training module 504 includes a loss function calculation submodule, an adjustment submodule, and a generation submodule, wherein:
  • the loss function calculation submodule is used to obtain a second loss function according to the thinned and segmented image
  • the loss function calculation submodule is also used to determine a target loss function based on the first loss function and the second loss function;
  • the adjustment sub-module is used to adjust the model parameters of the initial image matting model according to the description
  • the generation sub-module is used to generate a trained image matting model according to the model parameters when the iteration end condition is met.
  • the matting accuracy and accuracy of the image matting model can be improved.
  • the pre-training submodule includes a calculation unit, an adjustment unit, and an output unit, wherein:
  • the calculation unit is used to calculate the reconstruction loss according to the reconstructed image and the original images in the original image set;
  • the computing unit is further configured to determine the first loss function according to the reconstruction loss
  • An adjustment unit is configured to adjust segmentation parameters of the image segmentation layer based on the first loss function
  • the output unit is used to output the pre-trained image segmentation layer according to the segmentation parameters when the iteration end condition is met.
  • the efficiency of pre-training can be improved, and at the same time, more accurate masks can be ensured for subsequent training.
  • FIG. 6 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 6 includes a memory 61 , a processor 62 , and a network interface 63 connected to each other through a system bus. It should be noted that only the computer device 6 is shown with components 61-63, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, dedicated Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the computer equipment may be computing equipment such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can perform human-computer interaction with the user through keyboard, mouse, remote controller, touch panel or voice control device.
  • the memory 61 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc.
  • the memory 61 may be an internal storage unit of the computer device 6 , such as a hard disk or memory of the computer device 6 .
  • the memory 61 can also be an external storage device of the computer device 6, such as a plug-in hard disk equipped on the computer device 6, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device.
  • the memory 61 is generally used to store the operating system and various application software installed in the computer device 6, such as computer-readable instructions of the image matting method based on image segmentation.
  • the memory 61 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 62 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. This processor 62 is generally used to control the general operation of said computer device 6 . In this embodiment, the processor 62 is configured to run computer-readable instructions stored in the memory 61 or process data, such as computer-readable instructions for running the image segmentation-based image matting method.
  • CPU Central Processing Unit
  • controller a controller
  • microcontroller a microcontroller
  • microprocessor microprocessor
  • This processor 62 is generally used to control the general operation of said computer device 6 .
  • the processor 62 is configured to run computer-readable instructions stored in the memory 61 or process data, such as computer-readable instructions for running the image segmentation-based image matting method.
  • the network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
  • the processor executes the computer-readable instructions stored in the memory
  • the steps of the image matting method based on image segmentation in the above embodiment are realized, and the image is initially segmented through the image segmentation layer in the trained image matting model. , and then output the preliminary segmentation results to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and completely avoid manual intervention, achieve complete automatic matting, and improve the efficiency of image matting.
  • more accurate matting results can be achieved through the image matting model, and the accuracy and accuracy of matting can be improved.
  • the present application also provides another implementation manner, which is to provide a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor, so that the at least one processor executes the steps of the image matting method based on image segmentation as described above , the image is initially segmented through the image segmentation layer in the trained image matting model, and then the preliminary segmentation result is output to the image matting layer for further refinement and segmentation, which can realize image matting without any additional input, and is completely It avoids manual intervention, achieves complete automatic matting, and improves the efficiency of image matting. At the same time, through the image matting model, more accurate matting results can be achieved, and the accuracy and accuracy of matting can be improved.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Abstract

Des modes de réalisation de la présente demande se rapportent au domaine technique de l'intelligence artificielle, et concernent un procédé de montage sur sous-carte d'image basé sur une segmentation d'image. Le procédé consiste à : entrer un ensemble d'images d'entraînement obtenues dans un modèle de montage sur sous-carte d'image initial préconstruit, le modèle de montage sur sous-carte d'image comprenant une couche de segmentation d'image et une couche de montage sur sous-carte d'image ; segmenter des images dans l'ensemble d'images d'entraînement au moyen de la couche de segmentation d'image pour obtenir un ensemble d'images segmentées de manière préliminaire ; entrer l'ensemble d'images segmentées de manière préliminaire dans la couche de montage sur sous-carte d'image pour obtenir des images finement segmentées ; déterminer une fonction de perte cible sur la base des images finement segmentées, effectuer une mise à jour itérative sur le modèle de montage sur sous-carte d'image initial selon la fonction de perte cible, et délivrer en sortie un modèle de montage sur sous-carte d'image entraîné ; et entrer une image cible dans le modèle de montage sur sous-carte d'image pour obtenir un résultat de montage sur sous-carte. La présente demande concerne en outre un appareil de montage sur sous-carte d'image basé sur une segmentation d'image, ainsi qu'un dispositif informatique et un support. De plus, la présente demande concerne en outre une technologie des chaînes de blocs, et l'image cible peut être stockée dans une chaîne de blocs. La présente demande peut améliorer la précision et l'exactitude de montage sur sous-carte d'image.
PCT/CN2022/089507 2022-02-23 2022-04-27 Procédé et appareil de montage sur sous-carte d'image basés sur une segmentation d'image, dispositif informatique et support WO2023159746A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210168421.X 2022-02-23
CN202210168421.XA CN114529574A (zh) 2022-02-23 2022-02-23 基于图像分割的图像抠图方法、装置、计算机设备及介质

Publications (1)

Publication Number Publication Date
WO2023159746A1 true WO2023159746A1 (fr) 2023-08-31

Family

ID=81623939

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089507 WO2023159746A1 (fr) 2022-02-23 2022-04-27 Procédé et appareil de montage sur sous-carte d'image basés sur une segmentation d'image, dispositif informatique et support

Country Status (2)

Country Link
CN (1) CN114529574A (fr)
WO (1) WO2023159746A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576076A (zh) * 2023-12-14 2024-02-20 湖州宇泛智能科技有限公司 一种裸土检测方法、装置和电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782460B (zh) * 2022-06-21 2022-10-18 阿里巴巴达摩院(杭州)科技有限公司 图像分割模型的生成方法及图像的分割方法、计算机设备
CN116167922B (zh) * 2023-04-24 2023-07-18 广州趣丸网络科技有限公司 一种抠图方法、装置、存储介质及计算机设备
CN116524577A (zh) * 2023-07-05 2023-08-01 电子科技大学 一种基于渐进细化算法的人像抠图方法

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961303A (zh) * 2018-07-23 2018-12-07 北京旷视科技有限公司 一种图像处理方法、装置、电子设备和计算机可读介质
CN111815649A (zh) * 2020-06-30 2020-10-23 清华大学深圳国际研究生院 一种人像抠图方法及计算机可读存储介质
US20200357142A1 (en) * 2019-05-09 2020-11-12 Disney Enterprises, Inc. Learning-based sampling for image matting
CN112446380A (zh) * 2019-09-02 2021-03-05 华为技术有限公司 图像处理方法和装置
CN112529913A (zh) * 2020-12-14 2021-03-19 北京达佳互联信息技术有限公司 图像分割模型训练方法、图像处理方法及装置
WO2021164429A1 (fr) * 2020-02-21 2021-08-26 京东方科技集团股份有限公司 Procédé de traitement d'images, appareil de traitement d'images, et dispositif
CN113379786A (zh) * 2021-06-30 2021-09-10 深圳市斯博科技有限公司 图像抠图方法、装置、计算机设备及存储介质
WO2022001464A1 (fr) * 2020-06-30 2022-01-06 稿定(厦门)科技有限公司 Procédé et système de matage automatique
CN114038006A (zh) * 2021-08-09 2022-02-11 奥比中光科技集团股份有限公司 一种抠图网络训练方法及抠图方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961303A (zh) * 2018-07-23 2018-12-07 北京旷视科技有限公司 一种图像处理方法、装置、电子设备和计算机可读介质
US20200357142A1 (en) * 2019-05-09 2020-11-12 Disney Enterprises, Inc. Learning-based sampling for image matting
CN112446380A (zh) * 2019-09-02 2021-03-05 华为技术有限公司 图像处理方法和装置
WO2021164429A1 (fr) * 2020-02-21 2021-08-26 京东方科技集团股份有限公司 Procédé de traitement d'images, appareil de traitement d'images, et dispositif
CN111815649A (zh) * 2020-06-30 2020-10-23 清华大学深圳国际研究生院 一种人像抠图方法及计算机可读存储介质
WO2022001464A1 (fr) * 2020-06-30 2022-01-06 稿定(厦门)科技有限公司 Procédé et système de matage automatique
CN112529913A (zh) * 2020-12-14 2021-03-19 北京达佳互联信息技术有限公司 图像分割模型训练方法、图像处理方法及装置
CN113379786A (zh) * 2021-06-30 2021-09-10 深圳市斯博科技有限公司 图像抠图方法、装置、计算机设备及存储介质
CN114038006A (zh) * 2021-08-09 2022-02-11 奥比中光科技集团股份有限公司 一种抠图网络训练方法及抠图方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576076A (zh) * 2023-12-14 2024-02-20 湖州宇泛智能科技有限公司 一种裸土检测方法、装置和电子设备

Also Published As

Publication number Publication date
CN114529574A (zh) 2022-05-24

Similar Documents

Publication Publication Date Title
WO2023159746A1 (fr) Procédé et appareil de montage sur sous-carte d'image basés sur une segmentation d'image, dispositif informatique et support
CN107293296B (zh) 语音识别结果纠正方法、装置、设备及存储介质
WO2021155713A1 (fr) Procédé de reconnaissance faciale à base de fusion de modèle de greffage de poids, et dispositif y relatif
WO2023035531A1 (fr) Procédé de reconstruction à super-résolution pour image de texte et dispositif associé
WO2022105125A1 (fr) Procédé et appareil de segmentation d'image, dispositif informatique et support de stockage
CN111915480B (zh) 生成特征提取网络的方法、装置、设备和计算机可读介质
CN114792355B (zh) 虚拟形象生成方法、装置、电子设备和存储介质
CN113012712A (zh) 一种基于生成对抗网络的人脸视频合成方法及装置
CN113379627A (zh) 图像增强模型的训练方法和对图像进行增强的方法
CN114863229A (zh) 图像分类方法和图像分类模型的训练方法、装置
US20230237713A1 (en) Method, device, and computer program product for generating virtual image
CN116975347A (zh) 图像生成模型训练方法及相关装置
CN114758130B (zh) 图像处理及模型训练方法、装置、设备和存储介质
WO2023173536A1 (fr) Procédé et appareil d'identification de formule chimique, dispositif informatique et support de stockage
CN112990046B (zh) 差异信息获取方法、相关装置及计算机程序产品
WO2022178975A1 (fr) Procédé et appareil de réduction de bruit d'image basés sur un champ de bruit, dispositif et support de stockage
CN113592074B (zh) 一种训练方法、生成方法及装置、电子设备
WO2021169356A1 (fr) Procédé et appareil de réparation de fichier vocal, dispositif informatique et support d'enregistrement
CN114926322A (zh) 图像生成方法、装置、电子设备和存储介质
CN114040129A (zh) 视频生成方法、装置、设备及存储介质
CN114463466A (zh) 智能卡卡面图案定制方法、装置、电子设备及介质
CN113781491A (zh) 图像分割模型的训练、图像分割方法及装置
CN112966150A (zh) 一种视频内容抽取的方法、装置、计算机设备及存储介质
CN117095019B (zh) 一种图像分割方法及相关装置
CN112396613B (zh) 图像分割方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928019

Country of ref document: EP

Kind code of ref document: A1