CN113538456B - Image soft segmentation and background replacement system based on GAN network - Google Patents

Image soft segmentation and background replacement system based on GAN network Download PDF

Info

Publication number
CN113538456B
CN113538456B CN202110692455.4A CN202110692455A CN113538456B CN 113538456 B CN113538456 B CN 113538456B CN 202110692455 A CN202110692455 A CN 202110692455A CN 113538456 B CN113538456 B CN 113538456B
Authority
CN
China
Prior art keywords
image
network
module
model
background
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110692455.4A
Other languages
Chinese (zh)
Other versions
CN113538456A (en
Inventor
张冠华
陈烁
蒋林华
曾新华
庞成鑫
宋梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202110692455.4A priority Critical patent/CN113538456B/en
Publication of CN113538456A publication Critical patent/CN113538456A/en
Application granted granted Critical
Publication of CN113538456B publication Critical patent/CN113538456B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • G06T3/053Detail-in-context presentations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image soft segmentation and background replacement system based on a GAN network. The system comprises two parts of image soft segmentation and background replacement. The image soft segmentation part is used for predicting the foreground and alpha values of an original image, and comprises five modules: the system comprises an input module, a full-text combination module, a residual error network module, a pyramid scene analysis module and a lightweight interactive branch module; the background replacement part is used for replacing the background and generating a high-resolution background replacement graph which comprises a generator model and a discriminator model. The invention has the beneficial effects that: the method can reduce the heavy task brought by the auxiliary image making in the image soft segmentation process, and can replace the background by combining the image generation on the premise of obtaining the high-precision segmented image.

Description

Image soft segmentation and background replacement system based on GAN network
Technical Field
The invention relates to an image soft segmentation and background replacement system based on a GAN network, and relates to the technical fields of deep learning, computer vision, supervised and unsupervised learning and the like.
Background
The large flood flow of the data information impacts the deep learning field, the processing capacity of the computer on the image is greatly improved, and a high-quality result is obtained. At present, more and more ways to acquire pictures are available, mobile phone shooting is the most common mode, and although a large number of pictures are available, each picture is unique, and it is difficult to combine objects in the pictures, i.e. to replace the background. The separation of the front and background has been a classical problem. The application of combining image soft segmentation and image generation is rare, so that the realization of image soft segmentation and background replacement based on the GAN network becomes a very meaningful research subject.
The prior art mainly comprises the following steps: firstly, image segmentation and image synthesis are carried out by various software tools, and hard segmentation is carried out by a fixed function provided by the software tools; solving unknown regions for image processing based on statistical information of background colors before sampling/propagation establishment; thirdly, manually marking the ternary diagram trimap, and performing model training and alpha prediction by using marking information in combination with deep learning.
The first technical route has high human participation, and the manual processing by using software has some problems, namely, the possibility of manual error marking is high when the background color is similar to the color of the target object; secondly, the separation operation of the hair boundaries of people, animals and the like is difficult, and the effect is poor; thirdly, the efficiency of manually processing mass data is too low. Therefore, the artificial software image processing is only suitable for the segmentation and synthesis in a few scenes.
The second technique is based on a sampling/propagation numerical statistical mode, requires trimap as auxiliary input, establishes color statistical information of known foreground and background through sampling based on a sampling method, and then solves alpha matte in an 'unknown' region. The purpose of the propagation-based method is to propagate alpha matte from foreground and background regions to unknown regions to solve the picture equations. The cost of the two modes is that trimap is made, and the results of the sampling and propagation modes are unpredictable and have common quality.
And the third method is a deep learning-based technology, soft segmentation is performed by combining trimap and a deep neural network, and synthetic model training is performed on various backgrounds under a data set containing ground truth masks. The method greatly improves the precision of soft segmentation and synthesis, but the cost is still trimap, and the model is too dependent on manual labeling, so that the robustness is not strong.
In summary, the prior art has the following disadvantages: firstly, the precision can not meet the requirement of ultrahigh resolution; secondly, the artificial participation degree is high, and the cost for manufacturing the trimap auxiliary graph is high; the model has high dependence and poor robustness; and fourthly, most of the focus points are applied to the quality and cost of soft segmentation in a way of combining image soft segmentation and synthesis.
Disclosure of Invention
Aiming at the problems that a high-quality image processing system is needed in the current image processing and the image background replacement in the prior art is time-consuming, high in labor labeling cost, low in precision and the like, the invention aims to provide a GAN network-based image soft segmentation and background replacement system, which can reduce the heavy task brought by auxiliary image making in the image soft segmentation process and can realize one-key high-resolution image background replacement by combining segmentation and generation on the premise of obtaining a high-resolution segmented image.
An image soft segmentation and background replacement system based on a GAN network is characterized by comprising an image soft segmentation part and a background replacement part. An image soft segmentation part for predicting the foreground of the image and the alpha value; and a background replacement section for generating a high-precision synthesized image.
One) image soft segmentation
The image soft segmentation part comprises five modules: the system comprises an input module, a full-text combination module, a residual error network module, a pyramid scene analysis module and a lightweight interactive branch module; wherein:
the input module is used for obtaining an original image I, a background image B and a target soft segmentation image S through data preprocessing; the target soft segmentation image S is obtained by corroding, expanding and Gaussian blurring the subject object extracted from the original image I;
the full-text combination module is used for firstly respectively coding an original image I, a background image B and a target soft segmentation image S into 512 x 256 feature maps, then respectively combining the background image B and the target soft segmentation image S by taking the original image I as a base to form two 512-channel feature maps, respectively extracting 64-channel feature maps through convolution, Batchnorm and ReLU, finally combining the base and the two 64-channel feature maps to form 384 channels, and extracting 256-channel feature maps through convolution, Batchnorm and ReLU to serve as the input of a next residual error network module;
the residual error network module comprises a main residual error module and two light-weight branch residual error modules connected behind the main residual error module; the main residual error module adopts a structure of a residual error network ResNet-101, and is characterized in that the last two layers of the ResNet-101 are replaced by full convolution layers with atrous contribution, and the output of the main residual error module belongs to shared residual error content; the two light-weight branch residual modules are respectively used for foreground prediction and alpha prediction; outputting by a residual error network module to obtain a deep characteristic map;
the pyramid scene analysis module PSP is used for solving the problems of internal data structure loss and lack of space consistency caused by pooling and convolution; after obtaining a deep feature map in a residual error network, using pyramids with four sizes, wherein kernel used for pooling is respectively 1 × 1, 2 × 2, 3 × 3 and 6 × 6, after pooling, performing convolution dimensionality reduction and bilinear difference upsampling on a group of 1 × 1, reducing the size of the feature map output by the residual error network, and then cascading the obtained feature maps, including feature maps before pooling, to complete multi-scale feature fusion; finally, the foreground prediction branch residual error module uses the ReLU to obtain a foreground prediction characteristic diagram, and the Alpha prediction branch uses Tanh to obtain an Alpha prediction diagram;
the light-weight interactive branch module is attached in front of the PSP module and used for receiving possible additional guidance information and supporting generalization on extreme special cases; allowing the user to operate on the original image, clicking inside the target object to generate an internal guide, and clicking on a positive or negative diagonal of the target object to generate an external guide; two-dimensional Gaussian functions are respectively placed near the inner point and the outer point to form two inner and outer guide heat maps, the inner and outer guide heat maps are further encoded into a feature map and combined to an output feature map of a residual error network, and a user can also select whether to execute interaction.
Two) background replacement
The background replacement part comprises a generator network and a discriminator network which jointly form an unsupervised GAN frame, the generator model and the discriminator model are finely adjusted based on the unsupervised GAN frame, the distribution of network learning real data is continuously optimized and generated, the resolution capability of the discrimination network is continuously improved, Nash equilibrium is finally reached, and the system can obtain a high-resolution background replacement image after training is finished; wherein:
the generator network combines the foreground to a new background to synthesize and generate a picture based on the foreground picture and alpha prediction obtained by the image soft segmentation part; the generator network comprises a guiding model and a guided model; the training set of the guiding model is a synthetic data set comprising several foregrounds FLabeled alpha matte, background B from coco datasetPerforming generator model training on background BIntroduction of rectification and Gaussian blur to prevent overfitting to avoid excessive bias of the system and to learn IAnd BThereby obtaining G with supervised learningteacherAs a guidance model; with GteacherServing as 'pseudo ground-truth', performing model training in a real scene under the condition of comparing with the 'pseudo ground-truth', and performing self-supervision training on a guided model by adopting a real data set to obtain GstudentAs a guided model; the guiding model and the guided model share the same loss function, the first loss being given less weight; using an ADAM optimizer to avoid the network from falling into a local minimum, and finding a better minimum for real data nearby;
the discriminator network is used for training the label-free data of the real scene by using the countermeasure training based on the multi-scale discriminator and discriminating whether the foreground result is a real sample or a synthesized sample after being pasted on a new synthesized image formed on the background; the multi-scale discriminator discriminates on three different scales which are respectively as follows: original, 1/2 for original, 1/4 for original; each scale of the multi-scale discriminator uses 3 linear discriminators, each linear discriminator comprises a full convolution network which consists of a plurality of groups of convolutions, BatchNorm and Leaky ReLU;
compared with the prior art, the technical scheme of the invention has the advantages that:
the method overcomes the defect that the prior art depends on the ternary diagram, and reduces the human participation and the cost of manual annotation.
Secondly, a global combination module is provided, all different clues can be effectively combined, and the soft segmentation effect of the object is obviously improved.
Third, combining and using the atrous convergence and PSP scene analysis to obtain larger receptive field and global information, and fusing the characteristics of different scales to obtain clearer image soft segmentation. The easier it is to achieve global consistency and discriminate local details using a multi-scale discriminator.
And fourthly, providing a lightweight interactive branch, performing artificial interference guidance on the model, and improving the generalization of the system.
And fifthly, providing the GAN network to combine image soft segmentation and image generation to perform background replacement. And carrying out unsupervised game between the generator and the discriminator, optimizing the model, and finally generating a background replacement picture which has little difference with the real picture.
Drawings
FIG. 1. an atrous restriction network.
FIG. 2. pyramid scene parsing Module (PSP).
FIG. 3 is a network of image soft segmentation.
Fig. 4.GAN network flow diagram.
Detailed Description
The system comprises two parts, alpha prediction and background replacement of the image. The first partial image soft segmentation comprises five modules: the system comprises an input module, a full-text combination module, a residual error network module, a pyramid scene analysis module and a lightweight interactive branch module. The second partial image composition includes a generator model and a discriminator model.
Alpha prediction of an image. One picture contains 7 elements, foreground F (R, G, B), background B (R, G, B) and foreground mask alpha matte (α), so the image equation can be expressed as:
Ii=αiFi+(1-αi)Bi
to obtain high quality soft segmentation, the system needs to predict the accurate foreground and alpha matte. The first part introduces the following modules:
1. and an input module, namely preprocessing of data. The manual labeling of trimaps is expensive, and to overcome this drawback, a background map without target objects is added instead. The input requirements of the system are an image under static conditions, plus an image of the background only, the imaging process is simple and can support the taking of any camera set to lock exposure and focus, e.g. a smartphone camera. Assuming that the camera motion is small, a homography matrix is applied to align the background with the given input image. And finally, obtaining initial soft segmentation of the subject object through corrosion, expansion and Gaussian blur.
In conclusion, the data preprocessing obtains three parts of an original image (I), a background image (B) and a target soft segmentation image (S).
2. The modules are combined in full text. The system uses a new full-text combination network to effectively combine all clue characteristics. For example, when the color of the target object is similar to the background, the network should focus more on segmentation cues for the region rather than pixel differences, which avoids internal holes and blurring artifacts that may occur in soft segmentation. The specific implementation is as follows:
the I, B and S images are respectively coded into feature maps of 512 x 256. And combining B and S by taking an original image as a substrate to form two feature maps of 512 channels, extracting 64-channel feature maps respectively by convolution and BatchNorm and ReLU, connecting the substrate and the two 64 channels in parallel to form 384 channels, and reducing the substrate and the two 64 channels into 256-channel feature maps by convolution and BatchNorm and ReLU to be used as the input of the next residual error network module. Full-text composition systems facilitate generalization across different datasets and domains.
3. And a residual error network module. By taking the experience of ResNet, the system adopts a residual error network in the main module. The backbone network selects the architecture of ResNet-101, with the full connectivity layer and the max pooling layer removed of course, and introduces an aperture constraint in the last two phases to ensure that pixel level prediction is performed and an acceptable output resolution. Sparse prediction of the atrous convergence can obtain a larger receptive field, enable example soft segmentation boundaries to be clearer, and enable interaction with a following aggregation module.
The output of the main residual error network belongs to shared residual error content, and two light-weight branch residual error networks are connected behind the main residual error network and are respectively used for foreground prediction and alpha prediction. And the foreground prediction branch continues to pass through the residual block, is aggregated by a pyramid scene analysis module, and is connected with a group of convolutions, bilinear interpolation upsampling, BatchNorm and ReLU to obtain the final foreground heatmap. The Alpha prediction branch passes through a residual block, is connected with a pyramid scene analysis module, and is connected with a group of convolution, bilinear interpolation upsampling, BatchNorm and Tanh to obtain the final Alpha prediction, and the reason for using Tanh is that the Alpha matte value of each pixel needs to be between 0 and 1.
4. The Pyramid Scene Parsing module Pyramid Scene Parsing (PSP). The system selects the currently popular PSP model to handle the relationships between the scenes and aggregate global context information. Although the full-text combination module can fuse the characterization information at a shallow layer to a certain extent, the problems of internal data structure loss and spatial consistency caused by pooling and convolution need to be further improved by utilizing the PSP. The specific implementation is as follows:
and after the branch residual error network extracts the deep feature map, creating a spatial pool pyramid to fuse feature maps with different scales. The kernel used for pooling is 1 × 1, 2 × 2, 3 × 3, 6 × 6, respectively, and pooling modules of different scales are concerned with activating different regions of the map. After pooling, the data is subjected to convolution dimensionality reduction and bilinear interpolation upsampling by a group of 1 multiplied by 1, and then the data is restored to the output size of the branch network. And (4) performing cascade (cascade) on the obtained feature map before pooling to complete multi-scale feature fusion, and finally connecting a set of convolution. The PSP has strong context inference capability, and the feature extraction from multiple levels, including pixel level, super-pixel level and global, and the consideration of various ranges is integrated to have great help for soft segmentation.
5. A lightweight interactive branching module. To support generalization over extreme cases, the system attaches a lightweight branch before the PSP module for receiving possible additional guiding information. The user allows operations to be performed on the original image, clicking inside the target object to generate internal guidance, and clicking on the positive or negative diagonal of the target object to generate external guidance. Two-dimensional gaussians are respectively placed near the inner point and the outer point, two thermodynamic diagrams are made, and the system encodes the thermodynamic diagrams into characteristic diagrams and combines the characteristic diagrams into two branches of a residual error network. The interaction process is simple, but the adaptability of the model to extreme cases can be improved, and a user can also select whether to execute the interaction.
(II) background replacement (image synthesis). In order to synthesize a background replacement picture that is comparable to a real picture, the system uses an unsupervised GAN network for model training.
1. The Generator network Generator. The modules 1 to 5 can be collectively regarded as the work done by the generator model. And (4) obtaining a foreground picture and alpha prediction by the soft segmentation of the image completed in the first step, and pasting the foreground on a new background to synthesize the picture.
The generator uses a "guide and guided" model. The training set of the guiding model is a synthetic data set comprising several foregrounds (F)) And annotated alpha matte (α)) Against background (B) from coco dataset) And carrying out supervised learning. To avoid system over-dependence on learning IAnd BDifference of (2), to background BIntroducing gamma correction and Gaussian blur to prevent overfitting, thereby obtaining GteacherAs a guidance model. The loss function is as follows:
Figure BDA0003126659490000061
with GteacherActing as a "pseudo-ground-truth" as a supervisor. Under the guidance of 'pseudo ground-truth', the guided model GstudentAnd carrying out self-supervision training by adopting a real data set. The loss function is as follows:
loss2=Ddisc(αF+(1-α)B-1)2
the generator loss function of the training network is the minimum loss1And loss2But the first penalty is given less weight. The initial λ is set to 0.02, and the zoom out 1/2 is performed every five iterations. Network selection ADAM optimizer to avoid network trappingOf the local minima, and a better minimum for the real data is found nearby. The generator losses are as follows:
Figure BDA0003126659490000062
2. the Discriminator network. The discriminator needs to discriminate whether it is a true sample or a synthetic sample and perform parameter fine-tuning by back propagation. In order to improve the background replacement effect in a real scene, the system uses a multi-scale discriminator based on pix2pix hd. Each scale of the discriminator comprises 3 linear discriminators, each linear discriminator is a full convolution network and consists of a plurality of groups of convolutions, BatchNorm and Leaky ReLU. The 3 dimensions of the discriminator are respectively: original, 1/2 for original, 1/4 for original. The method has the advantages of being similar to PSP, the coarser scale receptive field is larger, the global consistency is easier to judge, and the finer scale receptive field is smaller, the detailed information such as color, texture and the like is easier to judge.
3. And finally, fine tuning the generator model and the discriminator model by using an unsupervised GAN frame, continuously optimizing the distribution of the real learning data of the generated network, continuously improving the resolution capability of the discrimination network, and finally achieving Nash equilibrium. After training is finished, the system can obtain a high-resolution background replacement picture.
The invention comprises 7 modules in two parts. The global combination module brings combination of different representation information, and improves the segmentation quality; the residual error network is combined with the aperture constraint and the PSP to execute pixel-level prediction, and multi-scale features can be fused; the lightweight interactive module can guide model training and improve adaptability; a system combining image soft segmentation and image generation based on a GAN network is provided and applied to background replacement. The above are all the key points and points to be protected.
All technical solutions formed by equivalent transformation or equivalent replacement fall within the protection scope of the present invention, and are not described in detail herein.

Claims (2)

1. An image soft segmentation and background replacement system based on a GAN network is characterized by comprising an image soft segmentation part and a background replacement part; the image soft segmentation part is used for predicting the foreground and the alpha value of the image and executing soft segmentation operation; a background replacement section for generating a high-resolution composite image; wherein:
one) image soft segmentation
The image soft segmentation part comprises five modules: the system comprises an input module, a full-text combination module, a residual error network module, a pyramid scene analysis module and a lightweight interactive branch module; wherein:
the input module inputs an original image I, a background image B and a target soft segmentation image S; obtaining an initial soft segmentation of the subject object by the target soft segmentation image S through corrosion, expansion and Gaussian blur;
the full-text combination module is used for firstly respectively coding an original image I, a background image B and a target soft segmentation image S into 512 x 256 feature maps, then respectively combining the background image B and the target soft segmentation image S by taking the original image I as a base to form two 512-channel feature maps, respectively extracting 64-channel feature maps through convolution, Batchnorm and ReLU, finally combining the base and the two 64-channel feature maps to form 384 channels, and extracting 256-channel feature maps through convolution, Batchnorm and ReLU to serve as the input of a next residual error network module;
a residual network module comprising a main residual module and two subsequent lightweight branch residual modules; the backbone network selects the architecture of ResNet-101, certainly deleting the fully connected layer and the maximum pooling layer, and introducing the aperture constraint in the last two stages to ensure that pixel-level prediction is performed and an acceptable output resolution is achieved; the main residual error module belongs to shared residual error content and aims to obtain a deeper feature map, and the two light-weight branch residual error networks behind the main residual error module are respectively used for foreground prediction and alpha prediction, so that two feature maps are finally obtained, and multi-scale feature fusion is carried out subsequently through the PSP (pyramid scene analysis) module;
the pyramid scene analysis module PSP obtains a deep feature map through a trunk residual error network and a branch residual error network with atrous convergence, then pyramids with four sizes are used, kernel used for pooling is respectively 1 × 1, 2 × 2, 3 × 3 and 6 × 6, after pooling, dimension reduction and bilinear difference value upsampling are carried out through a group of 1 × 1 convolution, feature map size output by the residual error network is reduced, then the obtained feature maps, including feature maps before pooling, are cascaded, and multi-scale feature fusion is completed; finally, the foreground prediction branch residual error module uses the ReLU to obtain a foreground prediction characteristic diagram, and the Alpha prediction branch uses Tanh to obtain an Alpha prediction diagram;
the light-weight interactive branch module is attached in front of the PSP (pyramid scene analysis) module and used for receiving possible additional guidance information and supporting generalization on extreme special cases; allowing the user to operate on the original image, clicking inside the target object to generate an internal guide, and clicking on a positive or negative diagonal of the target object to generate an external guide; respectively placing a two-dimensional Gaussian function near the inner point and the outer point to prepare two inner and outer guide heat maps, coding the inner and outer guide heat maps into a characteristic map and combining the characteristic map with an output characteristic map of a residual error network, and enabling a user to select whether to execute interaction or not;
two) background replacement
The background replacement part comprises a generator model and a discriminator model which jointly form an unsupervised GAN frame, the generator model and the discriminator model are finely adjusted by adopting the unsupervised GAN frame, the distribution of network learning real data is continuously optimized and generated, the resolution capability of the discrimination network is continuously improved, Nash equilibrium is finally reached, and the system can obtain a high-quality background replacement image after training is finished; wherein:
the generator network combines the foreground to a new background to synthesize and generate a picture based on the foreground picture and alpha prediction obtained by the image soft segmentation part; the generator network uses a "guide and guided" model; the training set of the guiding model is a synthetic data set comprising several foregrounds FAlpha matte of the tag, i.e. alphaBackground B from coco datasetPerforming generator model training on background BIntroduction of rectification and Gaussian blur to prevent overfitting to avoid excessive bias of the system and to learn IAnd BThereby obtaining G with supervised learningteacherAs a guidance model; with GteacherServing as 'pseudo ground-truth', performing model training in a real scene under the condition of comparing with the 'pseudo ground-truth', and performing self-supervision training on a guided model by adopting a real data set to obtain GstudentAs a guided model; the guiding model and the guided model share the same loss function, the first loss being given less weight; using an ADAM optimizer to avoid the network from falling into a local minimum, and finding a better minimum for real data nearby;
the discriminator network is used for training the label-free data of the real scene by using the countermeasure training based on the multi-scale discriminator and discriminating whether the foreground result is a real sample or a synthesized sample after being pasted on a new synthesized image formed on the background; the multi-scale discriminator discriminates on three different scales which are respectively as follows: original, 1/2 for original, 1/4 for original; each scale of the multi-scale discriminator uses 3 linear discriminators, each linear discriminator comprises a full convolution network which consists of a plurality of groups of convolutions, BatchNorm and Leaky ReLU.
2. The image soft segmentation and background replacement system of claim 1, wherein, in the generator,
guidance model GteacherThe loss function of (a) is as follows:
Figure FDA0003126659480000021
guided model GstudentThe loss function of (a) is as follows:
loss2=Ddisc(αF+(1-α)B-1)2
the generator loss function of the training network is the minimum loss1And loss2Summing; the initial λ is set to 0.02, and every five iterations of minification 1/2, the network selects an ADAM optimizer that avoids the network from falling into local minima, while nearbyFinding a better minimum value for the real data; the generator losses are as follows:
Figure FDA0003126659480000031
CN202110692455.4A 2021-06-22 2021-06-22 Image soft segmentation and background replacement system based on GAN network Active CN113538456B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110692455.4A CN113538456B (en) 2021-06-22 2021-06-22 Image soft segmentation and background replacement system based on GAN network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110692455.4A CN113538456B (en) 2021-06-22 2021-06-22 Image soft segmentation and background replacement system based on GAN network

Publications (2)

Publication Number Publication Date
CN113538456A CN113538456A (en) 2021-10-22
CN113538456B true CN113538456B (en) 2022-03-18

Family

ID=78125625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110692455.4A Active CN113538456B (en) 2021-06-22 2021-06-22 Image soft segmentation and background replacement system based on GAN network

Country Status (1)

Country Link
CN (1) CN113538456B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114387208A (en) * 2021-12-02 2022-04-22 复旦大学 Pyramid structure unsupervised registration system and method based on context driving

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107730528A (en) * 2017-10-28 2018-02-23 天津大学 A kind of interactive image segmentation and fusion method based on grabcut algorithms
CN110136163A (en) * 2019-04-29 2019-08-16 中国科学院自动化研究所 The fuzzy automatic stingy figure of hand exercise and human body it is soft segmentation and replacing background application
CN110188760A (en) * 2019-04-01 2019-08-30 上海卫莎网络科技有限公司 A kind of image processing model training method, image processing method and electronic equipment
CN110232696A (en) * 2019-06-20 2019-09-13 腾讯科技(深圳)有限公司 A kind of method of image region segmentation, the method and device of model training
CN110334779A (en) * 2019-07-16 2019-10-15 大连海事大学 A kind of multi-focus image fusing method based on PSPNet detail extraction
CN112365514A (en) * 2020-12-09 2021-02-12 辽宁科技大学 Semantic segmentation method based on improved PSPNet

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102135478B1 (en) * 2018-12-04 2020-07-17 엔에이치엔 주식회사 Method and system for virtually dying hair

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107730528A (en) * 2017-10-28 2018-02-23 天津大学 A kind of interactive image segmentation and fusion method based on grabcut algorithms
CN110188760A (en) * 2019-04-01 2019-08-30 上海卫莎网络科技有限公司 A kind of image processing model training method, image processing method and electronic equipment
CN110136163A (en) * 2019-04-29 2019-08-16 中国科学院自动化研究所 The fuzzy automatic stingy figure of hand exercise and human body it is soft segmentation and replacing background application
CN110232696A (en) * 2019-06-20 2019-09-13 腾讯科技(深圳)有限公司 A kind of method of image region segmentation, the method and device of model training
CN110334779A (en) * 2019-07-16 2019-10-15 大连海事大学 A kind of multi-focus image fusing method based on PSPNet detail extraction
CN112365514A (en) * 2020-12-09 2021-02-12 辽宁科技大学 Semantic segmentation method based on improved PSPNet

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"An Automatic Glioma Segmentation System Using a Multilevel Attention Pyramid Scene Parsing Network";Zhenyu Zhang等;《Current Medical Imaging》;20210601;第17卷(第6期);751-761页 *
"人物前景和背景分离的研究与实现";李小芳;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200715;正文18-50页 *

Also Published As

Publication number Publication date
CN113538456A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
Wang et al. Esrgan: Enhanced super-resolution generative adversarial networks
CN109410239B (en) Text image super-resolution reconstruction method based on condition generation countermeasure network
Deng et al. Deep coupled feedback network for joint exposure fusion and image super-resolution
CN110443842B (en) Depth map prediction method based on visual angle fusion
CN110176027B (en) Video target tracking method, device, equipment and storage medium
Xiao et al. Example‐Based Colourization Via Dense Encoding Pyramids
Zhao et al. Pyramid global context network for image dehazing
CN110136062B (en) Super-resolution reconstruction method combining semantic segmentation
CN116071243B (en) Infrared image super-resolution reconstruction method based on edge enhancement
Cai et al. TDPN: Texture and detail-preserving network for single image super-resolution
CN116758130A (en) Monocular depth prediction method based on multipath feature extraction and multi-scale feature fusion
Ma et al. SD-GAN: Saliency-discriminated GAN for remote sensing image superresolution
Zhu et al. Multi-stream fusion network with generalized smooth L 1 loss for single image dehazing
Li et al. MANET: Multi-scale aggregated network for light field depth estimation
Deng et al. Omnidirectional image super-resolution via latitude adaptive network
CN114627269A (en) Virtual reality security protection monitoring platform based on degree of depth learning target detection
Mo et al. Attention-guided collaborative counting
Zhang et al. Remote sensing image generation based on attention mechanism and vae-msgan for roi extraction
CN113538456B (en) Image soft segmentation and background replacement system based on GAN network
CN117689592A (en) Underwater image enhancement method based on cascade self-adaptive network
Mengbei et al. Overview of research on image super-resolution reconstruction
Wang et al. Face super-resolution via hierarchical multi-scale residual fusion network
Liu et al. Dsma: Reference-based image super-resolution method based on dual-view supervised learning and multi-attention mechanism
Zhou et al. Multi-scale and attention residual network for single image dehazing
Liu et al. A single frame and multi-frame joint network for 360-degree panorama video super-resolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant