CN110222837A

CN110222837A - A kind of the network structure ArcGAN and method of the picture training based on CycleGAN

Info

Publication number: CN110222837A
Application number: CN201910350757.6A
Authority: CN
Inventors: 蒋涵; 陶文源; 孙倩
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2019-09-10

Abstract

The present invention discloses the network structure ArcGAN and method of a kind of picture training based on CycleGAN, and network structure ArcGAN is made of generator and double discriminators, and double discriminators include coarse discriminator and fine discriminator；Encoder includes an input layer and three down-sampling convolutional layers, and each down-sampling convolutional layer is followed by two flat convolutional layers as input layer structure；Converter includes five intensive convolution blocks for not having pond layer, and every piece includes five intensive convolutional layers with bottleneck layer, and compression layer is equipped between block and block；Decoder includes three up-sampling warp laminations and an output layer, and each up-sampling warp lamination is followed by two flat convolutional layers as input layer structure；Every layer of down-sampling in encoder carries out copy type with the up-sampling in corresponding decoder and connects.Coarse discriminator is made of for handling high-rise visual information six layers of down-sampling layer and an output layer；The loss that fine discriminator and coarse discriminator calculate combines, and completes confrontation consistency training jointly with generator.

Description

A kind of the network structure ArcGAN and method of the picture training based on CycleGAN

Technical field

The invention mainly relates to deep learning and field of image processing more particularly to a kind of picture instructions based on CycleGAN The method of experienced network structure ArcGAN and lines building automatic colouring.

Background technique

Since mid-term the 1990s, there is how a large amount of research and probe automatically converts image to special wind The synthesis art work of lattice.The sex work of starting of Gatys et al. illustrates convolutional neural networks (CNN) by separation and recombinates figure As content and style carry out the strength of creative arts image^[2].It is known as mind using the process that the content images of different-style are presented in CNN Through Style Transfer (NST).From that time, NST becomes the hot topic of academia and industry, it is just by more and more Concern, and propose various improvement or extend the method for original NST algorithm.

Deep neural network is nearest using most, effect is best, the highest method of efficiency.Network is fought to generation recently (GAN) ^[3,4]Research huge success, including blind movement deblurring are achieved in extensive image synthesis application^[5,6], high Image in different resolution synthesis^[7,8], light reality super-resolution^[9]And Image Rendering^[10].The Training strategy of GAN is to define two competitions Game between network.Generator attempts to deceive one while trained discriminator, which categorizes an image as very Real or synthesis image.GAN is well-known with its sample with good perceived quality, however, such as^[11]It is described, GAN's There are many problems, such as mode collapse, disappearance gradient for plaintext.Arjovsky et al.^[12]It discusses and is damaged by vanilla The difficulty of the training of GAN caused by function is lost, and proposes and uses Earth-Mover (Wasserstein-1) distance approximate as commenting The method sentenced.Gulrajani et al.^[13]Its stability is further improved by gradient punishment, it is more so as to training Structure, hardly need carry out hyper parameter adjustment.Basic GAN frame also can be used side information and be expanded.A kind of plan It is slightly to provide class label to generator and discriminator, to generate class condition sample, i.e. CGAN^[14].This side information can be significant Improve the quality for generating sample^[15].But this depth network frame does not simultaneously have generality.

Bibliography:

[1]SUN,Q.,LIN,J.,FU,C.-W.,KAIJIMA,S.,AND HE,Y.2013.A multi-touch interface for fast architectural sketching and massing.In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,ACM,247–256.

[2]L.A.Gatys,A.S.Ecker,and M.Bethge,“A neural algorithm of artistic style,”ArXiv e-prints,Aug.2015.

[3]Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde- Farley,Sherjil Ozair,Aaron Courville,and Yoshua Bengio.2014.Generative adversarial nets.In Advances in neural information processing systems.2672– 2680.

[4]Alec Radford,Luke Metz,and Soumith Chintala.2015.Unsupervised representation learning with deep convolutional generative adversarial networks.arXiv preprint arXiv:1511.06434 (2015).

[5]Orest Kupyn,Volodymyr Budzan,Mykola Mykhailych,Dmytro Mishkin,and Jiri Matas. 2017.DeblurGAN:Blind Motion Deblurring Using Conditional Adversarial Networks.arXiv preprint arXiv:1711.07064(2017).

[6]Seungjun Nah,Tae Hyun Kim,and Kyoung Mu Lee.2017.Deep multi-scale convolutional neural network for dynamic scene deblurring.In Proceedings of the IEEE Conference on Computer Vision and Pa ern Recognition(CVPR),Vol.2.

[7]Tero Karras,Timo Aila,Samuli Laine,and Jaakko Lehtinen.2017.Progressive growing of gans for improved quality,stability,and variation.arXiv preprint arXiv:1710.10196(2017).

[8]Ting-Chun Wang,Ming-Yu Liu,Jun-Yan Zhu,Andrew Tao,Jan Kautz,and Bryan Catanzaro.2017.High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs.arXiv preprint arXiv:1711.11585(2017).

[9]Christian Ledig,Lucas Leis,Ferenc Huszar,Jose Caballero,Andrew Cunning-′ham, Alejandro Acosta,Andrew Aitken,Alykhan Tejani,Johannes Totz, Zehan Wang,et al.2016. Photo-realistic single image super-resolution using a generative adversarial network.arXiv preprint (2016).

[10]Deepak Pathak,Philipp Krahenbuhl,Jeo Donahue,Trevor Darrell,and Alexei A Efros. 2016.Context encoders:Feature learning by inpainting.In Proceedings of the IEEE Conference on Computer Vision and Paern Recognition.2536–2544

[11]Tim Salimans,Ian Goodfellow,Wojciech Zaremba,Vicki Cheung,Alec Radford,and Xi Chen.2016.Improved techniques for training gans.In Advances in Neural Information Processing Systems.2234–2242.

[12]Martin Arjovsky,Soumith Chintala,and Leon Bo ou.2017.Wasserstein gan.′arXiv preprint arXiv:1701.07875(2017).

[13]Ishaan Gulrajani,Faruk Ahmed,Martin Arjovsky,Vincent Dumoulin,and Aaron C Courville.2017.Improved training of wasserstein gans.In Advances in Neural Information Processing Systems.5769–5779.

[14]Mehdi Mirza and Simon Osindero.2014.Conditional generative adversarial nets.arXiv preprint arXiv:1411.1784(2014).

[15]Aaron van den Oord,Nal Kalchbrenner,Lasse Espeholt,Oriol Vinyals, Alex Graves,et al. 2016.Conditional image generation with pixelcnn decoders.In Advances in Neural Information Processing Systems.4790–4798.

Summary of the invention

Purpose of the invention is to overcome the shortcomings in the prior art, provides a kind of picture training based on CycleGAN Network structure ArcGAN and method, the Style Transfer of building be faced with three problems: it is existing generate architectural drawing tool cannot Accomplish stylization, limited and existing depth network structure cannot can be applied well in data set of the invention with training set It is upper to generate ideal result.The present invention can solve the problems, such as lines architectural drawing to the Style Transfer of different-style colour architectural drawing.It is logical It crosses and builds and the deep neural network structure of suitable data set is trained to build automatic coloring to lines to realize Style Transfer.Solution The few problem of data set of having determined, effect is good and is easy to use.

The purpose of the present invention is what is be achieved through the following technical solutions:

A kind of network structure ArcGAN, the CycleGAN network architecture of the picture training based on CycleGAN is by two convolution Neural network (CNN) composition, one is generator, to be trained to generate the output of deception discriminator；The other is identifying Device, for classifying to image from real goal or composograph；Generator includes encoder, converter and decoding Device, encoder are an input layer and two down-sampling convolutional layers, and converter is nine pieces of residual error convolutional layers, and decoder is on two Sample warp lamination and an output layer；The network structure ArcGAN is made of generator and double discriminators, double mirror Other device includes coarse discriminator and fine discriminator；

Encoder in the generator of the network structure ArcGAN includes an input layer and three down-sampling convolutional layers, Each down-sampling convolutional layer is followed by two flat convolutional layers as input layer structure；Converter, which includes five, does not have pond layer Intensive convolution block, every piece comprising five have bottleneck layer intensive convolutional layers, between block and block be equipped with compression layer；Decoder Including three up-sampling warp laminations and an output layer, each up-sampling warp lamination is followed by two and input layer structure one The flat convolutional layer of sample；Every layer of down-sampling in encoder carries out copy type with the up-sampling in corresponding decoder and connects.

Coarse discriminator is made of for handling high-rise visual information six layers of down-sampling layer and an output layer；Fine mirror Other device is made of four layers of down-sampling and an end layer；The loss that fine discriminator and coarse discriminator calculate combines, with life Common completion of growing up to be a useful person fights consistency training.

A method of the lines based on deep neural network build automatic colouring, based on the network structure ArcGAN, comprising the following steps:

(1) training set is made；

(2) training set is converted to required format；

(3) training network；

(4) reduced model；

(5) test set is made；

(6) network is called, is coloured to test set, obtains accordingly result.

Compared with prior art, the beneficial effects brought by the technical solution of the present invention are as follows:

1. the present invention achieves good effect in the automatic coloring of black and white line building.The present invention is in former CycleGAN Flat convolutional layer is added after each layer of codec and alleviates the way that codec corresponding module carries out copy type connection Gradient disappearance problem, strengthens feature propagation, has encouraged feature reuse.

2. the intensive convolution block (DenseNet- with bottleneck layer and compression layer that the present invention is introduced in convenor section BC), parameter is greatly reduced, characteristic use rate is improved.Also, intensive convolution block can preferably realize feature reuse, make mould Type is more compact, implies depth supervision, makes neural network accuracy height.

3. the present invention coarse discriminator increased in discriminator network is capable of handling high-rise visual information, preferably retain The high layer information of construction line and architectural appearance.By training, ArcGAN is in geometry identification, lines reservation, colorability energy side Face is better than CycleGAN.It can be good at the window of identification building, wall, roof, and distinguish with corresponding color, For example window blue, wall grey, roof are red.Various colors are flexible, and lines are soft clear, and visual effect is splendid.

Detailed description of the invention

Fig. 1 is ArcGAN overall architecture schematic diagram；

Fig. 2 is the generator schematic network structure of ArcGAN；

Fig. 3 is double discriminator schematic network structures of ArcGAN；

Fig. 4 is the detailed figure of dense network block；

Fig. 5-1 to 5-9 is the stylized effect contrast figure of different buildings.

Specific embodiment

The present invention is described in further detail below in conjunction with the drawings and specific embodiments.It should be appreciated that described herein Specific embodiment be only used to explain the present invention, be not intended to limit the present invention.

The CycleGAN network architecture is made of two convolutional neural networks (CNN), and one is generator G, it be trained to Generate the output of deception discriminator.The other is discriminator D, it divides image from real goal or composograph Class.In order to adapt to the particularity of buildings model, network structure ArcGAN of the invention is mainly by generator and double discriminator groups At double discriminators include coarse discriminator (CD) and fine discriminator (FD).Referring specifically to Fig. 1.

1, about generator

Generator, which is mainly responsible for, is thought as true picture according to input picture generation discriminator, by encoder, converter It is formed with decoder three parts.Encoder is an input layer and two down-sampling convolutional layers in CycleGAN, and converter is Nine pieces of residual error convolutional layers, decoder are two up-sampling warp laminations and an output layer.It is asked to alleviate disappearance gradient Topic reinforces feature propagation, encourages feature reuse, and the present invention is by encoder and decoder in generator original in CycleGAN Each to deepen one layer, every layer is added two flat convolution and codec is carried out copy type connection.It is improved to greatly reduce parameter Characteristic use rate, the present invention introduce the intensive convolution block (DenseNet- for having bottleneck layer and compression layer in convenor section BC), original residual error convolutional layer is replaced, intensive convolution block can preferably realize feature reuse, model is more compact, imply deep Degree supervision, neural network accuracy are high.But for the size identical property for keeping convenor section characteristic pattern, the present invention eliminates intensive volume Pond layer between block.

Therefore the encoder of ArcGAN generator is an input layer and three down-sampling convolutional layers, each down-sampling convolution Layer is followed by two flat convolutional layers as input layer structure；Converter be five do not have pond layer intensive convolution blocks, every piece The intensive convolutional layer that bottleneck layer is had comprising five, there is compression layer between block and block；Decoder is three up-sampling warp laminations With an output layer, each up-sampling warp lamination is followed by two flat convolutional layers as input layer structure；In encoder Every layer of down-sampling carries out copy type with the up-sampling in corresponding decoder and connects.Referring specifically to Fig. 2

2, about discriminator

Discriminator, which is mainly responsible for, differentiates that the picture of input carrys out the synthesising picture of self-generator still from the true of training set Real picture is made of some continuous down-sampling convolutional layers.In CycleGAN, discriminator by four layers of down-sampling convolutional layer and One output layer is constituted.For the high layer information of better preserved building lines and architectural appearance, in discriminator network, this Invention increases one for handling the rougher discriminator of high-rise visual information, it is defeated by six layers of down-sampling layer and one Layer is constituted out, and the present invention combines the loss that two discriminators calculate well, and it is consistent to complete confrontation jointly with generator Property training.

Specifically, the network structure ArcGAN and method of the present invention provides a kind of picture training based on CycleGAN, Model uses TensorFlow deep learning frame.The present invention converts lines buildings model to the process of colored buildings model Lines manifold P is mapped to the mapping function of colored manifold C as one.Mapping function is to pass through training dataWithCome what is learnt, wherein n and m is the line in training set respectively Picture number and color image number.Unlike other GAN frames, two arbiter D are by distinguishing the figure in colored manifold Picture and other images are trained to, and provide antagonism loss for generator G to push G to reach its target.If L is loss function, G^*And D^*For the weight of network.It is an object of the present invention to solve the Minmax Problem:

In network structure ArcGAN, generator network G, which is used to that lines image will be inputted, to be mapped on color image.It is color Color architectural style is generated after model training.For G since three flat convolutional layers, each flat convolution uses 7 × 7 convolution Core, step-length 1, one example normalized function (IN) of heel, a correction linear unit (Relu) export characteristic pattern size It remains unchanged.Followed by three down-sampling convolution blocks, space pressure is carried out to image again with two layers of flat convolution after lower convolution every time Contracting and coding extract the conversion after useful local feature is used in this stage, and down-sampling convolution uses 3 × 3 convolution Core, step-length 2.Each process herein is known as an encoding block by the present embodiment.

Then, five dense network blocks (DenseNet-BC) are for constructing content and manifold feature.Have one between block and block The compression layer of a 1 × 1 convolution (Conv), compressibility factor 0.5.Wherein each dense network block includes five layers again, an example Normalized function, a correction linear unit and 3 × 3 convolution form one layer, and every layer of growth rate is k=32.This it Before, an example normalized function, the bottleneck layer of a correction linear unit and 1 × 1 convolution composition dense network, with The quantity for reducing input feature vector mapping, to improve computational efficiency.In the experiment of the present embodiment, the present invention allows each 1 × 1 Convolution generates 4k Feature Mapping.Therefore, the present embodiment will contain there are five IN-Relu-Conv (1 × 1)-IN-Relu-Conv (3 × 3) structure of layer is called a dense network block (DB), referring specifically to Fig. 4.Five DB-IN-Relu- of the present embodiment Convenor section of Conv (1 × 1) structure as generator.

Finally, colored style image is rebuild and exports with 3 pieces of up-sampling convolution blocks, each upper two layers of flat volume of convolution heel Product, upper convolution are that convolution kernel is 3 × 3, the warp lamination that step-length is 2, and flat convolution is finally volume with the flat convolution in encoding block The final convolutional layer of product core 3 × 3.Each process herein is known as a decoding block by the present embodiment.For better feature weight With the output feature of each encoding block is carried out copy type with the output feature of corresponding decoding block and connected by the present invention, as next The input of a decoding block.Referring specifically to Fig. 2

Be mainly down-sampling convolutional layer in discriminator, for extracting feature and carrying out two classification, wherein fine discriminator by Four layers of down-sampling and end layer are constituted, wherein every layer of down-sampling convolution kernel is 4 × 4, step-length 2, end layer convolution kernel is 4 × 4, step-length 1, the matrix that output is one 16 × 16, to be made the difference with true tag.Coarse discriminator is by six layers of down-sampling It is constituted with an end layer, wherein down-sampling layer and end layer are consistent with fine discriminator, the matrix that output is one 4 × 4, To extract higher level characteristic information, image integral layout is adjusted.It is specifically shown in Fig. 3

To prevent mode from collapsing, the present invention continues to use the circulation consistency feature of CycleGAN, is equivalent to two sets of training mutually The model G and F of mapping, to obtain circulation consistency loss L_cyc(G, F) guarantees the stability of network training.Recycle consistency Loss is defined as follows:Wherein, E (*) indicates distribution The desired value of function, S_data(p) distribution for the lines figure that do not paint, S are represented_data(c) point of the stylized figure after coloring is represented Cloth.The each lines figure p recycled in coherence request lines atlas P obtains stylized figure G after Maker model G conversion (p), G (p) can and start the being consistent property of p of input using gained lines figure F (G (p)) after Maker model F conversion, from And form a circulation pattern.Similarly, stylization figure c gained G (F (c)) after F is converted using G is still able to maintain and original c Unanimously.(present invention only emphasizes the conversion of lines figure to stylized figure, and inverse process does not require, calculates and lose just with it Function keeps circulation consistency.) this consistency be embodied in loss function with L1 norm | | * | |₁It calculates.

Specific implementation step of the invention is as follows:

1, the black and white line figure and color-patch map work that simple three-dimensional model renders 256 × 256 pixels respectively are made with 3DMax For network training collection P and training set C.

2, collection P and training set C storage will be practiced respectively into TFRecords file, this is deposited for the normal data of TensorFlow Format is taken, resource is saved.

3, collection P and training set C will be practiced and inputs network, network is trained, specifically:

1) training discriminator: a true picture c in training set C inputs two two mirror for being equipped with default parameters respectively In other device, discriminator extracts picture feature by convolution, finally exports the gap between true tag 0.9 respectively, then ask poor Away from penalty values D (c) of the average value as discriminator, target is to try to keep this penalty values low, according to this value after backpropagation Update discriminator parameter.

2) training generator: in the picture input generator in training set P, generator is mentioned by cataloged procedure convolution Picture feature is taken, picture feature is accordingly converted by conversion process, then reconstructed feature image by deconvolution, most Output generates picture eventually.

3) picture is generated as the input of double discriminators using what is 2) obtained, discriminator is right on the basis of being trained to primary This input carries out convolution, extracts picture feature, and finally the gap between output and true tag 0.9, target are to try to respectively So that this gap is become larger, the loss D (G (p)) that generator is generator is passed to after the two gaps are averaged, generates Device is according to this value backpropagation undated parameter.

4) 1) -3 are repeated), the true and false of picture is inputted until discriminator identifies not Chu, loss function tends to restrain to reach to receive Assorted balance, deconditioning.This process generally wants iteration 40,000-6 ten thousand to take turns, and total data is entered one time as a wheel in training set.

5) confrontation consistency loss is defined as: L (G, D)=L_adv(G,D)+L_cyc(G,D)

Wherein, L_cyc(G, D) i.e. above-mentioned circulation consistency loss.In Ladv (G, D), E (*) indicates distribution letter Several desired values, S_data(c) distribution of the stylized figure after coloring, S are represented_data(p) distribution for the lines figure that do not paint is represented. G attempt to generate seem with c like G (p), and the purpose of D is to discriminate between the G (p) of generation and c, i.e. D are schemed in true stylization (G (p)) and D (c).

4, trained model is exported to the file of .pb format, the derived each parameter simply generated in device, is surveyed at this time Discriminator is not had to when examination.

5, test set is made: with a kind of multiple point touching interactive tools for fast construction modeling^[1], provide one three Structural construction model is tieed up as input, renders one 256 × 256 Architectural drawing of lines drafting as test set.

6, test set is read in, trained network model is called, generates result figure.Details refer to Fig. 5-1 to Fig. 5-9.

The present invention is not limited to embodiments described above.Above the description of specific embodiment is intended to describe and say Bright technical solution of the present invention, the above mentioned embodiment is only schematical, is not restrictive.It is not departing from In the case of present inventive concept and scope of the claimed protection, those skilled in the art are under the inspiration of the present invention The specific transformation of many forms can be also made, within these are all belonged to the scope of protection of the present invention.

Claims

1. a kind of network structure ArcGAN, CycleGAN network architecture of the picture training based on CycleGAN is by two convolution minds It is formed through network (CNN), one is generator, to be trained to generate the output of deception discriminator；The other is discriminator, For classifying to image from real goal or composograph；Generator includes encoder, converter and decoder, is compiled Code device is an input layer and two down-sampling convolutional layers, and converter is nine pieces of residual error convolutional layers, and decoder is two up-samplings Warp lamination and an output layer；It is characterized in that, the network structure ArcGAN is made of generator and double discriminators, institute Stating double discriminators includes coarse discriminator and fine discriminator；

Encoder in the generator of the network structure ArcGAN includes an input layer and three down-sampling convolutional layers, each Down-sampling convolutional layer is followed by two flat convolutional layers as input layer structure；Converter includes five without the intensive of pond layer Convolution block, every piece includes five intensive convolutional layers with bottleneck layer, and compression layer is equipped between block and block；Decoder includes three Up-sampling warp lamination and an output layer, each up-sampling warp lamination are followed by two flat convolution as input layer structure Layer；Every layer of down-sampling in encoder carries out copy type with the up-sampling in corresponding decoder and connects.

The coarse discriminator is made of for handling high-rise visual information six layers of down-sampling layer and an output layer；Fine mirror Other device is made of four layers of down-sampling and an end layer；The loss that fine discriminator and coarse discriminator calculate combines, with life Common completion of growing up to be a useful person fights consistency training.