CN112950460A

CN112950460A - Technology for image style migration

Info

Publication number: CN112950460A
Application number: CN202110323511.7A
Authority: CN
Inventors: 梅琪琪; 王洪博; 祝忠明
Original assignee: Chengdu Univeristy of Technology
Current assignee: Chengdu Univeristy of Technology
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2021-06-11

Abstract

The invention discloses a technology for image style migration, which comprises an experimental environment and an algorithm research part. The experimental platform is a Linux CentOS-78 operating system, and the video card is NVIDIA Tesla V100. The algorithmic study includes preprocessing the homemade data set and training the model. The operation flow of the whole system is as follows: 1. the Ubuntu16.0 operating system is installed, and the configuration Anaconda version is Anaconda3, the python version is 3.7, the CUDA version is 10.0, and the cudnn version is 7.5. 2. And (3) sending the data set subjected to digital vectorization into a cycleGAN optimization algorithm for training. 3. And carrying out comparative evaluation on the output pictures. The invention mainly aims to improve the accuracy of image style conversion and promote the development of artificial intelligence.

Description

Technology for image style migration

Technical Field

The invention belongs to the aspect of image processing based on artificial intelligence, and relates to an image style migration technology.

Background

Image style migration is a method of applying some characteristics or styles of one picture to another picture to convert the picture into a designated image style. The traditional non-parametric image style migration method is mainly based on the drawing of a physical model and the synthesis of textures. Although the methods obtain better effect, the non-parametric image style migration method can only extract the bottom layer features of the image, but not the high-level abstract features, and the final image synthesis effect is not ideal when the images with complex colors and textures are processed. In recent years, deep learning carries with the rapid development of the wave of artificial intelligence, and shows a remarkable effect in the aspect of processing mass data, and the powerful ability of learning and processing data exceeds the performance of human beings even in partial fields. Therefore, Gatys et al discovered that, in the process of studying texture synthesis using a convolutional neural network, the statistical characteristics of the feature map in the convolutional neural network can reflect the style of an image, and the feature map is a deep feature representation of the network input image and reflects the content features of the image. Then, a randomly initialized image can be adjusted to be similar to the famous painting in style but the content is the image of the common photo through an iterative optimization method, thereby leading the concept of the migration network. Subsequent Johnson et al developed the network and presented the concept of transforming the network. Although the image stylization has achieved a good effect at present, there are still some problems that are not solved: firstly, the speed problem is solved, even the most advanced conversion network scheme is adopted, dozens of minutes are usually required for training a model, secondly, the problems that the style representation of the result after migration is not obvious enough exist, and the problems obviously have space for improvement. Rough and difficult to meet the actual requirements. Therefore, a fast and accurate algorithm is urgently needed to solve the above problems.

Disclosure of Invention

The invention provides an image style migration system based on a GAN optimization algorithm of a GAN, which is used for solving the problems of low speed, difficult training and low accuracy in the existing algorithm. The specific scheme is as follows:

in a first aspect, the present application provides a new image style migration method, including:

data sets were published data sets monet2photo and summer2winter _ yosemie, provided by the institute of electronic engineering and computers, university of california, berkeley.

And building environments comprising Anaconda, Pycharm compiling environment, Opencv, TensorFlow, Bottleneck, numpy, olefile, xlwt, zict, atomics writes and other installation packages.

And constructing a Darknet and CUDA parallel computing framework for receiving and processing data. And carrying out image style migration according to the cycleGAN optimization algorithm provided by the patent.

And training the model, wherein the environment required by the system is configured before the model is trained, and the training can be started after the version is checked to be correct. The main process is to send the data set into the optimized CycleGAN for training.

And evaluating the output result.

In a second aspect, the present application provides an image style migration system, including:

the experimental environment is as follows: the operating system selects Linux CentOS-78, and the display card is Tesla V100 GPU.

And (3) algorithm research: mainly, images in two data sets of monet2photo and summer2 winner _ yosemite are subjected to image style migration through an optimized CycleGAN algorithm. The addition-v 3 network is added into the original CycleGAN algorithm, so that the problem of poor representation of the original algorithm due to the residual error network is solved.

The operation flow of the whole system is as follows: 1. the Ubuntu16.0 operating system is installed, and the configuration Anaconda version is Anaconda3, the python version is 3.7, the CUDA version is 10.0, and the cudnn version is 7.5. 2. And (3) sending the data set subjected to digital vectorization into a cycleGAN optimization algorithm for training. 3. And carrying out comparative evaluation on the output pictures.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic overall framework diagram of an image style migration technology system provided in an example of the present application. Fig. 2 is a software system overall design scheme, fig. 3 is an inclusion-v 3 module diagram, fig. 4 is a CycleGAN optimization network structure diagram, fig. 5 is a loss function diagram, and fig. 6 is an Accuracy diagram.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As can be seen from FIG. 1, the application is based on the Linux CentOS-78 operating system.

A CPU: the CPU has 8G memory and powerful ALU (arithmetic operation unit) capable of completing arithmetic calculation in very few clock cycles and reaching 64bit double precision. The addition and multiplication of the double-precision floating-point operation are carried out in 1-3 clock cycles. The frequency of the clock cycles of the CPU is very high, reaching 1.5323gigahertz (gigahz, power 9 of 10). The powerful data processing function of the central processing unit effectively improves the working efficiency of the computer, and during data processing operation, the central processing unit is not only simple in operation, but also realizes the correspondence between the control instruction input by a user and the CPU in the instruction task executing process on the basis of the instruction task issued by a computer user.

NVIDIA Tesla V100 is a sophisticated implementation in the data center GPU on the market today to accelerate artificial intelligence, high performance computing, and graphics. The NVIDIA Tesla V100 accelerator is based on a brand-new Volta GV100 GPU, and the GV100 is a first processor which breaks through the 100TFLOPS deep learning performance limit. GV100 combines the CUDA core and the sensor core, providing excellent performance of AI (Artificial Intelligence) supercomputers in the GPU. Now, with the Tesla V100 accelerated system, the AI model, which in the past required several weeks of computational resources, took only a few days to complete training. With the substantial shortening of the training time, under the assistance of the NVIDIA Tesla V100 accelerator, the AI can now solve various novel problems.

Programming language: one of the design goals of Python is to make the code highly readable. When the design is carried out, punctuation marks and English single characters frequently used by other languages are used as much as possible, so that the code looks neat and beautiful. It does not require repeated writing of statement statements as is the case with other static languages such as C, Pascal, nor is it often the case and unexpected as is their syntax. This makes it a programming language for scripting and rapid development of applications on most platforms.

CUDA (computer Unified Device architecture), which is an operating platform introduced by NVIDIA (video graphics card vendor). CUDA is a general-purpose parallel computing architecture derived from NVIDIA that enables GPUs to solve complex computational problems. The version used in this patent is CUDA10.1 with cudnn 7.5.

The overall design scheme of the software system is shown in fig. 2, which mainly comprises the following steps:

step 1: and building environments comprising Anaconda, Pycharm compiling environment, Opencv, TensorFlow, Bottleneck, numpy, olefile, xlwt, zict, atomics writes and other installation packages.

Step 2: and sending the digitally vectorized pictures into the optimized cycleGAN for training. The input and output image sizes are unified to 256 × 256 pixels. Batch size is 2, training times are 300 rounds, and each 5 epochs holds one checkpoint. The initial value of the learning rate at this time is set to 0.0002, and is set to be gradually decreased after epoch is more than 150 times. The gradient descent in the training is optimized by using the Adam algorithm. FIG. 3 is a diagram of an inclusion-v 3 module, and FIG. 4 is a diagram of a CycleGAN optimization network.

And 4, step 4: and testing whether the identification precision of the model file meets the expected requirement, adjusting parameters related to the algorithm according to the experimental result, and verifying and comparing. The evaluation index is evaluated by peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM).

FIG. 5 is a graph of loss function, and FIG. 6 is an Accuracy graph.

Claims

1. A technology for image style migration is characterized in that an experiment platform comprises a CPU, a GPU, a programming language and a CUDA.

NVIDIA Tesla V100 is a sophisticated implementation in the data center GPU on the market today to accelerate artificial intelligence, high performance computing, and graphics. The NVIDIA Tesla V100 accelerator is based on a brand-new Volta GV100 GPU, and the GV100 is a first processor which breaks through the 100TFLOPS deep learning performance limit. GV100 combines the CUDA core and the Tensor core to provide superior performance of AI (architecture Intelligency) supercomputers in the GPU. Now, with the Tesla V100 accelerated system, the AI model, which in the past required several weeks of computational resources, took only a few days to complete training. With the substantial shortening of the training time, under the assistance of the NVIDIA Tesla V100 accelerator, the AI can now solve various novel problems.

2. According to claim 1The system described above, wherein images of one style can be converted to images of another style. The algorithm selected for the method is added with an inclusion-v 3 module on the basis of the CycleGAN, an algorithm model in the network uses a full convolution neural network, and training and evaluation are carried out on a data set. : 1. the ubutun16.0 operating system is installed, and the configuration Anaconda version is Anaconda3, the python version is 3.7, the CUDA version is 10.0, and the cudnn version is 7.5. 2. And (3) sending the data set subjected to digital vectorization into a cycleGAN optimization algorithm for training. 3. And carrying out comparative evaluation on the output pictures. The RELU function is used as the activation function in this process, as shown in equation 1. Loss function L_cycFor Cycle Consistency Loss, the optimizer selects the Adam algorithm optimizer for optimization as shown in equation 2. The evaluation index adopts peak Signal-to-Noise ratio (PSNR) (Peak Signal to Noise ratio) as shown in formula 3, wherein an image I with an image size of m × n pixels and Noise K are given in the PSNR, and maxI in the formula is the maximum value of image color. The structural similarity SSIM (structural similarity) is shown in equation 4. The SSIM is measured in the image X, Y from three aspects of brightness (l), contrast (c) and structure(s), the value range of the SSIM is 0-1, and the closer to 1, the better the effect is.

SSIM (X, Y) ═ 1(X, Y) × c (X, Y) × s (X, Y) formula 4.