CN111915591B

CN111915591B - High-quality image extrapolation system based on spiral generation network

Info

Publication number: CN111915591B
Application number: CN202010768731.6A
Authority: CN
Inventors: 郭冬升; 郑海永; 赵浩如
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2020-08-03
Filing date: 2020-08-03
Publication date: 2022-03-22
Anticipated expiration: 2040-08-03
Also published as: CN111915591A

Abstract

The invention relates to the technical field of image extrapolation, and particularly discloses a high-quality image extrapolation system based on a spiral generation network, wherein the spiral generation network comprises a hypothetical graph generation network, a slice generation network, a spiral discriminator and an extrapolation discriminator; the slice generation network comprises a slice operator, a slice generator and an extrapolation operator. The invention provides a novel spiral generation network, which treats the spiral extrapolation problem as a spiral growth process for the first time, so that an input sub-image gradually grows around along the spiral curve direction under the action of a hypothetical graph generation network and a slice generation network, and one side in the upper, lower, left and right directions is extrapolated each time until the input sub-image is extrapolated into a complete image.

Description

High-quality image extrapolation system based on spiral generation network

Technical Field

The invention relates to the technical field of image extrapolation, in particular to a high-quality image extrapolation system based on a spiral generation network.

Background

Given a sub-image (e.g., a partial image of a human face), how do one want to draw a complete image? In fact, although the surrounding area cannot be seen, the human can preliminarily imagine the appearance of the whole image according to the prior knowledge and then gradually draw the image from inside to outside along the periphery of the image according to the imagined content. In computer vision, such a problem is called image extrapolation. The purpose of the image extrapolation is to fill the invisible area around the input sub-images with content, e.g. to map complete objects from partial images of objects (e.g. faces, cars), or to extend the content boundaries of the scene image. The image extrapolation problem has two major challenges:

1. the extrapolated image needs to have semantic rationality, and the extrapolated part needs to have specific accurate semantic content;

2. the extrapolated area must be consistent in structure and texture with the original input sub-image.

Even for humans, image extrapolation is a challenging problem, and in computer vision, the problem of image extrapolation is primarily solved on the basis of the rapid development of the generation network in recent years. Currently, image extrapolation methods based on resist-generation networks first generate the entire complete image and then paste the original input sub-image onto the complete image according to the extrapolated position, which can cause the final result to be visually uncomfortable between the input sub-image area and the extrapolated area. In addition, due to the common long-distance context relationship between the extrapolated area and the input sub-image, directly applying the image completion method often results in a blurring phenomenon that produces semantic inconsistency.

Disclosure of Invention

The invention provides a high-quality image extrapolation system based on a spiral generation network, which solves the technical problems that: at present, an image extrapolation method based on a countermeasure generation network is not realized, so that an extrapolated image has semantic rationality, an extrapolated part needs to have specific accurate semantic content, and an extrapolated area and an original input sub-image are consistent in structure and texture.

In order to solve the above technical problems, the present invention provides a high quality image extrapolation system based on a spiral generation network, the spiral generation network comprising a hypothetical graph generation network, a slice generation network, a spiral discriminator, and an extrapolation discriminator;

the hypothetical graph generation network comprises a hypothetical graph generator and a hypothetical graph discriminator, and the hypothetical graph generation network is used for generating the hypothetical graph of the input sub-image under the countermeasure of a hypothetical graph target loss function of the hypothetical graph generator and the hypothetical graph discriminator;

the slice generation network comprises a slice operator, a slice generator and an extrapolation operator; the slice operator is used for cutting the hypothetical image into hypothetical image slices and cutting the extrapolated image of the current spiral point into nearest neighbor slices; the slice generator is configured to generate extrapolated slices from the nearest neighbor slice, the hypothetical graph slice, and the input sub-image; the extrapolation operator is used for stitching the extrapolated slice back to the extrapolated image of the current spiral point to obtain the extrapolated image of the next spiral point, so that the spiral extrapolation of one time is completed;

the spiral discriminator and the extrapolation discriminator are used for outputting a target image after a plurality of times of spiral extrapolation under the countermeasure of a spiral loss target function.

Further, the hypothetical graph target loss function is formed by linearly combining a first pair of loss-resisting function, a hue-color loss function and a perception loss function; the expression of the hue-color loss function is:

wherein the content of the first and second substances,

is a reduced image of the real image Y,

h x w is the size of the real image Y, ξ and γ are both constants, γ < 1.

Further, the expression of the hypothetical graph target loss function is:

wherein the content of the first and second substances,

representing the first immunity to loss function and the perceptual loss function, respectively,

is used for balancing

Weights of three types of losses;

the first immunity to loss function

The expression of (a) is:

wherein G is_I、D_IRespectively representing the hypothetical graph generator and the hypothetical graph discriminator,

representing an input sub-image;

the expression of the perceptual loss function is:

wherein N is_uIs the number of elements, σ, of the feature matrix in the u-th active layer_uIs the activation characteristic matrix of the u-th layer in the pre-training model VGG-19.

Further, the hypothetical graph generator adopts a generator structure of CycleGAN, and the hypothetical graph discriminator adopts a discriminator structure of Pix2 Pix.

Further, the input sub-images are input into the hypothetical graph generation network after being superimposed and extrapolated with the masking layer M and the uniformly distributed noise Z, so as to obtain the hypothetical graph.

Further, the spiral loss objective function is formed by linearly combining a second pair of anti-loss functions, an L1 loss function, a style loss function and the hue-color loss function;

the expression of the spiral loss objective function is as follows:

wherein the content of the first and second substances,

respectively represent the second pair of loss-immunity functions, the L1 loss function, the style loss function, the hue-color loss function, λ_adv、λ_L1、λ_style、λ_hueIs used for balancing

The weights of these four types of losses;

the expression of the second immunity to loss function is:

wherein F represents a helix generating function, D_SRepresenting said spiral discriminator, D_ERepresenting the extrapolation discriminator in question,

representing the target image generated by the spiral,

to represent

E denotes the extrapolated area of the real image Y,

to represent

The extrapolation area of, i.e.

Is shown and

and an extrapolated shielding layer of equal size Y.

The expression of the L1 loss function is:

the expression of the style loss function is:

wherein the content of the first and second substances,

representing the slave activation map σ_vG of medium structure_v×G_vAnd (5) a Gram matrix.

Further, the slice generator comprises an encoder, an adaptive instance normalization module, a spatial adaptive normalization module, and a decoder; inputting the hypothetical image slice to the encoder, fusing the style of the input sub-image in its hidden variable space by an adaptive instance normalization module, and then introducing semantic information of the nearest neighbor slice in the decoder by a spatial adaptive normalization module to generate the final extrapolated slice.

The invention provides a high-quality image extrapolation system based on a spiral generation network, which has the beneficial effects that:

1. proposing a hypothetical graph generation network to mimic the human imagination to generate hypothetical graphs for directing the slice generation network to generate more image details, the hypothetical graph target loss function may cause colors of the hypothetical graphs to be closer to the input sub-images;

2. providing a slice generation network, adopting a slice operator, a slice generator and an extrapolation operator, and gradually extrapolating the input subimages along four directions, namely the upper direction, the lower direction, the left direction and the right direction to draw the image under the semantic guidance of a hypothetical graph, wherein the finally obtained target image has better voice consistency and high-quality details and can be basically consistent with the original input subimages in structure and texture;

3. the method comprises the steps of providing a novel spiral generation network, regarding a spiral extrapolation problem as a spiral growth process for the first time, enabling an input sub-image to grow gradually around along a spiral curve direction under the action of a hypothetical graph generation network and a slice generation network, and extrapolating one side in the upper, lower, left and right directions each time until the input sub-image is extrapolated into a complete image, wherein the extrapolation area generated by the spiral is relatively real due to the network structures of the hypothetical graph generation network, the slice generation network, a spiral discriminator and an extrapolation discriminator and the corresponding target loss function design, and can be consistent with the original sub-image in structure and texture, so that the method accords with human imagination;

4. facing various image extrapolation situations and different data sets, the spiral generation network achieves the best index and visual effect in the image extrapolation problem.

Drawings

FIG. 1 is a schematic diagram of a spiral generation network that performs spiral extrapolation of an input subimage provided by an embodiment of the invention;

fig. 2 is an exemplary diagram of an effect of the spiral generation network provided in the embodiment of the present invention after performing spiral extrapolation on an input sub-image;

FIG. 3 is a schematic structural diagram of a spiral generating network provided by an embodiment of the present invention;

FIG. 4 is a flowchart illustrating operation of a slice generation network when a spiral generation network performs a spiral extrapolation of one turn according to an embodiment of the present invention;

FIG. 5 shows qualitative assessment results of different cases provided by embodiments of the present invention;

FIG. 6 is a diagram illustrating an example of the results of an unknown boundary distance case on a CelebA-HQ dataset provided by an embodiment of the present invention;

FIG. 7 is an exemplary graph of qualitative assessment results of a spiral necessity ablation experiment provided by an embodiment of the present invention;

FIG. 8 is an exemplary plot of qualitative rating results for a ternary slicing generation network input ablation experiment provided by embodiments of the present invention;

FIG. 9 is an exemplary illustration of qualitative assessment results of ablation experiments for different slice sizes of a slice generation network provided by embodiments of the present invention;

FIG. 10 is a graph comparing different penalties on the Flowers and Stanford Cars datasets for the hypothetical graph-generated network provided by embodiments of the present invention.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the accompanying drawings, which are given solely for the purpose of illustration and are not to be construed as limitations of the invention, including the drawings which are incorporated herein by reference and for illustration only and are not to be construed as limitations of the invention, since many variations thereof are possible without departing from the spirit and scope of the invention.

At present, an image extrapolation method based on a countermeasure generation network is not realized, so that an extrapolated image has semantic rationality, an extrapolated part needs to have specific accurate semantic content, and an extrapolated area and an original input sub-image are consistent in structure and texture. In order to solve the technical problem, the embodiment proposes a high-quality image extrapolation system based on a spiral generation network, and performs experimental verification on an image extrapolation effect of the spiral generation network, which is described below with respect to the spiral generation network and an experimental part, respectively.

Spiral generating network

Based on the inspired theory that human beings have imagination on invisible areas outside the image, aiming at the problem of high-quality image extrapolation, the embodiment provides a novel spiral generation network which extrapolates input sub-images along a spiral curve in four directions around the image to finally reach a target size. The spiral generating network is composed of two parts: a hypothetical graph generation network and a slice generation network. The two sub-networks decompose the image extrapolation problem into two independent sub-tasks: semantic information inference (generating a network from a hypothetical graph) and context detail generation (generating a network from slices). In this case, the slice generation network is designed to utilize the correlation between the generated extrapolated content and the extrapolated direction, and the extrapolated slice generation is performed for each direction.

As shown in fig. 1, this embodiment treats the image extrapolation problem as a spiral growth process, with the input sub-images growing progressively around the spiral curve direction, one edge at a time, until the full image is extrapolated. In essence, a spiral generation network is a progressive partial-to-whole image generation method that gradually extrapolates an input subimage in four directions, up, down, left, and right, to render an image. In this way, the present embodiment decomposes the large-block invisible area around the input sub-image into a plurality of small-block directional slices, which are generated separately for each small-block, which results in a final extrapolated image with better speech consistency and high quality detail, as shown in fig. 2.

As shown in fig. 3, the present embodiment regards the image extrapolation as a spiral growth process of the input sub-images: the input sub-images are extrapolated stepwise along the spiral curve to the complete image. Wherein an input sub-image is given

Extrapolation distance m ═ m^l,m^t,m^r,m^b) Wherein m is^l、m^t、m^rAnd m^bRespectively, the extrapolation distances of the left side, the top side, and the right side from the bottom side. The purpose of the image extrapolation is to output a high quality extrapolated image (target image):

wherein h ═ h + m^t+m^b，w′＝w+m^l+m^rX is

One sub-image.

A series of points P ═ P on the spiral curve₁,p₂,…,p_NAnd inputting the subimages to perform extrapolation in a certain direction at the series of points until the target image is obtained after N times of extrapolation

Each point p on the spiral curve is represented by the current point in both the number of turns on the spiral curve and the extrapolation direction. For convenience, the extrapolated distance τ at each point p is the same in this embodiment.

During the spiral extrapolation of any image, according to a given extrapolation distance m and an extrapolation size tau, it can be countedAnd calculating to obtain the total number N of points on the spiral curve and the total number T of turns of the spiral. Extrapolation function G for the present example_p(. -) represents the extrapolation process at point p, for the k-th point p_kInputting sub-images

Passing function

Is extrapolated to

Wherein p is_k+1Represents p_kThe next point on the spiral curve. In that

To

The input sub-image size will increase accordingly during the extrapolation process of (2). Finally, the subimage is input

Extrapolating to the target image through N points on the T-turn spiral curve

This process can be expressed as:

at the same time, the image sizes h and w will be changed to h 'and w', respectively, and the extrapolation distance will be changed

Will become (0,0,0, 0).

It is worth noting that the number of spiral turns in the four directions is not necessarily equal, since the input sub-image is not necessarily exactly in the center of the target image, so that the extrapolation in the four directions may stop at different times.

(1) Hypothetical graph generation network

The present embodiment proposes a hypothetical graph generation network for "imagining" a rough complete image from a given input sub-image. This hypothetical graph will be used as a guide for the slice generation network. This strategy is to imitate the imagination of human beings to the image extrapolation problem, and the embodiment is used for solving the image extrapolation problem by computer vision by taking the idea as a reference.

The hypothetical graph generating network is essentially a conditional countermeasure generating network G with an encoder-decoder_I. Giving an input sub-image

The obtained data is superimposed and extrapolated with a masking layer M and a uniformly distributed noise Z, and input into a network to obtain a hypothetical graph

In this case, all the input and output images are scaled to the same small size (e.g., 128 × 128), so as to fully utilize the advantage of the anti-generation network in low-resolution image generation.

In addition to the penalty function, the present embodiment introduces another penalty function to obtain better effect in the objective function of the network. In particular, the embodiment proposes a hue-color loss function in the network to eliminate the phenomena of bright spots and dark color tone generated by the output result. Furthermore, after experimentation, it was found that hue-color loss can help to stabilize the network training process at the same time.

Hue-color loss function. Hue is the basic element of color, and most people talk about various colors and express the hue of the color. Therefore, it would be helpful to maintain a consistent hue during image extrapolation. However, according to the HSL/HSV representation of the RGB color cube, the same hue may result in completely unused colors, which is limited. To simultaneously maintain hue consistency and color coordination, the present embodiment expresses hue-color as:

wherein the content of the first and second substances,

is a reduced image of the real image Y, ξ is a very small number (typically ≦ 0.01) added to avoid the appearance of 0 in the calculation, γ < 1 being used to increase the difference. In the experiment, the present embodiment empirically set ξ to 0.001 and γ to 0.4.

The hue-color loss of this embodiment is more concerned about true "color" than color loss and reconstruction loss, and neglects the gray-related parts, which is well-suited for emphasizing the reality of the extrapolated image and keeping the input sub-images consistent with the extrapolated area in the extrapolation task. In the practice of the present embodiment, it is found that in the absence of such a loss, the generated image tends to have a problem of a dark tone, which tends to produce bright spots in a target image (such as a flower) having vivid colors. While hue-color loss can solve this problem well. Furthermore, the present embodiment finds that this loss can help stabilize the training process.

Hypothetical target loss function. The target loss function of the hypothetical graph is formed by linearly combining a first pair of loss-resisting function, a hue-color loss function and a perception loss function, and the expression of the target loss function is as follows:

wherein the content of the first and second substances,

respectively representing a first immunity to loss function, a perceptual loss function,

is used for balancing

Weight of three types of losses；

In the experiments that follow

Set to 0.1, 10 and 1, respectively, and in other embodiments, other weight values may be set, such as 0.2, 10, 1, etc.

A first pair of loss-immunity functions. The penalty of the hypothetical generated network is expressed as:

wherein G is_I、D_IRespectively representing a hypothetical graph generator and a hypothetical graph discriminator,

representing the input sub-image.

Hypothetical drawing generator G_IDiscriminator D for reducing the number of generated image pairs in training_IIs determined by the phantom drawing discriminator D_IThe competing training between the two, in an attempt to increase its value, causes the hypothetical graph generator G to_IA more realistic image can be generated.

A perceptual loss function. The embodiment uses the loss of perception

For penalizing the difference between the hypothetical graph and the real image, the expression is:

(2) Slice generation network

The embodiment designs a novel slice generation network to realize p on the spiral curveExtrapolation function G of points_p(. functions). As shown in fig. 3, the slice generation network is formed by a slice operator ψ_pSlicing generator

And the extrapolation operator phi_pThe composition, in addition, the present embodiment also adds a spiral discriminator D_SAnd extrapolation discriminator D_E(both not identified in fig. 3).

For the k point p on the spiral curve_kExtrapolation function of slice generation network extrapolates image of current spiral point

And a hypothetical drawing

Outputting as input the extrapolated image

Hypothetical drawing

Needs to be enlarged to have a real image

The same dimensions, which are generated by the phantom generation network above.

And (5) slicing operator. Slicing operator psi_pThe method is used for cutting out a target slice from the image of the current spiral point, and the size of the cut slice is the extrapolated size at the point. For p_kGenerating a network of two slices, respectively extrapolating the current spiral point with two slice operators

And the imaginary diagram

Cutting out nearest neighbor slices

Slice with phantom

Namely:

and

a slice generator. To make better use of the hypothetical image

Input sub-image X and neighboring slices

The present embodiment designs a new generation structure- "encoder-adaptive instance normalization module-spatial adaptive normalization module-decoder", and the generator using this structure will make full use of the information of the two slices and the input sub-image to maintain semantic accuracy and visual consistency of the extrapolated image. As shown in FIG. 4, in the slice generator, the phantom is sliced

Input to the encoder, the style of the input sub-image X is fused in its hidden variable space by an adaptive instance normalization module, after which the nearest neighbor slice is introduced in the decoder by a spatial adaptive normalization module

The semantic information of the target object is obtained, and finally the extrapolation section is obtained

This process can be expressed as:

it is worth mentioning that the first-mentioned type of the coating,

slice nearest neighbor in (1)

Input sub-image with next spiral point

Extrapolated section of

Nearest neighbor, hypothetical graph

Slice of hypothetical drawing (5)

Then correspond to

Extrapolated section of

The position of (a). In this manner, the present embodiment semantically combines the corresponding slice semantic information from the input sub-images and hypothetical images for high quality extrapolated slice generation.

Furthermore, the slice generation network has no independent discriminator, considering the computational complexity problem.

And (5) an extrapolation operator. The purpose of the extrapolation operator is to slice the output of the slice generator

Stitching back input sub-images

Completing one extrapolation operation:

from the above, the present embodiment is at point p_kAnd finishing one-time extrapolation.

The shared slice generates a network. The spiral generating network includes N slice generating networks for image extrapolation of points on each spiral curve, as shown in fig. 3 for a combination of four slice generating networks in one turn of the spiral curve. When a long distance extrapolation of the input sub-images is required, the whole spiral generation network will require a large number of slice generation networks and thus its parameters will be very large. To address this problem, the present embodiment shares all slice generation network weights, that is, there is only one independent slice generation network in the whole spiral generation network.

(3) Design of spiral loss

The spiral loss objective function. The spiral loss objective function is formed by linearly combining a second pair of anti-loss functions, an L1 loss function, a style loss function and a hue-color loss function. The expression of the objective function of the spiral loss is:

wherein the content of the first and second substances,

respectively representing a second pair of immunity loss functions, an L1 loss function, a style loss function, and a hue-color loss function; lambda [ alpha ]_adv、λ_L1、λ_style、λ_hueIs used for balancing

The weights of these four types of losses are empirically set to 0.1, 10, 250, and 10, respectively, and in other embodiments, may be set to other weights, such as 0.2, 20, 250, and 10, respectively.

A second pair of loss-immunity functions. This embodiment designs a spiral discriminator D_SAnd an extrapolation discriminator D_ETo discriminate the whole extrapolated image

And extrapolation area

The expression for the second immunity function is:

wherein F represents a helix generating function, D_SDenotes a spiral discriminator, D_EIt is shown that the extrapolation discriminator,

representing the image of the object generated by the spiral,

to represent

E denotes the extrapolated area of the real image Y,

to represent

The extrapolation area of, i.e.

Is shown and

and an extrapolated shielding layer of equal size Y.

F is the whole spiral generating function, which attempts to minimize the discriminator D in the training_SAnd D_EIs lost, and D_SAnd D_EThen an attempt is made to maximize the loss due to F, which is a countermeasure between the two. Spiral discriminator D_SAttention is paid to maintaining image consistency while extrapolating discriminator D_EThen the continuity between slices of the extrapolated area is more emphasized.

L1 loss function. The embodiment minimizes the reconstruction difference between the extrapolated image and the real image by the L1 loss function, and the expression is:

wherein the content of the first and second substances,

(4) Case of unknown extrapolated distance

Assuming that only input sub-images are given

And extrapolating the output image size h '× w' × c, in the case where the extrapolation distance m is unknown, it cannot be known that X is

To a specific position in (a). In this case, the previous work was stranded. In the method of the embodiment, due to the strategy of generating the hypothetical graph (removing the extrapolation covering layer by the network input), the embodiment can perform template matching on the magnified hypothetical graph and the input sub-image, so that the approximate position of the input sub-image in the extrapolated image can be known. In this way, the present embodiment obtains the extrapolation distance in each direction virtually indirectly. In the experiments below, the present embodiment employs a normalized cross-correlation template matching method.

(5) Implementation details

And (5) network architecture. For the hypothetical graph generation network and the slice generator, the present embodiment employs an encoder-decoder network structure similar to the CycleGAN method. The difference is that in the slice generator, the original 8 residual blocks of the bottleneck network part are replaced by 6, and in addition, the adaptive instance normalization module is introduced before the bottleneck residual block, and two spatial adaptive normalization modules are introduced before two transposed convolution layers of the decoder, so that a structure of 'encoder-adaptive instance normalization module-spatial adaptive normalization module-decoder' is formed. In addition, for the hypothetical graph generation discriminator and the extrapolation discriminator, the present embodiment employs an image block discriminator structure based on the Pix2Pix method; inspired by the music method, the present embodiment adopts a network with a structure similar to that of DenceNet to realize a spiral discriminator.

Training details.

The generation network of the hypothetical graph needs to be independently trained in advance, the generator and the discriminator are optimized by Adam, and the learning rate adopts the same parameters: α ═ 0.0002, β₁0.5 and beta₂0.9. The present embodiment trains the slice generation network, the spiral discriminator, and the extrapolation discriminator using the same configuration.

(6) Specific effects

The high-quality image extrapolation system based on the spiral generation network provided by the embodiment of the invention has the following beneficial effects:

2. providing a slice generation network, and gradually extrapolating the input subimages along four directions, namely, up, down, left and right directions to draw the image under the semantic guidance of a hypothetical graph by adopting a slice operator, a slice generator and an extrapolation operator to obtain a target image which has better voice consistency and high-quality details and can basically keep the same with the original input subimages in structure and texture;

(II) experiment

To evaluate the performance of the image extrapolation method proposed in this example, the example was performed on eight data sets: CelebA-HQ (face), Stanford Cars, CUB (bird), Flowers, Paris street view, city, Place365 desk Road and Sky, mainly considering the situation of objects (face, car, bird and flower) and scenes (street view, city, Road and Sky).

For both Stanford Car and CUB datasets, the present embodiment uses a given bounding box to crop the objects in the dataset and then resize the pictures to 256 × 256, and also discards severely distorted objects, making it more suitable for the image extrapolation task. The present embodiment lists training and test set partitioning for eight datasets in table 1, where the present embodiment retains a default official partitioning for the cityscaps and Place365 server Road datasets, with samples randomly selected on the other datasets.

The evaluation of the embodiment takes into account three different image extrapolation tasks: (1) four-sided extrapolation from 128X 128 to 256X 256 on CelebA-HQ, Stanford Cars, CUB and Flowers datasets; (2) two-way extrapolation from 256 × 256 to 512 × 256 on the cityscaps and Place365 Sky datasets; (3) one-way extrapolations were performed in Paris StreetView and Place365 Desert Road, from 256X 256 to 512X 256. This embodiment compares the method of this embodiment with the currently most advanced Boundless and SRN, where Boundless compares the one-way extrapolation case and SRN all three cases. In addition, the present embodiment deals with the case of unknown extrapolated distance on the CelebA-HQ dataset (i.e., the case of unknown extrapolated distance), which cannot be handled by SRN. All models of this example were implemented by PyTorch v1.1 on a computer equipped with 4 NVIDIA GeForce GTX 1080Ti GPUs.

TABLE 1 partitioning of eight data sets on training and testing

(1) Quantitative evaluation

According to Boundless and SRN, the present embodiment uses peak signal-to-noise ratio (PSNR), Structural Similarity Index Metric (SSIM), and fraiche onset distance (FID) as indexes for evaluating semantic consistency and visual realism (the higher the PSNR and SSIM, the better the FID, the lower the better). The results in table 2 demonstrate that the spiral generating network of the present embodiment outperforms Boundless and SRN in almost all cases. It should be noted that the performance of the hypothetical graph generation network (as a cGAN) is worse than that of the final spiral generation network, and the fraction of the FID shows extremely poor, which proves that the hypothetical graph generation network has poor visual effect.

TABLE 2 quantitative evaluation results in various cases

To compare the authenticity of the extrapolated output, this example was subjected to a user study of the A/B paired test. The setup of this embodiment is similar to SRN. For each dataset, this example randomly selects 40 pairwise results from the same input, inferred from the spiral generating network and Boundless and the spiral generating network and SRN, respectively. The user is required to select a more realistic image from each pair within an unlimited time. Each pair being judged by at least 3 different users. The results shown in table 3 verify that the spiral generating network of the present embodiment performs better than Boundless and SRN on all available datasets.

Table 3 results show

(2) Qualitative evaluation

This example shows the results of Boundless, SRN, and the qualitative evaluation of the spiral-generated network of this example in fig. 5, where: (a) four-sided extrapolation results on CelebA-HQ, Stanford Cars, CUB and Flowers datasets; (b) bilateral extrapolation on cityscaps and Place365 Sky datasets; (c) one-sided extrapolation on Paris street View and Place365 desk Road dataset. The method of the embodiment infers a more reasonable generation result from semantic consistency and vivid details, and avoids meaningless content and chaotic background. Furthermore, FIG. 6 (results of the case of unknown boundary distances on CelebA-HQ datasets) and Table 3(CelebA-HQ) show that the case of unknown boundary distances in the spiral generation network of the present embodiment also works well.

(3) Necessity of helix

The spiral architecture of the present embodiment is essential for image extrapolation tasks such as the following three aspects, requiring various ablation experiments to verify the necessity of each (results are shown in table 4 and fig. 7). Fig. 7 shows the results of qualitative assessment of the spiral necessity ablation experiment, in which: (a) a, one-to-one; (b) a, horizontally and vertically; (c) a, vertically and horizontally; (d) b, no nearest neighbor slice; (e) b, sub-image slices exist; (f) c, simultaneously; (g) c, horizontal and vertical; (h) c, vertical and horizontal; (i) a spiral generating network, rotating anticlockwise; (j) the spiral generates a network, rotating clockwise.

A. And (3) extrapolation circle by circle: (1) one-to-one directional extrapolation (a. one-to-one); (2) horizontal and then vertical extrapolation (a. horizontal vertical); (3) vertical and then horizontal extrapolation (a. vertical horizontal). One example in fig. 7 shows that breaking the growth of slices in four directions will result in generator incoordination of horizontally small slices and vertically large slices, resulting in a semantic inconsistency of content obfuscation.

B. Dependence of directional slices in neighboring circles: (1) no nearest neighbor slice input (b. no nearest neighbor slice); (2) the nearest neighbor slice is replaced with the sub-picture slice (b. As shown in fig. 7d and 7e, blurred detail and unrealistic texture are displayed on portions away from the original sub-image area.

C. Correlation between adjacent slices: (1) four directional slices are generated simultaneously (c. simultaneously); (2) generate horizontal then vertical slices (c. horizontal vertical); (3) vertical then horizontal slices (c. vertical horizontal) were generated. Fig. 7f, 7g and 7h illustrate that some of the slices in a spiral turn are affected by non-continuous slice generation. While the spiral generation network can generate either counterclockwise or clockwise (default) slices in one revolution, clockwise performs better in both quantitative (table 4) and qualitative (fig. 7) evaluations, indicating that this embodiment is effective for image inference in a spiral fashion.

TABLE 4 quantitative evaluation results of spiral necessity ablation experiments

(4) Analysis of slice generation networks

The ternary slice generates the input to the network. The slice generation network of this embodiment adopts a new "encoder-adaptive instance normalization module-spatial adaptive normalization module-decoder" structure design, encodes (encoder) the phantom slice into the latent space, and fuses the pattern information (adaptive instance normalization) in the sub-image with the latent encoding, and then fuses (spatial adaptive normalization module) with the semantic information in the nearest neighbor slice when the synthesized latent encoding is decoded back into the image space (decoder), thereby obtaining an extrapolated slice with pattern, semantic, and context consistency.

This embodiment generates a network input for the ternary slice: the phantom slices, sub-images and nearest neighbor slices were subjected to ablation studies to verify their efficacy and corresponding structures. This embodiment constructs a reference, which is an encoded self-encoding network with an imaginary slice as the only input, and then adds the sub-image and the nearest neighbor slice for comparison using the "encoder-adaptive instance normalization module-decoder" and the "encoder-spatially adaptive normalization module-decoder", respectively. In addition, the present embodiment also swaps the phantom slice and the nearest neighbor slice for further analysis. Table 5 and fig. 8 show the results of ablation experiments on ternary slice generation network inputs, demonstrating the advantages of the present embodiment configuration. In fig. 8: (a) a reference; (b) adding sub-images; (c) adding nearest neighbor slices; (d) pseudo slice and nearest neighbor slice exchange; (e) the spiral generates a network.

Visually, the pattern between the sub-image and the generated slice looks more harmonious than the input sub-image (fig. 8a and 8 b); by inputting nearest neighbor slices, the resulting slices also semantically appear to be more consistent with the sub-images (fig. 8a and 8 c); if the present embodiment swaps an imaginary slice and a nearest neighbor slice, a semantically inconsistent distorted content will appear (fig. 8 d); with ternary input, the performance of the spiral generating network is improved (fig. 8 e).

TABLE 5 quantitative rating results of ternary slicing generation network input ablation experiments

Method	PSNR	SSIM	FID
				Datum	13.95	0.5683	26.10
Adding sub-images	14.02	0.5652	24.41
				Adding nearest neighbor slices	14.18	0.5727	24.64
Imaginary slice and nearest neighbor slice	13.98	0.5743	26.86
				Spiral generating network	14.31	0.5775	23.64

Different slice sizes. Then, this example investigated the effect of slice size τ and conducted ablation experiments using five different sizes of τ ═ {4,8,16,32,64 }. The results of table 6 and fig. 9 show that small size slices may result in unclear texture (fig. 9a) and large size slices may result in more pronounced stitching blockiness (fig. 9 d). Considering both effectiveness and efficiency, the present embodiment sets τ to 32.

TABLE 6 quantitative evaluation results of ablation experiments with different slice sizes of the slice generation network

τ	4	8	16	32	64
						PSNR	13.58	13.70	13.80	14.05	14.31
SSIM	0.5486	0.5566	0.5632	0.5694	0.5775
						FID	27.11	26.14	23.64	21.07	20.59

(5) Efficacy of hue-color loss

This example finally analyzes the efficacy of hue-color loss. For convenience, this embodiment performed experiments on Flowers and Stanford Cars datasets using the hypothetical graph generation network by deleting hue-color loss as a reference and comparing the color loss with the L1 loss, and the results were referenced to table 7 and fig. 10. In fig. 10: (a) and (b) show qualitative assessment results on the Flowers and Stanford Cars datasets, respectively, from left to right for baseline, color loss, L1 loss, hue-color loss, respectively; (c) in step-wise training, the corresponding total loss curves on the Flowers and Stanford Cars datasets.

The results in table 7 show that the hue-color loss is more effective in terms of PSNR, SSIM and FID. Fig. 10a and 10b show the progressive inference in training with different penalties, from which this embodiment observes that dark and light colored dots appear in the benchmark, with L1 penalties and color penalties that may mitigate one of the problems, while the hue-color penalties of this embodiment deal well with both problems. The corresponding total loss curve in fig. 10c shows that by using hue-color loss, the total loss will initially drop very fast, so that the model can identify the correct color and thus stabilize the training process.

TABLE 7 quantitative evaluation results of network ablation experiments generated from hypothetical graphs of different losses

(6) Conclusion

In an experiment, aiming at different extrapolation situations, the method is tested by facing to various object and scene data sets, and an experiment result shows that the spiral generation network provided by the method achieves the most excellent image extrapolation quality at present. In addition, a series of ablation contrast tests are carried out in the embodiment, and the effectiveness of the detail design of the spiral generation network is verified.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A high quality image extrapolation system based on a spiral generation network, wherein the spiral generation network comprises a hypothetical graph generation network, a slice generation network, a spiral discriminator, and an extrapolation discriminator;

the hypothetical graph generation network comprises a hypothetical graph generator and a hypothetical graph discriminator, and the hypothetical graph generation network is used for generating the hypothetical graph of the input sub-image under the countermeasure of a hypothetical graph target loss function of the hypothetical graph generator and the hypothetical graph discriminator; the generator of the hypothetical graph adopts a generator structure of cycleGAN, and the discriminator of the hypothetical graph adopts a discriminator structure of Pix2 Pix;

the slice generation network comprises a slice operator, a slice generator and an extrapolation operator; the slice operator is used for cutting the hypothetical image into hypothetical image slices and cutting the extrapolated image of the current spiral point into nearest neighbor slices; the slice generator is configured to generate extrapolated slices from the nearest neighbor slice, the hypothetical graph slice, and the input sub-image; the extrapolation operator is used for stitching the extrapolated slice back to the extrapolated image of the current spiral point to obtain the extrapolated image of the next spiral point, so that the spiral extrapolation of one time is completed; the slice generator comprises an encoder, an adaptive instance normalization module, a spatial adaptive normalization module and a decoder; inputting the hypothetical image slice to the encoder, fusing the style of the input sub-image in its hidden variable space by an adaptive instance normalization module, and then introducing semantic information of the nearest neighbor slice in the decoder by a spatial adaptive normalization module to generate a final extrapolated slice;

the spiral discriminator and the extrapolation discriminator are used for outputting a target image after spiral extrapolation for a plurality of times under the countermeasure of a spiral loss target function;

the hypothetical graph target loss function is formed by linearly combining a first immunity loss function, a hue-color loss function and a perception loss function; the expression of the hue-color loss function is:

wherein the content of the first and second substances,

is a reduced image of the real image Y,

h is the hypothetical graph, h x w is the size of the real image Y, xi and gamma are constants, and gamma is less than 1;

the first immunity to loss function

The expression of (a) is:

representing an input sub-image;

the expression of the perceptual loss function is:

wherein N is_uIs the number of elements, σ, of the feature matrix in the u-th active layer_uIs the activation characteristic matrix of the u-th layer in the pre-training model VGG-19;

the spiral loss objective function is composed of a second pair of loss-resisting functions

The L1 loss function, the style loss function and the hue-color loss function are linearly combined;

the expression of the second immunity to loss function is:

representing the target image generated by the spiral,

to represent

E denotes the extrapolated area of the real image Y,

to represent

The extrapolation area of, i.e.

Is shown and

and an extrapolated shielding layer of equal size as Y;

the expression of the L1 loss function is:

the expression of the style loss function is:

wherein the content of the first and second substances,

2. A high quality image extrapolation system based on spiral generation network as claimed in claim 1 wherein the hypothetical target loss function is expressed as:

wherein the content of the first and second substances,

is used for balancing

Three types of lost weights.

3. A high quality image extrapolation system based on a spiral generation network as claimed in claim 2, characterized by: and superposing the input sub-images with an extrapolation covering layer M and uniformly distributed noise Z, and inputting the superposition result into the hypothetical graph generation network to obtain the hypothetical graph.

4. A high quality image extrapolation system based on a spiral generation network as claimed in claim 2, wherein the expression of the spiral loss objective function is:

wherein the content of the first and second substances,

These four types of lost weights.