CN111915703A

CN111915703A - Image generation method and device

Info

Publication number: CN111915703A
Application number: CN201910390118.2A
Authority: CN
Inventors: 陈培; 刘奎龙; 刘宸寰; 唐浩超; 向为; 高暐玥; 陈鹏; 杨昌源
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2020-11-10
Anticipated expiration: 2039-05-10
Also published as: CN111915703B

Abstract

The present disclosure relates to an image generation method and apparatus. The method comprises the following steps: receiving a first semantic segmentation image input by a user, wherein the first semantic segmentation image comprises at least one type of target scene object; determining a first edge line image corresponding to the first semantic segmentation image, wherein the first edge line image comprises edge information of each type of target scene object; and inputting the first semantic segmentation image and the first edge line image into a first condition to generate an antagonistic neural network model, and generating an image with a target style corresponding to the first semantic segmentation image, wherein the first condition generated antagonistic neural network model is obtained by training a plurality of sample images with the target style. The method and the device can realize the segmentation of the image according to the semantics input by the user, and can quickly generate the image which has a target style and rich and complete scene content for the user.

Description

Image generation method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image generation method and apparatus.

Background

With the aging of new generation machine learning algorithms represented by deep learning, artificial intelligence plays an increasingly important role in the fields of intelligent medical treatment, unmanned driving, intelligent education and the like. At present, artificial intelligence assists drawing is the research focus in the artificial intelligence field.

In the prior art, artificial intelligence assisted drawing systems can be divided into the following three categories: the first type is that the user is helped to finish simple drawing operation by executing rule instructions, and because the rule instructions cannot cover all possible situations, the system can only finish drawing image output under certain specific situations; the second type is that a deep neural network is used for simulating a painting style to output an image, for example, a common image is converted into an image with a certain style based on a style migration algorithm, and the system only simply migrates image textures and cannot realize innovation of image contents; the third type is to learn a large amount of image data based on a deep learning algorithm to obtain the potential probability distribution of the image data, so that images which do not exist in a training set can be generated.

Therefore, an image generation method is needed to assist a user in generating an image with a target style, high quality and rich content.

Disclosure of Invention

In view of this, the present disclosure provides an image generation method and apparatus, so that an image can be segmented according to semantics input by a user, and an image with a target style and rich and complete scene content is quickly generated for the user.

According to a first aspect of the present disclosure, there is provided an image generation method including: receiving a first semantic segmentation image input by a user, wherein the first semantic segmentation image comprises at least one type of target scene object; determining a first edge line image corresponding to the first semantic segmentation image, wherein the first edge line image comprises edge information of each type of target scene object; and inputting the first semantic segmentation image and the first edge line image into a first condition to generate an antagonistic neural network model, and generating an image with a target style corresponding to the first semantic segmentation image, wherein the first condition generated antagonistic neural network model is obtained by training a plurality of sample images with the target style.

In one possible implementation, the generating a confrontation neural network model from the first condition trained from the plurality of sample images by: performing semantic segmentation on each sample image, and determining a second semantic segmentation image of the sample image, wherein the second semantic segmentation image comprises multiple types of scene objects in the sample image; performing edge detection on each sample image, and determining a second edge line image of the sample image, wherein the second edge line image comprises edge information of each type of scene object in the sample image; and training to obtain the first condition to generate a confrontation neural network model according to the plurality of sample images, and the second semantic segmentation image and the second edge line image of each sample image.

In one possible implementation manner, determining a first edge line image corresponding to the first semantically segmented image includes: and determining the first edge line image according to the second semantic segmentation image and the second edge line image of each sample image.

In one possible implementation, segmenting the image according to the second semantic meaning of each sample image and determining the first edge line image according to the second semantic meaning of each sample image includes: determining a first spatial structure feature of the first semantically segmented image; performing dimension reduction processing on the first space structure feature, and determining a first feature point corresponding to the first semantic segmentation image in a feature space model, wherein the dimension of the dimension-reduced first space structure feature is the same as that of the feature space model, and the feature space model comprises a second feature point corresponding to a second semantic segmentation image of each sample image; determining a second semantic segmentation image corresponding to a second feature point, of which the Euclidean distance between the first feature point and the second feature point is smaller than or equal to a threshold value, in the feature space model as a third semantic segmentation image; and determining the first edge line image according to a second edge line image corresponding to the same sample image as the third semantic segmentation image.

In one possible implementation, the feature space model is built by: training to obtain a self-encoder model according to a second semantic segmentation image of each sample image, wherein the self-encoder model is used for extracting the spatial structure characteristics of the semantic segmentation images; determining a second spatial structure feature of a second semantically segmented image of each sample image according to the self-encoder model; and performing dimension reduction processing on the second space structure characteristic of the second semantic segmentation image of each sample image to obtain the characteristic space model.

In one possible implementation, determining a first spatial structure feature of the first semantically segmented image comprises: determining the first spatial structure feature according to the self-encoder model.

In one possible implementation, the dimension reduction processing is performed on the spatial structure features, and includes: and performing dimensionality reduction on the spatial structure features by using a PCA algorithm.

In one possible implementation, segmenting the image according to the second semantic meaning of each sample image and determining the first edge line image according to the second semantic meaning of each sample image includes: and inputting the first semantic segmentation image into a second condition generation antagonistic neural network model, and determining the first edge line image, wherein the second condition generation antagonistic neural network model is obtained by training according to a second semantic segmentation image and a second edge line image of each sample image.

In one possible implementation, the target style is an animation style.

According to a second aspect of the present disclosure, there is provided an image generation apparatus comprising: the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a first semantic segmentation image input by a user, and the first semantic segmentation image comprises at least one type of target scene object; the first determining module is configured to determine a first edge line image corresponding to the first semantic segmentation image, where the first edge line image includes edge information of each type of target scene object; and the generation module is used for inputting the first semantic segmentation image and the first edge line image into a first condition to generate an antagonistic neural network model, and generating an image with a target style corresponding to the first semantic segmentation image, wherein the first condition generated antagonistic neural network model is obtained by training according to a plurality of sample images with the target style.

In one possible implementation, the apparatus further includes: the second determining module is used for performing semantic segmentation on each sample image and determining a second semantic segmentation image of the sample image, wherein the second semantic segmentation image comprises a plurality of types of scene objects in the sample image; a third determining module, configured to perform edge detection on each sample image, and determine a second edge line image of the sample image, where the second edge line image includes edge information of each type of scene object in the sample image; and the first model training module is used for training to obtain the first condition to generate the confrontation neural network model according to the plurality of sample images, and the second semantic segmentation image and the second edge line image of each sample image.

In one possible implementation manner, the first determining module includes: and the first determining submodule is used for determining the first edge line image according to the second semantic segmentation image and the second edge line image of each sample image.

In one possible implementation, the first determining sub-module includes: a first determining unit, configured to determine a first spatial structure feature of the first semantically segmented image; a second determining unit, configured to perform dimension reduction processing on the first spatial structure feature, and determine a first feature point corresponding to the first semantic segmentation image in a feature space model, where the dimension of the first spatial structure feature after dimension reduction is the same as that of the feature space model, and the feature space model includes a second feature point corresponding to a second semantic segmentation image of each sample image; a third determining unit, configured to determine, as a third semantic segmentation image, a second semantic segmentation image corresponding to a second feature point in the feature space model, where an euclidean distance between the second feature point and the first feature point is less than or equal to a threshold; and the fourth determining unit is used for determining the first edge line image according to a second edge line image of the sample image corresponding to the third semantic segmentation image.

In one possible implementation, the apparatus further includes: the second model training module is used for training to obtain a self-encoder model according to a second semantic segmentation image of each sample image, and the self-encoder model is used for extracting the spatial structure characteristics of the semantic segmentation images; a fourth determining module, configured to determine, according to the self-encoder model, a second spatial structure feature of a second semantic segmentation image of each sample image; and the fifth determining module is used for performing dimension reduction processing on the second space structure characteristic of the second semantic segmentation image of each sample image to determine the characteristic space model.

In a possible implementation manner, the first determining unit is specifically configured to: determining the first spatial structure feature according to the self-encoder model.

In one possible implementation, the apparatus further includes: and the data processing module is used for performing dimension reduction processing on the space structure characteristics by using a PCA algorithm.

In a possible implementation manner, the first determining submodule is specifically configured to: and inputting the first semantic segmentation image into a second condition generation antagonistic neural network model, and determining the first edge line image, wherein the second condition generation antagonistic neural network model is obtained by training according to a second semantic segmentation image and a second edge line image of each sample image.

In one possible implementation, the target style is an animation style.

According to a third aspect of the present disclosure, there is provided an image generating apparatus comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the image generation method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the image generation method of the first aspect described above.

The method comprises the steps of receiving a first semantic segmentation image which is input by a user and comprises at least one type of target scene object, determining a first edge line image corresponding to the first semantic segmentation image, wherein the first edge line image comprises edge information of each type of target scene object, inputting the first semantic segmentation image and the first edge line image into a first condition to generate an antagonistic neural network model, generating an image which corresponds to the first semantic segmentation image and has a target style, and generating the antagonistic neural network model by the first condition according to a plurality of sample images which have the target style, so that the image can be segmented according to the semantic input by the user, and the image which has the target style and is rich and complete in scene content can be generated for the user quickly.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a schematic flow diagram of an image generation method of an embodiment of the present disclosure;

FIG. 2 illustrates a schematic diagram of a first semantically segmented image of an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a first edge line image corresponding to the first semantically segmented image shown in FIG. 2 according to an embodiment of the disclosure;

FIG. 4 is a schematic diagram of an animation style image corresponding to the first semantically segmented image shown in FIG. 2 according to an embodiment of the disclosure;

fig. 5 shows a schematic structural diagram of an image generation apparatus according to an embodiment of the present disclosure;

fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Drawing is an activity with freedom, openness, expressiveness and artistry, and artificial intelligence assisted drawing is always a research hotspot in the field of artificial intelligence. At present, as stated in the background section, the existing artificial intelligence assisted drawing system cannot generate images with rich scene contents and complete scene contents.

The image generation method provided by the disclosure can be applied to artificial intelligence auxiliary drawing scenes, so that the image which has a target style and rich and complete scene content can be quickly generated for a user only by inputting the semantic segmentation image by the user. The following describes the image generation method provided by the present disclosure in detail, taking the generation of an image having a cartoon style as an example. It should be understood by those skilled in the art that the cartoon style is only one example of the application scene of the present disclosure, and does not constitute a limitation to the present disclosure, and the image generation method provided by the present disclosure may also be applied to generate application scenes with images of other styles.

Fig. 1 shows a schematic flow diagram of an image generation method according to an embodiment of the present disclosure. As shown in fig. 1, the method may include:

step S11, receiving a first semantic segmentation image input by a user, where the first semantic segmentation image includes at least one type of object in the target scene.

Step S12, determining a first edge line image corresponding to the first semantic segmentation image, where the first edge line image includes edge information of each type of target scene object.

Step S13 is to input the first semantic segmentation image and the first edge line image into a first condition generating an antagonistic neural network model, and generate an image having a target style corresponding to the first semantic segmentation image, where the first condition generating the antagonistic neural network model is trained from a plurality of sample images having the target style.

Before generating the image with the target style, training a plurality of sample images with the target style to obtain a first condition generation antagonistic neural network model, wherein the first condition generation antagonistic neural network model can be used for generating the image with the target style.

In one possible implementation, the method for generating the antagonistic neural network model according to the first condition obtained by training the plurality of sample images comprises the following steps: performing semantic segmentation on each sample image, and determining a second semantic segmentation image of the sample image, wherein the second semantic segmentation image comprises a plurality of types of scene objects in the sample image; performing edge detection on each sample image, and determining a second edge line image of the sample image, wherein the second edge line image comprises edge information of each type of scene object in the sample image; and training to obtain a first condition to generate a confrontation neural network model according to the plurality of sample images, and the second semantic segmentation image and the second edge line image of each sample image.

In one possible implementation, the target style is an animation style.

The following description will be given in detail by taking an example of generating an antagonistic neural network model based on a first condition trained from a plurality of sample images having an animation style (target style).

First, a plurality of sample images having a cartoon style (target style) are acquired. For example, a plurality of sample images having a cartoon style (target style) are obtained by image-capturing a cartoon movie and/or a cartoon series.

In an example, the target style may further finely divide the animation styles to obtain different types of animation styles. Such as japanese cartoon style, american cartoon style, chinese cartoon style, etc. And obtaining a plurality of sample images with the same type of cartoon styles for model training.

In one example, a plurality of sample images are screened, and only landscape sample images are selected for model training.

And secondly, performing semantic segmentation on each sample image, and determining a second semantic segmentation image of each sample image. For example, for each sample image, semantic segmentation labeling is performed on various scene objects in the sample image, and different colors can be used to represent different types of scene objects, so as to obtain a second semantic segmentation image.

Each color area in the semantic segmentation image corresponds to one type of scene object, and semantic information of the image can be provided.

In one example, 9 categories of scene objects may be defined: sky, mountains, trees, grass, buildings, rivers, roads, rocks, others (scene objects that do not fall into the above 8 categories, e.g., people, vehicles, etc.).

And thirdly, carrying out edge detection on each sample image, and determining a second edge line image of each sample image.

In one example, a canny edge detection algorithm is used to perform edge detection on each sample image, so as to obtain a second edge line image of each sample image.

When the edge detection is performed on each sample image, other edge detection algorithms may be used in addition to the canny edge detection algorithm, which is not specifically limited in this disclosure.

The edge line image includes edge information of each type of scene object, and can provide detail information of the image.

And fourthly, scaling or clipping each sample image and the second semantic segmentation image and the second edge line image corresponding to each sample image.

Because different sample images may be derived from different multimedia data (animation movies or animation series), in order to ensure that the sizes of images subjected to model training are consistent, each sample image, and the second semantic segmentation image and the second edge line image corresponding to each sample image are scaled or cropped to obtain an image data set with consistent size.

And if the sizes of each sample image and the second semantic segmentation image corresponding to each sample image are consistent with the size of the second edge line image, the fourth step of scaling or clipping is not needed.

And fifthly, performing model training by using a data group of 'sample image-sample image second semantic segmentation image-sample image second edge line image' as training data for each sample image, and learning the mapping relation among the sample image, the sample image second semantic segmentation image and the sample image second edge line image.

In an example, the first condition generating the antagonistic neural network model includes a generator G and a discriminator D.

Second semantically segmented image x for any "sample image y-sample image₁-a second edge line image x of the sample image₂"the data set details the model training process.

First, a second semantic segmentation image x of the sample image is divided₁And a second edge line image x₂The generator G is input.

Next, generator G segments image x according to a second semantic₁Second edge line image x₂And a random vector z, and generates an image y' having a cartoon style (target style), i.e., G(x₁,x₂,z)→y'。

Next, the discriminator D judges whether the generated image y' is the same as the sample image y, and returns the judgment result to the generator G, so that the generator G improves its own generation capability.

And finally, playing a plurality of games through the generator G and the discriminator D, namely minimizing the following objective functions:

and optimizing the generating capability of the generator G, and training to obtain a first condition to generate the antagonistic neural network model.

After the artificial intelligence auxiliary drawing system is trained to obtain a first condition to generate the antagonistic neural network model, when a user wants to generate an image with an animation style (target style) through the artificial intelligence auxiliary drawing system, the user inputs a first semantic segmentation drawing comprising at least one type of target scene object into the artificial intelligence auxiliary drawing system based on drawing creation requirements.

Fig. 2 illustrates a schematic diagram of a first semantically segmented image of an embodiment of the present disclosure. As shown in fig. 1, the first semantically segmented image includes five types of target scene objects: sky, mountain, tree, grass, river. In the first semantically segmented image, different classes of target scene objects may be represented in different colors.

In one possible implementation manner, determining a first edge line image corresponding to a first semantic segmentation image includes: and determining the first edge line image according to the second semantic segmentation image and the second edge line image of each sample image.

The manner of determining the first edge line image from the second semantic segmentation image and the second edge line image of each sample image includes at least two of the following.

The first method comprises the following steps:

in one possible implementation, segmenting the image and the second edge line image according to the second semantic meaning of each sample image, and determining the first edge line image includes: determining a first spatial structure feature of a first semantically segmented image; performing dimensionality reduction on the first space structure characteristic, and determining a first characteristic point corresponding to the first semantic segmentation image in a characteristic space model, wherein the dimensionality of the first space structure characteristic subjected to dimensionality reduction is the same as that of the characteristic space model, and the characteristic space model comprises a second characteristic point corresponding to a second semantic segmentation image of each sample image; determining a second semantic segmentation image corresponding to a second feature point, of which the Euclidean distance from the first feature point is smaller than or equal to a threshold value, in the feature space model as a third semantic segmentation image; and determining the first edge line image according to the second edge line image corresponding to the same sample image as the third semantic segmentation image.

Before determining the first edge line image from the second semantic segmentation image and the second edge line image of each sample image, establishing a feature space model from the second semantic segmentation image of each sample image.

In one possible implementation, the feature space model is built by: training to obtain a self-encoder model according to a second semantic segmentation image of each sample image, wherein the self-encoder model is used for extracting the spatial structure characteristics of the semantic segmentation images; determining a second spatial structure feature of a second semantic segmentation image of each sample image according to the self-encoder model; and performing dimension reduction processing on the second space structure characteristic of the second semantic segmentation image of each sample image to obtain a characteristic space model.

In one possible implementation, the dimension reduction processing is performed on the spatial structure features, and includes: the spatial structure features are subjected to dimensionality reduction using a Principal Component Analysis (PCA) algorithm.

And training according to the second semantic segmentation image of each sample image to obtain a self-encoder model for extracting the spatial structure characteristics of the semantic segmentation image. And extracting a second spatial structure characteristic of a second semantic segmentation image of each sample image according to the self-encoder model. And performing dimensionality reduction processing on the second space structure feature of the second semantic segmentation image of each sample image through a PCA algorithm to construct a feature space model, namely determining a corresponding second feature point of the second semantic segmentation image of each sample image in the feature space model.

In one possible implementation, determining a first spatial structure feature of a first semantically segmented image includes: a first spatial structure feature is determined based on the self-coder model.

For a first semantically segmented image input by a user, a first spatial structure feature of the first semantically segmented image may be determined using the self-encoder model trained from the second semantically segmented image of each sample image.

After the first spatial structure feature is determined, a PCA algorithm may be used to perform dimension reduction on the first spatial structure feature, and a corresponding first feature point of the first semantic segmentation image in the feature space model is determined, where the dimension of the first spatial structure feature after dimension reduction is the same as that of the feature space model.

In an example, the euclidean distance between feature points in the feature space model may represent the spatial structure difference between the semantically segmented images corresponding to the feature points.

In an example, in the feature space model, a second semantic segmentation image corresponding to a second feature point whose euclidean distance between first feature points is equal to or less than a threshold value is determined as a third semantic segmentation image. Namely, the spatial structure similarity between the third semantically segmented image and the first semantically segmented image is high. The size of the threshold can be determined according to actual conditions, and the specific value of the threshold is not limited in the disclosure.

For example, the threshold is x, and in the feature space model, the euclidean distance between a first feature point a corresponding to the first semantic segmentation image a and a second feature point corresponding to the second semantic segmentation image of each sample image is determined, and then 3 second feature points whose euclidean distance from the first feature point a is less than or equal to the threshold x are determined: and determining a second semantic segmentation image B corresponding to the second feature point B, a second semantic segmentation image C corresponding to the second feature point C and a second semantic segmentation image D corresponding to the second feature point D as third semantic segmentation images.

In one example, the second semantic segmentation image corresponding to a preset number of second feature points closest to the distance between the first feature points is determined as the third semantic segmentation image. The preset number can be determined according to actual conditions, and the preset number is not particularly limited in the present disclosure.

For example, the preset number is 6, and in the feature space model, the euclidean distance between the first feature point and the second feature point corresponding to the second semantic segmentation image of each sample image is determined, and then the 6 second feature points closest to the euclidean distance between the first feature point a are determined: and determining a second semantic segmentation image B corresponding to the second feature point B, a second semantic segmentation image C corresponding to the second feature point C, a second semantic segmentation image D corresponding to the second feature point D, a second semantic segmentation image E corresponding to the second feature point E, a second semantic segmentation image F corresponding to the second feature point F and a second semantic segmentation image H corresponding to the second feature point H as third semantic segmentation images.

After the third semantic segmentation images are determined, an edge line image material library is determined according to second edge line images corresponding to the same sample image with each third semantic segmentation image. Since the third semantic segmentation image is most similar to the first semantic segmentation image in spatial structure, the first edge line image corresponding to the first semantic segmentation image may be determined based on the second edge line image in the edge line image material library.

And the second method comprises the following steps:

in one possible implementation, segmenting the image and the second edge line image according to the second semantic meaning of each sample image, and determining the first edge line image includes: and inputting the first semantic segmentation image into a second condition to generate a confrontation neural network model, and determining a first edge line image, wherein the second condition generated confrontation neural network model is obtained by training according to a second semantic segmentation image and a second edge line image of each sample image.

And for each sample image, performing model training by using a data group of 'second semantic segmentation image of the sample image-second edge line image of the sample image' as training data, learning a mapping relation between the second semantic segmentation image of the sample image and the second edge line image of the sample image, and obtaining a second condition to generate the antagonistic neural network model.

And inputting the first semantic segmentation image input by the user into a second condition to generate a confrontation neural network model, and then generating a first edge line image corresponding to the first semantic segmentation image.

Fig. 3 is a schematic diagram of a first edge line image corresponding to the first semantic segmentation image shown in fig. 2 according to an embodiment of the disclosure.

After the first edge line image corresponding to the first semantic division image input by the user is determined, the first semantic division image and the first edge line image are input to the first condition to generate the antagonistic neural network model, and an image having the animation style (target style) corresponding to the first semantic division image can be generated.

Fig. 4 is a schematic diagram illustrating an image with a cartoon style corresponding to the first semantic segmentation image illustrated in fig. 2 according to an embodiment of the disclosure.

The method comprises the steps of receiving a first semantic segmentation image which is input by a user and comprises at least one type of target scene object, determining a first edge line image corresponding to the first semantic segmentation image, wherein the first edge line image comprises edge information of each type of target scene object, inputting the first semantic segmentation image and the first edge line image into a first condition to generate an antagonistic neural network model, generating an image which corresponds to the first semantic segmentation image and has a target style, and generating the antagonistic neural network model by the first condition according to a plurality of sample images which have the target style, so that the image can be segmented according to the semantic input by the user, and the image which has the target style and is rich and complete in content can be generated for the user quickly.

Fig. 5 shows a schematic structural diagram of an image generation apparatus according to an embodiment of the present disclosure. The apparatus 50 shown in fig. 5 may be used to perform the steps of the method embodiment shown in fig. 1, the apparatus 50 comprising:

a receiving module 51, configured to receive a first semantic segmentation image input by a user, where the first semantic segmentation image includes at least one type of target scene object;

the first determining module 52 is configured to determine a first edge line image corresponding to the first semantic segmentation image, where the first edge line image includes edge information of each type of target scene object;

the generating module 53 is configured to input the first semantic segmentation image and the first edge line image into a first condition to generate an antagonistic neural network model, and generate an image with a target style corresponding to the first semantic segmentation image, where the first condition generation antagonistic neural network model is trained from a plurality of sample images with the target style.

In one possible implementation, the apparatus 50 further includes:

the second determining module is used for performing semantic segmentation on each sample image and determining a second semantic segmentation image of the sample image, wherein the second semantic segmentation image comprises a plurality of types of scene objects in the sample image;

the third determining module is used for performing edge detection on each sample image and determining a second edge line image of the sample image, wherein the second edge line image comprises edge information of each type of scene object in the sample image;

and the first model training module is used for training to obtain a first condition to generate the confrontation neural network model according to the plurality of sample images and the second semantic segmentation image and the second edge line image of each sample image.

In one possible implementation, the first determining module 52 includes:

and the first determining submodule is used for determining the first edge line image according to the second semantic segmentation image and the second edge line image of each sample image.

In one possible implementation, the first determining sub-module includes:

a first determining unit, configured to determine a first spatial structure feature of a first semantically segmented image;

the second determining unit is used for performing dimension reduction processing on the first space structure feature and determining a first feature point corresponding to the first semantic segmentation image in the feature space model, the dimension of the dimension-reduced first space structure feature is the same as that of the feature space model, and the feature space model comprises a second feature point corresponding to the second semantic segmentation image of each sample image;

a third determining unit, configured to determine a second semantic segmentation image corresponding to a second feature point, where an euclidean distance between the second feature point and the first feature point in the feature space model is less than or equal to a threshold, as a third semantic segmentation image;

and the fourth determining unit is used for determining the first edge line image according to the second edge line image of the sample image corresponding to the third semantic segmentation image.

In one possible implementation, the apparatus 50 further includes:

the second model training module is used for segmenting the image according to the second semantic of each sample image and training to obtain a self-encoder model, and the self-encoder model is used for extracting the spatial structure characteristics of the semantic image;

the fourth determining module is used for determining a second space segmentation space structure characteristic of a second semantic segmentation image of each sample image according to the self-encoder model;

and the fifth determining module is used for performing dimension reduction processing on the second space structure characteristic of the second semantic segmentation image of each sample image to determine a characteristic space model.

In a possible implementation manner, the first determining unit is specifically configured to:

a first spatial structure feature is determined based on the self-coder model.

In one possible implementation, the apparatus 50 further includes:

and the data processing module is used for performing dimension reduction processing on the space structure characteristics by using a PCA algorithm.

In a possible implementation manner, the first determining submodule is specifically configured to:

and inputting the first semantic segmentation image into a second condition to generate a confrontation neural network model, and determining a first edge line image, wherein the second condition generated confrontation neural network model is obtained by training according to a second semantic segmentation image and a second edge line image of each sample image.

In one possible implementation, the target style is an animation style.

The apparatus 50 provided in the present disclosure can implement each step in the method embodiment shown in fig. 1, and implement the same technical effect, and is not described herein again to avoid repetition.

Fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 6, at the hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be interconnected by an internal bus, which may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.

And a memory for storing the program. In particular, the program may include program code including computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the image generating device on a logic level. The processor executes the program stored in the memory and specifically executes: receiving a first semantic segmentation image input by a user, wherein the first semantic segmentation image comprises at least one type of target scene object; determining a first edge line image corresponding to the first semantic segmentation image, wherein the first edge line image comprises edge information of each type of target scene object; and inputting the first semantic segmentation image and the first edge line image into a first condition to generate an antagonistic neural network model, generating an image with a target style corresponding to the first semantic segmentation image, and training the first condition generated antagonistic neural network model according to a plurality of sample images with the target style.

In one possible implementation, the processor is specifically configured to perform: performing semantic segmentation on each sample image, and determining a second semantic segmentation image of the sample image, wherein the second semantic segmentation image comprises a plurality of types of scene objects in the sample image; performing edge detection on each sample image, and determining a second edge line image of the sample image, wherein the second edge line image comprises edge information of each type of scene object in the sample image; and training to obtain a first condition to generate a confrontation neural network model according to the plurality of sample images, and the second semantic segmentation image and the second edge line image of each sample image.

In one possible implementation, the processor is specifically configured to perform: and determining the first edge line image according to the second semantic segmentation image and the second edge line image of each sample image.

In one possible implementation, the processor is specifically configured to perform: determining a first spatial structure feature of a first semantically segmented image; performing dimensionality reduction on the first space structure characteristic, and determining a first characteristic point corresponding to the first semantic segmentation image in a characteristic space model, wherein the dimensionality of the first space structure characteristic subjected to dimensionality reduction is the same as that of the characteristic space model, and the characteristic space model comprises a second characteristic point corresponding to a second semantic segmentation image of each sample image; determining a second semantic segmentation image corresponding to a second feature point, of which the Euclidean distance from the first feature point is smaller than or equal to a threshold value, in the feature space model as a third semantic segmentation image; and determining the first edge line image according to the second edge line image corresponding to the same sample image as the third semantic segmentation image.

In one possible implementation, the processor is specifically configured to perform: training to obtain a self-encoder model according to a second semantic segmentation image of each sample image, wherein the self-encoder model is used for extracting the spatial structure characteristics of the semantic segmentation images; determining a second spatial structure feature of a second semantic segmentation image of each sample image according to the self-encoder model; and performing dimension reduction processing on the second space structure characteristic of the second semantic segmentation image of each sample image to obtain a characteristic space model.

In one possible implementation, the processor is specifically configured to perform: a first spatial structure feature is determined based on the self-coder model.

In one possible implementation, the processor is specifically configured to perform: and performing dimensionality reduction on the spatial structure features by using a PCA algorithm.

In one possible implementation, the processor is specifically configured to perform: and inputting the first semantic segmentation image into a second condition to generate a confrontation neural network model, and determining a first edge line image, wherein the second condition generated confrontation neural network model is obtained by training according to a second semantic segmentation image and a second edge line image of each sample image.

In one possible implementation, the target style is an animation style.

The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may execute the method executed in the method embodiment shown in fig. 1, and implement the functions of the method embodiment shown in fig. 1, which are not described herein again in this specification.

Embodiments of the present specification also propose a computer-readable storage medium storing one or more programs, the one or more programs including instructions, which when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the image generation method in the embodiment shown in fig. 1, and specifically perform the steps of the embodiment of the method shown in fig. 1.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image generation method, comprising:

receiving a first semantic segmentation image input by a user, wherein the first semantic segmentation image comprises at least one type of target scene object;

determining a first edge line image corresponding to the first semantic segmentation image, wherein the first edge line image comprises edge information of each type of target scene object;

and inputting the first semantic segmentation image and the first edge line image into a first condition to generate an antagonistic neural network model, and generating an image with a target style corresponding to the first semantic segmentation image, wherein the first condition generated antagonistic neural network model is obtained by training a plurality of sample images with the target style.

2. The method of claim 1, wherein training the first condition from the plurality of sample images to generate an antagonistic neural network model comprises:

performing semantic segmentation on each sample image, and determining a second semantic segmentation image of the sample image, wherein the second semantic segmentation image comprises multiple types of scene objects in the sample image;

performing edge detection on each sample image, and determining a second edge line image of the sample image, wherein the second edge line image comprises edge information of each type of scene object in the sample image;

and training to obtain the first condition to generate a confrontation neural network model according to the plurality of sample images, and the second semantic segmentation image and the second edge line image of each sample image.

3. The method of claim 2, wherein determining the first edge line image corresponding to the first semantically segmented image comprises:

and determining the first edge line image according to the second semantic segmentation image and the second edge line image of each sample image.

4. The method of claim 3, wherein segmenting the image according to the second semantic meaning of each sample image and determining the first edge line image according to the second edge line image comprises:

determining a first spatial structure feature of the first semantically segmented image;

performing dimension reduction processing on the first space structure feature, and determining a first feature point corresponding to the first semantic segmentation image in a feature space model, wherein the dimension of the dimension-reduced first space structure feature is the same as that of the feature space model, and the feature space model comprises a second feature point corresponding to a second semantic segmentation image of each sample image;

determining a second semantic segmentation image corresponding to a second feature point, of which the Euclidean distance between the first feature point and the second feature point is smaller than or equal to a threshold value, in the feature space model as a third semantic segmentation image;

and determining the first edge line image according to a second edge line image corresponding to the same sample image as the third semantic segmentation image.

5. The method of claim 4, wherein the feature space model is built by:

training to obtain a self-encoder model according to a second semantic segmentation image of each sample image, wherein the self-encoder model is used for extracting the spatial structure characteristics of the semantic segmentation images;

determining a second spatial structure feature of a second semantically segmented image of each sample image according to the self-encoder model;

and performing dimension reduction processing on the second space structure characteristic of the second semantic segmentation image of each sample image to obtain the characteristic space model.

6. The method of claim 5, wherein determining a first spatial structure feature of the first semantically segmented image comprises:

determining the first spatial structure feature according to the self-encoder model.

7. The method according to claim 4 or 5, wherein the dimension reduction processing is performed on the spatial structure features, and comprises the following steps:

and performing dimensionality reduction on the spatial structure features by using a Principal Component Analysis (PCA) algorithm.

8. The method of claim 3, wherein segmenting the image according to the second semantic meaning of each sample image and determining the first edge line image according to the second edge line image comprises:

and inputting the first semantic segmentation image into a second condition generation antagonistic neural network model, and determining the first edge line image, wherein the second condition generation antagonistic neural network model is obtained by training according to a second semantic segmentation image and a second edge line image of each sample image.

9. The method of claim 1, wherein the target style is an animation style.

10. An image generation apparatus, comprising:

the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a first semantic segmentation image input by a user, and the first semantic segmentation image comprises at least one type of target scene object;

the first determining module is configured to determine a first edge line image corresponding to the first semantic segmentation image, where the first edge line image includes edge information of each type of target scene object;

and the generation module is used for inputting the first semantic segmentation image and the first edge line image into a first condition to generate an antagonistic neural network model, and generating an image with a target style corresponding to the first semantic segmentation image, wherein the first condition generated antagonistic neural network model is obtained by training according to a plurality of sample images with the target style.

11. The apparatus of claim 10, further comprising:

a third determining module, configured to perform edge detection on each sample image, and determine a second edge line image of the sample image, where the second edge line image includes edge information of each type of scene object in the sample image;

and the first model training module is used for training to obtain the first condition to generate the confrontation neural network model according to the plurality of sample images, and the second semantic segmentation image and the second edge line image of each sample image.

12. The apparatus of claim 11, wherein the first determining module comprises:

13. The apparatus of claim 12, wherein the first determining submodule comprises:

a first determining unit, configured to determine a first spatial structure feature of the first semantically segmented image;

a second determining unit, configured to perform dimension reduction processing on the first spatial structure feature, and determine a first feature point corresponding to the first semantic segmentation image in a feature space model, where the dimension of the first spatial structure feature after dimension reduction is the same as that of the feature space model, and the feature space model includes a second feature point corresponding to a second semantic segmentation image of each sample image;

a third determining unit, configured to determine, as a third semantic segmentation image, a second semantic segmentation image corresponding to a second feature point in the feature space model, where an euclidean distance between the second feature point and the first feature point is less than or equal to a threshold;

and the fourth determining unit is used for determining the first edge line image according to a second edge line image of the sample image corresponding to the third semantic segmentation image.

14. The apparatus of claim 13, further comprising:

the second model training module is used for training to obtain a self-encoder model according to a second semantic segmentation image of each sample image, and the self-encoder model is used for extracting the spatial structure characteristics of the semantic segmentation images;

a fourth determining module, configured to determine, according to the self-encoder model, a second spatial structure feature of a second semantic segmentation image of each sample image;

and the fifth determining module is used for performing dimension reduction processing on the second space structure characteristic of the second semantic segmentation image of each sample image to determine the characteristic space model.

15. The apparatus according to claim 14, wherein the first determining unit is specifically configured to:

16. The apparatus of claim 4 or 5, further comprising:

17. The apparatus of claim 12, wherein the first determination submodule is specifically configured to:

18. The apparatus of claim 10, wherein the target style is an animation style.

19. An image generation apparatus, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the image generation method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the image generation method of any of claims 1-9.