CN112070853A

CN112070853A - Image generation method and device

Info

Publication number: CN112070853A
Application number: CN201910498731.6A
Authority: CN
Inventors: 刘金林; 任健强
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-06-10
Filing date: 2019-06-10
Publication date: 2020-12-11

Abstract

The application discloses an image generation method, an image generation device, computer equipment and a readable storage medium. The method comprises the following steps: acquiring noise data; obtaining a characteristic image according to the noise data and the first data set; and obtaining the target image according to the characteristic image and the second data set. According to the method and the device, the noise data can be directly generated into the characteristic image through the first data set, and the image details in the characteristic image are greatly reduced compared with the target image, so that the process of generating the characteristic image by the first data set is simpler, the difficulty of optimizing related parameters in the first data set is reduced, the probability of generating the characteristic image with a stable picture by the first data set is greatly improved, then the second data set is utilized, the characteristic image is directly generated into the complete target image, and the probability of color distortion in the target image is reduced due to the fact that the picture of the characteristic image is stable, and the picture quality of the target image is improved.

Description

Image generation method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image generation method, an image generation apparatus, a computer device, and a readable storage medium.

Background

In the field of website and webpage design, the requirements on image materials are very large, if each image material is independently designed and constructed, very large workload can be generated, and the efficiency is very low, so that the image materials are usually automatically generated by adopting an artificial intelligence technology at present.

The existing method for automatically generating image materials basically inputs gaussian noise with random normal distribution into a Generative Adaptive Network (GAN) model, and the Generative adaptive network model directly outputs complete image materials.

However, in an actual application scenario, under the condition that image details of an image material are required to be more, gaussian noise is directly generated into the complete image material through the generative confrontation network model, and due to the fact that the complete image details are more, the difficulty of directly generating the complete image from the noise is higher, the generated image material is unstable, and the probability of color distortion of the image material is higher.

Content of application

In view of the above, the present application is made to provide an image generation method, apparatus, computer device and readable storage medium that overcome or at least partially solve the above problems.

In accordance with an aspect of the present application, there is provided an image generation method including:

acquiring noise data;

obtaining a characteristic image according to the noise data and the first data set; wherein the first set of data is used to generate the feature image from the noise data;

obtaining a target image according to the characteristic image and a second data set; wherein the second data set is used to generate the target image from the feature image.

In accordance with another aspect of the present application, there is provided an image generation method including:

acquiring noise data;

generating a feature image based on a first neural network according to the noise data, wherein the feature image comprises a texture feature image or an edge feature image;

and generating a target image based on a second neural network according to the characteristic image.

In accordance with another aspect of the present application, there is provided an image generating apparatus including:

a first acquisition module for acquiring noise data;

a first generation module for obtaining a feature image from the noise data and a first data set; wherein the first set of data is used to generate the target feature image from the noise data;

the second generation module is used for obtaining a target image according to the characteristic image and the second data set; wherein the second data set is used to generate the target image from the feature image.

In accordance with another aspect of the present application, there is provided an image generating apparatus, the apparatus including:

a second acquisition module for acquiring noise data;

a third generation module, configured to generate a feature image based on a first neural network according to the noise data, where the feature image includes a texture feature image or an edge feature image;

and the fourth generation module is used for generating a target image based on the second neural network according to the characteristic image.

According to another aspect of the application, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements a method according to one or more of the above.

According to another aspect of the application, a computer-readable storage medium is provided, on which a computer program is stored, characterized in that the program, when executed by a processor, implements a method as described in one or more of the above.

According to the embodiment of the application, the noise data can be directly generated into the feature image through the first data set, and the picture composition of the feature image is relatively simple, so that the image details in the feature image are greatly reduced compared with the final target image, the process of generating the feature image through the first data set is simpler, the difficulty of optimizing the relevant parameters in the first data set is reduced, the probability of generating the feature image with a stable picture through the first data set is greatly improved, further, the second data set is reused in the embodiment of the application, the feature image is directly generated into the complete target image, and the probability of color distortion in the target image is reduced due to the fact that the picture of the feature image is stable, and the picture quality of the target image is improved.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the alternative embodiments. The drawings are only for purposes of illustrating alternative embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 illustrates a system architecture diagram of an image generation method provided in accordance with the present application;

FIG. 2 illustrates an actual scene graph of an image generation method provided in accordance with the present application;

FIG. 3 shows a flow diagram of an embodiment of an image generation method according to an embodiment of the application;

FIG. 4 illustrates a detailed scene diagram of an image generation method according to an embodiment of the present application;

FIG. 5 shows a flow diagram of an embodiment of an image generation method according to an embodiment of the application;

FIG. 6 is a flowchart illustrating specific steps of an embodiment of an image generation method according to an embodiment of the present application;

FIG. 7 is a diagram illustrating a specific application scenario of another image generation method according to an embodiment of the present application;

FIG. 8 shows a flow diagram of another image generation method embodiment in accordance with an embodiment of the present application;

FIG. 9 is a flow chart illustrating particular steps of another embodiment of an image generation method according to an embodiment of the present application;

FIG. 10 is a block diagram illustrating an embodiment of an image generation method according to an embodiment of the present application;

FIG. 11 shows a block diagram of another embodiment of an image generation method according to an embodiment of the present application;

fig. 12 illustrates an exemplary system that can be used to implement various embodiments described in this disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

To enable those skilled in the art to better understand the present application, the following description is made of the concepts related to the present application:

the noise data refers to data in which an error or an anomaly (deviation from an expected value) exists, and in this embodiment, the noise data may be gaussian noise (gaussian noise), which refers to a type of noise whose probability density function follows a gaussian distribution (i.e., a normal distribution), and is also a common noise of the digital image. Common gaussian noise includes heave noise, cosmic noise, thermal noise, shot noise, and so on. In the embodiment of the present application, the gaussian noise may exist in the form of a noise image, and may be processed by the first data set into a feature image of a preset size. Of course, in the embodiment of the present application, other types of noise, such as rayleigh noise, gamma noise, etc., may also be used.

The feature images (image features) mainly include texture feature images and edge feature images. The texture feature image describes surface properties of a scene corresponding to the image or the image area, reflects visual features of a homogeneous phenomenon in the image, and embodies the surface structure organization arrangement attribute with slow change or periodic change of the surface of an object in the image. Two elements that make up the texture feature image are as follows:

(1) texture primitives: the most basic units in an image are pixels, and a set of pixels with certain shapes and sizes, such as round spots, block spots, patterns of figured cloth, and the like, and the combination of a plurality of image elements with certain shapes and sizes is called texture elements.

(2) The texture primitive is arranged and combined into a texture feature image. The difference in the density, periodicity, directionality, etc. of the texel arrangement can cause the appearance of the image to change greatly.

The edge feature image is used for representing the edge with obvious change or discontinuous areas in an image, and since the edge is the boundary line between different areas in an image, an edge image can be a binary image, and the purpose of edge detection is to capture the area with sharp change of brightness. In an ideal case, edge detection is performed on the image, and an edge feature image composed of a series of continuous curves can be obtained to represent the boundary of the object.

It should be noted that an entire image may include an edge feature image and a texture feature image in pairs, for example, if an image material of a leaf is finally required to be obtained, the generated feature image may include the edge feature image of the leaf and the texture feature image of the leaf.

It is to be understood that the edge feature image may be a boundary line of an object in the image, and the texture feature image may be an arrangement property of a surface structure of the object in a region formed between the boundary lines in the image. In the embodiment of the application, the edge feature image and the texture feature image can exist in the form of the edge feature image and the texture feature image, and one edge feature image and one corresponding texture feature image are combined to obtain a complete image material.

The first data set and the second data set can respectively comprise one or more mathematical models, each mathematical model is a scientific or engineering model constructed by using a mathematical logic method and a mathematical language, each mathematical model is a mathematical structure which is generally or approximately expressed by using the mathematical language aiming at the characteristic or quantity dependency relationship of a certain object system, and the mathematical structure is a pure relational structure of a certain system which is described by means of mathematical symbols. The mathematical model may be one or a set of algebraic, differential, integral or statistical equations, and combinations thereof, by which the interrelationships or causal relationships between the variables of the system are described quantitatively or qualitatively. In addition to mathematical models described by equations, there are models described by other mathematical tools, such as algebra, geometry, topology, mathematical logic, etc., which describe the behavior and characteristics of the system rather than the actual structure of the system.

Wherein the first set of data is used to generate the feature image from noise data. The second data set is used for generating the target image according to the characteristic image. In practical applications, the machine learning method adopted by the first data set may include a generative confrontation network model, and the machine learning method adopted by the second data set may include a Convolutional Neural Network (CNN) or the like.

Referring to fig. 1, a system architecture diagram of an image generation method provided by an embodiment of the present application is shown, wherein a first data set may include a generative confrontation network model, a second data set may include a convolutional neural network model, the generative confrontation network model is mainly used for generating a feature image according to noise data, and the convolutional neural network model is mainly used for generating a target image according to the feature image.

Specifically, the generative confrontation network model may include a generator (a first sub data set in this application) and a discriminator (a second sub data set in this application), and is a deep learning model, and the generative confrontation network model passes through (at least) two modules in its framework: the game learning of the generator and the arbiter with each other produces the preferred output.

First, a sample feature image set may be determined first, so that the discriminator may call the sample feature images in the sample feature image set. The acquired noise data may then be input to the generator in the form of an image, and the generator may generate a candidate image from the noise data, where the candidate image has "true" or "false" conditions due to the limitations of the parameters of the generator itself, where the "true" condition means that the candidate image is highly similar to the sample feature image, and the "false" condition means that the candidate image is highly dissimilar to the sample feature image, and in order to finally generate a better quality image material, a better generator should continuously optimize the parameters thereof so that the probability of generating a feature image that is "true" is greater than a first set threshold.

In this embodiment of the application, the discriminator may be configured to discriminate whether the candidate image output by the generator is "true" or "false", specifically, match the candidate image output by the generator with the sample feature image, judge whether the candidate image is highly similar to the sample feature image, and if so, determine that the candidate image is "true", and is the feature image; if the heights are not similar, the candidate image is determined to be false and not a feature image. In practical application, the candidate image of the candidate image can be obtained by respectively sampling a sample characteristic image and the candidate image, then inputting the sample characteristic image and the candidate image into a discriminator, judging whether the candidate image is true or not according to two sampling results by the discriminator, and deceiving the discriminator by a generator after multiple rounds of iterative training so as to achieve the purpose of improving the output quality of the generator.

Further, the feature image determined by the discriminator may be further input into a second data set, and the convolution neural network model included in the second data set maps the feature image into a complete target image, which is similar to a process of combining an edge feature image and a texture feature image included in the feature image to obtain a complete image material, such as the target image in fig. 1.

For example, referring to fig. 2, which shows an actual scene diagram of an image generation method provided in the embodiment of the present application, if it is required to finally generate a target image 30 of a type of a leaf, some sample feature image sets including an edge feature image 10 and a texture feature image 20 of the leaf may be collected, and in the edge feature image 10, a boundary of the leaf divides the leaf into a plurality of edge regions 11. The texture feature image 20 has an arrangement 21 of the surface structure of leaves in each edge region 11. After the noise data with random Gaussian distribution is input into the first data set, a generator or the noise data is generated into initial feature images, the initial feature images are respectively matched with the edge feature image 10 and the texture feature image 20 of the leaf in the sample feature image set by a discriminator, if any matching condition exists, the candidate image is determined to be true, finally the feature image which is true is input into the second data set, and the edge feature image 10 of the leaf and the texture feature image 20 of the leaf included in the feature image which is true are combined through a convolutional neural network model in the second data set to generate a target image 30 of the leaf.

In the embodiment of the present application, the noise data can be directly generated into the feature image through the first data set, and since the picture composition of the feature image is relatively simple, the image details in the feature image are greatly reduced compared with the final target image, so that the process of generating the feature image by the generating type confrontation network model is more concise, the difficulty of optimizing the relevant parameters in the generating type confrontation network model is also reduced, the probability of generating the feature image with a relatively stable picture by the first data set is greatly improved, further, the embodiment of the present application utilizes the convolutional neural network model to directly generate the feature image into the complete target image, since the picture of the feature image is relatively stable, the process of constructing the target image is relatively simple when the relatively stable feature image is input into another convolutional neural network model, the probability of color distortion in the target image is reduced, and the picture quality of the target image is improved.

Referring to fig. 3, a flowchart of an embodiment of an image generation method according to the embodiment of the present application is shown, where the method specifically may include the following steps:

step 101, noise data is acquired.

In the embodiment of the present application, the noise data may be gaussian noise, which refers to a type of noise whose probability density function follows a gaussian distribution (i.e., a normal distribution). In a specific implementation manner of the embodiment of the present application, a randn (random normal distribution) function may be used to achieve the obtaining of gaussian noise, where the randn function is a function of a random number or a Matrix that generates a standard normal distribution, belongs to a Matrix Laboratory (MATLAB) function, and may implement the operation of obtaining random gaussian noise data through an open source randn function interface provided by the MATLAB. Of course, the embodiment of the present application may also use noise data of other digital images, and the embodiment of the present invention does not limit this.

It should be noted that, in the embodiment of the present application, in order to facilitate processing of noise data in subsequent steps, the noise data may be converted into a noise image, specifically, according to a random array of a normal distribution in the noise data, a summation calculation is performed with a default input pixel to obtain an output pixel, and then the output pixel is filled in a picture with a preset size to obtain a noise image with a preset size.

102, obtaining a characteristic image according to the noise data and the first data set; wherein the first set of data is used to generate the feature image from the noise data.

In the embodiment of the application, a first data set and a second data set can be preset; the first data set may comprise a generative confrontation network model, which is mainly used for generating the feature image from the noise data, and in particular, the generative confrontation network model may comprise a generator and a discriminator, the generative confrontation network model being a deep learning model, the generative confrontation network model passing through (at least) two modules in its framework: the mutual game learning of the generator and the discriminator produces a preferred output so that the noise data is generated as a feature image.

Specifically, a sample feature image set may be determined first, so that the discriminator may call the sample feature images in the sample feature image set. Then, the acquired normally distributed noise data may be input to the generator in the form of an image, and the generator may generate a candidate image through the noise data in this embodiment of the application, the discriminator may be configured to discriminate whether the candidate image output by the generator is "true" or "false", specifically, match the candidate image output by the generator with the sample feature image, judge whether the candidate image and the sample feature image are highly similar, and if so, determine that the candidate image is "true", and then, determine that the candidate image is the feature image; if the heights are not similar, the candidate image is determined to be false, and the candidate image is not the feature image. In the embodiment of the application, noise data is directly used for generating the feature image with less image details instead of generating the noise data into the complete image with more image details, so that the process of generating the feature image by using the first data set can be simplified, the difficulty of optimizing related parameters in the first data set is reduced, and the first data set can generate the feature image with a stable picture.

103, obtaining a target image according to the characteristic image and the second data set; wherein the second data set is used to generate the target image from the feature image.

In this step, the feature image determined as "true" by the discriminator may be further input into the second data set, and the feature image is generated into a complete target image by the convolutional neural network model included in the second data set, which is similar to combining an edge feature image and a texture feature image included in the feature image to obtain a complete target image. Because the picture of the characteristic image is relatively stable, the probability of color distortion in the target image is reduced, and the picture quality of the target image is improved.

For example, referring to fig. 4, a specific example diagram of an image generation method according to the present application is shown, wherein four edge feature images a1, b1, c1 and d1 with different edges are obtained according to a first data set, and four texture feature images a2, b2, c2 and d2 with different textures corresponding to the four edge feature images are obtained, and then a1-a2, b1-b2, c1-c2 and d1-d2 are input into a second data set, so that four corresponding different target images a3, b3, c3 and d3 can be obtained.

To sum up, an image generation method provided by the embodiment of the present application includes: acquiring noise data; obtaining a characteristic image according to the noise data and the first data set; wherein the first data set is used to generate a feature image from the noise data; obtaining a target image according to the characteristic image and the second data set; wherein the second data set is used to generate the target image from the feature image. The noise data can be directly generated into the characteristic image through the first data set, and the picture composition of the characteristic image is relatively simple, so that the image details in the characteristic image are greatly reduced compared with the final target image, the process of generating the characteristic image by the first data set is simpler, the difficulty of optimizing the related parameters in the first data set is also reduced, the probability of generating the characteristic image with a relatively stable picture by the first data set is greatly improved, further, the embodiment of the application directly generates the characteristic image into the complete target image by utilizing the second data set, and the picture of the characteristic image is relatively stable, so that the construction process is relatively simple, the probability of color distortion in the target image is reduced when the relatively stable characteristic image is input into the other second data set for constructing the target image, the picture quality of the target image is improved.

Referring to fig. 5, a specific flowchart of an embodiment of an image generation method according to the embodiment of the present application is shown, and the method may specifically include the following steps:

step 201, a second sample feature image is obtained.

In the embodiment of the present application, before applying the first data set to the process of generating the noise data into the feature image, the first data set may be trained by using the training data, so as to adjust the relevant parameters in the first data set to be better, so that the first data set can generate the feature image with better quality.

Specifically, the training data may adopt a second sample feature image, and assuming that the target image generated by the final requirement is an image of a leaf, an obtaining manner of the second sample feature image may specifically include: and collecting massive complete leaf images, extracting edge characteristic images and texture characteristic images of the leaf images, and determining the extracted edge characteristic images and texture characteristic images as second sample characteristic images.

In practical applications, the second sample feature image may include: an edge feature image and/or a texture feature image.

The edge feature image can be obtained by detecting an edge of the complete image, and this operation is also called step detection (step detection). For example, the generated leaves can be obtained by performing edge detection on images of various leaves.

The texture feature image can be obtained by a structural method, which is a spatial domain method, and the basic idea is that complex textures can be formed by repeatedly arranging and combining a plurality of simple texture elements in a certain regular form. The structural description has two key points: firstly, determining texture elements; secondly, an arrangement rule is established, and a texture element is a connected pixel set which is described by a group of attributes. In order to describe textures in a structural way, rules for arranging the texture primitives and the texture primitives in a reverse direction are established on the basis of the texture primitives, and the arrangement rules and the mode can be defined by formal grammar. Through the structure method, the texture characteristic image can be obtained from the complete image. For example, the generated leaves can be obtained by performing edge detection on images of various leaves.

It should be noted that, if an edge feature image and a texture feature image are to be used simultaneously, the above-mentioned extraction process may be performed on one complete image respectively to obtain the edge feature image and the texture feature image. In the embodiment of the present application, in this case, the edge feature image and the texture feature image are combined in pairs, and the combination corresponds to a complete image from which the feature is extracted. For example, for a leaf image a, the edge feature image a1 and the texture feature image a2 are extracted according to the above method, then a1 and a2 are combined in pairs, and a1 and a2 correspond to a. Of course, the edge feature image and the texture feature image may also be generated in other manners, and the embodiments of the present invention do not limit these.

Step 202, training the first data set according to the second sample feature image.

In an embodiment of the present application, the first data set may be a generative confrontation network model, including: the embodiment of the present application may solve a loss function (loss function) for the first data set through the second sample characteristic image and the first data set, where the loss function is a function that maps a random event or a value of a random variable related to the random event to a non-negative real number to represent a "risk" or a "loss" of the random event. In application, the loss function is usually associated with an optimization problem as a learning criterion, that is, the model is solved and evaluated by minimizing the loss function, so as to optimize the parameters of the model. By utilizing the loss function aiming at the first data set, the optimization of the parameters of the generator and the discriminator can be realized, and the aim of training the first data set is fulfilled. It should be noted that the generative confrontation network model is a system with an optimization minimum value being not fixed, and the training process of the generative confrontation network model does not aim at the minimum loss value, but aims at achieving dynamic balance between the loss values of the generator and the discriminator through a loss function.

Specifically, in one implementation, assuming training of the first data set, the training may include:

firstly, a second sample characteristic image marked with 'true' or 'false' is used for training the discriminator, namely the second sample characteristic image is input, the difference between the 'true' or 'false' value output by the discriminator and the originally marked 'true' or 'false' value of the second sample characteristic image is obtained, the loss function of the discriminator is obtained, and the weight of the relevant parameter in the discriminator is correspondingly adjusted according to the loss function, so that the 'true' or 'false' value output by the discriminator is as same as the originally marked 'true' or 'false' value of the second sample characteristic image as possible. If a second sample feature image marked as true is input into the discriminator, but the output result is false, the weighting parameters of the function discriminated as false in the discriminator can be reduced; if a second sample feature image labeled "false" is input to the discriminator, but the output result is "true", the weighting parameter of the function discriminated as "true" in the tuning-down discriminator may be corresponded. In addition, the training process may go through multiple iterations, each iteration going through the process of calculating the loss function and adjusting the relevant parameters of the discriminator according to the loss function until the relevant parameters of the discriminator are adjusted to be optimal.

And secondly, training the generator by using a second sample characteristic image, for example, inputting random Gaussian-distribution noise data into the generator to obtain an initial image output by the generator, performing similarity calculation on the initial image and the second sample characteristic image to obtain a loss value between the initial image and the second sample characteristic image, constructing a loss function corresponding to the generator according to the loss value, and adjusting a related weight parameter of the generator by using the loss function to enable the loss value generated in the subsequent iteration process to be smaller. In addition, the training process may go through multiple iterations, each of which goes through the process of calculating the loss value and adjusting the associated weight parameters of the generator according to the loss function until the associated weight parameters of the generator are adjusted to be optimal. Of course, in the actual training process, the noise data may be extracted from the second sample feature image, and then the noise data may be used as the input feature of the generator and the output feature of the corresponding second sample feature, and then the training process may be performed.

And thirdly, fitting the loss function of the generator and the loss function of the discriminator into a target loss function, dynamically adjusting parameters of the generator and the discriminator by using the target loss function, and observing the fitting degree between the actual output distribution of the generator and the discriminator and the preset sample distribution in real time, wherein the higher the fitting degree is, the better the training effect of the first data set is, therefore, the parameters of the generator and the discriminator when the fitting degree is the highest can be determined as the parameters of the final generator and the discriminator, and the training of the first data set is completed.

Step 203, obtaining a sample target image corresponding to the second sample characteristic image.

In the embodiment of the present application, the second sample feature image may be obtained by extracting a feature image from the sample target image. For example, if the second sample feature image is an edge feature image and a texture feature image corresponding to a leaf image, the sample target image may be the leaf image.

In practical applications, for example, the second data set is a convolutional neural network model, the training requires an input image and an output image, and if the sample feature image is extracted from a sample target image, which may be an existing image material, the sample feature image may be used as an input of the convolutional neural network model, and the sample target image may be trained as an output of the convolutional neural network model, for example, the aforementioned (a1, a2) is input into the convolutional neural network model, an image output by the convolutional neural network model is compared with a, a loss function is calculated, and then parameters of the convolutional neural network model are adjusted.

Step 204, training the second data set according to the second sample feature image and the sample target image.

In an embodiment of the present application, the second data set may be a convolutional neural network model. Training a second data set by using the second sample feature image and the sample target image may specifically include:

and step one, inputting the second sample characteristic image into a second data set to obtain an initial complete image output by the second data set.

And secondly, calculating a difference value between the initial complete image and the second sample characteristic image, and obtaining a loss function based on the difference value according to the difference value.

And thirdly, adjusting parameters of the second data set by utilizing the loss function through multiple iterations to enable the difference value between the initial complete image and the second sample characteristic image output by the final second data set to be as small as possible. Specifically, there may be various ways to adjust the parameters of the second data set through the loss function, and in one implementation, the parameters of the second data set may be adjusted by using a gradient back propagation algorithm according to the loss function.

It should be noted that, the

steps

201 and 204 are preset processes of the first data set and the second data set, and the training of the first data set and the training of the second data set may not be in sequence.

Step 205, noise data is acquired.

This step may specifically refer to step 101, which is not described herein again.

Step 206, obtaining a characteristic image according to the noise data and the first data set; wherein the first set of data is used to generate the feature image from the noise data.

Optionally, the feature image includes: at least one of texture feature images and edge feature images.

In the embodiment of the present application, the edge feature image may be a boundary line of an object in the image, and the texture feature image may be a color filled in an area formed between the respective boundary lines in the image. In the embodiment of the application, the edge feature image and the texture feature image can exist in the form of the edge feature image and the texture feature image, and one edge feature image and one corresponding texture feature image are combined to obtain a complete image material.

In general, the feature image needs to include both a texture feature image and an edge feature image, so as to achieve the purpose of combining the edge feature image and a corresponding texture feature image to obtain a complete image material. In addition, in some other cases, the feature image may also include only a texture feature image or only an edge feature image. For example, where the second data set includes a classifier for identifying a classification of an image, then a texture feature image or an edge feature image may be input into the second data set, outputting a classification category for the texture feature image or the edge feature image.

Optionally, in a specific implementation manner of the embodiment of the present application, the first data set includes: the image processing device comprises a first sub data set and a second sub data set, wherein the first sub data set is used for generating a candidate image according to the noise data, and the second sub data set is used for identifying the characteristic image from the candidate image; the step 206 may specifically include:

substep 2061, generating the candidate image based on the noise data and the first sub-data set.

In the embodiment of the present application, if the first sub data set (generator) is trained, after noise data is input into the first sub data set, the first sub data set outputs candidate images, and the candidate images may include a larger number of feature images.

Sub-step 2062 of determining the feature image in the candidate image through the second sub-data set.

In this step, due to the limitation of the training process on the first sub data set, the effect of outputting a feature image by one hundred percent of the first sub data set cannot be achieved, and therefore, non-feature images in the candidate images need to be screened out. In the embodiment of the present application, the candidate image may be input into the second sub data set (discriminator), and the discriminator may screen out non-feature images in the candidate image.

Optionally, in a specific implementation manner of the embodiment of the present application, the second sub data set includes: a first sample feature image; the sub-step 2062 may be specifically implemented by determining, according to the second sub-data set, a candidate image whose similarity with the first sample feature image is greater than or equal to a preset similarity threshold as the feature image.

In this step, the first sample feature image may be completely the same as, may be partially the same as, or may be completely different from the second sample feature image, and the first sample feature image and the second sample feature image may belong to one image classification, which is not limited in this application.

Specifically, the second sub-data set may call the first sample feature image, perform similarity calculation on the first sample feature image and the input feature image, and determine the candidate image having a similarity greater than or equal to a preset similarity threshold with the first sample feature image as a real image highly similar to the first sample feature image, that is, as the feature image.

Step 207, obtaining a target image according to the characteristic image and the second data set; wherein the second data set is used to generate the target image from the feature image.

This step may specifically refer to step 103, which is not described herein again.

To sum up, an image generation method provided by the embodiment of the present application includes: acquiring noise data; obtaining a characteristic image according to the noise data and the first data set; wherein the first data set is used to generate a feature image from the noise data; obtaining a target image according to the characteristic image and the second data set; wherein the second data set is used to generate the target image from the feature image. The noise data can be directly generated into the feature image through the first data set, and because the picture composition of the feature image is relatively simple, the image details in the feature image are greatly reduced compared with the final target image, so that the process of generating the feature image by the first data set is simpler, the difficulty of optimizing the related parameters in the first data set is also reduced, the probability of generating the feature image with a relatively stable picture by the first data set is greatly improved, further, the embodiment of the application directly generates the feature image into the complete target image by utilizing the second data set, and because the picture of the feature image is relatively stable, the probability of color distortion in the target image is reduced, and the picture quality of the target image is improved.

Referring to fig. 6, a specific flowchart of another embodiment of an image generation method according to the embodiment of the present application is shown, and the method may specifically include the following steps:

step 301, noise data is acquired.

This step can refer to step 101 described above, and is not described here.

Step 302, obtaining a feature image corresponding to a target category according to the noise data and the first data set.

In this step, the determination of the target class may be advanced, and specifically, the determination of the target class may be performed in various manners.

The target category may be a specific category of an object desired by the user, for example, an image of a leaf desired by the user, the leaf may be the target category, the user may desire an image of a fish, the fish may be the target category, the specific target category may be selected according to a user requirement, and the embodiment of the present application is not limited thereto. In training the first and second data sets, the training process described above may be performed using the second sample feature images of the respective target classes and the sample target images. For example, for the leaf category, a large amount of leaf images are collected, the feature images extracted from the leaf images are constructed into leaf sample feature images, and the leaf sample feature images are used for training a first data set, so that when noise data are input into the first data set, the leaf feature images corresponding to the leaf target category can be obtained. Similarly, the corresponding first data set and the second data set can be trained for the fish and other categories. In practical applications, different target categories may correspond to different first and second data sets.

For example, in one implementation, referring to fig. 7, a specific application scenario diagram of another image generation method provided in the present application is shown, which provides two data sets of object class, i.e., a first data set of object class a, a second data set of object class a, and a first data set of object class B, a second data set of object class B. The target category may be specified according to a requirement, and assuming that a user determines the target category a at a client, for example, the user selects the target category a in a menu of a page of a terminal, or inputs the target category a in the page of the terminal, and then sends an image generation request to a server, after receiving the request, the server may obtain the target category a from the request, and may select a data set corresponding to the target category a, such as a first data set of the target category a and a second data set of the target category a. Then inputting the noise data to the first data set of the target class A to obtain the characteristic image of the target class A. Assuming that the service provider provides the first data set and the second data set of the leaf category, then

In another implementation, the target category may not be set, that is, a random image may be collected, a feature image extracted from the random image is constructed as a second sample feature image, and the second sample feature image is used to train the first data set, so that when noise data is input into the first data set, a random feature image may be obtained.

Step 303, obtaining a target image corresponding to the target category according to the feature image and the second data set.

In this step, referring to fig. 7, the feature image of the target category a is input into the second data set of the target category a, and the target image generation process is performed to obtain the target image of the target category a.

In another implementation, the target category may not be set, that is, when noise data is input into the first data set, a random feature image may be obtained, and then the random feature image may be input into the second data structure to obtain a random target image.

To sum up, in the embodiment of the present application, noise data can be directly generated into a feature image through a first data set, and because the picture composition of the feature image is relatively simple, image details in the feature image are greatly reduced compared with a final target image, so that the process of generating the feature image by the first data set is simpler, the difficulty of optimizing relevant parameters in the first data set is also reduced, and the probability of generating the feature image with a stable picture by the first data set is greatly improved. Moreover, target images of corresponding categories can be generated according to different target categories, and the generated images are more accurate.

Referring to fig. 8, a flowchart of an embodiment of an image generation method according to the embodiment of the present application is shown, where the method specifically includes the following steps:

step 401, noise data is acquired.

Step 402, generating a feature image based on a first neural network according to the noise data, wherein the feature image comprises a texture feature image or an edge feature image.

Neural Networks (NNs) can also be called Connection models (Connection models), which are arithmetic mathematical models simulating animal Neural network behavior characteristics and performing distributed parallel information processing, and the Networks achieve the purpose of processing information by adjusting the interconnection relationship among a large number of internal nodes depending on the complexity of the system.

In the embodiment of the present application, the first neural network may include a generative confrontation network model, the generative confrontation network model is mainly used for generating the feature image according to the noise data, and specifically, the generative confrontation network model may include a generator and a discriminator, the generative confrontation network model is a deep learning model, and the generative confrontation network model is formed by (at least) two modules in its framework: the mutual game learning of the generator and the discriminator produces a preferred output so that the noise data is generated as a feature image.

Specifically, a sample feature image set may be determined first, so that the discriminator may call the sample feature images in the sample feature image set. Then, the acquired noise data, such as the normal distribution, may be input to the generator in the form of an image, and the generator may generate a candidate image through the noise data, in this embodiment of the application, the discriminator may be configured to discriminate whether the candidate image output by the generator is "true" or "false", specifically, match the candidate image output by the generator with the sample feature image, judge whether the candidate image and the sample feature image are highly similar, and if so, determine that the candidate image is "true", and then is the feature image; if the heights are not similar, the candidate image is determined to be false, and the candidate image is not the feature image. In the embodiment of the application, noise data is directly used for generating the feature image with less image details instead of generating the noise data into the complete image with more image details, so that the process of generating the feature image by the first neural network can be simplified, the difficulty of optimizing related parameters in the first neural network is reduced, and the first neural network can generate the feature image with a stable picture. Wherein, the characteristic image is at least one of the texture characteristic image and the edge characteristic image described in the foregoing embodiments.

And step 403, generating a target image based on the second neural network according to the characteristic image.

Specifically, the second neural network may be a convolutional neural network, where the convolutional neural network includes a feature extractor composed of convolutional layers and sub-sampling layers, in a convolutional layer of the convolutional neural network, one neuron is connected to only part of neurons in adjacent layers, in a convolutional layer of the convolutional neural network, a number of feature planes (featuremaps) are usually included, each feature plane is composed of some neurons arranged in a rectangle, and neurons in the same feature plane share a weight, where the shared weight is a convolutional kernel. The convolution kernel can be initialized in the form of a random decimal matrix, and the convolution kernel can learn to obtain a weight value meeting the requirement in the training process of the network. Sharing weights (convolution kernels) brings the immediate benefit of reducing the connections between layers of the network, while reducing the risk of over-fitting. Sub-sampling is also called pooling (Pooling), and has two forms, such as mean sampling (mean posing) and maximum value sub-sampling (max posing). Sub-sampling can be viewed as a special convolution process. The convolution and sub-sampling greatly simplify the complexity of the model, reduce the parameters of the model and improve the processing capacity of the convolutional neural network in the image field.

In this step, the feature image output by the first neural network may be further input into the second neural network, and the feature image is generated into a complete target image by the second neural network, in a process similar to that the feature image includes an edge feature image and a texture feature image, which are combined to obtain a complete target image. Because the picture of the characteristic image is relatively stable, the probability of color distortion in the target image is reduced, and the picture quality of the target image is improved.

To sum up, an image generation method provided by the embodiment of the present application includes: acquiring noise data; generating a feature image based on the first neural network according to the noise data, wherein the feature image comprises a texture feature image or an edge feature image; and generating a target image based on the second neural network according to the characteristic image. The noise data can be directly generated into the characteristic image through the first neural network, and the picture composition of the characteristic image is relatively simple, so that the image details in the characteristic image are greatly reduced compared with the final target image, the process of generating the characteristic image by the first neural network is simpler, the difficulty of optimizing related parameters in the first neural network is also reduced, the probability of generating the characteristic image with a relatively stable picture by the first neural network is greatly improved, further, the embodiment of the application directly generates the characteristic image into the complete target image by utilizing the second neural network, and the picture of the characteristic image is relatively stable, so that the construction process is relatively simple, the probability of color distortion in the target image is reduced when the relatively stable characteristic image is input into the other second neural network for constructing the target image, the picture quality of the target image is improved.

Referring to fig. 9, a specific flowchart of an embodiment of an image generation method according to the embodiment of the present application is shown, and the method may specifically include the following steps:

step 501, noise data is acquired.

Step 502, generating the candidate image according to the noise data and the first sub-neural network.

In the embodiment of the present application, if the first sub-neural network (e.g., the generator) is trained, after the noise data is input into the first sub-neural network, the first sub-neural network outputs candidate images, and the candidate images may include a larger number of feature images.

It should be noted that the candidate image may include at least one of an edge feature image and a texture feature image, such as an edge feature image, a texture feature image, or both an edge feature image and a texture feature image.

Optionally, step 502 may specifically include:

substep 5021, based on the fully-connected layer of the first sub-neural network, transforms the noisy data into a first profile.

In the embodiment of the present application, the fully connected layers (FC) function as "classifiers" in the whole neural network, that is, the fully connected layers function to map the learned "distributed feature representation" to the sample labeling space. In practical use, the fully-connected layer can be realized by convolution operation, each neuron in the fully-connected layer is fully connected with all neurons in the adjacent layer, and the fully-connected layer can integrate local information with category distinction in the convolution layer or the pooling layer.

In each convolutional layer in the neural network, the data exists in three dimensions, and the three-dimensional data can be viewed as a plurality of two-dimensional pictures which are stacked together, wherein each two-dimensional picture can be called a feature map (feature map), and in the neural network, the feature map can exist in the form of a feature matrix. For example, if the data is a color picture, the data may include 3 feature maps, i.e., a feature map reflecting a red channel, a feature map reflecting a green channel, and a feature map reflecting a blue channel. There are several convolution kernels (kernel) between convolution layers of the neural network, and each feature map output by the previous convolution layer is convolved with each convolution kernel of the current convolution layer to generate a feature map for inputting the next convolution layer.

The first sub-neural network may be a generator in a generative countermeasure network, and the purpose of the generator is to process one of the aforementioned noise data into a feature image through a convolution operation, so that the generator needs to combine more features into a feature image.

Therefore, in this step, the noise data may be first converted into the first feature map through the fully-connected layer of the first sub-neural network, thereby obtaining a preliminary feature map.

It should be noted that, in the case of simultaneously using the edge feature and the texture feature, the first feature map may include two feature maps, that is, the first feature map may include a first feature map corresponding to the edge feature image and a first feature map corresponding to the texture feature image.

Substep 5022, based on the convolution layer of the first sub-neural network, performs a first convolution process on the first feature map to obtain the candidate image.

In this step, a plurality of convolutional layers may be preset in the first sub-neural network, each convolutional layer may perform a first convolution operation, may be based on the plurality of convolutional layers of the first sub-neural network, performing a preset number of times of first convolution processing on the first feature map to realize multiple rounds of iterative convolution on the first feature map, wherein the size of the first feature map can be increased through the first convolution processing, the method comprises extracting features from a first feature map corresponding to noise data by a first convolution process, combining the features to obtain a first feature map with increased size, the first feature map with increased size contains more features than the original first feature map, until the first convolution processing is performed for a preset number of times, and when the size of the first feature map reaches a first preset size threshold value, determining the first feature map as the candidate image.

Specifically, the preset number of times may be set to be not less than 10, that is, the number of layers of the convolutional layer is not less than 10, and the larger the preset number of times is, the higher the accuracy of the obtained candidate image is, but the more time and the more calculation resources are consumed, so that the preset number of times may be controlled according to the actual demand to achieve the target demand effect. In addition, the feature map output by the next convolutional layer has a multiple size compared with the feature map output by the previous convolutional layer.

It should be noted that the candidate image may include two images, that is, the candidate image may include a candidate image corresponding to the edge feature image and a candidate image corresponding to the texture feature image.

Specifically, sub-step 5022 may specifically include:

substep 50221, in the first sub-neural network, performing a first convolution process on the first feature map to obtain the first candidate feature map with the increased size.

In this step, the first feature map output from the fully-connected layer may be further input to a convolutional layer located in a layer next to the fully-connected layer, and the convolutional layer may perform a first convolution process on the first feature map to obtain a first candidate feature map with an increased size.

Specifically, each Convolutional layer (Convolutional layer) in the first sub-neural network is composed of a plurality of convolution units, parameters of each convolution unit are optimized through a back propagation algorithm, convolution operation in the Convolutional layer aims at extracting different input features, the Convolutional layer in the first layer may only extract some low-level features such as edges, lines, angles and other levels, and more layers of networks can iteratively extract more complex features from the low-level features. In this step, the first feature map is increased in size by the first convolution processing, and the purpose is to extract features from the first feature map corresponding to the noise data by the first convolution processing, and combine these features to obtain a first candidate feature map increased in size, which contains more features than the original first feature map.

Before the first convolution processing is performed on the first feature map, Batch Normalization (BN) processing may be performed on the first feature map so that the mean value of the first feature map is 0 and the variance is 1. The idea of batch normalization processing is to normalize the current layer inputs to make their mean value 0 and variance 1, the batch normalization processing has the advantages of accelerated convergence, and the convolutional neural network added in the batch normalization processing is very little affected by weight initialization, has very good stability, and has very good effect on improving the convolution performance.

In addition, after the first feature map is subjected to first convolution processing by the convolution layer, the convolved first candidate feature map can be activated through a preset activation function, so that the activation number of the neurons in the first sub-neural network is increased. For example, a Linear rectification function (ReLU) may be used as the activation function, and the ReLU function has the following advantages compared to a conventional neural network activation function: 1. imitating the biological principle: in a neural network using the ReLU function, approximately 50% of the neurons are active. 2. More efficient gradient descent and back propagation: the problems of gradient explosion and gradient disappearance are avoided. 3. The calculation process is simplified: the influence of other complex activation functions such as an exponential function is avoided; meanwhile, the activity dispersity enables the overall calculation cost of the neural network to be reduced.

In sub-step 50222, a step of performing a preset number of times of first convolution processing on the first candidate feature map to obtain the first candidate feature map with the increased size is performed until the first candidate feature map is determined as the candidate image when the size of the first candidate feature map reaches a first preset size threshold.

In this step, the first candidate feature map with increased size obtained in the sub-step 50221 may be further input into a next convolutional layer, and the size of the first candidate feature map is further increased by a first convolution processing performed by the next convolutional layer, so as to extract more features from the first candidate feature map through the first convolution processing, and combine more features to obtain a first candidate feature map with further increased size, where the first candidate feature map with further increased size includes more features. And the first candidate feature map with the further increased size can be further input into the next convolution layer for first convolution processing until the first candidate feature map is determined as the candidate image after multiple rounds of iterative convolution until the size of the first candidate feature map reaches a first preset size threshold.

The iteration times of the multi-round iterative convolution can be set to be not less than 10 times, for example, the larger the iteration times is, the higher the accuracy of the obtained candidate image is, but the more the consumed time and the more the calculation resources are, so that the iteration times can be controlled according to the actual requirements to achieve the target requirement effect. In addition, the feature map output by the next convolutional layer has a multiple size compared with the feature map output by the previous convolutional layer.

In this step, before the first convolution processing is performed on the first candidate feature map, batch normalization processing may be performed on the first candidate feature map so that the mean value of the first candidate feature map is 0 and the variance is 1. In addition, after the first candidate feature map is subjected to the first convolution processing by the convolution layer, the convolved first candidate feature map can be activated through a preset activation function, so that the activation number of the neurons in the first sub-neural network is increased.

Step 503, determining the feature image in the candidate image according to the second sub-neural network.

In this step, due to the limitation of the training process of the first sub-neural network, the effect of outputting a feature image by a hundred percent of the first sub-neural network cannot be achieved, and therefore, non-feature images in the candidate images need to be screened out. In the embodiment of the present application, the candidate image may be input to a second sub neural network (discriminator), and the discriminator may screen out non-feature images in the candidate image.

It should be noted that, in the embodiment of the present application, the feature image may include two images, that is, the feature image may include a feature image corresponding to the edge feature image and a feature image corresponding to the texture feature image.

Optionally, the second sub-neural network includes: a sample feature image; step 503 may specifically include:

sub-step 5031, based on the fully connected layer of the second sub-neural network, converting the candidate image into a second feature map.

In the embodiment of the present application, the fully connected layers (FC) function as "classifiers" in the entire neural network. In practical use, the fully-connected layer can be realized by convolution operation, each neuron in the fully-connected layer is fully connected with all neurons in the adjacent layer, and the fully-connected layer can integrate local information with category distinction in the convolution layer or the pooling layer.

The second sub-neural network may be a discriminator in the generative countermeasure network, where the discriminator is to screen a feature image from the candidate image output by the generator, so that the discriminator needs to obtain a target feature of the candidate image through convolution operation, where the target feature can accurately reflect the image property of the candidate image, and the discriminator may perform similarity calculation on the target feature and the sample feature image, so that a candidate image corresponding to a target feature whose similarity with the sample feature image is greater than or equal to a preset similarity threshold is determined as the feature image according to the similarity.

Therefore, in this step, the noise data can be first converted into the second feature map through the fully-connected layer of the second sub-neural network, so as to obtain a preliminary feature map.

It should be noted that the second feature map may include two feature maps, that is, the second feature map may include a second feature map corresponding to the edge feature image and a second feature map corresponding to the texture feature image.

Sub-step 5032, performing a second convolution process on the second feature map based on the convolution layer of the second sub-neural network to obtain a second candidate feature map.

In this step, multiple convolutional layers may be preset in the second sub-neural network, each convolutional layer may perform a second convolution operation once, and the second feature map may be subjected to second convolution processing for a preset number of times based on the multiple convolutional layers of the second sub-neural network, so as to implement multiple rounds of iterative convolution on the second feature map, and the size of the second feature map may be reduced through the second convolution processing, so as to obtain a second candidate feature map. The objective is to extract target features from the second feature map that accurately reflect the candidate image and screen out non-target features other than the target features by a second convolution process, the reduced size second candidate feature map containing fewer but more accurate features than the original second feature map.

The preset times can be set to be not less than 10 times, the larger the preset times is, the higher the accuracy of the obtained target characteristics is, but the more the consumed time and the more the calculation resources are, therefore, the preset times can be controlled according to the actual requirements to achieve the target requirement effect. In addition, the feature map output by the next convolutional layer is reduced in size by a factor of two compared with the feature map output by the previous convolutional layer.

Optionally, sub-step 5032 may specifically include:

sub-step 50321, in the second sub-neural network, performing convolution processing on the second feature map to obtain the second candidate feature map with reduced size.

In this step, the second feature map output by the fully-connected layer may be further input to a convolutional layer located in a layer next to the fully-connected layer, and in the convolutional layer, second convolution processing may be performed on the second feature map to obtain a second candidate feature map with a reduced size.

Specifically, the size of the second feature map is reduced by the convolution operation, and the purpose is to extract the target feature from the second feature map and screen out the non-target features except the target feature through the second convolution processing, so that a second candidate feature map with a reduced size is obtained, and the second candidate feature map with the reduced size contains fewer features but more accurate features than the original second feature map.

Before the second convolution processing is performed on the second feature map, batch normalization processing may be performed on the second feature map, so that the mean value of the first feature map is 0 and the variance is 1. In addition, after the convolution layer performs the second convolution processing on the second feature map, the convolved second candidate feature map can be activated through a preset activation function, so that the activation number of the neurons in the second sub-neural network is increased.

Sub-step 50322, performing a second convolution process on the second candidate feature map for a preset number of times to obtain the second candidate feature map with the reduced size, until the similarity between the second candidate feature map and the sample feature image is calculated when the size of the second candidate feature map reaches a second preset size threshold.

Wherein the second preset size threshold is smaller than the first preset size threshold.

In this step, the second candidate feature map with reduced size obtained in sub-step 50321 may be further input into the next convolutional layer, and the size of the second candidate feature map is further reduced by the next convolutional layer through a convolution operation, so as to obtain a second candidate feature map with further reduced size, which contains fewer features than the second candidate feature map obtained in sub-step 50321, by extracting more irrelevant non-target features from the second candidate feature map through a second convolution process and obtaining a second candidate feature map with further reduced size. And the second candidate feature map with the further reduced size can be further input into a next convolution layer for convolution operation until the size of the second candidate feature map reaches a smaller second preset size threshold through multiple rounds of iterative convolution, and irrelevant non-target features in the second candidate feature map are almost screened out when the size of the second candidate feature map reaches the smaller second preset size threshold, so that the similarity between the second candidate feature map and the sample feature image is calculated, the interference of the irrelevant non-target features can be reduced, and the calculation accuracy is improved.

The iteration times of the multi-round iterative convolution are generally not less than 10 times, the larger the iteration times, the higher the accuracy of the obtained target features, but the more the consumed time and the more the calculation resources, so that the iteration times can be controlled according to the actual requirements to achieve the target requirement effect. In addition, the feature map output by the next convolutional layer is reduced in size by a factor of two compared with the feature map output by the previous convolutional layer.

In this step, before the second convolution processing is performed on the second candidate feature map, batch normalization processing may be performed on the second candidate feature map so that the mean value of the second candidate feature map is 0 and the variance is 1. In addition, after the convolution layer performs the second convolution processing on the second candidate feature map, the convolved second candidate feature map can be activated through a preset activation function, so that the activation number of the neurons in the first sub-neural network is increased.

Sub-step 5033, determining the similarity between the second candidate feature map and the sample feature image.

In this step, when the size of the second candidate feature map reaches a smaller second preset size threshold, irrelevant non-target features in the second candidate feature map are almost completely screened out, so that the similarity between the second candidate feature map and the sample feature image is calculated, interference of the irrelevant non-target features can be reduced, and the calculation accuracy can be improved.

Sub-step 5034, determining a candidate image corresponding to a second candidate feature map with the similarity greater than or equal to a preset similarity threshold value with the sample feature image as the feature image.

In this step, the second candidate feature map may be completely the same as, may be partially the same as, or may be completely different from the sample feature image, and the second candidate feature map and the sample feature image may belong to one image classification, which is not limited in this application.

Specifically, the second sub-neural network may call the sample feature image, perform similarity calculation on the sample feature image and the input second candidate feature map, and determine the candidate image corresponding to the second candidate feature map whose similarity with the sample feature image is greater than or equal to a preset similarity threshold as a real image highly similar to the sample feature image, that is, determine the candidate image as the feature image.

And step 504, generating a target image based on the second neural network according to the characteristic image.

This step may specifically refer to step 403, which is not described herein again.

Optionally, step 504 may specifically include:

sub-step 5041, converting the feature image into a third feature map based on a fully connected layer of the second neural network.

In the embodiment of the present application, the fully-connected layer plays a role of "classifier" in the whole neural network, that is, the fully-connected layer plays a role of mapping the learned "distributed feature representation" to the sample labeling space. In practical use, the fully-connected layer can be realized by convolution operation, each neuron in the fully-connected layer is fully connected with all neurons in the adjacent layer, and the fully-connected layer can integrate local information with category distinction in the convolution layer or the pooling layer.

The second neural network can be a convolutional neural network, which aims to process the characteristic image into a target image through convolution operation, so that the generator needs to extract the accurate target characteristics in the characteristic image, screen out irrelevant non-target characteristics, and then combine the accurate target characteristics into a target image.

Therefore, in this step, the noise data can be first converted into a third feature map through the fully connected layer of the second neural network, thereby obtaining a preliminary feature map.

It should be noted that the third feature map may include two feature maps, that is, the third feature map may include a third feature map corresponding to the edge feature image and a third feature map corresponding to the texture feature image.

Sub-step 5042, performing a third convolution process on the third feature map based on the first convolution layer in the second neural network to obtain a third candidate feature map.

In this step, the third feature map output by the fully-connected layer may be further input to a convolutional layer located in a layer next to the fully-connected layer, and in the convolutional layer, third convolution processing may be performed on the third feature map to obtain a third candidate feature map with a reduced size.

Specifically, the size of the third feature map is reduced by the convolution operation, and the purpose is to extract the target feature from the third feature map and screen out non-target features other than the target feature through the third convolution processing, so that a third candidate feature map with a reduced size is obtained, and the third candidate feature map with the reduced size contains fewer features but more accurate features than the original third feature map.

Before the third convolution processing is performed on the third feature map, batch normalization processing may be performed on the third feature map, so that the mean value of the third feature map is 0 and the variance is 1. In addition, after the layer of convolutional layer performs the third convolutional processing on the third feature map, the convolved third candidate feature map may be activated through a preset activation function, so as to increase the number of activated neurons in the first sub-neural network.

Further, the third candidate feature map with the reduced size may be further input into a next convolutional layer, and the size of the third candidate feature map is further reduced by the next convolutional layer through a convolution operation, so as to screen out more irrelevant non-target features from the third candidate feature map through a third convolution process, and extract fewer but more accurate target features to obtain a third candidate feature map with a further reduced size. And the third candidate feature map with the further reduced size can be further input into the next convolution layer for convolution operation until the size of the third candidate feature map reaches a third preset size threshold value after multiple rounds of iterative convolution, and the third convolution processing is stopped.

The iteration times of the multi-round iterative convolution can be set to be not less than 10 times, the larger the iteration times is, the higher the accuracy of the obtained target characteristics is, but the consumed time and the calculation resources are larger, so that the iteration times can be controlled according to actual requirements to achieve the target requirement effect. In addition, the feature map output by the next convolutional layer is reduced in size by a factor of two compared with the feature map output by the previous convolutional layer.

Sub-step 5043, performing a fourth convolution process on the third candidate feature map based on the second convolution layer in the second neural network, and obtaining a target image.

In this step, since the accurate target feature in the feature image is already obtained through the third convolution processing, and the irrelevant non-target feature is screened out, the third candidate feature map whose size reaches the third preset size threshold may be further input into the next convolution layer, and the fourth convolution processing is performed on the next convolution layer, so as to obtain the third candidate feature map whose size is increased. The method aims to combine the target features through fourth convolution processing to obtain a third candidate feature map with the increased size.

Before performing the fourth convolution processing on the third candidate feature map, batch normalization processing may be performed on the third candidate feature map, so that the mean value of the third candidate feature map is 0 and the variance is 1. In addition, after the layer of convolutional layer performs the fourth convolution processing on the third candidate feature map, the convolved third candidate feature map may be activated through a preset activation function, so as to increase the number of activated neurons in the first sub-neural network.

In the case that the third feature map includes a third feature map corresponding to the edge feature image and a third feature map corresponding to the texture feature image, the third candidate feature map includes a third candidate feature map corresponding to the edge feature image and a third candidate feature map corresponding to the texture feature image

In the process of performing the fourth convolution processing on the third candidate feature map, the third candidate feature map corresponding to the edge feature image and the third candidate feature map corresponding to the texture feature image may be merged to obtain the target image.

Further, the obtained third candidate feature map with the increased size is further input into the next convolution layer, and the size of the third candidate feature map is further increased by the next convolution layer through convolution operation. And the third candidate feature map with the further increased size can be further input into the next layer of convolution layer for convolution operation until the size of the third candidate feature map reaches a fourth preset size threshold after multiple rounds of iterative convolution, and the third candidate feature map with the size reaching the fourth preset size threshold is determined as the target image.

The third candidate feature map includes: under the condition of the third candidate feature maps corresponding to the edge feature images and the third candidate feature maps corresponding to the texture feature images, when the sizes of the two third candidate feature maps reach a fourth preset size threshold, the two third candidate feature maps with the sizes from the fourth preset size threshold can be merged, so that the target image is obtained.

The iteration times of the multi-round iterative convolution can be not less than 10 times, the larger the iteration times, the higher the precision of the obtained target image is, but the more the consumed time and the more the calculation resources are, therefore, the iteration times can be controlled according to the actual requirements so as to achieve the effect of the target requirements. In addition, the feature map output by the next convolutional layer has a multiple size compared with the feature map output by the previous convolutional layer.

In this step, before performing the fourth convolution processing on the third candidate feature map, batch normalization processing may be performed on the third candidate feature map so that the mean value of the third candidate feature map is 0 and the variance is 1. In addition, after the layer of convolutional layer performs the fourth convolution processing on the third candidate feature map, the convolved third candidate feature map may be activated through a preset activation function, so as to increase the number of activated neurons in the first sub-neural network.

Optionally, the feature image includes: the scheme of the sub-step 5041 to the sub-step 5043 may be specifically implemented by the following scheme of the sub-step a1 to the sub-step a 7.

Sub-step A1, converting the texture feature image into a first sub-feature map based on a fully connected layer of the second neural network.

Sub-step A2, converting the edge feature image into a second sub-feature map based on the fully connected layer of the second neural network.

In the case where the feature image includes a texture feature image and an edge feature image, the fully connected layer of the second neural network may convert the texture feature image into a first sub-feature map and convert the edge feature image into a second sub-feature map.

And a substep a3, performing a third convolution process on the first sub-feature map based on the first convolution layer in the second neural network to obtain a first sub-candidate feature map.

And a substep a4, performing a third convolution process on the second sub-feature map based on the first convolution layer in the second neural network, to obtain a second sub-candidate feature map.

Under the condition that the feature image comprises a texture feature image and an edge feature image, a first sub-feature map corresponding to the texture feature image and a second sub-feature map corresponding to the edge feature image are obtained.

After the first sub-feature map and the second sub-feature map are obtained, a first sub-candidate feature map with the size reduced from the first sub-feature map and a second sub-candidate feature map with the size reduced from the second sub-feature map are obtained through a third convolution operation. The first sub-candidate feature map comprises accurate features capable of reflecting texture feature images, and the second sub-candidate feature map comprises accurate features capable of reflecting edge feature images.

And a substep a5, merging the first sub-candidate feature map and the second sub-candidate feature map according to the fourth convolution operation based on the second convolution layer in the second neural network, so as to obtain the target image.

After the first sub-candidate feature map and the first sub-candidate feature map are obtained, a third sub-candidate feature map with the size of the first sub-candidate feature map increased and a fourth sub-candidate feature map with the size of the second sub-feature map increased are obtained through a fourth convolution operation, and the third sub-candidate feature map and the fourth sub-candidate feature map are combined to generate a target image, namely a second neural network generates a complete target image through the edge feature image and the texture feature image.

Referring to fig. 10, a block diagram of an embodiment of an image generating apparatus according to an embodiment of the present application is shown, where the apparatus may specifically include:

a first obtaining module 601, configured to obtain noise data;

a first generation module 602, configured to obtain a feature image according to the noise data and the first data set; wherein the first set of data is used to generate the target feature image from the noise data;

optionally, the first data set includes: the image processing device comprises a first sub data set and a second sub data set, wherein the first sub data set is used for generating a candidate image according to the noise data, and the second sub data set is used for identifying the characteristic image from the candidate image;

the first generating module 602 includes:

a generation sub-module for generating the candidate image according to the noise data and the first sub-data set;

a determining sub-module for determining the feature image of the candidate image from the second sub-data set.

Optionally, the second sub data set includes: a first sample feature image; the determining submodule is further configured to determine, according to a second sub-data set, the candidate image, of which the similarity with the first sample feature image is greater than or equal to a preset similarity threshold, as the feature image.

Optionally, the first generating module 602 includes:

and the category first acquisition module is used for acquiring the characteristic image corresponding to the target category according to the noise data and the first data set.

A second generating module 603, configured to obtain a target image according to the feature image and the second data set; wherein the second data set is used to generate the target image from the feature image.

Optionally, the second generating module 603 includes:

and the category determining module is used for obtaining a target image corresponding to the target category according to the characteristic image and the second data set.

Optionally, the apparatus further comprises:

the first sample first acquisition module is used for acquiring a second sample characteristic image;

and the first training module is used for training the first data set according to the second sample characteristic image.

The first sample first acquisition module is used for acquiring a sample target image corresponding to the second sample characteristic image;

and the second training module is used for training the second data set according to the second sample characteristic image and the sample target image.

In summary, an image generating apparatus provided in an embodiment of the present application includes: acquiring noise data; obtaining a characteristic image according to the noise data and the first data set; wherein the first data set is used to generate a feature image from the noise data; obtaining a target image according to the characteristic image and the second data set; wherein the second data set is used to generate the target image from the feature image. The noise data can be directly generated into the characteristic image through the first data set, and the picture composition of the characteristic image is relatively simple, so that the image details in the characteristic image are greatly reduced compared with the final target image, the process of generating the characteristic image by the first data set is simpler, the difficulty of optimizing the related parameters in the first data set is also reduced, the probability of generating the characteristic image with a relatively stable picture by the first data set is greatly improved, further, the embodiment of the application directly generates the characteristic image into the complete target image by utilizing the second data set, and the picture of the characteristic image is relatively stable, so that the construction process is relatively simple, the probability of color distortion in the target image is reduced when the relatively stable characteristic image is input into the other second data set for constructing the target image, the picture quality of the target image is improved.

Referring to fig. 11, a block diagram of an embodiment of an image generating apparatus according to an embodiment of the present application is shown, where the apparatus may specifically include:

a second obtaining module 701, configured to obtain noise data;

a second generating module 702, configured to generate a feature image based on a first neural network according to the noise data, where the feature image includes a texture feature image or an edge feature image;

optionally, the first neural network includes: a first sub-neural network and a second sub-neural network; a second generating module 702, comprising:

a first sub-neural network sub-module for generating the candidate image based on the noise data and the first sub-neural network;

optionally, the first sub-neural network sub-module is further configured to convert the noise data into a first feature map based on a fully connected layer of the first sub-neural network;

and performing first convolution processing on the first feature map based on the convolution layer of the first sub-neural network to obtain the candidate image.

A second sub-neural network sub-module for determining the feature images in the candidate images according to the second sub-neural network.

Optionally, the second sub-neural network includes: a sample feature image; the second sub-neural network sub-module is further used for converting the candidate image into a second feature map based on a full connection layer of the second sub-neural network;

performing second convolution processing on the second feature map based on the convolution layer of the second sub-neural network to obtain a second candidate feature map;

determining a similarity between the second candidate feature map and the sample feature image;

and determining the candidate image corresponding to the second candidate feature image with the similarity between the candidate image and the sample feature image being larger than or equal to a preset similarity threshold as the feature image.

A second generating module 703, configured to generate a target image based on the second neural network according to the feature image.

Optionally, the second generating module 703 is further configured to convert the feature image into a third feature map based on a full connection layer of the second neural network;

performing third convolution processing on the third feature map based on the first convolution layer in the second neural network to obtain a third candidate feature map;

and performing fourth convolution processing on the third candidate feature map based on a second convolution layer in the second neural network to obtain a target image.

Optionally, the feature image includes: the second generating module 703 is further configured to convert the texture feature image into a first sub-feature map based on a fully connected layer of the second neural network;

converting the edge feature image into a second sub-feature map based on a fully connected layer of the second neural network;

performing third convolution processing on the first sub-feature map based on the first convolution layer in the second neural network to obtain a first sub-candidate feature map;

performing third convolution processing on the second sub-feature map based on the first convolution layer in the second neural network to obtain a second sub-candidate feature map;

and merging the first sub-candidate feature map and the second sub-candidate feature map according to the fourth convolution operation based on a second convolution layer in the second neural network to obtain the target image.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Embodiments of the disclosure may be implemented as a system using any suitable hardware, firmware, software, or any combination thereof, in a desired configuration. Fig. 12 schematically illustrates an exemplary system (or apparatus) 1600 that can be used to implement various embodiments described in this disclosure.

For one embodiment, fig. 12 illustrates an exemplary system 1600 having one or more processors 1602, a system control module (chipset) 1604 coupled to at least one of the processor(s) 1602, a system memory 1606 coupled to the system control module 1604, a non-volatile memory (NVM)/storage 1608 coupled to the system control module 1604, one or more input/output devices 1610 coupled to the system control module 1604, and a network interface 1612 coupled to the system control module 1606.

The processor 1602 may include one or more single-core or multi-core processors, and the processor 1602 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the system 1600 can function as a browser as described in embodiments herein.

In some embodiments, system 1600 may include one or more computer-readable media (e.g., system memory 1606 or NVM/storage 1608) having instructions and one or more processors 1602, which in conjunction with the one or more computer-readable media, are configured to execute the instructions to implement modules to perform the actions described in this disclosure.

For one embodiment, the system control module 1604 may include any suitable interface controllers to provide any suitable interface to at least one of the processor(s) 1602 and/or any suitable device or component in communication with the system control module 1604.

The system control module 1604 may include a memory controller module to provide an interface to the system memory 1606. The memory controller module may be a hardware module, a software module, and/or a firmware module.

System memory 1606 may be used, for example, to load and store data and/or instructions for system 1600. For one embodiment, system memory 1606 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, system memory 1606 may include double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).

For one embodiment, the system control module 1604 may include one or more input/output controllers to provide an interface to the NVM/storage 1608 and input/output device(s) 1610.

For example, NVM/storage 1608 may be used to store data and/or instructions. The NVM/storage 1608 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 1608 may include storage resources that are physically part of the device on which system 1600 is installed or may be accessed by the device and not necessarily part of the device. For example, the NVM/storage 1608 may be accessed over a network via the input/output device(s) 1610.

Input/output device(s) 1610 can provide an interface for system 1600 to communicate with any other suitable devices, input/output devices 1610 can include communication components, audio components, sensor components, and the like. Network interface 1612 can provide an interface for system 1600 to communicate over one or more networks, and system 1600 can wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as access to a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.

For one embodiment, at least one of the processor(s) 1602 may be packaged together with logic for one or more controllers (e.g., memory controller modules) of the system control module 1604. For one embodiment, at least one of the processor(s) 1602 may be packaged together with logic for one or more controllers of the system control module 1604 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1602 may be integrated on the same die with logic for one or more controllers of system control module 1604. For one embodiment, at least one of the processor(s) 1602 may be integrated on the same die with logic for one or more controllers of system control module 1604 to form a system on a chip (SoC).

In various embodiments, system 1600 may be, but is not limited to being: a browser, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.). In various embodiments, system 1600 may have more or fewer components and/or different architectures. For example, in some embodiments, system 1600 includes one or more cameras, keyboards, Liquid Crystal Display (LCD) screens (including touch screen displays), non-volatile memory ports, multiple antennas, graphics chips, Application Specific Integrated Circuits (ASICs), and speakers.

Wherein, if the display includes a touch panel, the display screen may be implemented as a touch screen display to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The present application further provides a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a terminal device, the one or more modules may cause the terminal device to execute instructions (instructions) of method steps in the present application.

In one example, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method according to the embodiments of the present application when executing the computer program.

There is also provided in one example a computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements a method as one or more of the embodiments of the application.

An embodiment of the application discloses an image generation method and an image generation device, and example 1 includes an image generation method, including:

acquiring noise data;

Example 2 includes the method of example 1, the first set of data comprising: the image processing device comprises a first sub data set and a second sub data set, wherein the first sub data set is used for generating a candidate image according to the noise data, and the second sub data set is used for identifying the characteristic image from the candidate image;

said obtaining a feature image from said noise data and a first data set, comprising:

generating the candidate image according to the noise data and the first sub data set;

determining the feature image in the candidate image through the second sub data set.

Example 3 includes the method of example 2, the second set of sub-data comprising: a first sample feature image;

the determining the feature image of the candidate image by the second sub data set includes:

and according to the second sub data set, determining the candidate image with the similarity greater than or equal to a preset similarity threshold value with the first sample characteristic image as the characteristic image.

Example 4 includes the method of example 1, further comprising:

acquiring a second sample characteristic image;

training the first data set according to the second sample feature image.

Example 5 includes the method of example 1, further comprising:

acquiring a sample target image corresponding to the second sample characteristic image;

training the second data set according to the second sample feature image and the sample target image.

Example 6 includes the method of example 1, the obtaining a feature image from the noise data and a first data set, comprising:

obtaining a characteristic image corresponding to a target category according to the noise data and the first data set;

the obtaining a target image according to the feature image and the second data set comprises:

and obtaining a target image corresponding to the target category according to the characteristic image and the second data set.

Example 7 includes the method of any one of examples 1-6, wherein the feature image includes: at least one of texture feature images and edge feature images.

Example 8, an image generation method, comprising:

acquiring noise data;

Example 9, the method of example 8, the first neural network comprising: a first sub-neural network and a second sub-neural network;

generating a feature image based on a first neural network from the noise data, comprising:

generating the candidate image according to the noise data and the first sub-neural network;

determining the feature images in the candidate images according to the second sub-neural network.

Example 10, the method of example 9, the generating the candidate image from the noise data and the first sub-neural network, comprising:

converting the noise data into a first feature map based on a fully-connected layer of the first sub-neural network;

Example 11, the method of example 9, the second sub-neural network comprising: a sample feature image;

the determining the feature image in the candidate image according to the second sub-neural network includes:

converting the candidate image into a second feature map based on a fully connected layer of the second sub-neural network;

Example 12, the method of example 8, the generating, from the feature image, a target image based on a second neural network, comprising:

converting the feature image into a third feature map based on a fully connected layer of the second neural network;

Example 13, the method of example 12, the feature image comprising: a texture feature image and an edge feature image, the converting the feature image into a third feature map based on a fully connected layer of the second neural network, comprising:

converting the texture feature image into a first sub-feature map based on a fully connected layer of the second neural network;

performing, by the first convolution layer in the second neural network, third convolution processing on the third feature map to obtain a third candidate feature map, including:

performing, by the second convolution layer in the second neural network, fourth convolution processing on the third candidate feature map to obtain a target image, including:

Example 14, an image generation apparatus, the apparatus comprising:

a first acquisition module for acquiring noise data;

Example 15, an image generation apparatus, the apparatus comprising:

a second acquisition module for acquiring noise data;

Example 16, a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a method as in one or more of examples 1-7 when executing the computer program.

Example 17, a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as in one or more of examples 1-7.

Example 18, a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method as in one or more of examples 8-13 when executing the computer program.

Example 19, a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as in one or more of examples 8-13.

Although certain examples have been illustrated and described for purposes of description, a wide variety of alternate and/or equivalent implementations or calculations may be made to achieve the same objectives without departing from the scope of practice of the present application. This application is intended to cover any adaptations or variations of the embodiments discussed herein. Therefore, it is manifestly intended that the embodiments described herein be limited only by the claims and the equivalents thereof.

Claims

1. An image generation method, comprising:

acquiring noise data;

2. The method of claim 1, wherein the first set of data comprises: the image processing device comprises a first sub data set and a second sub data set, wherein the first sub data set is used for generating a candidate image according to the noise data, and the second sub data set is used for identifying the characteristic image from the candidate image;

3. The method of claim 2, wherein the second set of sub-data comprises: a first sample feature image;

4. The method of claim 1, further comprising:

acquiring a second sample characteristic image;

training the first data set according to the second sample feature image.

5. The method of claim 4, further comprising:

6. The method of claim 1, wherein obtaining a feature image from the noise data and the first data set comprises:

7. The method of any one of claims 1-6, wherein the feature image comprises: at least one of texture feature images and edge feature images.

8. An image generation method, comprising:

acquiring noise data;

9. The method of claim 8, wherein the first neural network comprises: a first sub-neural network and a second sub-neural network;

10. The method of claim 9, wherein generating the candidate image from the noise data and the first sub-neural network comprises:

11. The method of claim 9, wherein the second sub-neural network comprises: a sample feature image;

12. The method of claim 8, wherein generating the target image based on a second neural network from the feature image comprises:

13. The method of claim 12, wherein the feature image comprises: a texture feature image and an edge feature image, the converting the feature image into a third feature map based on a fully connected layer of the second neural network, comprising:

14. An image generation apparatus, characterized in that the apparatus comprises:

a first acquisition module for acquiring noise data;

15. An image generation apparatus, characterized in that the apparatus comprises:

a second acquisition module for acquiring noise data;

16. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to one or more of claims 1-7 when executing the computer program.

17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to one or more of claims 1-7.

18. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to one or more of claims 8-13 when executing the computer program.

19. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to one or more of claims 8-13.