CN116433920A - Image generation method and device based on depth feature guidance and storage medium - Google Patents

Image generation method and device based on depth feature guidance and storage medium Download PDF

Info

Publication number
CN116433920A
CN116433920A CN202310182359.4A CN202310182359A CN116433920A CN 116433920 A CN116433920 A CN 116433920A CN 202310182359 A CN202310182359 A CN 202310182359A CN 116433920 A CN116433920 A CN 116433920A
Authority
CN
China
Prior art keywords
image
aesthetic
real
network
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310182359.4A
Other languages
Chinese (zh)
Inventor
郑义杰
邱国平
钟志鹏
周泽宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Peng Cheng Laboratory
Original Assignee
Shenzhen University
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University, Peng Cheng Laboratory filed Critical Shenzhen University
Priority to CN202310182359.4A priority Critical patent/CN116433920A/en
Publication of CN116433920A publication Critical patent/CN116433920A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses an image generation method, device and storage medium based on depth feature guidance, wherein the method comprises the following steps: acquiring a real image, inputting the acquired real image into a convolutional neural network, extracting depth features with color gradient from the real image through the convolutional neural network, learning the depth features with color gradient, and training an aesthetic knowledge guiding network; forming a generated countermeasure network model, wherein the generated countermeasure network model comprises a generator and a discriminator which are connected; the generator generates an image with a color gradient between adjacent color blocks under the direction of the aesthetic knowledge guidance network. The image generation method based on depth feature guidance provided by the invention enables the adjacent color blocks to have intermediate color transition, so that an image with color gradient is generated.

Description

Image generation method and device based on depth feature guidance and storage medium
Technical Field
The invention relates to the technical field of image processing, in particular to an image generation method, device and storage medium based on depth feature guidance.
Background
With the popularization and development of various photographing devices, the image acquisition mode of people becomes simpler and simpler, and the image becomes an indispensable carrier for conveying information because the image can intuitively express the emotion of a user and the content is richer. However, "beauty" is a subjective experience, but how to quantify the aesthetic appearance of vision is a very challenging task for computers. For this task, computing aesthetics has been developed with the aim of enabling the computer to independently complete the processes of understanding, deriving, modeling, etc. the aesthetics, i.e. quantitative analytical computations, by simulating the human visual system and aesthetic thinking, by studying the specific computing methods of the aesthetics, and to provide an objectively viable aesthetic solution in specific practical applications.
In recent years, in the field of machine learning, the generation of models has become increasingly important and popular due to applicability in different fields. The model is generated, and the name implies that the model can be generated to result in accordance with our desired target data distribution by training data samples and building an appropriate model. There are many applications of generative models, such as image generative models, video generative models, music generative models, etc. For image generation models, early image generation models included gaussian mixture models (Gaussian Mixed Model, GMM), markov random fields (Markov Random Field, MRF), deep belief networks (Deep Belief Networks, DBNs), limited boltzmann machines (Restricted Boltzmann Machines, RBMs), classical image generation models developed in recent years based on deep neural networks, mainly PixelRNN and PixelCNN in autoregressive models, NICE, realncp and Glow in streaming models, variational Auto-Encoders (VAE), and generation of countermeasure networks (Generative Adversarial Networks, GAN) and diffusion models (diffusion models). The most prominent tool in current neural art generation models is the generation of an antagonism network model.
Compared with other generation models, the generation countermeasure network model has certain advantages and good theoretical support, but in the process of generating the image, the generation countermeasure network model only extracts a plurality of color blocks with different colors in the real image to splice, if the generated image is a landscape image, the factors of the object under the irradiation of light rays need to be considered, the colors presented are different according to the distance between the object and the light source, namely, the color presented by the area of the object close to the light source is darker, and the color of the area of the object far away from the light source is gradually lighter. Even in shadows, there is a transition from light to dark. Thus, the prior art image lacks intermediate color transitions between adjacent color blocks, i.e., the image lacks the color gradient effect.
In view of this, the prior art is still to be improved and developed.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention aims to provide an image generating method, device and storage medium based on depth feature guidance, which aims to solve the technical problem that the prior art lacks intermediate color transition between adjacent color blocks of an image.
The technical scheme adopted for solving the technical problems is as follows:
in a first aspect, the present invention provides a depth feature guidance-based image generation method, including:
acquiring a real image, inputting the acquired real image into a convolutional neural network, extracting depth features with color gradient from the real image through the convolutional neural network, learning the depth features with color gradient, and training an aesthetic knowledge guiding network;
forming a generated countermeasure network model, wherein the generated countermeasure network model comprises a generator and a discriminator which are connected;
the generator generates an image with a color gradient between adjacent color blocks under the direction of the aesthetic knowledge guidance network.
In one implementation, the acquiring a real image, inputting the acquired real image into a convolutional neural network, extracting depth features with color gradient in the real image through the convolutional neural network, and learning the depth features with color gradient to train out an aesthetic knowledge guiding network, which includes:
setting aesthetic scores according to texture features, color features and brightness features of the real images by inputting the real images into an image database, and setting a plurality of different aesthetic tags by the aesthetic scores;
the real images in the image database are classified according to the plurality of different aesthetic tags to obtain an image dataset corresponding to the aesthetic tags.
In one implementation, the acquiring the real image, inputting the acquired real image into a convolutional neural network, extracting depth features with color gradient in the real image through the convolutional neural network, and learning the depth features with color gradient to train out an aesthetic knowledge guiding network, including:
the convolutional neural network learns the depth characteristic with the color gradient through a regression supervised learning loss function, and the function expression of the regression supervised learning loss function is as follows:
Figure BDA0004102709580000031
wherein N is the number of samples, y i For the prediction result of the i-th sample model,
Figure BDA0004102709580000032
is the actual aesthetic label of the ith sample model.
In one implementation, the forming generates an countermeasure network model, the generating the countermeasure network model including a generator and a arbiter connected, comprising:
the generator receives the input sound information and generates an image matched with the sound information;
inputting the image matched with the sound information and the real training data set into the discriminator for comparison;
and eliminating the image of which the distribution of the generated data is inconsistent with that of the real training data, and outputting the image of which the distribution of the generated data is consistent with that of the real training data.
In one implementation, the generator receives input sound information and generates an image that matches the sound information; comprising the following steps:
the generator generates the image matched with the sound information through a generator loss function, and the function expression of the generator loss function is as follows:
Figure BDA0004102709580000041
wherein G (z) is a generated image,
Figure BDA0004102709580000042
is->
Figure BDA0004102709580000043
Mathematical expectation, +.A generated image sample taken from G (z)>
Figure BDA0004102709580000044
Is->
Figure BDA0004102709580000045
One sheet taken out of G (z)Generating an image sample->
Figure BDA0004102709580000046
The probability prediction value of the generated image belonging to the real image is used for the discriminator.
In one implementation, the inputting the image matched with the sound information and the real training data set into the discriminator for comparison includes:
the arbiter compares the image matched with the sound information with the real training data set through a arbiter loss function, and the function expression of the arbiter loss function is as follows:
Figure BDA0004102709580000047
wherein I is real In order to input an image of the subject,
Figure BDA0004102709580000048
for x is from I real The mathematical expectation of one input image sample taken from (x-I) real For x is from I real D (x) is the probability prediction value of the discriminator for the input image belonging to the real image.
In one implementation, the generator, under the direction of the aesthetic knowledge guidance network, generates an image with a color gradient between adjacent color patches, comprising:
the generator is used for guiding the network loss function to generate an image with color gradient between adjacent color blocks through aesthetic knowledge, and the aesthetic knowledge is used for guiding the function expression of the network loss function to be:
L Aes =λ 3 L s4 L f ,
wherein lambda is 3 And lambda (lambda) 4 Is a constant term, L s As a fractional level loss function, L f Is a feature level loss function.
In one implementation, the generator directs the network loss function through aesthetic knowledge to generate an image having a color gradient between adjacent color patches, comprising:
setting a first threshold value and a second threshold value in the aesthetic score, wherein the first threshold value is larger than the second threshold value, classifying aesthetic labels according to the first threshold value and the second threshold value, and outputting an image data set with the aesthetic labels larger than the first threshold value and an image data set with the aesthetic labels smaller than the second threshold value in the image database;
setting a fractional level loss function, wherein the functional expression of the fractional level loss function is as follows:
Figure BDA0004102709580000051
where batch is the number of samples of a single load image, G (z i ) For each random variable, a generated image is obtained by a generator, score (G (z i ) Representing a network regression prediction score obtained by the generated image through the aesthetic knowledge guidance network;
setting a feature level loss function, wherein the function expression of the feature level loss function is as follows:
Figure BDA0004102709580000052
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004102709580000053
for the features obtained for images with aesthetic scores greater than a first threshold value, +.>
Figure BDA0004102709580000054
Features resulting for images having an aesthetic score less than a second threshold.
In a second aspect, the present invention provides an image generation apparatus based on depth feature guidance, including: a memory and a processor; the memory stores a depth feature guidance based image generation program which, when executed by the processor, is operable to implement the operations of the depth feature guidance based image generation method as described above.
In a third aspect, the present invention provides a storage medium, which is a computer-readable storage medium storing a depth feature guidance based image generation program which, when executed by a processor, is adapted to carry out the operations of the depth feature guidance based image generation method as described above.
Compared with the prior art, the invention has the beneficial effects that:
according to the image generation method based on depth feature guidance, the depth features with color gradient in the real image are learned through the convolutional neural network, the aesthetic knowledge guidance network is trained, the generator generates the image with color gradient between adjacent color blocks under the guidance of the aesthetic knowledge guidance network, and compared with the real image, the generated image is high in fidelity, the technical problem that the generated image lacks intermediate color transition between the adjacent color blocks in the prior art is solved, and the generation quality of the image is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an image generation method based on depth feature guidance provided by the invention;
FIG. 2 is a flow chart for classifying a real image dataset provided by the present invention;
FIG. 3 is a flow chart of image generation for generating an countermeasure network model provided by the invention;
FIG. 4 is an overall frame diagram of an image generation network provided by the present invention;
FIG. 5 is a training frame diagram of an aesthetic knowledge guidance network provided by the present invention;
FIG. 6 is a diagram of an image generation framework for generating an countermeasure network model provided by the invention;
FIG. 7 is a schematic block diagram of an aesthetic knowledge guidance network provided by the present invention;
fig. 8 is a functional schematic diagram of an image generating apparatus based on depth feature guidance provided by the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The present application provides an image generating method, device and storage medium based on depth feature guidance, and for making the purposes, technical solutions and effects of the present application clearer and more specific, the present application will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Exemplary method
In the prior art, there are many applications of generating models, such as image generating models, video generating models, music generating models, and the like. For image generation models, early image generation models included gaussian mixture models, markov random fields, deep belief networks, limited boltzmann machines. Classical image generation models developed in recent years based on deep neural networks mainly include the PixelRNN and PixelCNN in autoregressive models, NICE, realncp and Glow in stream models, variational self-encoders, and generation of countermeasure networks and diffusion models. Currently, the most prominent tool in neuro-artistic generative models is the generation of an antagonistic network model.
Compared with other generation models, the generation countermeasure network model has certain advantages and good theoretical support, but in the process of generating the image, the generation countermeasure network model only extracts a plurality of color blocks with different colors in the real image to splice, if the generated image is a landscape image, the factors of the object under the irradiation of light rays need to be considered, the colors presented are different according to the distance between the object and the light source, namely, the color presented by the area of the object close to the light source is darker, and the color of the area of the object far away from the light source is gradually lighter. Even in shadows, there is a transition from light to dark.
Although the generated image is fitted to the real data distribution as much as possible under the action of the discriminator in the prior art, the transition of intermediate colors between adjacent color blocks is lacking, that is, the generated image lacks the effect of color gradation, so the level of the generated image in the prior art is lower.
For the generation of the countermeasure network model in the prior art, intermediate color transition between adjacent color blocks of an image is not considered, so that the generated image lacks a color gradient effect, the embodiment provides an image generation method based on depth feature guidance, and an image generation framework (AesGAN) under aesthetic attribute knowledge guidance is provided, and the framework is composed of a generator, a discriminator and an aesthetic knowledge guidance network, and the aesthetic knowledge guidance network is used for guiding the generator to generate the image with color gradient between the adjacent color blocks.
As shown in fig. 1, an embodiment of the present invention provides an image generating method based on depth feature guidance, where the image generating method based on depth feature guidance includes the following steps:
step S100: the method comprises the steps of obtaining a real image, inputting the obtained real image into a convolutional neural network, extracting depth features with color gradient from the real image through the convolutional neural network, learning the depth features with color gradient, and training an aesthetic sense knowledge guidance network.
It should be noted that in the convolutional neural network, all the images are regarded as a matrix, and a plurality of arranged pixels are disposed in the matrix, and these pixels constitute an image that can be seen by human vision.
Extracting depth characteristics with color gradient from a real image, namely digitizing the real image to form a matrix, wherein each pixel in the matrix has corresponding numerical values for expressing colors, different numerical values reflect the depths of different colors, and extracting corresponding numerical values represented by a plurality of pixels showing the color gradient.
To better understand the depth of an image, it is possible to exemplify, for example: if the image is a black-and-white image, the image is called a gray image, the shade of gray color is reflected by different values, namely 0-255, the smaller the value is, the darker the represented color is, and the larger the value is, the lighter the represented color is; if the image is a color image, it includes three primary colors of RGB, i.e., R has a value, G has a value, and B has a value, and the value expressed by R, G, B reflects the depth of the image. Specifically, the real image used in this embodiment is a color image that has not been subjected to a color removal process.
Before step S100 is implemented, the real images in the image database need to be classified, as shown in fig. 2, fig. 2 is a flowchart for classifying the real image data set, specifically including the following steps:
step S001: by inputting the real image in the image database, aesthetic scores are set according to the texture features, color features, and brightness features of the real image, and a plurality of different aesthetic tags are set by the aesthetic scores.
Specifically, the real image is aesthetically scored by texture features, color features, and brightness features, such as:
1) If the texture of the real image is clear and complete, the score obtained by the texture features is higher; if the texture of the real image is blurred and incomplete, the score obtained by the texture features is lower.
2) If the color of the real image is gorgeous, the score obtained by the color characteristics is higher; if the color of the real image is dull, the score obtained by the color feature is lower.
3) If the brightness of the real image is higher, the score obtained by the brightness characteristic is higher; if the brightness of the real image is low, the score obtained by the brightness feature is low.
And comprehensively evaluating the aesthetic score of the real image through the texture feature, the color feature and the brightness feature.
Of course, it should be understood that the evaluation items for aesthetic scores of real images may incorporate other evaluation features in addition to the texture features, the color features, and the brightness features, and that this embodiment is presented as only one implementation and should not be construed as limiting the embodiment.
Step S002: the real images in the image database are classified according to the plurality of different aesthetic tags to obtain an image dataset corresponding to the aesthetic tags.
The total score of the aesthetic scores was determined, and the total score of the aesthetic scores used in this example was 10. Setting a plurality of thresholds for classifying aesthetic tags within a total score range of the aesthetic scores, i.e. a first threshold y 1 Second threshold y 2 Third threshold y 3 Referring specifically to fig. 4, fig. 4 is an overall frame diagram of the image generation network. Wherein the first threshold y 1 Is 5, said second threshold y 2 Is 3, said third threshold y 3 The aesthetic score of (2) is 4, and the statistical aesthetic tag is greater than the first threshold y 1 The dataset and aesthetic tag being less than a second threshold y 2 And input into an aesthetic knowledge guidance network, the aesthetic tag being equal to a third threshold y 3 Is not entered into the aesthetic knowledge guided network.
In step S100, the convolutional neural network learns the depth feature with color gradation through a regression supervised learning loss function, where a functional expression of the regression supervised learning loss function is:
Figure BDA0004102709580000101
wherein N is the number of samples, y i For the prediction result of the i-th sample model,
Figure BDA0004102709580000102
is the actual aesthetic label of the ith sample model.
The convolutional neural network learns the depth characteristics with color gradient in the real image through the regression supervision learning loss function, so that the convolutional neural network can guide the generation direction of the image by the generator.
Step S200: a generated countermeasure network model is formed, the generated countermeasure network model including a generator and a arbiter that are connected.
The generation of the countermeasure network model is composed of a generator and a discriminator. The goal of the generator is to try to learn to generate new data like a sample of real data in an attempt to learn that the sample distribution of the generated data is consistent with the sample distribution of the real data. The discriminator is a classifier, and the objective of the discriminator is to judge the true image as a true sample as accurately as possible, and to judge the false image generated by the generator as a false sample. The generation of the countermeasure network model requires simultaneous training of the generator and the arbiter.
Specifically, as shown in fig. 3, fig. 3 is a flowchart for generating an image of an impedance network model; when the input signal is sound information, the sound information is specifically random sound, and the generator and the discriminator will perform the following steps:
step S201: the generator receives input sound information and generates an image matching the sound information.
Specifically, the generator generates the image matched with the sound information through a generator loss function, and the function expression of the generator loss function is as follows:
Figure BDA0004102709580000111
wherein G (z) is a generated image,
Figure BDA0004102709580000112
is->
Figure BDA0004102709580000113
Mathematical expectation, +.A generated image sample taken from G (z)>
Figure BDA0004102709580000114
Is->
Figure BDA0004102709580000115
One generated image sample taken from G (z), +.>
Figure BDA0004102709580000116
The probability prediction value of the generated image belonging to the real image is used for the discriminator.
Step S202: and inputting the image matched with the sound information and the real training data set into the discriminator for comparison.
Specifically, the arbiter compares the image matched with the sound information with the real training data set through a arbiter loss function, and the function expression of the arbiter loss function is as follows:
Figure BDA0004102709580000117
wherein I is real In order to input an image of the subject,
Figure BDA0004102709580000118
for x is from I real The mathematical expectation of one input image sample taken from (x-I) real For x is from I real D (x) is the probability prediction value of the discriminator for the input image belonging to the real image.
Step S203: and eliminating the image of which the distribution of the generated data is inconsistent with that of the real training data, and outputting the image of which the distribution of the generated data is consistent with that of the real training data.
By inputting the sound information to the generator, the generator outputs the generated image matched with the sound information, the generated image matched with the sound information and the training data set are input into the discriminator, and the discriminator loss function is counterloss, so that the data distribution of the generated image matched with the sound information is consistent with the distribution of real training data.
Step S300: the generator generates an image with a color gradient between adjacent color blocks under the direction of the aesthetic knowledge guidance network.
Specifically, step S300 combines step S100 and step S200, and the guidance modes of the aesthetic knowledge guidance network for the generator include aesthetic score guidance, feature guidance and score feature joint guidance, and the corresponding guidance modes can be specifically selected according to actual situations. The generated image has high fidelity, and the adjacent color blocks have color gradient effect, so that the generation level of the image is improved.
As an implementation manner of this embodiment, the generator generates an image with color gradation between adjacent color blocks through an aesthetic knowledge guiding network loss function, where a functional expression of the aesthetic knowledge guiding network loss function is:
L Aes =λ 3 L s4 L f ,
wherein lambda is 3 And lambda (lambda) 4 Is a constant term, L s As a fractional level loss function, L f Is a feature level loss function. By calculation, lambda 3 And lambda (lambda) 4 Can take a constant value, i.e. lambda 3 =1;λ 4 =1。
From the above, the aesthetic knowledge directs the network loss function to be the sum of the fractional level loss function and the feature level loss function.
In the aesthetic knowledge guidance network, a first threshold y in the aesthetic score is set 1 And a second threshold y 2 Wherein the first threshold y 1 Is 5, said second threshold y 2 Is 3, said first threshold y 1 Greater than the second threshold y 2 And according to the first threshold y 1 And the second threshold y 2 Classifying the aesthetic labels, outputting that the aesthetic labels in the image database are larger than a first threshold y 1 The image dataset and aesthetic tag being less than a second threshold y 2 Is described.
Setting a fractional level loss function, wherein the functional expression of the fractional level loss function is as follows:
Figure BDA0004102709580000121
where batch is the number of samples of a single load image, G (z i ) For each random variable, a generated image is obtained by a generator, score (G (z i ) A network regression prediction score derived from the generated image through the aesthetic knowledge-guided network.
In order to generate the score L of the image s As high a score approach to aesthetic score as possible, thus minimizing1-Score (G (zi)) is formulated so that a Score L obtained by generating an image s Take the maximum value, close to 1.
Setting a feature level loss function, wherein the function expression of the feature level loss function is as follows:
Figure BDA0004102709580000122
namely:
Figure BDA0004102709580000123
determining images with aesthetic scores greater than 5 by calculating the modular length of the vector
Figure BDA0004102709580000124
And an image with an aesthetic score of less than 3 +.>
Figure BDA0004102709580000131
Distance in the aesthetic feature space.
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004102709580000132
for aesthetic scores greater than a first threshold y 1 Is used to determine the characteristics of the image obtained by the image,
Figure BDA0004102709580000133
for aesthetic scores less than a second threshold y 2 Features obtained from the image of (a).
In the aesthetic feature space, inputting to the aesthetic knowledge guiding network an image G (z) with an aesthetic score greater than 5 points
Figure BDA0004102709580000134
And an image with an aesthetic score of less than 3 +.>
Figure BDA0004102709580000135
The generator is guided by an aesthetic knowledge guidance networkThe depth features corresponding to the generated image are approximated toward image features having a high aesthetic score and away from image features having a low aesthetic score.
The aesthetic knowledge guidance network loss function comprises comparison between generated images and aesthetic score of different grades and comparison between features, the sum of the score grade loss function and the feature grade loss function is obtained, under the combined action of the discriminator and the aesthetic knowledge guidance network, the generated images of the generator are high in fidelity and have color gradient effect between adjacent color blocks, the technical problem that the generated images lack intermediate color transition between the adjacent color blocks in the prior art is solved, and the generation level of the images is improved.
The aesthetic sense knowledge guides the network loss function to combine with the generator loss function to obtain a total generator loss function, and the function expression of the total generator loss function is as follows:
L=λ 1 L G2 L Aes
wherein lambda is 1 And lambda (lambda) 2 Is a constant term, lambda is calculated 1 And lambda (lambda) 2 Can take a constant value, i.e. lambda 1 =1;λ 2 =0.01。
From the above, it can be derived that the total generator loss function is the sum of the coefficients of the generator loss function and the aesthetic knowledge guiding network loss function, in other words, the generator generates an image with a color gradient effect between adjacent color blocks under the guidance of the aesthetic knowledge guiding network.
The image generating method based on depth feature guidance according to the present embodiment is summarized below, and specifically includes the following steps:
as shown in fig. 5, fig. 5 is a training frame diagram of an aesthetic knowledge guidance network. And inputting a real image into the convolutional neural network, extracting depth characteristics with color gradient in the real image, and learning the depth characteristics with color gradient to train an aesthetic knowledge guiding network.
As shown in fig. 6, fig. 6 generates a frame map for an image of an impedance network model. The generator receives an input signal and generates an image that matches the input signal. The discriminator compares the image matched with the input signal with the real image, and if the training data distribution of the image matched with the input signal and the training data distribution of the real image are consistent, the image is judged to be the real image; if there is a difference in the training data distribution between the image matching the input signal and the real image, it is determined as a false image.
As shown in fig. 7, fig. 7 is a schematic block diagram of an aesthetic knowledge guidance network. The input image comprises aesthetic scores greater than a first threshold y 1 Image of (c), generated image, and aesthetic score less than a second threshold y 2 Calculating an aesthetic score greater than a first threshold y from the feature level loss function 1 The image and aesthetic score being less than the second threshold y 2 Such that the generator, under the direction of the aesthetic knowledge guidance network, generates an image with high aesthetic score, high fidelity and color grading effect between adjacent color patches.
Exemplary apparatus
Based on the above embodiment, the present invention further provides an image generating device based on depth feature guidance, and a functional schematic diagram thereof may be specifically shown in fig. 8.
The image generation device based on depth feature guidance comprises: the system comprises a processor, a memory, an interface, a display screen and a communication module which are connected through a system bus; wherein the processor of the depth feature guidance based image generation device is configured to provide computing and control capabilities; the memory of the image generating device based on depth feature guidance comprises a storage medium and an internal memory; the storage medium stores an operating system and a computer program; the internal memory provides an environment for the operation of the operating system and the computer program in the storage medium; the interface is used for connecting external equipment, such as mobile terminals, computers and other equipment; the display screen is used for displaying corresponding image generation information guided based on depth characteristics; the communication module is used for communicating the aesthetic knowledge guiding network and the generating countermeasure network.
The computer program is for implementing a depth feature guided based image generation method when executed by a processor.
It will be appreciated by those skilled in the art that the functional schematic shown in fig. 8 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the depth feature based image generating apparatus to which the present inventive arrangements are applied, and in particular, the depth feature based image generating apparatus may include more or less components than those shown, or may combine some components, or may have a different arrangement of components.
In one embodiment, there is provided an image generation apparatus based on depth feature guidance, including: a memory and a processor; the memory stores a depth feature guidance based image generation program which, when executed by the processor, is operable to implement the operations of the depth feature guidance based image generation method as described above.
In one embodiment, a storage medium is provided, the storage medium being a computer readable storage medium storing a depth feature guidance based image generation program which, when executed by a processor, is operable to implement the depth feature guidance based image generation method as described above.
Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program comprising instructions for the relevant hardware, the computer program being stored on a non-volatile storage medium, the computer program when executed comprising the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory.
The invention discloses an image generation method, device and storage medium based on depth feature guidance, wherein the method comprises the following steps: acquiring a real image, inputting the acquired real image into a convolutional neural network, extracting depth features with color gradient from the real image through the convolutional neural network, learning the depth features with color gradient, and training an aesthetic knowledge guiding network; forming a generated countermeasure network model, wherein the generated countermeasure network model comprises a generator and a discriminator which are connected; the generator generates an image with a color gradient between adjacent color blocks under the direction of the aesthetic knowledge guidance network. The image generation method based on depth feature guidance provided by the invention enables the adjacent color blocks to have intermediate color transition, so that an image with color gradient is generated.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. An image generation method based on depth feature guidance, which is characterized by comprising the following steps:
acquiring a real image, inputting the acquired real image into a convolutional neural network, extracting depth features with color gradient from the real image through the convolutional neural network, learning the depth features with color gradient, and training an aesthetic knowledge guiding network;
forming a generated countermeasure network model, wherein the generated countermeasure network model comprises a generator and a discriminator which are connected;
the generator generates an image with a color gradient between adjacent color blocks under the direction of the aesthetic knowledge guidance network.
2. The image generating method based on depth feature guidance according to claim 1, wherein the steps of acquiring a real image, inputting the acquired real image into a convolutional neural network, extracting depth features with color gradient from the real image through the convolutional neural network, learning the depth features with color gradient, and training out an aesthetic knowledge guidance network include:
setting aesthetic scores according to texture features, color features and brightness features of the real images by inputting the real images into an image database, and setting a plurality of different aesthetic tags by the aesthetic scores;
the real images in the image database are classified according to the plurality of different aesthetic tags to obtain an image dataset corresponding to the aesthetic tags.
3. The image generating method based on depth feature guidance according to claim 1, wherein the acquiring a real image, inputting the acquired real image into a convolutional neural network, extracting depth features with color gradation in the real image through the convolutional neural network, and learning the depth features with color gradation to train out an aesthetic knowledge guidance network, comprises:
the convolutional neural network learns the depth characteristic with the color gradient through a regression supervised learning loss function, and the function expression of the regression supervised learning loss function is as follows:
Figure FDA0004102709570000011
wherein N is the number of samples, y i For the prediction result of the i-th sample model,
Figure FDA0004102709570000029
is the actual aesthetic label of the ith sample model.
4. The depth feature based image generation method of claim 1, wherein the forming generates an countermeasure network model, the generating the countermeasure network model including a generator and a arbiter connected thereto, comprising:
the generator receives the input sound information and generates an image matched with the sound information;
inputting the image matched with the sound information and the real training data set into the discriminator for comparison;
and eliminating the image of which the distribution of the generated data is inconsistent with that of the real training data, and outputting the image of which the distribution of the generated data is consistent with that of the real training data.
5. The depth feature guidance-based image generating method according to claim 4, wherein the generator receives input sound information, and generates an image matching the sound information; comprising the following steps:
the generator generates the image matched with the sound information through a generator loss function, and the function expression of the generator loss function is as follows:
Figure FDA0004102709570000021
wherein G (z) is a generated image,
Figure FDA0004102709570000022
is->
Figure FDA0004102709570000023
A mathematical expectation of the generated image sample taken from G (z),
Figure FDA0004102709570000024
is->
Figure FDA0004102709570000025
One generated image sample taken from G (z), +.>
Figure FDA0004102709570000026
The probability prediction value of the generated image belonging to the real image is used for the discriminator.
6. The depth feature based image generation method according to claim 5, wherein the inputting the image matched with the sound information and the real training data set into the arbiter for comparison comprises:
the arbiter compares the image matched with the sound information with the real training data set through a arbiter loss function, and the function expression of the arbiter loss function is as follows:
Figure FDA0004102709570000027
wherein I is real In order to input an image of the subject,
Figure FDA0004102709570000028
for x is from I real The mathematical expectation of one input image sample taken from (x-I) real For x is from I real D (x) is the probability prediction value of the discriminator for the input image belonging to the real image.
7. The depth feature guidance-based image generation method of claim 2, wherein the generator generates the image with color gradation between adjacent color blocks under guidance of the aesthetic knowledge guidance network, comprising:
the generator is used for guiding the network loss function to generate an image with color gradient between adjacent color blocks through aesthetic knowledge, and the aesthetic knowledge is used for guiding the function expression of the network loss function to be:
L Aes3 L s + 4 L f ,
wherein, the liquid crystal display device comprises a liquid crystal display device,λ 3 and lambda (lambda) 4 Is a constant term, L s As a fractional level loss function, L f Is a feature level loss function.
8. The depth feature based image generation method of claim 7, wherein the generator directs the network loss function to generate an image with color gradation between adjacent color blocks through aesthetic knowledge, comprising:
setting a first threshold value and a second threshold value in the aesthetic score, wherein the first threshold value is larger than the second threshold value, classifying aesthetic labels according to the first threshold value and the second threshold value, and outputting an image data set with the aesthetic labels larger than the first threshold value and an image data set with the aesthetic labels smaller than the second threshold value in the image database;
setting a fractional level loss function, wherein the functional expression of the fractional level loss function is as follows:
Figure FDA0004102709570000031
where batch is the number of samples of a single load image, G (z i ) For each random variable, a generated image is obtained by a generator, score (G (z i ) Representing a network regression prediction score obtained by the generated image through the aesthetic knowledge guidance network;
setting a feature level loss function, wherein the function expression of the feature level loss function is as follows:
Figure FDA0004102709570000032
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004102709570000033
for the features obtained for images with aesthetic scores greater than a first threshold value, +.>
Figure FDA0004102709570000041
Features resulting for images having an aesthetic score less than a second threshold.
9. An image generation apparatus based on depth feature guidance, comprising: a memory and a processor; the memory stores a depth feature guide based image generation program which, when executed by a processor, is operable to implement the depth feature guide based image generation method of any one of claims 1-8.
10. A storage medium, characterized in that the storage medium is a computer-readable storage medium storing a depth feature based image generation program which, when executed by a processor, is adapted to carry out the operations of the depth feature based image generation method according to any one of claims 1-8.
CN202310182359.4A 2023-02-17 2023-02-17 Image generation method and device based on depth feature guidance and storage medium Pending CN116433920A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310182359.4A CN116433920A (en) 2023-02-17 2023-02-17 Image generation method and device based on depth feature guidance and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310182359.4A CN116433920A (en) 2023-02-17 2023-02-17 Image generation method and device based on depth feature guidance and storage medium

Publications (1)

Publication Number Publication Date
CN116433920A true CN116433920A (en) 2023-07-14

Family

ID=87083813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310182359.4A Pending CN116433920A (en) 2023-02-17 2023-02-17 Image generation method and device based on depth feature guidance and storage medium

Country Status (1)

Country Link
CN (1) CN116433920A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036862A (en) * 2023-08-21 2023-11-10 武汉纺织大学 Image generation method based on Gaussian mixture variation self-encoder

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036862A (en) * 2023-08-21 2023-11-10 武汉纺织大学 Image generation method based on Gaussian mixture variation self-encoder
CN117036862B (en) * 2023-08-21 2024-03-22 武汉纺织大学 Image generation method based on Gaussian mixture variation self-encoder

Similar Documents

Publication Publication Date Title
CN108090902B (en) Non-reference image quality objective evaluation method based on multi-scale generation countermeasure network
CN111754596B (en) Editing model generation method, device, equipment and medium for editing face image
US8692830B2 (en) Automatic avatar creation
US10600171B2 (en) Image-blending via alignment or photometric adjustments computed by a neural network
CN108898579A (en) A kind of image definition recognition methods, device and storage medium
EP3989104A1 (en) Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
WO2023151289A1 (en) Emotion identification method, training method, apparatus, device, storage medium and product
US11537277B2 (en) System and method for generating photorealistic synthetic images based on semantic information
TWI415011B (en) Facial identification method and system using thereof
CN114723643B (en) Low-light image enhancement method based on reinforcement learning and aesthetic evaluation
EP3772038A1 (en) Augmented reality display method of simulated lip makeup
JPWO2018203549A1 (en) Signal change device, method, and program
CN112200736B (en) Image processing method based on reinforcement learning and model training method and device
CN114021524B (en) Emotion recognition method, device, equipment and readable storage medium
CN109145871A (en) Psychology and behavior recognition methods, device and storage medium
CN110782448A (en) Rendered image evaluation method and device
CN116433920A (en) Image generation method and device based on depth feature guidance and storage medium
CN111598153B (en) Data clustering processing method and device, computer equipment and storage medium
JP2018163444A (en) Information processing apparatus, information processing method and program
CN112116589A (en) Method, device and equipment for evaluating virtual image and computer readable storage medium
US20240153271A1 (en) Method and apparatus for selecting cover of video, computer device, and storage medium
JP2009151350A (en) Image correction method and device
CN112767038B (en) Poster CTR prediction method and device based on aesthetic characteristics
CN112069916B (en) Face beauty prediction method, device and system and readable storage medium
CN111724054A (en) Evaluation method, device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination