CN112348739B - Image processing method, device, equipment and storage medium - Google Patents

Image processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN112348739B
CN112348739B CN202011364910.XA CN202011364910A CN112348739B CN 112348739 B CN112348739 B CN 112348739B CN 202011364910 A CN202011364910 A CN 202011364910A CN 112348739 B CN112348739 B CN 112348739B
Authority
CN
China
Prior art keywords
image
sample image
target
style
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011364910.XA
Other languages
Chinese (zh)
Other versions
CN112348739A (en
Inventor
陈树东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Boguan Information Technology Co Ltd
Original Assignee
Guangzhou Boguan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Boguan Information Technology Co Ltd filed Critical Guangzhou Boguan Information Technology Co Ltd
Priority to CN202011364910.XA priority Critical patent/CN112348739B/en
Publication of CN112348739A publication Critical patent/CN112348739A/en
Application granted granted Critical
Publication of CN112348739B publication Critical patent/CN112348739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image processing method, device, equipment and storage medium, which are used for extracting an image to be processed; inputting an image to be processed into a trained image style migration model, and performing image style migration processing based on a target style to obtain a target image; specifically, performing global maximum pool processing and global average pool processing on an image to be processed respectively to obtain a first image; performing convolution processing on the first image to obtain a second image; and coding the second image to obtain a target image. In the scheme, when the style migration model carries out the image style migration processing based on the target style, the global maximum pool processing and the global average pool processing are carried out on the image to be processed, so that the detail characteristics of the image can be fully extracted, the style migration effect is effectively improved, the target image with high quality is obtained, and the user experience is improved.

Description

Image processing method, device, equipment and storage medium
Technical Field
The present disclosure relates to Artificial Intelligence (AI) technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a storage medium.
Background
The image style migration technology is used for migrating information such as texture, color and shape in a scene I to a scene II with another style, so that the scene II can obtain new style characteristics such as color, texture and shape on the basis of keeping original content. With the development of Artificial Intelligence (AI) technology, style migration technology is widely applied in many aspects such as computer vision, etc., for example, landscape painting realizes specific artistic style, black and white photo colorization, various attribute changes of human face, etc.
The current image style migration technology, for example, the face animation stylization technology, mainly adopts GAN network technology to implement one-to-one style conversion. However, the above-mentioned techniques easily ignore the detail features during the migration process, so that the image after migration cannot clearly retain the detail features, such as the features of hair shape, and the like, thereby resulting in poor image style migration effect.
Disclosure of Invention
The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a storage medium, so that the image style migration effect is effectively improved.
In a first aspect, an embodiment of the present application provides an image processing method, including:
extracting an image to be processed;
inputting an image to be processed into a trained image style migration model, and performing image style migration processing based on a target style to obtain a target image, wherein the style of the target image is the target style;
inputting an image to be processed into a trained image style migration model, and performing image style migration processing based on a target style to obtain a target image, wherein the image style migration processing comprises the following steps:
respectively carrying out global maximum pool processing and global average pool processing on an image to be processed to obtain a first image;
performing convolution processing on the first image to obtain a second image;
and carrying out channel coding on the second image to obtain a target image.
Optionally, the first image includes: the image after global maximum pooling and the image after global average pooling;
the image style migration model comprises a global maximum pooling layer, a global average pooling layer, a convolution layer and an encoding layer;
the global maximum pooling device is used for performing global maximum pooling on the image to be processed to obtain an image subjected to global maximum pooling;
the global average pooling pool is used for performing global average pooling treatment on the image to be treated to obtain an image subjected to global average pooling treatment;
the convolution layer is used for performing convolution processing on the first image to obtain a second image;
and the coding layer is used for coding the second image to obtain the target image.
Optionally, the method further comprises:
acquiring a source sample image and a reference sample image, wherein the style of the reference sample image is a target style;
inputting the source sample image into a first generator, converting the style of the source sample image into a target style, and obtaining a target sample image;
respectively inputting the target sample image and the reference sample image into a first discriminator to obtain a first discrimination result;
adjusting training parameters in the first generator and the first discriminator according to the first discrimination result until the discrimination result in the first discriminator meets a preset condition, and stopping adjustment;
and obtaining a trained image style migration model according to the adjusted first generator.
Optionally, the first generator includes: the system comprises a first global maximum pooling layer, a first global average pooling layer, a first auxiliary classifier, a first convolution layer, a plurality of first fully-connected layers and a first decoder;
inputting the source sample image into a first generator, converting the style of the source sample image into a target style, and obtaining a target sample image, wherein the method comprises the following steps:
inputting the source sample image into a first global maximum pooling layer, and processing the source sample image based on a first weight vector to obtain a first feature image;
inputting the source sample image into a first global average pooling layer, processing the source sample image based on a second weight vector to obtain a second feature image,
inputting the first weight vector and the second weight vector into a first auxiliary classifier to obtain a first classification result;
inputting the first characteristic image and the second characteristic image into the first convolution layer to obtain a third characteristic image;
inputting the third characteristic image to a plurality of first full-connection layers to obtain a plurality of parameter vectors;
inputting the third characteristic image into a first decoder, and carrying out channel coding on the third characteristic image according to a plurality of parameter vectors to obtain a target sample image;
the first weight vector and the second weight vector are used for adjusting the ratio of each feature in the source sample image.
Optionally, adjusting training parameters in the first generator and the first discriminator according to the first discrimination result until the discrimination result in the first discriminator meets a preset condition, and stopping the adjustment, including:
adjusting training parameters in the first generator and the first discriminator according to the first discrimination result and the first classification result to obtain an adjusted first generator and an adjusted first discriminator;
converting the style of the source sample image into a target style according to the adjusted first generator to obtain a first target sample image;
inputting the first target sample image and the reference sample image into the adjusted first discriminator respectively, and outputting a second discrimination result, wherein the second discrimination result comprises: a first loss value corresponding to the first target sample image;
and when the first loss value is determined to be smaller than the preset value, determining that the judgment result in the first discriminator meets the preset condition, and stopping adjustment.
Optionally, the first discriminator includes: the second global maximum pooling layer, the second global average pooling layer, the second auxiliary classifier and the first classifier;
inputting the target sample image and the reference sample image into a first discriminator respectively to obtain a first discrimination result, wherein the method comprises the following steps:
respectively inputting the target sample image and the reference sample image into a second global maximum pooling layer, and processing the images based on a third weight vector to respectively obtain a fourth characteristic image and a fifth characteristic image;
respectively inputting the target sample image and the reference sample image into a second global average pooling layer, processing the images based on a fourth weight vector, and respectively obtaining a sixth characteristic image and a seventh characteristic image output by the second global average pooling layer;
inputting the third weight vector and the fourth weight vector into a second auxiliary classifier to obtain a second classification result;
inputting the fourth feature image, the fifth feature image, the sixth feature image and the seventh feature image into a first classifier, and obtaining a first judgment result output by the first classifier;
the third weight vector and the fourth weight vector are used for adjusting the specific gravity of each feature in the target sample image and the reference sample image.
Optionally, the training parameters in the first generator include: a first weight vector, a second weight vector, a plurality of parameter vectors;
the training parameters in the first discriminator include: a third weight vector and a fourth weight vector;
according to the first discrimination result and the first classification result, training parameters in the first generator and the first discriminator are adjusted to obtain the adjusted first generator and the adjusted first discriminator, and the method comprises the following steps:
a first generator for adjusting the first weight vector, the second weight vector and the plurality of parameter vectors according to the first discrimination result, the first classification result and the second classification result;
and adjusting the third weight vector and the fourth weight vector according to the first discrimination result, the first classification result and the second classification result to obtain an adjusted first discriminator.
Optionally, the method further comprises:
inputting the target sample image into a second generator to convert the style of the target sample image into the style of the source sample image to obtain an intermediate sample image;
and respectively inputting the intermediate sample image and the source image into a second discriminator to obtain a second discrimination result.
Optionally, adjusting training parameters in the first generator and the first discriminator according to the first discrimination result until the discrimination result meets a preset condition, and stopping the adjustment, including:
adjusting training parameters in the first generator, the first discriminator, the second generator and the second discriminator according to the first discrimination result and the second discrimination result to obtain an adjusted first generator, an adjusted first discriminator, an adjusted second generator and an adjusted second discriminator;
processing the source sample image according to the adjusted first generator to convert the style of the source sample image into a target style and obtain a second target sample image;
processing the second target sample image according to the adjusted second generator to convert the style of the second target sample image into the style of the source sample image, and obtaining an intermediate sample image;
inputting the second target sample image into the adjusted first discriminator, and outputting a third discrimination result, wherein the third discrimination result is a second loss value of the second target sample image;
inputting the intermediate sample image into the adjusted second discriminator, and outputting a fourth discrimination result, wherein the fourth discrimination result is a third loss value of the intermediate sample image;
and when the sum of the second loss value and the third loss value is smaller than the preset value, determining that the first judgment result and the second judgment result meet the preset condition, and stopping adjustment.
In a second aspect, an embodiment of the present application provides an image processing apparatus, including:
the extraction module is used for extracting an image to be processed;
the processing module is used for inputting the image to be processed into the trained image style migration model, and performing image style migration processing based on a target style to obtain a target image, wherein the style of the target image is the target style;
the processing module comprises:
the first processing unit is used for respectively carrying out global maximum pool processing and global average pool processing on the image to be processed to obtain a first image;
the second processing unit is used for performing convolution processing on the first image to obtain a second image;
and the third processing unit is used for carrying out channel coding on the second image to obtain a target image.
In a third aspect, an embodiment of the present application provides an electronic device, including:
a memory for storing program instructions;
a processor for calling and executing the program instructions in the memory to perform the image processing method according to any one of the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium having a computer program stored thereon; the computer program, when executed by a processor, implements the method as set forth in any one of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the first aspects.
The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a storage medium, wherein the image to be processed is extracted; inputting an image to be processed into a trained image style migration model, and performing image style migration processing based on a target style to obtain a target image; specifically, performing global maximum pool processing and global average pool processing on an image to be processed respectively to obtain a first image; performing convolution processing on the first image to obtain a second image; and coding the second image to obtain a target image. When the style migration model carries out image style migration processing based on the target style, the global maximum pool processing and the global average pool processing are carried out on the image to be processed, so that the detail characteristics of the image can be fully extracted, the style migration effect is effectively improved, the target image with high quality is obtained, and the user experience is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art according to these drawings.
Fig. 1 is a diagram illustrating a scene of an image processing method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an image style migration model according to an embodiment of the present application;
fig. 3 is a flowchart of an image processing method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
FIG. 5 is a flowchart of an image processing method according to another embodiment of the present application;
FIG. 6 is a flowchart of an image processing method according to another embodiment of the present application;
FIG. 7 is a schematic structural diagram of an image processing apparatus according to another embodiment of the present application
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first", "second", and the like in the various parts of the embodiments of the present application and in the drawings are used for distinguishing similar objects and not necessarily for describing a particular order or sequence. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The method flow diagrams referred to in the following embodiments of the present application are exemplary only, and do not necessarily include all of the contents and steps, nor do they necessarily have to be performed in the order described. For example, some steps may be broken down and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
The functional blocks in the block diagrams referred to in the embodiments described below are only functional entities, and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processors and/or microcontrollers.
With the development of Artificial Intelligence (AI) technology, the technology is widely applied to many computer vision tasks, such as realizing special artistic stylization of scenic pictures, colorizing black and white photos, changing various attributes of human faces, and the like. The style migration refers To migrating information such as texture, color, shape and the like in a scene I To a scene II with another style, so that the scene II can obtain new style characteristics such as color, texture, shape and the like on the basis of keeping original content, and belongs To the field of Image-To-Image Translation (ITIT) in computer vision. The current image style migration technology, for example, the face animation stylization technology, mainly adopts GAN network technology to implement one-to-one style conversion. However, the above-mentioned techniques cannot clearly retain detailed features, such as the shape of hair, during the migration process, thereby resulting in poor image style migration effect.
Based on the above problems, embodiments of the present application provide an image processing method, an apparatus, a device, and a storage medium, where when performing image style migration processing based on a target style through an image style migration model, global maximum pool processing and global average pool processing are performed on an image to be processed, and detail features of the image can be sufficiently extracted, so that a style migration effect is effectively improved, a high-quality target image is obtained, and a user experience is improved.
Fig. 1 is a diagram illustrating an example of a scene for image style migration according to an embodiment of the present application. The image processing method can be implemented by executing the corresponding software code by a processing device, such as a processor, of the terminal installed by the corresponding software/client, or by executing the corresponding software code by the processing device of the terminal and combining with other hardware entities. Examples of the terminal include a desktop computer, a notebook, a Personal Digital Assistant (PDA), a smart phone, a tablet computer, and a game machine. This embodiment is explained with the terminal apparatus 101 as an execution subject.
In practical application, the terminal device 101 has a trained image style migration model, and is configured to perform style migration processing on an image to be processed based on a target style, so as to obtain a target image.
Specifically, the image to be processed is input into the terminal device 101, the terminal device 101 processes the image to be processed based on the target style, and the style of the image to be processed is transferred to the target style, so as to obtain the target image of the target style. In an embodiment, the target style may be multiple, each target style corresponds to a different image style migration model, and after the user selects a target style in the terminal device 101 according to a requirement during image processing, the terminal device 101 migrates the style of the image to be processed into the target style according to the image style migration model corresponding to the selected target style.
In another embodiment, the user may also input a reference image with a target style, and the terminal device 101 acquires an image feature in the reference image by recognizing the style of the reference image, and then migrates the style of the image to be processed into the target style by using the image style migration model according to the image feature.
The following describes the technical solution of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a schematic structural diagram of an image style migration model according to an embodiment of the present application. As shown in fig. 2, the image style migration model 20 includes a global maximum pooling layer 201 and a global average pooling layer 202, a convolutional layer 203, and an encoding layer 204.
The global maximum pooling 201 is used for performing global maximum pooling on an image to be processed to obtain an image subjected to global maximum pooling;
a global average pooling pool 202, configured to perform global average pooling on the image to be processed, so as to obtain a globally average pooled image;
the convolution layer 203 is used for performing convolution processing on the image subjected to the global maximum pooling processing and the image subjected to the global average pooling processing to obtain a second image;
and the coding layer 204 is configured to perform channel coding on the second image to obtain a target image.
The following describes the process of image style migration by using each module in the image style migration model with reference to specific steps:
fig. 3 is a flowchart of an image processing method according to an embodiment of the present application. As shown in fig. 3, the image processing method provided in the embodiment of the present application specifically includes the following steps:
s301, extracting an image to be processed.
In one embodiment, acquiring the image to be processed may specifically include the following steps:
(1) acquiring an image to be processed from an input image;
wherein, the image to be processed includes: a real head portrait, etc., wherein the style of the image to be processed is a source domain style.
(2) Amplifying the extracted face frame in the image to be processed;
specifically, a face region in the image to be processed is identified, and the face region is amplified to be N times of the original face region.
In one embodiment, the magnification factor may be set according to the parameters of the image style migration model, for example, the magnification factor may be 1.25 times; in another embodiment, the magnification factor may also be adjusted according to the requirement of generating the target image, and this embodiment is not limited thereto.
(3) And clipping the enlarged/reduced image to be processed.
Because the input image often has various complex backgrounds, only the face part needing to be converted can be obtained through the step, so that the influence of other image information in the background on the style conversion process is reduced.
In some embodiments, before inputting the image to be processed into the trained image style migration model for style migration processing, the image to be processed needs to be preprocessed, where the preprocessing includes: at least one of an alignment process and a normalization process.
Specifically, the alignment process includes: and (5) performing alignment processing on the image to be processed according to the longest edge to ensure that the length and the width of the image to be processed are equal.
The normalization process includes: acquiring the pixel value of the image to be processed, and then scaling down/enlarging the image, wherein the size of the scaling down/enlarging may be a preset size, for example, enlarging the image to 192 × 192, and then normalizing the value to the interval [0,1 ].
Further, after the image is preprocessed, since the image at this time has only 3 channels, that is, the size of the image to be processed is 192 × 192 × 3, the image to be processed needs to be input into a plurality of convolution layers to realize downsampling, thereby increasing the channel characteristics of the image to be processed. The number of convolutional layers and the down-sampling multiple are not particularly limited in the embodiments of the present application, and for example, the size of the image to be processed is 192 × 192 × 3, the number of convolutional layers is 3, and the down-sampling multiple is 4 times, which are taken as examples, and the image to be processed with the size of 48 × 48 × 3 is output through the above-mentioned sampling processing.
Optionally, after the convolved image to be processed is obtained, the processed image to be processed may be input into a plurality of residual error layers (ResBlock) to increase channel characteristics of the image, so as to sufficiently extract image characteristics. Illustratively, its channels can be increased to 256, i.e., the resulting image size is 48 × 48 × 256.
S302, inputting the image to be processed into the trained image style migration model, and performing image style migration processing based on the target style to obtain the target image.
This step is explained below with reference to steps S3021 to S3023:
and S3021, performing global maximum pool processing and global average pool processing on the image to be processed respectively to obtain a first image.
Wherein the first image includes: a global maximum pooled processed image and a global average pooled processed image.
For convenience of understanding, please refer to fig. 2, the image to be processed obtained in step S301 is divided into two paths, the first path is input into the global maximum pooling layer 201 in the style migration model 200, and the image after the global maximum pooling is obtained by performing the global maximum pooling on the image to be processed, and the second path is input into the global average pooling layer 202 in the style migration model 200, and the image after the global average pooling is obtained by performing the global average pooling on the image to be processed.
Taking the size of the image input into the two pooling layers as an example of 48 × 48 × 256, the size of the image subjected to the global maximum pooling and the size of the image subjected to the global average pooling are 48 × 48 × 256.
Further, the image after the global average pooling and the image after the global maximum pooling are channel-merged to obtain a first image, wherein the size of the first image is 48 × 48 × 512.
Through global maximum pooling and global average pooling, global information and local information of the image to be processed can be fully extracted, so that the style migration effect can be effectively improved, a high-quality style migration image is obtained, and the user experience is improved.
And S3022, performing convolution processing on the first image to obtain a second image.
Further, the first image is input to the convolutional layer 203, and the convolutional layer 203 is used to perform upsampling processing, thereby reducing the characteristic channel of the first image and outputting a second image. It should be noted that, the number of convolutional layers and the sampling multiple are not specifically limited in the embodiment of the present application, and for example, the number of convolutional layers may be 1, and the size of the output second image is 48 × 48 × 256.
And S3023, performing channel coding on the second image to obtain a target image.
Specifically, the first image is used as an input of the coding layer 204, and the second image is channel coded by the coding layer to obtain the target image.
The encoding layer 204 may include: a plurality of fully-connected layers and a plurality of decoders, it is understood that the number of fully-connected layers and decoders in the embodiments of the present application is not particularly limited, and for example, the number of fully-connected layers may be 3, and the number of decoders may be 4.
On one hand, the second image is input into a full-connection layer, the full-connection layer performs full-connection processing on the image features of the second image, and the second image is processed into a column term vector. Still by way of example, the first image is a feature map of 48x48x256, and after full concatenation processing, the output column entry vectors are multiple parameter vectors of 1x1x256, where the parameter vectors include: gamma parameter (gamma) and beta parameter (beta).
And on the other hand, inputting the first image into decoders, and performing channel coding on the first image by each decoder according to the parameter vector output by the full-connection layer to obtain the target image. Specifically, the second image is channel-coded by a gamma parameter vector of 1x1x256 and a beta parameter vector of 1x1x256, and illustratively, the second image is channel-coded according to the following formula (1):
y=gamma*x+beta (1)
wherein y is the image characteristic of the target image, and x is the image characteristic of the second image.
Since the size of the obtained target image is different from that of the image to be processed by the above-described processing, it is necessary to process it as an image of the same size as that of the image to be processed. Still in the above example, the size of the obtained target image is 48 × 48 × 256, which needs to be processed into an image of 192 × 192 × 3 size.
Specifically, the obtained target image is input to an deconvolution layer for extracting image features of the target image, so as to obtain a convolved target image, and it can be understood that the size of the convolved target image is the same as that of the image to be processed.
Optionally, the target image after the convolution processing may be input to a layer normalization layer (LN normalization layer) to normalize the obtained image features of the convolution layer, so that the shape and style features of the image may be effectively transferred.
The image processing method provided by the embodiment extracts the image to be processed, inputs the image to be processed into the trained image style migration model, performs image style migration processing based on the target style to obtain the target image, and processes the image through the trained image style migration model, so that the efficiency of the image style migration process can be improved, and the user experience is improved. In addition, when the style migration model carries out the image style migration processing based on the target style, the global maximum pool processing and the global average pool processing are respectively carried out on the image to be processed, so that the detail characteristics of the image can be fully extracted, the style migration effect is effectively improved, and the target image with high quality is obtained.
In practical applications, the image style migration model is obtained by adjusting an image processing device, and specifically, before obtaining a target image by using the image style migration model based on the image style migration processing of the target style, the image processing device needs to be adjusted, so that a trained image style migration model is obtained according to the adjusted image processing device. First, an image processing apparatus will be described with reference to specific embodiments:
fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. As shown in fig. 4, an image processing apparatus 40 provided in an embodiment of the present application includes: a first generator 41 and a first discriminator 42.
Specifically, the first generator 41 includes: a first global maximum pooling layer 411 and a first global average pooling layer 412, a first auxiliary classifier 413, a first convolutional layer 414, a plurality of first fully connected layers 415, a first decoder 416.
The first discriminator 42 includes: a second global maximum pooling layer 421 and a second global average pooling layer 422, a second auxiliary classifier 423, a first classifier 424.
Further, the process of adjusting the image processing apparatus to obtain the trained image style migration model with reference to fig. 4 and the specific embodiment is described:
fig. 5 is a flowchart of an image processing method according to another embodiment of the present application. As shown in fig. 5, the method provided by the embodiment of the present application includes the following steps:
s501, acquiring a source sample image and a reference sample image.
The source sample image is a real face image of a source domain style, and the reference sample image is an image of a target style, wherein the reference sample image of the target style may include: animation-style images, and animal-style images.
It should be noted that the process of obtaining the source sample image and the reference sample image is similar to step S301 in the embodiment shown in fig. 2, and specific reference may be made to the above, and details are not repeated here.
S502, inputting the source sample image into a first generator, converting the style of the source sample image into a target style, and obtaining a target sample image.
For easy understanding, referring to fig. 4, the first generator 41 converts the style of the source sample image into the target style, and obtaining the target sample image includes the following steps:
(1) the source sample image is input to the first global maximum pooling layer 411, and the source sample image is processed based on the first weight vector to obtain a first feature image.
The first weight vector is obtained by adjusting the first generator, the first weight vector is used for adjusting the proportion of each feature in the source sample image, and the number of the first weight vectors is the same as the number of channels of the source sample image. Specifically, taking the size of the source sample image input to the first generator 41 as 192 × 192 × 256, which includes 256 channels, the number of the first weight vectors is 256, and each first weight vector is used to control the proportion of the feature of its corresponding channel, so as to obtain the first feature image according to the proportion of the feature of each channel.
(2) The source sample image is input to the first global average pooling layer 412, and the source sample image is processed based on the second weight vector to obtain a second feature image.
Correspondingly, a second weight vector is obtained by adjusting the first generator 41, the second weight vector is used for adjusting the proportion of each feature in the source sample image, and the number of the second weight vectors is the same as the number of channels of the source sample image. Specifically, taking the size of the source sample image input to the first generator 41 as 192 × 192 × 256, which includes 256 channels, the number of the second weight vectors is 256, and each second weight vector is used to control the proportion of the feature of its corresponding channel, so as to obtain the second feature image according to the proportion of the feature of each channel.
It should be noted that, in the embodiment of the present application, the execution sequence of the two steps (1) and (2) is not specifically limited.
(3) The first weight vector and the second weight vector are input to the first auxiliary classifier 413, and a first classification result is obtained.
Specifically, the first weight vector and the second weight vector are combined, and the combined weight vector is input to the first auxiliary classifier 413. For example, in combination with the above, if the number of the first weight vectors and the number of the second weight vectors are both 256, the number of the combined weight vectors is 512.
The first classification result is used to indicate the importance of different vectors in the combined weight vector, each different vector is used to indicate the magnitude of the weight value of the feature channel corresponding to the vector, the combined weight vector is input to the first auxiliary classifier 413, and the importance of each vector in each combined weight vector is determined by the first auxiliary classifier 413, so as to output the first classification result.
(4) And inputting the first characteristic image and the second characteristic image into the first convolution layer to obtain a third characteristic image.
Specifically, the first feature image after the global maximum pooling and the second feature image after the global average pooling are subjected to channel merging to obtain a first image. In the example where the sizes of the images input to the two pooling layers are both 48 × 48 × 256, if the sizes of the image subjected to the global maximum pooling process and the image subjected to the global average pooling process are 48 × 48 × 256, the size of the merged first image is 48 × 48 × 512.
Further, the merged first image is used as an input for the first convolution layer 414, and the first convolution layer 414 is used to perform upsampling processing, thereby reducing the feature channel of the first image and outputting a third feature image. It should be noted that, the number and sampling multiple of the first convolution layers 414 are not specifically limited in this embodiment, and for example, the number of the first convolution layers 414 may be 1, and the size of the output third feature image is 48 × 48 × 256.
By performing global maximum pooling and global average pooling on the image, global information and local information of the image can be sufficiently extracted, a more accurate image style migration model can be trained based on the acquired image information, and a high-quality style migration image can be generated through the image style migration model.
(5) And inputting the third characteristic image to a plurality of first full-connection layers to obtain a plurality of parameter vectors.
(6) And inputting the third characteristic image into a first decoder, and carrying out channel coding on the third characteristic image according to a plurality of parameter vectors to obtain a target sample image.
The plurality of parameter vectors are obtained by adjusting the first generator 41. Steps (5) and (6) are similar to step S3023 in the embodiment shown in fig. 3, and reference may be made to the embodiment shown in fig. 3 for details, which are not repeated herein.
S503, inputting the target sample image and the reference sample image into a first discriminator respectively to obtain a first discrimination result.
The first discrimination result includes: loss value of the image wind migration model. For convenience of understanding, please continue to refer to fig. 4, the first discriminator 42 obtains the first discrimination result according to the target sample image and the reference sample image, and specifically includes the following steps:
(1) the target sample image and the reference sample image are respectively input to the second global maximum pooling layer 421, and the images are processed based on the third weight vector, so as to respectively obtain a fourth feature image and a fifth feature image.
The third weight vector is obtained by adjusting the first discriminator, the third weight vector is used for adjusting the proportion of each feature in the target sample image and the reference sample image, and the number of the third weight vector is the same as the number of channels of the target sample image and the reference sample image. Specifically, taking the example that the sizes of the target sample image and the reference sample image input into the first discriminator 42 are 192 × 192 × 256, which each include 256 channels, the number of the third weight vectors is 256, and each third weight vector is used to control the proportion of the feature of its corresponding channel, so as to obtain the fourth feature image corresponding to the target sample image and the fifth feature image corresponding to the reference sample image according to the proportion of the feature of each channel.
(2) And respectively inputting the target sample image and the reference sample image into a second global average pooling layer 422, processing the images based on the fourth weight vector, and respectively obtaining a sixth feature image and a seventh feature image output by the second global average pooling layer.
Correspondingly, a fourth weight vector is obtained by adjusting the first discriminator 42, the fourth weight vector is used for adjusting the specific gravity of each feature in the target sample image and the reference sample image, and the number of the fourth weight vectors is the same as the number of channels of the target sample image and the reference sample image. Specifically, taking the size of the source sample image input to the first discriminator 42 as 192 × 192 × 256, which includes 256 channels, the number of the fourth weight vectors is 256, and each fourth weight vector is used to control the proportion of the feature of its corresponding channel, so as to obtain the sixth feature image corresponding to the target sample image and the seventh feature image corresponding to the reference sample image according to the proportion of the feature of each channel.
It should be noted that, the execution order of the steps (1) and (2) is not specifically limited in the embodiments of the present application.
Optionally, before the target sample image and the reference sample image are input to the first discriminator 42, the target sample image and the reference sample image need to be respectively subjected to processing procedures such as amplification and cropping, so that the influence of other image information in the background on the style conversion procedure can be reduced.
Further, in some embodiments, before inputting the target sample image and the reference sample image into the first discriminator 42, the image to be processed needs to be preprocessed, where the preprocessing includes: at least one of an alignment process and a normalization process.
Specifically, the processes and principles of the amplifying, cutting, and preprocessing of the target sample image and the reference sample image are similar to those of step S301 in the embodiment shown in fig. 3, which may be referred to in the foregoing embodiment specifically, and are not repeated herein.
Alternatively, before the image is input to the first discriminator 42, it is necessary to perform downsampling processing on the target sample image and the reference sample image and increase the number of channels thereof. Specifically, the target sample image and the reference sample image are respectively input to the convolutional layer, and it should be noted that, in the embodiment of the present invention, the number and the sampling multiple of the convolutional layer are not specifically limited in the embodiment of the present invention, for example, the number of the convolutional layer may be 4, and the sampling multiple may be 8, and taking the size of both the target sample image and the reference sample image as 192 × 192 × 3 as an example, through the above sampling processing, the size of both the target sample image and the reference sample image is 24 × 24 × 3, and then a plurality of residual error layers (ResBlock) are respectively input to both the target sample image and the reference sample image to increase the channel characteristics of the images, and for the increased number of channels, the present application is not limited. Illustratively, its number of channels can be increased to 512, i.e., the resulting image size is 48 × 48 × 512. By sampling the image and adding the channel characteristics, the image characteristics can be fully extracted.
(3) And inputting the third weight vector and the fourth weight vector into a second auxiliary classifier to obtain a second classification result.
Specifically, the third weight vector and the fourth weight vector are combined, and the combined weight vector is input to the second auxiliary classifier 423. For example, in combination with the above, if the number of the third weight vectors and the number of the fourth weight vectors are 256, the number of the combined weight vectors is 512.
The second classification result is used to indicate the importance degree of different vectors in the combined weight vector, each different vector is used to indicate the weight value of the feature channel corresponding to the vector, the combined weight vector is input to the second auxiliary classifier 423, and the importance degree of each component in each combined weight vector is determined by the second auxiliary classifier 423 to output the second classification result.
(4) The fourth feature image, the fifth feature image, the sixth feature image, and the seventh feature image are input to the first classifier 424, and a first determination result output by the first classifier 424 is obtained.
Among them, the first classifier 424 may include: a convolutional layer and a fully connected layer.
Specifically, the fourth feature image and the sixth feature image are combined to obtain a feature image corresponding to the target sample image, and the fifth feature image and the seventh feature image are combined to obtain a feature image corresponding to the reference sample image. It should be noted that, taking the example that the sizes of the target sample images input into the two pooling layers and the reference sample images input into the two pooling layers are both 48 × 48 × 512, and the sizes of the image after the global maximum pooling process and the image after the global average pooling process are both 48 × 48 × 512, the size of the feature image corresponding to the target sample image is 48 × 48 × 512, and the size of the feature image corresponding to the reference sample image is also 48 × 48 × 512.
Further, the merged feature image corresponding to the target sample image and the merged feature image corresponding to the reference sample image are respectively input to the convolutional layer, the convolutional layer performs down-sampling processing on the two, channels of the two are reduced to 256, that is, the size of the feature image corresponding to the target sample image and the size of the feature image corresponding to the reference sample image which are output after passing through the convolutional layer are both 48 × 48 × 256.
On one hand, the feature image corresponding to the target sample image is input into the full-connection layer, the first feature vector of the feature image corresponding to the target sample image is output through the full-connection layer, the first feature vector is processed through a sigmoid activation function, the first probability and the first label probability of the target sample image from the source domain are obtained, and then the first cross entropy is determined according to the first probability and the first label probability, wherein the first cross entropy is a loss value of the first generator.
On the other hand, the feature image corresponding to the reference sample image is input into the full-connection layer, a second feature vector of the feature image corresponding to the reference sample image is output through the full-connection layer, the second feature vector is processed through a sigmoid activation function, a second probability and a second label probability of the reference sample image from the target domain are obtained, and then a second cross entropy is determined according to the second probability and the second label probability, wherein the second cross entropy is a loss value of the first discriminator.
Further, a loss value of the image style migration model in the embodiment of the application is determined according to the first cross entropy and the second cross entropy.
Specifically, the loss value of the image style migration model can be obtained according to the following formula (2):
Figure BDA0002805122090000161
wherein the content of the first and second substances,
Figure BDA0002805122090000162
is the loss value of the first generator,
Figure BDA0002805122090000163
is the loss value of the first discriminator.
S504, whether the judgment result in the first discriminator meets the preset condition or not.
Specifically, when the loss value of the image style migration model is greater than or equal to the preset value, which indicates that the image processing apparatus has not yet reached the standard, parameters of each structure of the image processing apparatus need to be adjusted.
It should be noted that, for the preset value of the loss value, the preset value may be set according to actual requirements, and the embodiment of the present application is not specifically limited.
And S505, if the first identifier does not satisfy the first criterion, adjusting the training parameters in the first generator and the first discriminator according to the first criterion result.
Wherein the training parameters in the first generator include: a first weight vector, a second weight vector, a plurality of parameter vectors;
the training parameters in the first discriminator include: a third weight vector, a fourth weight vector.
According to the first discrimination result, adjusting the training parameters in the first generator and the first discriminator comprises:
adjusting a first weight vector, a second weight vector and a plurality of parameter vectors according to the first discrimination result, the first classification result and the second classification result to obtain an adjusted first generator;
and adjusting the third weight vector and the fourth weight vector according to the first discrimination result, the first classification result and the second classification result to obtain an adjusted first discriminator.
It should be noted that the goal of adjusting the training parameters is to make the loss value of the image style migration model smaller and smaller, and the specific adjustment process includes: and calculating the derivative of the loss value to each structure in the first generator and the first discriminator, and determining the training parameters corresponding to each structure in the first generator and the first discriminator according to a preset learning rate.
Specifically, the parameters of the current structure can be obtained according to the following formula (3).
w1=w0+b*a (3)
Where w0 is an initialized weight vector, a is a derivative of the loss value to the current structure, b is a preset learning rate, and as for the magnitude of the learning rate b, the embodiment of the present application is not particularly limited, and for example, the learning rate b may be set to 1.
Specifically, the derivative of each structure is sequentially and reversely calculated according to the loss value, and the training parameters of all the structures of the network are updated once according to the derivative a. Illustratively, the adjusting the first weight vector, the second weight vector, and the plurality of parameter vectors according to the loss value to obtain an adjusted first generator includes:
the first weight vector of the first global max pooling layer 411 in the first generator 41 is adjusted, the second weight vector of the first global mean pooling layer 412 in the first generator 41 is adjusted, and the parameter vector of the fully connected layer 415 in the first generator 41 is adjusted.
Specifically, the initial value of the first weight vector is taken as w10The initial value of the second weight vector is w20Initial value of the parameter vector is w30For example, the derivative of the first global maximum pooling layer 411 is a according to the loss value1The derivative of the first global average pooling layer 412 is a2The derivative of the fully-connected layer 415 is a3
Further, the adjusted training parameters of each structure are obtained according to the derivative of each structure:
specifically, the adjusted first weight vector is: w is a11=w10+b*a1The adjusted second weight vector is: w is a21=w20+b*a2The adjusted parameter vector is: w is a31=w30+b*a3
Further, adjusting a third weight vector and a fourth weight vector according to the first determination result, the first classification result and the second classification result to obtain an adjusted first determiner, including:
and adjusting the third weight vector and the fourth weight vector according to the loss value to obtain the adjusted first discriminator.
It is understood that the training parameter adjustment process of the first discriminator is similar to the training parameter adjustment process of the first generator, and is not described herein again.
S506, obtaining a second judgment result according to the adjusted first generator and the first discriminator.
The following describes in detail the process of obtaining a trained image style migration model with reference to specific steps:
(1) and according to the adjusted first generator, converting the style of the source sample image into a target style to obtain a first target sample image.
According to the steps, parameters of all structures in the image processing device are adjusted to obtain an adjusted first generator, and further, the source sample image is input into the adjusted generator to obtain a first target sample image.
(2) Respectively inputting the first target sample image and the reference sample image into the adjusted first discriminator, and outputting a second discrimination result, wherein the second discrimination result comprises: a first loss value corresponding to the first target sample image.
The first loss value corresponding to the first target sample image is the loss value of the adjusted image format transition model.
It is understood that the method and principle of steps (1) and (2) are similar to the process of steps S501 to S505, and reference may be made to the above steps.
And S507, if the judgment result in the discriminator meets a preset condition, stopping adjustment to obtain a trained image style migration model.
And when the first loss value is smaller than the preset value, determining that the judgment result in the first discriminator meets the preset condition, stopping adjustment, and determining the currently adjusted image style migration model as a trained image style migration model.
According to the image processing method provided by the embodiment of the application, the generator generates the target sample image, the discriminator obtains the discrimination result corresponding to the target sample image, the training parameters in the generator and the discriminator are continuously adjusted according to the discrimination result, the generator continuously improves the image style migration effect, the discriminator continuously improves the discrimination effect, the generator and the discriminator realize counterlearning, and the trained image style migration model is obtained according to the adjusted first generator until the discrimination result in the discriminator meets the preset condition. By the aid of the image migration method and the image migration device, image migration effects of the image style migration model can be further improved, so that the image style migration model focuses on detailed characteristics of the image more, and the generated image is more vivid.
In addition, when the style migration model carries out image style migration processing based on the target style, the global maximum pool processing and the global average pool processing are respectively carried out on the image to be processed, so that the detail characteristics of the image can be fully extracted, the style migration effect is effectively improved, the target image with high quality is obtained, and the user experience is improved.
In some embodiments, since the input sample pictures for training are not paired data, that is, the source sample image and the reference sample image are not paired, two generators and two discriminators are required to obtain a trained image format migration model, wherein the first generator is used for converting the style of the source sample image into a target style to obtain a target image, and the second generator is used for restoring the style of the target image into the style of the source sample image to obtain an intermediate sample image. The first discriminator is used for judging the target image generated by the first generator and judging whether the target image is a real image in a target style or a generated image in a target style, and the second discriminator is used for judging whether the intermediate sample image generated by the second generator is a real image from a source domain or a generated image in a source domain. Through continuous loop confrontation learning, the first generator continuously adjusts the training parameters, so that the quality of the generated image is improved to cheat the first discriminator, and the first discriminator also continuously adjusts the training parameters, so that the discrimination capability is improved to discriminate. Correspondingly, the second generator continuously improves the quality of generated images to cheat the second discriminator, the second discriminator also continuously improves discrimination capability to discriminate until the quality reaches a balance state, at the moment, the training is finished, and the first generator after parameters are adjusted is output, so that the image style migration model is obtained. Due to the fact that a circular conversion path is provided in the scheme, the scheme can obtain a trained image style migration model by using unpaired training data.
In combination with the above, the image processing apparatus 40 provided in the embodiment shown in fig. 4 may further include: a second generator 43 and a second discriminator 44.
Specifically, the second generator 43 includes: a third global maximum pooling layer 431 and a third global average pooling layer 432, a third auxiliary classifier 433, a second convolutional layer 434, a plurality of second fully connected layers 435, a second decoder 436.
The first discriminator 44 includes: a fourth global max pooling layer 441 and a fourth global average pooling layer 442, a fourth secondary classifier 443, a second classifier 444.
The following describes a process of obtaining a trained image style migration model by using unpaired training data by the first generator 41, the first arbiter 42, the second generator 43, and the second arbiter 44 with reference to fig. 6:
fig. 6 is a flowchart of an image processing method according to another embodiment of the present application. As shown in fig. 6, the method comprises the steps of:
s601, acquiring a source sample image and a reference sample image.
The source sample image is in a source domain style, and the style of the reference sample image is in a target style.
S602, inputting the source sample image into a first generator, converting the style of the source sample image into a target style, and obtaining a target sample image.
And S603, respectively inputting the target sample image and the reference sample image into a first discriminator to obtain a first discrimination result.
And S604, inputting the target sample image into a second generator so as to convert the style of the target sample image into the style of the source sample image and obtain an intermediate sample image.
And S605, inputting the intermediate sample image and the source image into a second discriminator respectively to obtain a second discrimination result.
And S606, judging whether the judgment result meets the preset requirement.
And S607, if the first decision result and the second decision result are not satisfied, adjusting the training parameters in the first generator, the first discriminator, the second generator and the second discriminator to obtain the adjusted first generator, the adjusted first discriminator, the adjusted second generator and the adjusted second discriminator.
And S608, processing the source sample image according to the adjusted first generator to convert the style of the source sample image into a target style, and obtaining a second target sample image.
And S609, processing the second target sample image according to the adjusted second generator so as to convert the style of the second target sample image into the style of the source sample image and obtain an intermediate sample image.
S610, inputting the second target sample image into the adjusted first discriminator, outputting a third discrimination result, inputting the intermediate sample image into the adjusted second discriminator, and outputting a fourth discrimination result.
And the third judgment result is a second loss value of the second target sample image, and the fourth judgment result is a third loss value of the intermediate sample image.
And S611, if the judgment result meets the preset requirement, stopping adjustment and outputting the image style migration model.
It should be noted that steps S601 to S611 are similar to the method and principle of steps S501 to S507 in the embodiment shown in fig. 5, and please refer to the embodiment shown in fig. 5 for details, which are not repeated herein.
In the scheme provided by the embodiment of the application, by arranging two generators and two discriminators, sample pictures are not paired data, that is, a source sample image and a reference sample image are not paired and input, a first discriminator is used for judging a target image generated by the first generator and judging whether the target image is a real image in a target style or a generated image in the target style, and a second discriminator is used for continuously adjusting training parameters of each structure in an image processing device through continuous loop antagonistic learning to obtain an image format migration model, wherein the intermediate sample image generated by the second generator is a real image from a source domain or a generated image in the source domain. Due to the fact that a circular conversion path is provided in the scheme, the scheme can obtain a trained image style migration model by using unpaired training data.
Fig. 7 is a schematic structural diagram of an image processing apparatus according to yet another embodiment of the present application. As shown in fig. 7, the image processing apparatus 70 may include:
an extraction module 71, configured to extract an image to be processed;
the processing module 72 is configured to input an image to be processed into the trained image style migration model, and perform image style migration processing based on a target style to obtain a target image, where the style of the target image is the target style;
the processing module 72 includes:
a first processing unit 721, configured to perform global maximum pool processing and global average pool processing on an image to be processed, respectively, to obtain a first image;
a second processing unit 722, configured to perform convolution processing on the first image to obtain a second image;
the third processing unit 723 is configured to encode the second image according to a plurality of parameter vectors to obtain a target image, where the plurality of parameter vectors are obtained by training the style migration model.
Optionally, the first image includes: the image after global maximum pooling and the image after global average pooling;
the image style migration model comprises a global maximum pooling layer, a global average pooling layer, a convolution layer and an encoding layer;
the global maximum pooling device is used for performing global maximum pooling on the image to be processed to obtain an image subjected to global maximum pooling;
the global average pooling pool is used for performing global average pooling treatment on the image to be treated to obtain an image subjected to global average pooling treatment;
the convolution layer is used for performing convolution processing on the first image to obtain a second image;
and the coding layer is used for carrying out channel coding on the second image to obtain a target image.
It should be noted that, for the implementation principle and the technical effect of the image processing apparatus provided in this embodiment, reference may be made to the foregoing embodiments, and details are not described here again.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device of the present embodiment may be the computer (or a component usable for the computer) mentioned in the foregoing method embodiment. The electronic device may be configured to implement the method corresponding to the computer described in the above method embodiment, and refer to the description in the above method embodiment specifically.
The electronic device may comprise one or more processors 801, which processors 801 may also be referred to as processing units, which may perform certain control or processing functions. The processor 801 may be a general purpose processor, a special purpose processor, or the like. For example, a baseband processor, or a central processor. The baseband processor may be configured to process data, and the central processor may be configured to control the electronic device, execute a software program, and process data of the software program.
In one possible design, the processor 801 may also have stored instructions 803 or data (e.g., test parameters). The instructions 803 may be executed by the processor 801 to enable the electronic device to execute the method corresponding to the computer device or the network device described in the above method embodiment.
In yet another possible design, the electronic device may include circuitry that may implement the functionality of transmitting or receiving or communicating in the foregoing method embodiments.
In one possible implementation, the electronic device may include one or more memories 802 having instructions 804 stored thereon, which are executable on the processor 801 to cause the electronic device to perform the methods described in the above method embodiments.
In one possible implementation, the memory 802 may also have data stored therein. The processor 801 and the memory 802 may be provided separately or may be integrated together.
In one possible implementation, the electronic device may also include a transceiver 805 and/or an antenna 808. The processor 801, which may be referred to as a processing unit, controls the electronic device. The transceiver 805 may be referred to as a transceiving unit, a transceiver, a transceiving circuit, a transceiver, or the like, and is used for implementing transceiving functions of the electronic device.
For specific implementation processes of the processor 801 and the transceiver 805, reference may be made to the related descriptions of the above embodiments, and details are not described herein again.
The processor 801 and transceiver 805 described herein may be implemented on an Integrated Circuit (IC), an analog IC, a Radio Frequency Integrated Circuit (RFIC), a mixed signal IC, an Application Specific Integrated Circuit (ASIC), a Printed Circuit Board (PCB), an electronic device, or the like.
For the implementation principle and the technical effect of the electronic device provided by this embodiment, reference may be made to the foregoing embodiments, which are not described herein again.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program is used for implementing the image processing method according to any one of the above embodiments when executed by a processor.
Embodiments of the present invention further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the image processing method provided in any of the foregoing embodiments.
In the embodiments described above, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, a division of modules is merely a division of logical functions, and an actual implementation may have another division, for example, a plurality of modules may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.
It should be understood that the Processor may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, or the like.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks, and so forth. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (11)

1. An image processing method, comprising:
extracting an image to be processed; inputting the image to be processed into a trained image style migration model, and performing image style migration processing based on a target style to obtain a target image, wherein the style of the target image is the target style;
inputting the image to be processed into a trained image style migration model, and performing image style migration processing based on a target style to obtain a target image, wherein the image style migration processing comprises the following steps: respectively carrying out global maximum pool processing and global average pool processing on the image to be processed to obtain a first image; performing convolution processing on the first image to obtain a second image; performing channel coding on the second image to obtain the target image;
the image processing method further includes: obtaining a source sample image and a reference sample image, wherein the style of the reference sample image is the target style; inputting the source sample image into a first generator, converting the style of the source sample image into the target style, and obtaining a target sample image; inputting the target sample image and the reference sample image into a first discriminator respectively to obtain a first discrimination result; adjusting the training parameters in the first generator and the first discriminator according to the first discrimination result until the discrimination result in the first discriminator meets a preset condition, and stopping adjustment; obtaining the trained image style migration model according to the adjusted first generator;
the first generator includes: the system comprises a first global maximum pooling layer, a first global average pooling layer, a first auxiliary classifier, a first convolution layer, a plurality of first fully-connected layers and a first decoder;
inputting a source sample image into a first generator, converting the style of the source sample image into a target style, and obtaining a target sample image, wherein the method comprises the following steps:
inputting the source sample image to a first global maximum pooling layer, and processing the source sample image based on a first weight vector to obtain a first feature image; inputting the source sample image to the first global average pooling layer, processing the source sample image based on a second weight vector to obtain a second feature image, and inputting the first weight vector and the second weight vector to the first auxiliary classifier to obtain a first classification result; inputting the first characteristic image and the second characteristic image into the first convolution layer to obtain a third characteristic image; inputting the third feature image to the plurality of first fully-connected layers to obtain a plurality of parameter vectors; inputting the third feature image to the first decoder, and performing channel coding on the third feature image according to the multiple parameter vectors to obtain the target sample image; the first weight vector and the second weight vector are used for adjusting the proportion of each feature in the source sample image.
2. The method of claim 1, wherein the first image comprises: the image after global maximum pooling and the image after global average pooling;
the image style migration model comprises a global maximum pooling layer, a global average pooling layer, a convolution layer and an encoding layer;
the global maximum pooling tank is used for performing global maximum pooling on the image to be processed to obtain an image subjected to global maximum pooling;
the global average pooling tank is used for performing global average pooling on the image to be processed to obtain an image subjected to global average pooling;
the convolution layer is used for performing convolution processing on the first image to obtain the second image;
and the coding layer is used for coding the second image to obtain the target image.
3. The method according to claim 1, wherein the adjusting training parameters in the first generator and the first discriminator according to the first discrimination result until the discrimination result in the first discriminator meets a preset condition and stops adjusting comprises:
adjusting training parameters in the first generator and the first discriminator according to the first discrimination result and the first classification result to obtain an adjusted first generator and an adjusted first discriminator;
converting the style of the source sample image into a target style according to the adjusted first generator to obtain a first target sample image;
inputting the first target sample image and the reference sample image into the adjusted first discriminator, and outputting a second discrimination result, where the second discrimination result includes: a first loss value corresponding to the first target sample image;
and when the first loss value is determined to be smaller than a preset value, determining that the judgment result in the first discriminator meets a preset condition, and stopping adjustment.
4. The method of claim 3, wherein the first discriminator comprises: the second global maximum pooling layer, the second global average pooling layer, the second auxiliary classifier and the first classifier;
the inputting the target sample image and the reference sample image into a first discriminator respectively to obtain a first discrimination result includes:
inputting the target sample image and the reference sample image into a second global maximum pooling layer respectively, and processing the images based on a third weight vector to obtain a fourth feature image and a fifth feature image respectively;
inputting the target sample image and the reference sample image to the second global average pooling layer respectively, processing the images based on a fourth weight vector, and obtaining a sixth feature image and a seventh feature image output by the second global average pooling layer respectively;
inputting the third weight vector and the fourth weight vector to the second auxiliary classifier to obtain a second classification result;
inputting the fourth feature image, the fifth feature image, the sixth feature image and the seventh feature image to the first classifier to obtain a first judgment result output by the first classifier;
wherein the third weight vector and the fourth weight vector are used for adjusting the specific gravity of each feature in the target sample image and the reference sample image.
5. The method of claim 4, wherein the training parameters in the first generator comprise: the first weight vector, the second weight vector, the plurality of parameter vectors;
the training parameters in the first discriminator include: the third weight vector, the fourth weight vector;
the adjusting training parameters in the first generator and the first discriminator according to the first discrimination result and the first classification result to obtain an adjusted first generator and an adjusted first discriminator includes:
adjusting the first weight vector, the second weight vector and the plurality of parameter vectors according to the first judgment result, the first classification result and the second classification result to obtain an adjusted first generator;
and adjusting the third weight vector and the fourth weight vector according to the first discrimination result, the first classification result and the second classification result to obtain an adjusted first discriminator.
6. The method according to any one of claims 1-5, further comprising:
inputting the target sample image into a second generator to convert the style of the target sample image into the style of the source sample image to obtain an intermediate sample image;
and respectively inputting the intermediate sample image and the source sample image into a second discriminator to obtain a second discrimination result.
7. The method according to claim 6, wherein the adjusting the training parameters in the first generator and the first discriminator according to the first discrimination result until the discrimination result satisfies a preset condition and stopping the adjustment comprises:
adjusting training parameters in the first generator, the first discriminator, the second generator and the second discriminator according to the first discrimination result and the second discrimination result to obtain an adjusted first generator, an adjusted first discriminator, an adjusted second generator and an adjusted second discriminator;
processing the source sample image according to the adjusted first generator to convert the style of the source sample image into a target style and obtain a second target sample image;
processing the second target sample image according to the adjusted second generator to convert the style of the second target sample image into the style of a source sample image to obtain an intermediate sample image;
inputting the second target sample image into the adjusted first discriminator, and outputting a third discrimination result, wherein the third discrimination result is a second loss value of the second target sample image;
inputting the intermediate sample image into the adjusted second discriminator, and outputting a fourth discrimination result, wherein the fourth discrimination result is a third loss value of the intermediate sample image;
and when the sum of the second loss value and the third loss value is smaller than a preset value, determining that the first judgment result and the second judgment result meet a preset condition, and stopping adjustment.
8. An image processing apparatus characterized by comprising:
the extraction module is used for extracting an image to be processed;
the processing module is used for inputting the image to be processed into a trained image style migration model and carrying out image style migration processing based on a target style to obtain a target image, wherein the style of the target image is the target style;
the processing module comprises:
the first processing unit is used for respectively carrying out global maximum pool processing and global average pool processing on the image to be processed to obtain a first image;
the second processing unit is used for performing convolution processing on the first image to obtain a second image;
the third processing unit is used for carrying out channel coding on the second image to obtain the target image;
the extraction module is further configured to: obtaining a source sample image and a reference sample image, wherein the style of the reference sample image is the target style;
the processing module is further configured to: inputting the source sample image into a first generator, converting the style of the source sample image into the target style, and obtaining a target sample image; inputting the target sample image and the reference sample image into a first discriminator respectively to obtain a first discrimination result; adjusting the training parameters in the first generator and the first discriminator according to the first discrimination result until the discrimination result in the first discriminator meets a preset condition, and stopping adjustment; obtaining the trained image style migration model according to the adjusted first generator; the first generator includes: the system comprises a first global maximum pooling layer, a first global average pooling layer, a first auxiliary classifier, a first convolution layer, a plurality of first fully-connected layers and a first decoder;
the processing module is specifically configured to: inputting the source sample image to a first global maximum pooling layer, and processing the source sample image based on a first weight vector to obtain a first feature image; inputting the source sample image to the first global average pooling layer, processing the source sample image based on a second weight vector to obtain a second feature image, and inputting the first weight vector and the second weight vector to the first auxiliary classifier to obtain a first classification result; inputting the first characteristic image and the second characteristic image into the first convolution layer to obtain a third characteristic image; inputting the third feature image to the plurality of first fully-connected layers to obtain a plurality of parameter vectors; inputting the third feature image to the first decoder, and performing channel coding on the third feature image according to the multiple parameter vectors to obtain the target sample image; the first weight vector and the second weight vector are used for adjusting the proportion of each feature in the source sample image.
9. The apparatus of claim 8, wherein the first image comprises: the image after global maximum pooling and the image after global average pooling;
the image style migration model comprises a global maximum pooling layer, a global average pooling layer, a convolution layer and an encoding layer;
the global maximum pooling tank is used for performing global maximum pooling on the image to be processed to obtain an image subjected to global maximum pooling;
the global average pooling tank is used for performing global average pooling on the image to be processed to obtain an image subjected to global average pooling;
the convolution layer is used for performing convolution processing on the first image to obtain the second image;
and the coding layer is used for carrying out channel coding on the second image.
10. An electronic device, comprising: a memory for storing program instructions;
a processor for calling and executing program instructions in said memory, performing the image processing method of any of claims 1 to 7.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the image processing method of any one of claims 1 to 7.
CN202011364910.XA 2020-11-27 2020-11-27 Image processing method, device, equipment and storage medium Active CN112348739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011364910.XA CN112348739B (en) 2020-11-27 2020-11-27 Image processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011364910.XA CN112348739B (en) 2020-11-27 2020-11-27 Image processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112348739A CN112348739A (en) 2021-02-09
CN112348739B true CN112348739B (en) 2021-09-28

Family

ID=74366136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011364910.XA Active CN112348739B (en) 2020-11-27 2020-11-27 Image processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112348739B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205449A (en) * 2021-05-21 2021-08-03 珠海金山网络游戏科技有限公司 Expression migration model training method and device and expression migration method and device
CN113436062A (en) * 2021-07-28 2021-09-24 北京达佳互联信息技术有限公司 Image style migration method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800732A (en) * 2019-01-30 2019-05-24 北京字节跳动网络技术有限公司 The method and apparatus for generating model for generating caricature head portrait
CN110163286A (en) * 2019-05-24 2019-08-23 常熟理工学院 Hybrid pooling-based domain adaptive image classification method
CN110532955A (en) * 2019-08-30 2019-12-03 中国科学院宁波材料技术与工程研究所 Example dividing method and device based on feature attention and son up-sampling
CN110930297A (en) * 2019-11-20 2020-03-27 咪咕动漫有限公司 Method and device for migrating styles of face images, electronic equipment and storage medium
CN111784566A (en) * 2020-07-01 2020-10-16 北京字节跳动网络技术有限公司 Image processing method, migration model training method, device, medium and equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308679B (en) * 2018-08-13 2022-08-30 深圳市商汤科技有限公司 Image style conversion method and device, equipment and storage medium
CN111127378A (en) * 2019-12-23 2020-05-08 Oppo广东移动通信有限公司 Image processing method, image processing device, computer equipment and storage medium
CN111476708B (en) * 2020-04-03 2023-07-14 广州市百果园信息技术有限公司 Model generation method, model acquisition method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800732A (en) * 2019-01-30 2019-05-24 北京字节跳动网络技术有限公司 The method and apparatus for generating model for generating caricature head portrait
CN110163286A (en) * 2019-05-24 2019-08-23 常熟理工学院 Hybrid pooling-based domain adaptive image classification method
CN110532955A (en) * 2019-08-30 2019-12-03 中国科学院宁波材料技术与工程研究所 Example dividing method and device based on feature attention and son up-sampling
CN110930297A (en) * 2019-11-20 2020-03-27 咪咕动漫有限公司 Method and device for migrating styles of face images, electronic equipment and storage medium
CN111784566A (en) * 2020-07-01 2020-10-16 北京字节跳动网络技术有限公司 Image processing method, migration model training method, device, medium and equipment

Also Published As

Publication number Publication date
CN112348739A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
WO2022110638A1 (en) Human image restoration method and apparatus, electronic device, storage medium and program product
CN112348739B (en) Image processing method, device, equipment and storage medium
KR20210074360A (en) Image processing method, device and apparatus, and storage medium
CN108734653B (en) Image style conversion method and device
US11887280B2 (en) Method, system, and computer-readable medium for improving quality of low-light images
CN112562019A (en) Image color adjusting method and device, computer readable medium and electronic equipment
US10810462B2 (en) Object detection with adaptive channel features
EP3779891A1 (en) Method and device for training neural network model, and method and device for generating time-lapse photography video
US20240005628A1 (en) Bidirectional compact deep fusion networks for multimodality visual analysis applications
US20230047748A1 (en) Method of fusing image, and method of training image fusion model
US10452955B2 (en) System and method for encoding data in an image/video recognition integrated circuit solution
CN114331918B (en) Training method of image enhancement model, image enhancement method and electronic equipment
CN110619334A (en) Portrait segmentation method based on deep learning, architecture and related device
CN116205820A (en) Image enhancement method, target identification method, device and medium
CN112188236B (en) Video interpolation frame model training method, video interpolation frame generation method and related device
CN103929640A (en) Techniques For Managing Video Streaming
US20190220699A1 (en) System and method for encoding data in an image/video recognition integrated circuit solution
CN112200817A (en) Sky region segmentation and special effect processing method, device and equipment based on image
CN107105167B (en) Method and device for shooting picture during scanning question and terminal equipment
CN114418835A (en) Image processing method, apparatus, device and medium
CN114299105A (en) Image processing method, image processing device, computer equipment and storage medium
WO2023010981A1 (en) Encoding and decoding methods and apparatus
CN113538456B (en) Image soft segmentation and background replacement system based on GAN network
CN116228895B (en) Video generation method, deep learning model training method, device and equipment
CN115147850B (en) Training method of character generation model, character generation method and device thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant