WO2020258902A1 - Image generating and neural network training method, apparatus, device, and medium - Google Patents

Image generating and neural network training method, apparatus, device, and medium Download PDF

Info

Publication number
WO2020258902A1
WO2020258902A1 PCT/CN2020/076835 CN2020076835W WO2020258902A1 WO 2020258902 A1 WO2020258902 A1 WO 2020258902A1 CN 2020076835 W CN2020076835 W CN 2020076835W WO 2020258902 A1 WO2020258902 A1 WO 2020258902A1
Authority
WO
WIPO (PCT)
Prior art keywords
network unit
unit block
layer
network
content
Prior art date
Application number
PCT/CN2020/076835
Other languages
French (fr)
Chinese (zh)
Inventor
黄明杨
张昶旭
刘春晓
石建萍
Original Assignee
商汤集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 商汤集团有限公司 filed Critical 商汤集团有限公司
Priority to KR1020217017354A priority Critical patent/KR20210088656A/en
Priority to JP2021532473A priority patent/JP2022512340A/en
Publication of WO2020258902A1 publication Critical patent/WO2020258902A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation

Definitions

  • the present disclosure relates to the field of image processing, and in particular to an image generation method and neural network training method, device, electronic equipment, and computer storage medium.
  • the method of image generation can be to generate from one real image to another, and then subjectively judge whether the generated image is more realistic through human vision.
  • neural network-based image generation methods have emerged in related technologies.
  • the neural network can usually be trained based on paired data, and then the content image can be styled through the trained neural network.
  • paired data is represented by The content image and style image that have the same content characteristics for training, and the style image and the content image have different style characteristics.
  • this method is not easy to implement.
  • the embodiments of the present disclosure are expected to provide a technical solution for image generation.
  • an embodiment of the present disclosure provides an image generation method, the method includes: extracting content features of a content image by using sequentially connected multi-layer first network unit blocks in a first neural network to obtain first The content features respectively output by the network unit blocks; the style features of the style images are extracted; the content features respectively output by the first network unit blocks of each layer are correspondingly fed forward into the second neural network connected to the multilayer second network unit sequentially Block, and feed-forward the style features from the first-layer second network unit block in the multi-layer second network unit block, and obtain the input feature after each second network unit block processes The generated image output by the second neural network, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
  • the embodiments of the present disclosure also propose a neural network training method.
  • the method further includes: extracting the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain each The content features respectively output by the first network unit block of the layer; extract the style features of the style image; the content features respectively output by the first network unit blocks of each layer are fed forward into the sequentially connected multi-layer first neural network in the second neural network.
  • Two network unit blocks and feed forward the style features from the first-layer second network unit block in the multi-layer second network unit block, and after each second network unit block processes the respective input features Obtain the generated image output by the second neural network, where the multi-layer first network unit block corresponds to the multi-layer second network unit block; the generated image is identified to obtain the authentication result; Adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
  • an embodiment of the present disclosure also provides an image generation device.
  • the device includes a first extraction module, a second extraction module, and a first processing module.
  • the first extraction module is used to use the first neural network.
  • the first network unit blocks of multiple layers connected in sequence extract the content features of the content image to obtain the content features respectively output by the first network unit blocks of each layer;
  • the second extraction module is used to extract the style features of the style image;
  • the first processing Module used to feed forward the content features respectively outputted by the first network unit blocks of each layer into the second neural network connected in sequence in the second neural network, and transfer the style features from the multiple
  • the first layer second network unit block in the second layer of the network unit block feeds forward input, and the generated image output by the second neural network is obtained after each second network unit block processes the characteristics of each input, wherein
  • the multi-layer first network unit block corresponds to the multi-layer second network unit block.
  • the embodiments of the present disclosure also provide a neural network training device, which includes a third extraction module, a fourth extraction module, a second processing module, and an adjustment module; wherein, the third extraction module is used to use In the first neural network, the sequentially connected multi-layer first network unit blocks extract the content features of the content image, and obtain the content features respectively output by the first network unit blocks of each layer; the fourth extraction module is used to extract the style features of the style image
  • the second processing module is used to feed the content features respectively output by the first network unit blocks of each layer into the second neural network sequentially connected multi-layer second network unit blocks corresponding to the feedforward input, and the style features Feed forward input from the first-layer second network unit block in the multi-layer second network unit block, and obtain the generated image output by the second neural network after each of the second network unit blocks process the features of their respective inputs Identify the generated image to obtain an identification result; wherein, the multi-layer first network unit block corresponds to the multi-layer second network unit block; an adjustment module is used for
  • the embodiments of the present disclosure also propose an electronic device, including a processor and a memory for storing a computer program that can run on the processor; wherein, when the processor is used to run the computer program, execute Any one of the above image generation methods or any one of the above neural network training methods.
  • the embodiments of the present disclosure also propose a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, any one of the foregoing image generation methods or any of the foregoing neural network training methods is implemented.
  • the content features of the content image are extracted by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain The content features respectively output by the first network unit blocks of each layer; the style features of the style images are extracted; the content features respectively output by the first network unit blocks of each layer are correspondingly fed forward into the sequentially connected multilayers in the second neural network
  • the second network unit block, and the style feature is fed forward from the first-layer second network unit block in the multi-layer second network unit block, and the respective input features are processed by each of the second network unit blocks
  • the generated image output by the second neural network is obtained, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
  • both the content image and the style image can be determined in actual need, and the content image and the style image do not need to be a pair of images, which is easy to implement; in addition, in the process of image generation, the first neural network can be used The first network unit block of each layer extracts the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that the generated image retains more semantic information compared with the content image. Therefore, The generated image is more realistic.
  • FIG. 1 is a flowchart of an image generation method according to an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of the structure of a neural network pre-trained in an embodiment of the disclosure
  • FIG. 3 is an exemplary structural diagram of a content encoder according to an embodiment of the disclosure.
  • FIG. 4 is a schematic diagram of an exemplary structure of a CRB in an embodiment of the disclosure.
  • FIG. 5 is an exemplary structural diagram of a generator of an embodiment of the disclosure.
  • Fig. 6 shows several exemplary sets of content images, style images, and generated images in the embodiments of the disclosure
  • Fig. 7 is a flowchart of a neural network training method according to an embodiment of the disclosure.
  • FIG. 8 is a schematic structural diagram of the framework of the image generation method proposed by the application embodiment of the disclosure.
  • Fig. 9a is a schematic structural diagram of a residual block of a content encoder in an application embodiment of the present disclosure.
  • Fig. 9b is a schematic structural diagram of a residual block of the generator in an application embodiment of the present disclosure.
  • FIG. 9c is a schematic structural diagram of the FADE module of the application embodiment of the disclosure.
  • FIG. 10 is a schematic diagram of the composition structure of an image generating device according to an embodiment of the disclosure.
  • FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
  • FIG. 12 is a schematic diagram of the composition structure of a neural network training device according to an embodiment of the disclosure.
  • the terms "including”, “including” or any other variations thereof are intended to cover non-exclusive inclusion, so that a method or device including a series of elements not only includes what is clearly stated Elements, but also include other elements not explicitly listed, or elements inherent to the implementation of the method or device. Without more restrictions, the element defined by the sentence “including a" does not exclude the existence of other related elements (such as steps or steps in the method) in the method or device that includes the element.
  • the unit in the device for example, the unit may be part of a circuit, part of a processor, part of a program or software, etc.).
  • the image generation method and neural network training method provided by the embodiments of the present disclosure include a series of steps, but the image generation method and neural network training method provided by the embodiments of the present disclosure are not limited to the recorded steps.
  • the present disclosure The image generation device and neural network training device provided in the embodiments include a series of modules, but the devices provided in the embodiments of the present disclosure are not limited to include the explicitly recorded modules, and may also include information for obtaining relevant information or processing based on information. The module that needs to be set.
  • the embodiments of the present disclosure can be applied to a computer system composed of a terminal and a server, and can operate with many other general-purpose or special-purpose computing system environments or configurations.
  • the terminal can be a thin client, a thick client, a handheld or laptop device, a microprocessor-based system, a set-top box, a programmable consumer electronics product, a network personal computer, a vehicle-mounted device, a small computer system, etc.
  • the server can It is server computer system, small computer system, large computer system and distributed cloud computing technology environment including any of the above systems, etc.
  • Electronic devices such as terminals and servers can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system.
  • program modules may include routines, programs, object programs, components, logic, data structures, etc., which perform specific tasks or implement specific abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment. In the distributed cloud computing environment, tasks are executed by remote processing equipment linked through a communication network.
  • program modules may be located on a storage medium of a local or remote computing system including a storage device.
  • an image generation method is proposed.
  • the applicable scenarios of the embodiments of the present disclosure include but are not limited to automatic driving, image generation, image synthesis, computer vision, deep learning, Machine learning, etc.
  • FIG. 1 is a flowchart of an image generation method according to an embodiment of the disclosure. As shown in FIG. 1, the method may include:
  • Step 101 Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer.
  • the content image may be an image that requires style conversion; for example, the content image may be obtained from a local storage area or the content image may be obtained from the network.
  • the content image may be an image taken by a mobile terminal or a camera.
  • the format of the content image can be Joint Photographic Experts GROUP (JPEG), Bitmap (BMP), Portable Network Graphics (PNG) or other formats; it should be noted that this is only for The format and source of the content image are exemplified, and the embodiment of the present disclosure does not limit the format and source of the content image.
  • content features and style features can be extracted.
  • the content feature is used to characterize the content information of the image, for example, the content feature represents the object position, object shape, object size, etc. in the image
  • the style feature is used to represent the style information of the content image, for example, the style feature is used to represent weather, Style information such as day, night, and conversation style.
  • the style conversion may refer to the conversion of the style feature of the content image into another style feature.
  • the conversion of the style feature of the content image may be the conversion from day to night, and from night to day.
  • the conversion between styles can be from sunny to rainy, rainy to sunny, sunny to cloudy, cloudy to sunny, cloudy to rainy, rainy to cloudy, sunny to cloudy Snow conversion, snow to sunny conversion, cloudy to snow conversion, snow to cloudy conversion, snow to rain conversion or rain to snow conversion, etc.
  • the conversion of different painting styles can be oil painting Conversion to ink painting, ink painting to oil painting, oil painting to sketch painting, sketch painting to oil painting, sketch painting to ink painting or ink painting to sketch painting, etc.
  • the first neural network is a network for extracting content features of content images, and the embodiment of the present disclosure does not limit the type of the first neural network.
  • the first neural network includes sequentially connected multi-layer first network unit blocks.
  • the content characteristics of the content image can be changed from the first network unit block of the multi-layer first network.
  • Feed forward input of the first network unit block of the layer the data processing direction corresponding to the feedforward input represents the data processing direction from the input end of the neural network to the output end, corresponding to the forward propagation or forward propagation process; for the feedforward input process, the upper layer of the neural network unit block
  • the output result is used as the input result of the next layer of network unit block.
  • the first network unit block of each layer of the first neural network can extract content features for the input data, that is, the output result of the first network unit block of each layer of the first neural network is the corresponding layer first network
  • the content characteristics of the unit blocks and the content characteristics output by different first network unit blocks in the first neural network are different.
  • the representation mode of the content feature of the content image may be a content feature map or other representation mode, which is not limited in the embodiment of the present disclosure.
  • each layer of the first network unit block in the first neural network is a plurality of neural network layers organized in a residual structure, so that it can be based on multiple layers of the first network unit block in each layer organized in a residual structure
  • the neural network layer extracts the content features of the content image.
  • Step 102 Extract the style features of the style image.
  • the style image is an image with the target style feature
  • the target style feature represents the style feature to which the content image needs to be converted
  • the style image can be set as needed.
  • the target style feature to be converted after acquiring the content image, the target style feature to be converted can be determined, and then the style image can be selected according to the demand.
  • the style image can be obtained from the local storage area or the network.
  • the style image can be an image taken through a mobile terminal or camera;
  • the format of the style image can be JPEG, BMP, PNG or other formats; Yes, here is only an example of the format and source of the style image, and the embodiment of the present disclosure does not limit the format and source of the style image.
  • the style feature of the content image is different from the style feature of the style image
  • the purpose of performing style conversion on the content image may be: to make the generated image obtained after the style conversion have the content feature and style of the content image The stylistic characteristics of the image.
  • the extracting the style features of the style image includes: extracting the features of the style image distribution; sampling the features of the style image distribution to obtain the style features, and the style features include the style image distribution The mean and standard deviation of the features.
  • the style characteristics of the style image can be accurately extracted, which is conducive to accurate style conversion of the content image.
  • at least one layer of convolution operation may be performed on the style image to obtain the characteristics of the style image distribution.
  • Step 103 Feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and remove the style features from the multi-layer second network unit blocks
  • the first layer of the second network unit block feed forward input, and the generated image of the second neural network output is obtained after each second network unit block processes the respective input characteristics.
  • the multi-layer first network unit block and the multi-layer second The network unit block corresponds.
  • the second neural network includes successively connected multi-layer second network unit blocks, and the output result of the previous network unit block in the second neural network is the input result of the next network unit block; optionally,
  • the second network unit blocks of each layer in the second neural network are multiple neural network layers organized in a residual structure. In this way, it can be based on the multiple neural network layer pairs organized in a residual structure in each second network unit block.
  • the input features are processed.
  • step 101 to step 103 can be implemented by a processor in an electronic device.
  • the processor can be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), Digital signal processing device (Digital Signal Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), FPGA, central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor At least one.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD programmable logic device
  • FPGA central processing unit
  • CPU Central Processing Unit
  • controller microcontroller
  • microprocessor At least one.
  • both the content image and the style image can be determined according to actual needs, and the content image and the style image do not need to be a pair of images, which is easy to implement; in addition, in the process of image generation, The first network unit block of each layer of the first neural network is used to extract the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that the generated image retains more than the content image. Semantic information, therefore, the generated image is more realistic.
  • the style of the style image can be determined according to actual needs, and does not limit the style characteristics of the style image and the style characteristics of the style image used when training the neural network. That is to say, the training image of the dark night style is used when the neural network is trained, but when the image is generated based on the trained neural network, you can choose the content image and the snowy style, the rainy style or other styles Images, which generate images that meet the actual needs of the style, not only the dark night style images, and improve the generalization and universality of the image generation method.
  • style images with different style characteristics can be set according to user needs, and then generated images with different style characteristics can be obtained for one content image.
  • generated images with different style characteristics can be obtained for one content image.
  • dark night style, cloudy style and rainy style that is, based on the same content image, multiple styles of generated images can be obtained, not only one style of image can be generated, and the applicability of the image generation method is improved.
  • the number of layers of the first network unit block of the first neural network and the number of layers of the second network unit block of the second neural network may be the same, and the first network unit block of each layer of the first neural network It forms a one-to-one correspondence with the second network unit blocks of each layer of the second neural network.
  • the corresponding feedforward input of the content features respectively output by the first network unit blocks of each layer to the sequentially connected multi-layer second network unit blocks in the second neural network includes: sequentially responding to i In the case of 1 to T, the content features output by the first network unit block of the i-th layer are fed forward into the second network unit block of the T-i+1th layer, i is a positive integer, and T represents the first nerve
  • T represents the first nerve
  • the content features output by the first network unit block of the last layer are input to the second network unit block of the first layer.
  • the received content feature of the second network unit block of each layer in the second neural network is the output feature of the first network unit block of each layer of the first neural network, and the second network unit of each layer in the second neural network
  • the received content characteristics of the block vary with different positions in the second neural network.
  • the second neural network uses style features as input. As the style features deepen from the lower-level second network unit block of the second neural network to the high-level second network unit block, more content features can be integrated, which can be based on The style feature gradually merges the semantic information of each layer of the content image, so that the resulting image can retain the multi-layer voice information and style feature information of the content image.
  • the feature processing of the first-level second network unit block in each of the second network unit blocks includes: the content feature and the style feature from the last-level first network unit block can be multiplied , Obtain the intermediate feature of the first-level second network unit block; add the content feature from the first-level first network unit block of the last layer and the intermediate feature of the first-level second network unit block to obtain the first-level second network unit block
  • the output characteristics of the first layer of the second network unit block input the output characteristics of the second layer of the second network unit block.
  • a convolution operation may be performed on the content feature from the first network unit block of the last layer. That is, it is possible to first perform a convolution operation on the content features from the first network unit block of the last layer, and then perform a multiplication operation on the result of the convolution operation and the style feature.
  • the input feature processing of the middle layer second network unit block in each second network unit block includes: the input content feature and the output feature of the upper layer second network unit block can be multiplied , Get the intermediate feature of the second network unit block of the middle layer; add the input content feature and the intermediate feature of the second network unit block of the middle layer to obtain the output feature of the second network unit block of the middle layer; The output feature of the network unit block is input to the second network unit block of the next layer. It can be seen that by performing the above-mentioned multiplication operation and addition operation, it is convenient to realize the fusion of the output characteristics of the second network unit block of the upper layer and the corresponding content characteristics.
  • the second network unit block in the middle layer is the second network unit block in the second neural network except the first layer second network unit block and the last layer second network unit block.
  • the second neural network There can be one intermediate second network unit block, or there can be multiple second network unit blocks; the above-mentioned content is only an intermediate second network unit block as an example, the data of the intermediate second network unit block The processing procedure is explained.
  • the intermediate layer second network unit block performs a convolution operation on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block.
  • the input feature processing of the second network unit block of the last layer in each second network unit block includes: the content characteristics from the first network unit block of the first layer can be combined with the second network unit of the upper layer.
  • the output feature of the block is multiplied to obtain the intermediate feature of the second network unit block of the last layer; the content feature from the first network unit block of the first layer and the intermediate feature of the second network unit block of the last layer are added to obtain the generated image .
  • the second network unit block of the last layer performs a multiplication operation on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer, and performs the multiplication operation on the first network unit block from the first layer.
  • the content feature of the unit block is subjected to convolution operation.
  • FIG. 2 is a schematic structural diagram of a neural network pre-trained in an embodiment of the disclosure.
  • the pre-trained neural network includes a content encoder, a style encoder, and a generator; wherein the content encoder is used to utilize the first neural network described above.
  • the network extracts the content features of the content image, the style encoder is used to extract the style features of the style image, and the generator is used to use the second neural network to realize the fusion of the style features and the content features output by the first network unit blocks of each layer.
  • the first neural network can be used as the content encoder
  • the second neural network can be used as the generator
  • the neural network used for style feature extraction on the style image can be used as the style encoder.
  • the image to be processed ie, content image
  • the multi-layer first network unit block of the first neural network can be used for processing, and each layer of the first network unit block
  • the content feature can be output;
  • the style image can also be input into the style encoder, and the style feature of the style image can be extracted from the style encoder.
  • the first network unit block is a residual block (Residual Block, RB)
  • the content feature output by the first network unit block of each layer is a content feature map.
  • Fig. 3 is a schematic diagram of an exemplary structure of the content encoder according to the embodiment of the disclosure.
  • the residual block of the content encoder can be marked as CRB, and the content encoder includes seven layers of CRB, the CRB( In A, B), A represents the number of input channels and B represents the number of output channels; in Figure 3, the input of CRB(3,64) is the content image, and the first CRB to the seventh CRB are arranged from bottom to top.
  • the first layer CRB to the seventh layer CRB can output seven content feature maps respectively.
  • FIG. 4 is an exemplary structural diagram of the CRB of an embodiment of the disclosure.
  • sync BN represents a synchronous BN layer
  • a rectified linear unit (ReLu) represents a ReLu layer
  • Conv represents a convolutional layer. Represents a summation operation; the structure of CRB shown in Figure 4 is the structure of a standard residual block.
  • a standard residual network structure can be used to extract content features, which facilitates the extraction of content features of content images and reduces semantic information loss.
  • the multi-layer second network unit block of the second neural network can be used for processing; for example, the second network unit block is RB.
  • FIG. 5 is an exemplary structural diagram of the generator of the embodiment of the disclosure.
  • the residual block in the generator can be denoted as GB, and the generator can include seven layers of GB, and the input of each layer of GB is The output of one layer of CRB of the content encoder; in the generator, the first layer GB to the seventh layer GB are GB ResBlk (1024), GB ResBlk (1024), GB ResBlk (1024), arranged from top to bottom, respectively GB ResBlk (512), GB ResBlk (256), GB ResBlk (128), and GB ResBlk (64); in GB ResBlk (C) in Figure 5, C represents the number of channels; the first layer of GB is used to receive style features, The first layer GB to the seventh layer GB are used to receive the content feature maps output from the seventh layer CRB to the first layer CRB; after each layer GB processes the input features, the seventh layer GB output can be used to generate an image.
  • the structural information of the content image can be encoded to generate multiple content feature maps of different levels; the content encoder can extract more abstract features in the deep layer, and in the surface layer A lot of structural information is retained.
  • the image generation method of the embodiment of the present disclosure can be applied to various image generation scenarios, for example, can be applied to scenarios such as image entertainment data generation, automatic driving model training test data generation, and the like.
  • Figure 6 shows several exemplary sets of content images, style images, and generated images in the embodiments of the present disclosure.
  • the first column represents content images
  • the second column represents style images
  • the third column represents implementations based on the present disclosure.
  • the generated image obtained by the image generation method of the example the image in the same row represents a group of content images, style images and generated images; the style conversion from the first row to the last row is from day to night, night to day, sunny to rainy. , Rainy to sunny, sunny to cloudy, cloudy to sunny, sunny to snow, and snowy to sunny style conversion, as can be seen from FIG. 6, the generated image based on the image generation method of the embodiment of the present disclosure can be The content information of the content image and the style information of the style image are retained.
  • the training process of the neural network in the embodiments of the present disclosure not only the forward propagation process from input to output is involved, but also the back propagation process from output to input; the training process of the neural network of the present disclosure can be used before Use the reverse process to generate images and use the reverse process to adjust the network parameters of the neural network.
  • FIG. 7 is a flowchart of a neural network training method according to an embodiment of the disclosure. As shown in FIG. 7, the process may include:
  • Step 701 Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer.
  • Step 702 Extract the style features of the style image.
  • Step 703 Feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and remove the style features from the multi-layer second network unit blocks
  • the first layer of the second network unit block feed forward input, and the generated image of the second neural network output is obtained after each second network unit block processes the respective input characteristics.
  • the multi-layer first network unit block and the multi-layer second The network unit block corresponds.
  • steps 701 to 703 in this embodiment is the same as the implementation of steps 101 to 103, and will not be repeated here.
  • Step 704 Discriminate the generated image, and obtain an identification result.
  • the output image generated by the generator needs to be identified.
  • the purpose of discriminating the generated image is to determine the probability that the generated image is a real image; in practical applications, this step can be implemented using a discriminator or the like.
  • Step 705 Adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
  • the network parameters of the first neural network and/or the second neural network can be adjusted based on the reverse process according to the content image, style image, generated image, and identification result, and then the forward process can be used to obtain the generated image again And the identification result, so, through the above-mentioned forward process and reverse process repeatedly, the network iterative optimization of the neural network is performed until the predetermined training completion conditions are met, and the trained neural network for image generation can be obtained. .
  • steps 701 to 705 can be implemented by a processor in an electronic device.
  • the aforementioned processor can be at least ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor.
  • ASIC ASIC
  • DSP digital signal processor
  • DSPD DSPD
  • PLD PLD
  • FPGA field-programmable gate array
  • both the content image and the style image can be determined according to actual needs, and the content image and the style image do not need to be a pair of images, which is easy to implement.
  • the first network unit blocks of each layer of the first neural network can be used to extract the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that Compared with the content image, the generated image retains more semantic information; in turn, the trained neural network can better maintain the semantic information of the content image.
  • the parameters of the above-mentioned multiplication operation and/or addition operation used in the second network unit block of each layer can be adjusted.
  • the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result includes : Determine the Generative Adversarial Net (GAN) loss according to the content image, style image, generated image, and identification results; wherein the Generative Adversarial Net (GAN) loss is used to characterize the difference in content characteristics between the generated image and the content image.
  • the difference in style characteristics between the generated image and the style image in one example, the generated confrontation network includes a generator and a discriminator; in response to the loss of the generated confrontation network that does not meet the first predetermined condition, adjust the first Network parameters of the neural network and/or the second neural network.
  • the network parameters of the first neural network and/or the second neural network can be adjusted based on the generation of countermeasures against the network loss, and a minimax strategy can be adopted.
  • the first predetermined condition may represent a predetermined training completion condition; it is understandable that according to the meaning of generating a confrontation network loss, training the neural network based on the loss of the generation confrontation network can make the generated image obtained based on the trained neural network, It has a high performance of maintaining the content characteristics of the content image and the style characteristics of the style image.
  • the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result further includes: According to the generated image and the style image, determine the style loss; in response to the situation that the style loss does not meet the second predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the style loss; wherein, the style loss It is used to characterize the difference between the style characteristics of the generated image and the style image.
  • the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result further includes: Determine the content loss according to the generated image and the content image; in response to the content loss not meeting the third predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the content loss; wherein the content loss is used for Characterize the difference in content characteristics between the generated image and the content image.
  • the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result further includes: Determine the feature matching loss according to the output features of each intermediate layer second network unit block in each second network unit block and the style image; in response to the feature matching loss not satisfying the fourth predetermined condition, adjust according to the feature matching loss The network parameters of the first neural network and/or the second neural network; wherein, the feature matching loss is used to characterize the difference between the output feature of the second network unit block of each intermediate layer and the style feature of the style image.
  • the aforementioned second predetermined condition, third predetermined condition, and fourth predetermined condition may represent predetermined training completion conditions; it is understandable that according to the meaning of style loss, content loss or feature matching loss, it can be known that based on style loss, content loss or feature
  • the matching loss training neural network can make the generated image based on the trained neural network have a higher performance of maintaining the content characteristics of the content image.
  • a neural network can be trained based on the foregoing one loss or multiple losses.
  • the trained neural network can be obtained when the loss meets the corresponding predetermined condition;
  • the accuracy of the style conversion of the trained neural network is higher.
  • the generation of countermeasure network loss, style loss, content loss, or feature matching loss can be represented by a loss function.
  • the training process of the neural network method can be implemented based on the content encoder, style encoder, generator and discriminator, etc., and the process of image generation based on the completion of the training of the neural network method can be based on the content encoder, style encoding Implements such as generators and generators.
  • FIG. 8 is a schematic structural diagram of the framework of the image generation method proposed by the application embodiment of the disclosure.
  • the input of the content encoder is the image to be processed (that is, the content image), which is used to extract the content characteristics of the content image;
  • the encoder is responsible for extracting the style features of the style image;
  • the generator combines the content features and style features of the first network unit blocks of different layers to generate high-quality images.
  • the discriminator used in the neural network training process is not shown in FIG. 8.
  • the content encoder includes multiple layers of residual blocks, CRB-1, CRB-2...CRB-T respectively represent the layer 1 residual block to the T layer residual block of the content encoder; generate The generator includes multiple layers of residual blocks, GB-1...GB-T-1, GB-T respectively represent the first layer to the T-th residual block of the generator.
  • the output result of the i-th layer residual block of the content encoder is input to the T-i+1th layer residual block of the generator; the input of the style encoder is style
  • the image is used to extract the style feature of the style image, and the style feature is input into the first layer residual block of the generator.
  • the output image is obtained based on the output result of the T-th layer residual block GB-T of the generator.
  • f i is defined as the content feature map output from the i-th layer residual block of the content encoder, using Represents the output characteristics of the i-th residual block of the generator.
  • the i-th residual block of the content encoder corresponds to the T-i+1-th layer residual block of the generator;
  • F i with the same number of channels, N denotes the size of the batch,
  • C i represents the number of the channel;
  • H i and W i indicates the height and width, respectively.
  • the activation value (n ⁇ [1,N], c ⁇ [1,C i ], h ⁇ [1,H i ], ⁇ [1,W i ]) can be expressed as formula (1).
  • Both correspond to the i-th residual block of the generator, and respectively represent the mean and standard deviation of the features output by the residual block of the previous layer (that is, the residual block of the second neural network), with It can be calculated according to formula (2).
  • the image generation method of the application embodiment of the present disclosure is feature-adaptive, that is, the modulation parameter can be calculated directly based on the content characteristics of the content image; and in the related image generation method, the modulation The parameters are unchanged.
  • Figure 9a is a schematic structural diagram of the residual block of the content encoder in the application embodiment of the disclosure.
  • BN represents the BN layer
  • ReLu represents the ReLu layer
  • Conv represents the convolutional layer. Represents the summation operation;
  • the structure of each residual block CRB of the content encoder is the structure of the standard residual block, and each residual block of the content encoder includes three convolutional layers, one of which is used to skip the connection (skip connection).
  • Fig. 9b is a schematic structural diagram of the residual block of the generator in the application embodiment of the present disclosure, as shown in Fig. 9b, in the standard residual block
  • the FADE module is used to replace the BN layer to obtain the structure of the residual block GB of each layer of the generator
  • F1, F2 and F3 represent the first FADE module, the second FADE module and the third FADE module, respectively.
  • each FADE module in each residual block of the generator, the input of each FADE module includes the corresponding content feature map output by the content encoder, refer to Figure 9b, in each residual block of the generator, in the generator Among the three FADE modules of each residual block, the input of F1 and F2 also includes the output characteristics of the previous residual block of the second neural network, and the input of F3 also includes the F1, ReLu layer and convolutional layer in turn Features obtained after processing.
  • Fig. 9c is a schematic diagram of the structure of the FADE module of the application embodiment of the present disclosure. Represents the multiplication operation, Means addition; Conv means convolutional layer, BN means BN layer; ⁇ and ⁇ represent the modulation parameters of each residual block of the generator. It can be seen that FADE takes the content feature map as input, which can be obtained from the convolutional features Derive denormalization parameters.
  • the trained neural network is made to adaptively transform the content image under the control of the style image.
  • the style encoder is proposed based on the Variational Adaptive Encoder (VAE).
  • VAE Variational Adaptive Encoder
  • the output of the style encoder is a mean vector And standard deviation vector Latent code (latent code) z is derived from the resampling of style images after encoding
  • be a uniformly distributed random vector with the same size as z; here, ⁇ ( ⁇
  • various parts of the entire neural network can be jointly trained.
  • the loss function of the entire first neural network can be calculated by referring to formula (3) based on the optimization of the minimax strategy, and then the training of the first neural network can be realized.
  • G represents the generator
  • D represents the discriminator
  • L VAE (E s , G) represents the style loss.
  • the style loss can be the loss of Kullback-Leibler divergence;
  • L VAE (E s , G) can be calculated according to formula (4).
  • KL( ⁇ ) represents the KL divergence
  • ⁇ 0 represents the hyperparameter in L VAE (E s ,G).
  • L GAN (E s ,E c ,G,D) represents the loss of the generated adversarial network, which is used in the adversarial training of the generator and discriminator;
  • L GAN (E s ,E c ,G,D) can be based on the formula ( 5) Perform calculations.
  • L VGG (E s , E c , G) represents content loss.
  • the content loss may be a VGG (Visual Geometry Group) loss.
  • L VGG (E s , E c , G) can be calculated according to formula (6).
  • L FM (E s , E c , G) represents the feature matching loss;
  • L FM (E s , E c , G) can be calculated according to formula (7).
  • the VGG loss has different weights in different layers.
  • the first neural network is trained based on multi-scale discriminators, and each discriminator on different scales has exactly the same structure; the discriminator with the roughest scale has the largest receptive field; The discriminator can distinguish higher-resolution images.
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • FIG. 10 is a schematic diagram of the composition structure of an image generation device according to an embodiment of the disclosure. As shown in FIG. 10, the device includes: a first extraction module 1001, a second extraction module 1002, and a first processing module 1003, wherein:
  • the first extraction module 1001 is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
  • the second extraction module 1002 is used to extract style features of style images
  • the first processing module 1003 is configured to feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and combine the The style features are fed forward from the first layer second network unit block in the multi-layer second network unit block, and the second neural network output is obtained after each second network unit block processes the respective input features An image is generated, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
  • the first processing module 1003 is configured to feed forward the content features output by the first network unit block of the i-th layer to the T-i+1th layer in response to the situation that i takes 1 to T in sequence.
  • i is a positive integer
  • T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
  • the first-level second network unit block in each of the second network unit blocks is used to multiply the content feature from the last-level first network unit block and the style feature to obtain the Intermediate features of the first-level second network unit block; add the content features from the first-level first-level network unit block and the intermediate features of the first-level second-level network unit block to obtain the first-level second network
  • the output characteristics of the unit block input the output characteristics of the first layer second network unit block into the second layer second network unit block.
  • the first-layer second network unit block is also used to perform multiplication operations on the first network unit from the last layer before the content feature from the first network unit block at the last layer and the style feature are multiplied.
  • the content feature of the block is subjected to convolution operation.
  • the middle layer second network unit block in each of the second network unit blocks is used to multiply the input content feature and the output feature of the second network unit block of the upper layer to obtain the Intermediate features of the second network unit block of the middle layer; add the content features of the input and the intermediate features of the second network unit block of the middle layer to obtain the output features of the second network unit block of the middle layer; The output characteristics of the second network unit block of the middle layer are input to the second network unit block of the next layer.
  • the middle layer second network unit block is further configured to perform multiplication on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block Convolution operation.
  • the last-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the first-level first network unit block and the output characteristics of the upper-level second network unit block Perform a multiplication operation to obtain the intermediate feature of the second network unit block of the last layer; add the content feature from the first network unit block of the first layer and the intermediate feature of the last second network unit block to obtain The generated image.
  • the second network unit block of the last layer is used to perform multiplication operations on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer.
  • the content features from the first network unit block of the first layer are convolved.
  • the second extraction module 1002 is configured to extract features of the style image distribution; sampling the features of the style image distribution to obtain the style feature, the style feature includes the style image distribution The mean and standard deviation of the features.
  • the first network unit block is configured to extract content features of content images based on multiple neural network layers organized in a residual structure in the first network unit block; and/or, the second network The unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
  • the first extraction module 1001, the second extraction module 1002, and the first processing module 1003 can all be implemented by processors.
  • the aforementioned processors can be ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, micro At least one of a controller and a microprocessor.
  • the functional modules in this embodiment can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software function module.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can A personal computer, server, or network device, etc.) or a processor (processor) executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • the computer program instructions corresponding to an image generation method or neural network training method in this embodiment can be stored on storage media such as optical disks, hard disks, and USB flash drives.
  • storage media such as optical disks, hard disks, and USB flash drives.
  • FIG. 11 shows an electronic device 11 provided by an embodiment of the present disclosure.
  • the electronic device 11 includes: a memory 111 and a processor 112; wherein, the memory 111 is used for A computer program is stored; the processor 112 is configured to execute the computer program stored in the memory to implement any image generation method or any neural network training method in the foregoing embodiments.
  • the various components in the electronic device 11 may be coupled together through a bus system. It can be understood that the bus system is used to realize the connection and communication between these components.
  • the bus system also includes a power bus, a control bus, and a status signal bus.
  • various buses are marked as bus systems in FIG. 11.
  • the aforementioned memory 111 may be a volatile memory (volatile memory), such as RAM; or a non-volatile memory (non-volatile memory), such as ROM, flash memory, or hard disk (Hard Disk). Drive, HDD) or Solid-State Drive (SSD); or a combination of the foregoing types of memories, and provide instructions and data to the processor 112.
  • volatile memory volatile memory
  • non-volatile memory non-volatile memory
  • ROM read-only memory
  • flash memory read-only memory
  • HDD hard disk
  • SSD Solid-State Drive
  • the aforementioned processor 112 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It can be understood that, for different devices, the electronic devices used to implement the above-mentioned processor functions may also be other, which is not specifically limited in the embodiment of the present disclosure.
  • FIG. 12 is a schematic diagram of the composition structure of a neural network training device according to an embodiment of the disclosure. As shown in FIG. 12, the device includes: a third extraction module 1201, a fourth extraction module 1202, a second processing module 1203, and an adjustment module 1204; among them,
  • the third extraction module 1201 is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
  • the fourth extraction module 1202 is used to extract style features of the style image
  • the second processing module 1203 is configured to feed forward the content characteristics respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and combine the The style feature is fed forward from the first layer second network unit block in the multi-layer second network unit block, and the output of the second neural network is obtained after each second network unit block processes the respective input features Generate an image; identify the generated image to obtain an authentication result; wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block;
  • the adjustment module 1204 is configured to adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
  • the second processing module 1203 is configured to feed forward the content features output by the first network unit block of the i-th layer to the T-i+1th layer in response to the situation where i takes 1 to T in sequence.
  • i is a positive integer
  • T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
  • the first-level second network unit block in each of the second network unit blocks is used to multiply the content feature from the last-level first network unit block and the style feature to obtain the Intermediate features of the first-level second network unit block; add the content features from the first-level first-level network unit block and the intermediate features of the first-level second-level network unit block to obtain the first-level second network
  • the output characteristics of the unit block input the output characteristics of the first layer second network unit block into the second layer second network unit block.
  • the first-layer second network unit block is also used to perform multiplication operations on the first network unit from the last layer before the content feature from the first network unit block at the last layer and the style feature are multiplied.
  • the content feature of the block is subjected to convolution operation.
  • the middle layer second network unit block in each of the second network unit blocks is used to multiply the input content feature and the output feature of the second network unit block of the upper layer to obtain the Intermediate features of the second network unit block of the middle layer; add the content features of the input and the intermediate features of the second network unit block of the middle layer to obtain the output features of the second network unit block of the middle layer; The output characteristics of the second network unit block of the middle layer are input to the second network unit block of the next layer.
  • the middle layer second network unit block is further configured to perform multiplication on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block Convolution operation.
  • the last-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the first-level first network unit block and the output characteristics of the upper-level second network unit block Perform a multiplication operation to obtain the intermediate feature of the second network unit block of the last layer; add the content feature from the first network unit block of the first layer and the intermediate feature of the last second network unit block to obtain The generated image.
  • the last-level second network unit block is also used to perform multiplication operations on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer.
  • the content feature from the first network unit block of the first layer is subjected to a convolution operation.
  • the adjustment module 1204 is configured to adjust the multiplication operation parameter and/or the addition operation parameter.
  • the adjustment module 1204 is configured to determine, according to the content image, the style image, the generated image, and the identification result, to generate a countermeasure network loss; in response to the generation countermeasure network loss that does not meet the first Under a predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the loss of the generated confrontation network; wherein the loss of the generated confrontation network is used to characterize the difference between the generated image and The content feature difference of the content image, and the style feature difference between the generated image and the style image.
  • the adjustment module 1204 is further configured to determine a style loss according to the generated image and the style image; in response to the situation that the style loss does not meet a second predetermined condition, adjust according to the style loss The network parameters of the first neural network and/or the second neural network; wherein the style loss is used to characterize the difference between the style characteristics of the generated image and the style image.
  • the adjustment module 1204 is further configured to determine the content loss according to the generated image and the content image; in response to the content loss not satisfying a third predetermined condition, adjust the content loss according to the content loss The network parameters of the first neural network and/or the second neural network; wherein the content loss is used to characterize the content feature difference between the generated image and the content image.
  • the adjustment module 1204 is further configured to determine the feature matching loss according to the output feature of each intermediate layer second network unit block in each second network unit block and the style image; in response to the feature If the matching loss does not meet the fourth predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the feature matching loss; wherein, the feature matching loss is used to characterize the The difference between the output feature of the second network unit block of each middle layer and the style feature of the style image.
  • the fourth extraction module 1202 is configured to extract features of the style image distribution; sampling the features of the style image distribution to obtain the style feature, the style feature includes the style image distribution The mean and standard deviation of the features.
  • the first network unit block is configured to extract content features of content images based on multiple neural network layers organized in a residual structure in the first network unit block; and/or, the second network The unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
  • the third extraction module 1201, the fourth extraction module 1202, the second processing module 1203, and the adjustment module 1204 can all be implemented by a processor, and the processor can be ASIC, DSP, DSPD, PLD, FPGA, CPU , At least one of a controller, a microcontroller, and a microprocessor.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the embodiment of the present disclosure further provides a computer storage medium, such as the memory 111 including a computer program, which can be executed by the processor 112 of the electronic device 11 to complete the steps described in the foregoing method.
  • the computer-readable storage medium can be FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM, etc.; it can also be a variety of devices including one or any combination of the foregoing memories, such as Mobile phones, computers, tablet devices, personal digital assistants, etc.
  • the embodiments of the present disclosure provide a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, any image generation method or any neural network training method in the foregoing embodiments is implemented.
  • the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present invention.
  • a terminal which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Abstract

An image generating method, a neural network training method, an apparatus, an electronic device, and a computer storage medium. The image generating method comprises: extracting content features of a content image by utilizing multiple first network unit blocks sequentially connected in a first neural network to obtain the content features outputted by the first network unit blocks (101); extracting a style feature of a style image (102); correspondingly feedforward inputting the content features respectively outputted by the first network unit blocks into multiple second network unit blocks sequentially connected in a second neural network, feedforward inputting the style feature from the first second network unit block in the multiple second network unit blocks, and, when the second network unit blocks process the features inputted by each, obtaining a generated image outputted by the second neural network, where the multiple first network unit blocks correspond to the multiple second network unit blocks (103).

Description

图像生成和神经网络训练方法、装置、设备和介质Image generation and neural network training method, device, equipment and medium
相关申请的交叉引用Cross references to related applications
本公开基于申请号为201910551145.3、申请日为2019年6月24日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本公开。This disclosure is filed based on a Chinese patent application with an application number of 201910551145.3 and an application date of June 24, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this disclosure by way of introduction.
技术领域Technical field
本公开涉及图像处理领域,尤其涉及一种图像生成方法和神经网络训练方法、装置、电子设备及计算机存储介质。The present disclosure relates to the field of image processing, and in particular to an image generation method and neural network training method, device, electronic equipment, and computer storage medium.
背景技术Background technique
图像生成的方法可以是从一张真实图生成到另外一张图,然后通过人为视觉主观评判生成图是否更加真实。随着神经网络的应用,相关技术中出现了基于神经网络的图像生成方法,通常可以基于成对数据训练神经网络,再通过训练的神经网络对内容图像进行风格转换,这里,成对数据表示用于训练的具有相同内容特征的内容图像和风格图像,且风格图像与内容图像的风格特征不同。然而,实际场景中,很少出现上述成对数据,因而,这种方法不便于实现。The method of image generation can be to generate from one real image to another, and then subjectively judge whether the generated image is more realistic through human vision. With the application of neural networks, neural network-based image generation methods have emerged in related technologies. The neural network can usually be trained based on paired data, and then the content image can be styled through the trained neural network. Here, paired data is represented by The content image and style image that have the same content characteristics for training, and the style image and the content image have different style characteristics. However, in actual scenarios, the above-mentioned paired data rarely occurs, so this method is not easy to implement.
发明内容Summary of the invention
本公开实施例期望提供图像生成的技术方案。The embodiments of the present disclosure are expected to provide a technical solution for image generation.
第一方面,本公开实施例提供了一种图像生成方法,所述方法包括:利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;提取风格图像的风格特征;将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应。In a first aspect, an embodiment of the present disclosure provides an image generation method, the method includes: extracting content features of a content image by using sequentially connected multi-layer first network unit blocks in a first neural network to obtain first The content features respectively output by the network unit blocks; the style features of the style images are extracted; the content features respectively output by the first network unit blocks of each layer are correspondingly fed forward into the second neural network connected to the multilayer second network unit sequentially Block, and feed-forward the style features from the first-layer second network unit block in the multi-layer second network unit block, and obtain the input feature after each second network unit block processes The generated image output by the second neural network, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
第二方面,本公开实施例还提出了一种神经网络训练方法,所述方法还包括:利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;提取风格图像的风格特征;将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应;对所述生成图像进行鉴别,得出鉴别结果;根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数。In the second aspect, the embodiments of the present disclosure also propose a neural network training method. The method further includes: extracting the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain each The content features respectively output by the first network unit block of the layer; extract the style features of the style image; the content features respectively output by the first network unit blocks of each layer are fed forward into the sequentially connected multi-layer first neural network in the second neural network. Two network unit blocks, and feed forward the style features from the first-layer second network unit block in the multi-layer second network unit block, and after each second network unit block processes the respective input features Obtain the generated image output by the second neural network, where the multi-layer first network unit block corresponds to the multi-layer second network unit block; the generated image is identified to obtain the authentication result; Adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
第三方面,本公开实施例还提出了一种图像生成装置,所述装置包括第一提取模块、第二提取模块和第一处理模块,其中,第一提取模块,用于利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;第二提取模块,用于提取风格图像的风格特征;第一处理模块,用于将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应。In a third aspect, an embodiment of the present disclosure also provides an image generation device. The device includes a first extraction module, a second extraction module, and a first processing module. The first extraction module is used to use the first neural network. The first network unit blocks of multiple layers connected in sequence extract the content features of the content image to obtain the content features respectively output by the first network unit blocks of each layer; the second extraction module is used to extract the style features of the style image; the first processing Module, used to feed forward the content features respectively outputted by the first network unit blocks of each layer into the second neural network connected in sequence in the second neural network, and transfer the style features from the multiple The first layer second network unit block in the second layer of the network unit block feeds forward input, and the generated image output by the second neural network is obtained after each second network unit block processes the characteristics of each input, wherein The multi-layer first network unit block corresponds to the multi-layer second network unit block.
第四方面,本公开实施例还提出了一种神经网络训练装置,所述装置包括第三提取模块、第四提取模块、第二处理模块和调整模块;其中,第三提取模块,用于利用第一神经网络中顺次连接的 多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;第四提取模块,用于提取风格图像的风格特征;第二处理模块,用于将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像;对所述生成图像进行鉴别,得出鉴别结果;其中,所述多层第一网络单元块与所述多层第二网络单元块对应;调整模块,用于根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数。In a fourth aspect, the embodiments of the present disclosure also provide a neural network training device, which includes a third extraction module, a fourth extraction module, a second processing module, and an adjustment module; wherein, the third extraction module is used to use In the first neural network, the sequentially connected multi-layer first network unit blocks extract the content features of the content image, and obtain the content features respectively output by the first network unit blocks of each layer; the fourth extraction module is used to extract the style features of the style image The second processing module is used to feed the content features respectively output by the first network unit blocks of each layer into the second neural network sequentially connected multi-layer second network unit blocks corresponding to the feedforward input, and the style features Feed forward input from the first-layer second network unit block in the multi-layer second network unit block, and obtain the generated image output by the second neural network after each of the second network unit blocks process the features of their respective inputs Identify the generated image to obtain an identification result; wherein, the multi-layer first network unit block corresponds to the multi-layer second network unit block; an adjustment module is used for the content image, the Adjusting the network parameters of the first neural network and/or the second neural network for the style image, the generated image and the identification result.
第五方面,本公开实施例还提出了一种电子设备,包括处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,所述处理器用于运行所述计算机程序时,执行上述任意一种图像生成方法或上述任意一种神经网络训练方法。In a fifth aspect, the embodiments of the present disclosure also propose an electronic device, including a processor and a memory for storing a computer program that can run on the processor; wherein, when the processor is used to run the computer program, execute Any one of the above image generation methods or any one of the above neural network training methods.
第六方面,本公开实施例还提出了一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述任意一种图像生成方法或上述任意一种神经网络训练方法。In a sixth aspect, the embodiments of the present disclosure also propose a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, any one of the foregoing image generation methods or any of the foregoing neural network training methods is implemented.
本公开实施例提出的图像生成方法和神经网络训练方法、装置、电子设备、计算机存储介质中,利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;提取风格图像的风格特征;将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应。在本公开实施例中,内容图像和风格图像均可以实际需要确定,内容图像和风格图像并不需要是成对图像,如此便于实现;另外,在图像生成的过程中,可以利用第一神经网络的各层第一网络单元块对内容图像的内容特征进行多次提取,进而保留了内容图像的更多的语义信息,使得生成图像与内容图像相比,保留了较多的语义信息,因而,生成图像更为真实。In the image generation method and neural network training method, device, electronic equipment, and computer storage medium proposed in the embodiments of the present disclosure, the content features of the content image are extracted by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain The content features respectively output by the first network unit blocks of each layer; the style features of the style images are extracted; the content features respectively output by the first network unit blocks of each layer are correspondingly fed forward into the sequentially connected multilayers in the second neural network The second network unit block, and the style feature is fed forward from the first-layer second network unit block in the multi-layer second network unit block, and the respective input features are processed by each of the second network unit blocks Then, the generated image output by the second neural network is obtained, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block. In the embodiments of the present disclosure, both the content image and the style image can be determined in actual need, and the content image and the style image do not need to be a pair of images, which is easy to implement; in addition, in the process of image generation, the first neural network can be used The first network unit block of each layer extracts the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that the generated image retains more semantic information compared with the content image. Therefore, The generated image is more realistic.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, rather than limiting the present disclosure.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。The drawings herein are incorporated into the specification and constitute a part of the specification. These drawings illustrate embodiments that conform to the disclosure and are used together with the specification to explain the technical solutions of the disclosure.
图1为本公开实施例的图像生成方法的流程图;FIG. 1 is a flowchart of an image generation method according to an embodiment of the disclosure;
图2为本公开实施例预先训练的神经网络的结构示意图;2 is a schematic diagram of the structure of a neural network pre-trained in an embodiment of the disclosure;
图3为本公开实施例的内容编码器的一个示例性的结构示意图;FIG. 3 is an exemplary structural diagram of a content encoder according to an embodiment of the disclosure;
图4为本公开实施例的CRB的一个示例性的结构示意图;FIG. 4 is a schematic diagram of an exemplary structure of a CRB in an embodiment of the disclosure;
图5为本公开实施例的生成器的一个示例性的结构示意图;FIG. 5 is an exemplary structural diagram of a generator of an embodiment of the disclosure;
图6为本公开实施例中几组示例性的内容图像、风格图像和生成图像;Fig. 6 shows several exemplary sets of content images, style images, and generated images in the embodiments of the disclosure;
图7为本公开实施例的神经网络的训练方法的流程图;Fig. 7 is a flowchart of a neural network training method according to an embodiment of the disclosure;
图8为本公开应用实施例提出的图像生成方法的框架的结构示意图;8 is a schematic structural diagram of the framework of the image generation method proposed by the application embodiment of the disclosure;
图9a为本公开应用实施例中内容编码器的残差块的结构示意图;Fig. 9a is a schematic structural diagram of a residual block of a content encoder in an application embodiment of the present disclosure;
图9b为本公开应用实施例中生成器的残差块的结构示意图;Fig. 9b is a schematic structural diagram of a residual block of the generator in an application embodiment of the present disclosure;
图9c为本公开应用实施例的FADE模块的结构示意图;FIG. 9c is a schematic structural diagram of the FADE module of the application embodiment of the disclosure;
图10为本公开实施例的图像生成装置的组成结构示意图;FIG. 10 is a schematic diagram of the composition structure of an image generating device according to an embodiment of the disclosure;
图11为本公开实施例的电子设备的结构示意图;FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure;
图12为本公开实施例的神经网络训练装置的组成结构示意图。FIG. 12 is a schematic diagram of the composition structure of a neural network training device according to an embodiment of the disclosure.
具体实施方式Detailed ways
以下结合附图及实施例,对本公开实施例进行进一步详细说明。应当理解,此处所提供的实施例仅仅用以解释本公开实施例,并不用于限定本公开实施例。另外,以下所提供的实施例是用于实施本公开的部分实施例,而非提供实施本公开的全部实施例,在不冲突的情况下,本公开实施例记载的技术方案可以任意组合的方式实施。The embodiments of the present disclosure will be described in further detail below with reference to the drawings and embodiments. It should be understood that the embodiments provided herein are only used to explain the embodiments of the present disclosure, and are not used to limit the embodiments of the present disclosure. In addition, the embodiments provided below are part of the embodiments for implementing the present disclosure, rather than providing all the embodiments for implementing the present disclosure. In the case of no conflict, the technical solutions described in the embodiments of the present disclosure can be combined in any manner. Implement.
需要说明的是,在本公开实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排 他性的包含,从而使得包括一系列要素的方法或者装置不仅包括所明确记载的要素,而且还包括没有明确列出的其他要素,或者是还包括为实施方法或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括该要素的方法或者装置中还存在另外的相关要素(例如方法中的步骤或者装置中的单元,例如的单元可以是部分电路、部分处理器、部分程序或软件等等)。It should be noted that in the embodiments of the present disclosure, the terms "including", "including" or any other variations thereof are intended to cover non-exclusive inclusion, so that a method or device including a series of elements not only includes what is clearly stated Elements, but also include other elements not explicitly listed, or elements inherent to the implementation of the method or device. Without more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other related elements (such as steps or steps in the method) in the method or device that includes the element. The unit in the device, for example, the unit may be part of a circuit, part of a processor, part of a program or software, etc.).
例如,本公开实施例提供的图像生成方法和神经网络训练方法包含了一系列的步骤,但是本公开实施例提供的图像生成方法和神经网络训练方法不限于所记载的步骤,同样地,本公开实施例提供的图像生成装置和神经网络训练装置包括了一系列模块,但是本公开实施例提供的装置不限于包括所明确记载的模块,还可以包括为获取相关信息、或基于信息进行处理时所需要设置的模块。For example, the image generation method and neural network training method provided by the embodiments of the present disclosure include a series of steps, but the image generation method and neural network training method provided by the embodiments of the present disclosure are not limited to the recorded steps. Similarly, the present disclosure The image generation device and neural network training device provided in the embodiments include a series of modules, but the devices provided in the embodiments of the present disclosure are not limited to include the explicitly recorded modules, and may also include information for obtaining relevant information or processing based on information. The module that needs to be set.
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article is only an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, exist alone B these three situations. In addition, the term "at least one" in this document means any one or any combination of at least two of the multiple, for example, including at least one of A, B, and C, may mean including A, Any one or more elements selected in the set formed by B and C.
本公开实施例可以应用于终端和服务器组成的计算机系统中,并可以与众多其它通用或专用计算系统环境或配置一起操作。这里,终端可以是瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、车载设备、小型计算机系统,等等,服务器可以是服务器计算机系统小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。The embodiments of the present disclosure can be applied to a computer system composed of a terminal and a server, and can operate with many other general-purpose or special-purpose computing system environments or configurations. Here, the terminal can be a thin client, a thick client, a handheld or laptop device, a microprocessor-based system, a set-top box, a programmable consumer electronics product, a network personal computer, a vehicle-mounted device, a small computer system, etc. The server can It is server computer system, small computer system, large computer system and distributed cloud computing technology environment including any of the above systems, etc.
终端、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。Electronic devices such as terminals and servers can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system. Generally, program modules may include routines, programs, object programs, components, logic, data structures, etc., which perform specific tasks or implement specific abstract data types. The computer system/server can be implemented in a distributed cloud computing environment. In the distributed cloud computing environment, tasks are executed by remote processing equipment linked through a communication network. In a distributed cloud computing environment, program modules may be located on a storage medium of a local or remote computing system including a storage device.
基于上述记载的内容,在本公开的一些实施例中,提出了一种图像生成方法,本公开实施例可以应用的场景包括但不限于自动驾驶、图像生成、图像合成、计算机视觉、深度学习、机器学习等。Based on the content recorded above, in some embodiments of the present disclosure, an image generation method is proposed. The applicable scenarios of the embodiments of the present disclosure include but are not limited to automatic driving, image generation, image synthesis, computer vision, deep learning, Machine learning, etc.
图1为本公开实施例的图像生成方法的流程图,如图1所示,该方法可以包括:FIG. 1 is a flowchart of an image generation method according to an embodiment of the disclosure. As shown in FIG. 1, the method may include:
步骤101:利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征。Step 101: Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer.
这里,内容图像可以是需要进行风格转换的图像;示例性地,可以从本地存储区域获取内容图像或从网络获取内容图像。例如,内容图像可以为通过移动终端或相机等拍摄的图像。内容图像的格式可以是联合图像专家小组(Joint Photographic Experts GROUP,JPEG)、位图(Bitmap,BMP)、便携式网络图形(Portable Network Graphics,PNG)或其他格式;需要说明的是,这里仅仅是对内容图像的格式和来源进行了举例说明,本公开实施例并不对内容图像的格式和来源进行限定。Here, the content image may be an image that requires style conversion; for example, the content image may be obtained from a local storage area or the content image may be obtained from the network. For example, the content image may be an image taken by a mobile terminal or a camera. The format of the content image can be Joint Photographic Experts GROUP (JPEG), Bitmap (BMP), Portable Network Graphics (PNG) or other formats; it should be noted that this is only for The format and source of the content image are exemplified, and the embodiment of the present disclosure does not limit the format and source of the content image.
对于一幅图像而言,可以提取内容特征和风格特征。其中,内容特征用于表征图像的内容信息,例如,内容特征表示图像中的物体位置、物体形状、物体尺寸等;风格特征用于表征内容图像的风格信息,例如,风格特征用于表征天气、白天、夜晚、会话风格等风格信息。For an image, content features and style features can be extracted. Among them, the content feature is used to characterize the content information of the image, for example, the content feature represents the object position, object shape, object size, etc. in the image; the style feature is used to represent the style information of the content image, for example, the style feature is used to represent weather, Style information such as day, night, and conversation style.
本公开实施例中,风格转换可以是指将内容图像的风格特征转换为另一风格特征,示例性地,内容图像的风格特征的转换可以是从白天到夜晚的转换、从夜晚到白天的转换、不同天气风格之间的转换、不同绘画风格之间的转换、真实图像到计算机图形(Computer-Graphic images,CG)图像的转换、CG图像到真实图像的转换中的任一种转换;不同天气风格之间的转换可以是晴天到雨天的转换、雨天到晴天的转换、晴天到阴天的转换、阴天到晴天的转换、阴天到雨天的转换、雨天到阴天的转换、晴天到下雪的转换、下雪到晴天的转换、阴天到下雪的转换、下雪到阴天的转换、下雪到雨天的转换或雨天到下雪的转换等;不同绘画风格的转换可以是油画到水墨画的转换、水墨画到油画的转换、油画到素描画的转换、素描画到油画的转换、素描画到水墨画的转换或水墨画到素描画的转换等。In the embodiment of the present disclosure, the style conversion may refer to the conversion of the style feature of the content image into another style feature. Illustratively, the conversion of the style feature of the content image may be the conversion from day to night, and from night to day. , Conversion between different weather styles, conversion between different painting styles, conversion from real images to computer-graphic images (CG) images, conversion from CG images to real images; different weather The conversion between styles can be from sunny to rainy, rainy to sunny, sunny to cloudy, cloudy to sunny, cloudy to rainy, rainy to cloudy, sunny to cloudy Snow conversion, snow to sunny conversion, cloudy to snow conversion, snow to cloudy conversion, snow to rain conversion or rain to snow conversion, etc.; the conversion of different painting styles can be oil painting Conversion to ink painting, ink painting to oil painting, oil painting to sketch painting, sketch painting to oil painting, sketch painting to ink painting or ink painting to sketch painting, etc.
这里,第一神经网络为用于提取内容图像的内容特征的网络,本公开实施例并不对第一神经网络的种类进行限定。第一神经网络中包括顺次连接的多层第一网络单元块,在第一神经网络的多层第一网络单元块中,可以将内容图像的内容特征从多层第一网络单元块的首层第一网络单元块前馈输入。其中,前馈输入对应的数据处理方向表示从神经网络的输入端到输出端的数据处理方向,对应正向传播或前向传播过程;对于前馈输入过程,神经网络的上一层网络单元块的输出结果作为下一层网络单元块的输入结果。Here, the first neural network is a network for extracting content features of content images, and the embodiment of the present disclosure does not limit the type of the first neural network. The first neural network includes sequentially connected multi-layer first network unit blocks. In the multi-layer first network unit block of the first neural network, the content characteristics of the content image can be changed from the first network unit block of the multi-layer first network. Feed forward input of the first network unit block of the layer. Among them, the data processing direction corresponding to the feedforward input represents the data processing direction from the input end of the neural network to the output end, corresponding to the forward propagation or forward propagation process; for the feedforward input process, the upper layer of the neural network unit block The output result is used as the input result of the next layer of network unit block.
对于第一神经网络,第一神经网络的每层第一网络单元块可以针对输入的数据提取内容特征,即第一神经网络的每层第一网络单元块的输出结果为对应该层第一网络单元块的内容特征,第一神经网络中的不同第一网络单元块输出的内容特征是不同的。For the first neural network, the first network unit block of each layer of the first neural network can extract content features for the input data, that is, the output result of the first network unit block of each layer of the first neural network is the corresponding layer first network The content characteristics of the unit blocks and the content characteristics output by different first network unit blocks in the first neural network are different.
可选地,内容图像的内容特征的表示方式可以为内容特征图或其他表示方式,本公开实施例对此并不进行限定。Optionally, the representation mode of the content feature of the content image may be a content feature map or other representation mode, which is not limited in the embodiment of the present disclosure.
可以理解的是,通过第一神经网络的各层第一网络单元块对内容特征的逐次提取,可以获得内容图像的从低层到高层的语义信息。可选地,第一神经网络中的每层第一网络单元块是以残差结构组织的多个神经网络层,这样,可以基于每层第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征。It is understandable that by successively extracting content features by the first network unit blocks of each layer of the first neural network, the semantic information of the content image from the low level to the high level can be obtained. Optionally, each layer of the first network unit block in the first neural network is a plurality of neural network layers organized in a residual structure, so that it can be based on multiple layers of the first network unit block in each layer organized in a residual structure The neural network layer extracts the content features of the content image.
步骤102:提取风格图像的风格特征。Step 102: Extract the style features of the style image.
这里,风格图像是具有目标风格特征的图像,目标风格特征表示内容图像需要转换到的风格特征,风格图像可以实际需要进行设置。本公开实施例中,可以在获取内容图像后,确定需要转换的目标风格特征,再根据需求选取风格图像。Here, the style image is an image with the target style feature, the target style feature represents the style feature to which the content image needs to be converted, and the style image can be set as needed. In the embodiment of the present disclosure, after acquiring the content image, the target style feature to be converted can be determined, and then the style image can be selected according to the demand.
在实际应用中,可以从本地存储区域或网络获取风格图像,例如,风格图像可以为通过移动终端或相机等拍摄的图像;风格图像的格式可以是JPEG、BMP、PNG或其他格式;需要说明的是,这里仅仅是对风格图像的格式和来源进行了举例说明,本公开实施例并不对风格图像的格式和来源进行限定。In practical applications, the style image can be obtained from the local storage area or the network. For example, the style image can be an image taken through a mobile terminal or camera; the format of the style image can be JPEG, BMP, PNG or other formats; Yes, here is only an example of the format and source of the style image, and the embodiment of the present disclosure does not limit the format and source of the style image.
本公开实施例中,内容图像的风格特征与风格图像的风格特征是不相同的,对内容图像进行风格转换的目的可以是:使得经风格转换后得到的生成图像具有内容图像的内容特征以及风格图像的风格特征。In the embodiment of the present disclosure, the style feature of the content image is different from the style feature of the style image, and the purpose of performing style conversion on the content image may be: to make the generated image obtained after the style conversion have the content feature and style of the content image The stylistic characteristics of the image.
例如,可以将白天风格的内容图像转换为夜晚风格的生成图像,或,将晴天风格的内容图像转换为雨天风格的生成图像,或,将水墨画风格的内容图像转换为油画风格的生成图像,或,将CG风格的图像转换为真实图像风格的生成图像等等。For example, you can convert a day-style content image to a night-style generated image, or convert a sunny-style content image to a rainy-style generated image, or convert an ink painting style content image to an oil painting style generated image, or , Convert CG style images into real image style generated images, etc.
对于本步骤的实现方式,示例性地,所述提取所述风格图像的风格特征,包括:提取风格图像分布的特征;对风格图像分布的特征进行采样,得到风格特征,风格特征包括风格图像分布的特征的均值和标准差。这里,通过对风格图像分布的特征进行采样,可以准确地提取出风格图像的风格特征,有利于将内容图像进行准确地风格转换。在实际应用中,可以对风格图像进行至少一层卷积运算,以得出风格图像分布的特征。For the implementation of this step, for example, the extracting the style features of the style image includes: extracting the features of the style image distribution; sampling the features of the style image distribution to obtain the style features, and the style features include the style image distribution The mean and standard deviation of the features. Here, by sampling the characteristics of the style image distribution, the style characteristics of the style image can be accurately extracted, which is conducive to accurate style conversion of the content image. In practical applications, at least one layer of convolution operation may be performed on the style image to obtain the characteristics of the style image distribution.
步骤103:将各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将风格特征从多层第二网络单元块中的首层第二网络单元块前馈输入,经各第二网络单元块对各自输入的特征处理后得到第二神经网络输出的生成图像,其中,多层第一网络单元块与多层第二网络单元块对应。Step 103: Feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and remove the style features from the multi-layer second network unit blocks The first layer of the second network unit block feed forward input, and the generated image of the second neural network output is obtained after each second network unit block processes the respective input characteristics. Among them, the multi-layer first network unit block and the multi-layer second The network unit block corresponds.
这里,第二神经网络中包括顺次连接的多层第二网络单元块,第二神经网络中的上一层网络单元块的输出结果为下一层网络单元块的输入结果;可选地,第二神经网络中的各层第二网络单元块是以残差结构组织的多个神经网络层,这样,可以基于每层第二网络单元块中以残差结构组织的多个神经网络层对输入的特征进行处理。Here, the second neural network includes successively connected multi-layer second network unit blocks, and the output result of the previous network unit block in the second neural network is the input result of the next network unit block; optionally, The second network unit blocks of each layer in the second neural network are multiple neural network layers organized in a residual structure. In this way, it can be based on the multiple neural network layer pairs organized in a residual structure in each second network unit block. The input features are processed.
在实际应用中,步骤101至步骤103可以利用电子设备中的处理器实现,上述处理器可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(Programmable Logic Device,PLD)、FPGA、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。In practical applications, step 101 to step 103 can be implemented by a processor in an electronic device. The processor can be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), Digital signal processing device (Digital Signal Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), FPGA, central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor At least one.
可以看出,在本公开实施例中,内容图像和风格图像均可以根据实际需要确定,内容图像和风格图像并不需要是成对图像,如此便于实现;另外,在图像生成的过程中,可以利用第一神经网络的各层第一网络单元块对内容图像的内容特征进行多次提取,进而保留了内容图像的更多的语义信息,使得生成图像与内容图像相比,保留了较多的语义信息,因而,生成图像更为真实。It can be seen that in the embodiments of the present disclosure, both the content image and the style image can be determined according to actual needs, and the content image and the style image do not need to be a pair of images, which is easy to implement; in addition, in the process of image generation, The first network unit block of each layer of the first neural network is used to extract the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that the generated image retains more than the content image. Semantic information, therefore, the generated image is more realistic.
另外,在基于本公开实施例中的神经网络结构进行图像生成的过程中,风格图像的风格可根据实际需要确定,并不限定风格图像的风格特征与训练神经网络时使用的风格图像的风格特征的关系,也就是说,神经网络训练的时候使用的是黑夜风格的训练图像,但在基于训练完成的神经网络进行图像生成时,可选择内容图像和雪天风格、雨天风格或者其他风格的风格图像,由此生成符合实际需求风格的图像,而不仅仅只能生成黑夜风格的图像,提高图像生成方法的泛化性和普适性。In addition, in the process of image generation based on the neural network structure in the embodiment of the present disclosure, the style of the style image can be determined according to actual needs, and does not limit the style characteristics of the style image and the style characteristics of the style image used when training the neural network. That is to say, the training image of the dark night style is used when the neural network is trained, but when the image is generated based on the trained neural network, you can choose the content image and the snowy style, the rainy style or other styles Images, which generate images that meet the actual needs of the style, not only the dark night style images, and improve the generalization and universality of the image generation method.
进一步地,可以根据用户需要设置多种具有不同风格特征的风格图像,进而可以针对一个内容 图像,得到具有不同风格特征的生成图像。例如,在基于训练完成的神经网络进行图像生成时,针对同一个内容图像,可以分别向训练完成的神经网络输入黑夜风格图像、阴天风格图像和雨天风格图像,从而将该内容图像的风格分别转换为黑夜风格、阴天风格和雨天风格,即,可以基于同一内容图像,得到多种风格的生成图像,而不仅仅只能生成一种风格的图像,提高图像生成方法的适用性。Further, a variety of style images with different style characteristics can be set according to user needs, and then generated images with different style characteristics can be obtained for one content image. For example, when performing image generation based on a trained neural network, for the same content image, you can separately input a dark night style image, a cloudy day style image, and a rainy day style image to the trained neural network, so that the style of the content image can be separated. Converting to dark night style, cloudy style and rainy style, that is, based on the same content image, multiple styles of generated images can be obtained, not only one style of image can be generated, and the applicability of the image generation method is improved.
本公开实施例中,第一神经网络的第一网络单元块的层数与第二神经网络的第二网络单元块的层数可以是相同的,第一神经网络的各层第一网络单元块与第二神经网络的各层第二网络单元块形成一一对应关系。In the embodiment of the present disclosure, the number of layers of the first network unit block of the first neural network and the number of layers of the second network unit block of the second neural network may be the same, and the first network unit block of each layer of the first neural network It forms a one-to-one correspondence with the second network unit blocks of each layer of the second neural network.
作为一种实现方式,所述将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块,包括:响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数;也即,首层第一网络单元块输出的内容特征被输入至末层第二网络单元块中,末层第一网络单元块输出的内容特征被输入至首层第二网络单元块中。As an implementation manner, the corresponding feedforward input of the content features respectively output by the first network unit blocks of each layer to the sequentially connected multi-layer second network unit blocks in the second neural network includes: sequentially responding to i In the case of 1 to T, the content features output by the first network unit block of the i-th layer are fed forward into the second network unit block of the T-i+1th layer, i is a positive integer, and T represents the first nerve The number of layers of the first network unit block of the network and the second network unit block of the second neural network; that is, the content characteristics output by the first network unit block of the first layer are input into the second network unit block of the last layer, The content features output by the first network unit block of the last layer are input to the second network unit block of the first layer.
本公开实施例中,第二神经网络中各层第二网络单元块的接收的内容特征为第一神经网络各层第一网络单元块的输出特征,第二神经网络中各层第二网络单元块的接收的内容特征,随着在第二神经网络中的不同位置而有所不同。可以看出,第二神经网络以风格特征作为输入,随着风格特征从第二神经网络的低层第二网络单元块到高层第二网络单元块的深入,可以融合更多的内容特征,可以基于风格特征逐渐融合内容图像各层语义信息,使得到的生成图像能够保留内容图像的多层语音信息以及风格特征信息。In the embodiment of the present disclosure, the received content feature of the second network unit block of each layer in the second neural network is the output feature of the first network unit block of each layer of the first neural network, and the second network unit of each layer in the second neural network The received content characteristics of the block vary with different positions in the second neural network. It can be seen that the second neural network uses style features as input. As the style features deepen from the lower-level second network unit block of the second neural network to the high-level second network unit block, more content features can be integrated, which can be based on The style feature gradually merges the semantic information of each layer of the content image, so that the resulting image can retain the multi-layer voice information and style feature information of the content image.
作为一种实现方式,各所述第二网络单元块中的首层第二网络单元块对输入的特征处理,包括:可以将来自末层第一网络单元块的内容特征和风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将来自末层第一网络单元块的内容特征与首层第二网络单元块的中间特征进行加法运算,得到首层第二网络单元块的输出特征;将首层第二网络单元块的输出特征输入第二层第二网络单元块。可以看出,通过进行上述乘法运算和加法运算,便于实现风格特征和末层第一网络单元块的内容特征的融合。As an implementation manner, the feature processing of the first-level second network unit block in each of the second network unit blocks includes: the content feature and the style feature from the last-level first network unit block can be multiplied , Obtain the intermediate feature of the first-level second network unit block; add the content feature from the first-level first network unit block of the last layer and the intermediate feature of the first-level second network unit block to obtain the first-level second network unit block The output characteristics of the first layer of the second network unit block input the output characteristics of the second layer of the second network unit block. It can be seen that, by performing the above-mentioned multiplication and addition operations, it is convenient to realize the fusion of the style feature and the content feature of the first network unit block of the last layer.
可选地,在将来自末层第一网络单元块的内容特征和风格特征进行乘法运算前,可以对来自末层第一网络单元块的内容特征进行卷积运算。也即,可以先对来自末层第一网络单元块的内容特征进行卷积运算,再将卷积运算的结果与风格特征进行乘法运算。Optionally, before multiplying the content feature and style feature from the first network unit block of the last layer, a convolution operation may be performed on the content feature from the first network unit block of the last layer. That is, it is possible to first perform a convolution operation on the content features from the first network unit block of the last layer, and then perform a multiplication operation on the result of the convolution operation and the style feature.
作为一种实现方式,各第二网络单元块中的中间层第二网络单元块对输入的特征处理,包括:可以对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到中间层第二网络单元块的中间特征;将输入的内容特征与中间层第二网络单元块的中间特征进行加法运算,得到中间层第二网络单元块的输出特征;将中间层第二网络单元块的输出特征输入下一层第二网络单元块。可以看出,通过进行上述乘法运算和加法运算,便于实现上一层第二网络单元块的输出特征和相应内容特征的融合。As an implementation manner, the input feature processing of the middle layer second network unit block in each second network unit block includes: the input content feature and the output feature of the upper layer second network unit block can be multiplied , Get the intermediate feature of the second network unit block of the middle layer; add the input content feature and the intermediate feature of the second network unit block of the middle layer to obtain the output feature of the second network unit block of the middle layer; The output feature of the network unit block is input to the second network unit block of the next layer. It can be seen that by performing the above-mentioned multiplication operation and addition operation, it is convenient to realize the fusion of the output characteristics of the second network unit block of the upper layer and the corresponding content characteristics.
需要说明的是,中间层第二网络单元块为第二神经网络中除去首层第二网络单元块和末层第二网络单元块之外的其他第二网络单元块,在第二神经网络中,可以有一个中间第二网络单元块,也可以有多个第二网络单元块;上述记载的内容仅仅是以一个中间层第二网络单元块为例,对中间层第二网络单元块的数据处理过程进行了说明。It should be noted that the second network unit block in the middle layer is the second network unit block in the second neural network except the first layer second network unit block and the last layer second network unit block. In the second neural network , There can be one intermediate second network unit block, or there can be multiple second network unit blocks; the above-mentioned content is only an intermediate second network unit block as an example, the data of the intermediate second network unit block The processing procedure is explained.
可选地,中间层第二网络单元块在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运算。Optionally, the intermediate layer second network unit block performs a convolution operation on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block.
作为一种实现方式,各第二网络单元块中的末层第二网络单元块对输入的特征处理,包括:可以将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到末层第二网络单元块的中间特征;将来自首层第一网络单元块的内容特征与末层第二网络单元块的中间特征进行加法运算,得到生成图像。As an implementation, the input feature processing of the second network unit block of the last layer in each second network unit block includes: the content characteristics from the first network unit block of the first layer can be combined with the second network unit of the upper layer. The output feature of the block is multiplied to obtain the intermediate feature of the second network unit block of the last layer; the content feature from the first network unit block of the first layer and the intermediate feature of the second network unit block of the last layer are added to obtain the generated image .
可以看出,通过进行上述乘法运算和加法运算,便于实现上一层第二网络单元块的输出特征和首层第一网络单元块的内容特征的融合,进而通过各层第二网络单元块的数据处理,可以使生成图像融合风格特征和各层第一网络单元块的内容特征。It can be seen that by performing the above-mentioned multiplication and addition operations, it is convenient to realize the fusion of the output characteristics of the second network unit block of the upper layer and the content characteristics of the first network unit block of the first layer, and then pass the second network unit block of each layer. Data processing can make the generated image merge the style characteristics and the content characteristics of the first network unit block of each layer.
可选地,末层第二网络单元块在对来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。Optionally, the second network unit block of the last layer performs a multiplication operation on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer, and performs the multiplication operation on the first network unit block from the first layer. The content feature of the unit block is subjected to convolution operation.
在实际应用中,可以预先训练用于图像生成的神经网络;下面结合附图对预先训练的神经网络 进行示例性说明。图2为本公开实施例预先训练的神经网络的结构示意图,如图2所示,预先训练的神经网络包括内容编码器、风格编码器和生成器;其中,内容编码器用于利用上述第一神经网络提取内容图像的内容特征,风格编码器用于提取风格图像的风格特征,生成器用于利用上述第二神经网络实现风格特征与各层第一网络单元块输出的内容特征的融合。In practical applications, the neural network used for image generation can be pre-trained; the pre-trained neural network will be exemplified below with reference to the accompanying drawings. Figure 2 is a schematic structural diagram of a neural network pre-trained in an embodiment of the disclosure. As shown in Figure 2, the pre-trained neural network includes a content encoder, a style encoder, and a generator; wherein the content encoder is used to utilize the first neural network described above. The network extracts the content features of the content image, the style encoder is used to extract the style features of the style image, and the generator is used to use the second neural network to realize the fusion of the style features and the content features output by the first network unit blocks of each layer.
在实际实施时,可将第一神经网络作为内容编码器,将第二神经网络作为生成器,将用于对风格图像进行风格特征提取的神经网络作为风格编码器。参照图2,可以将待处理图像(即内容图像)输入至内容编码器,在内容编码器中,可以利用第一神经网络的多层第一网络单元块进行处理,每层第一网络单元块可以输出内容特征;还可以将风格图像输入至风格编码器中,通过风格编码器中提取风格图像的风格特征。示例性地,第一网络单元块为残差块(Residual Block,RB),每层第一网络单元块输出的内容特征为内容特征图。In actual implementation, the first neural network can be used as the content encoder, the second neural network can be used as the generator, and the neural network used for style feature extraction on the style image can be used as the style encoder. 2, the image to be processed (ie, content image) can be input to the content encoder. In the content encoder, the multi-layer first network unit block of the first neural network can be used for processing, and each layer of the first network unit block The content feature can be output; the style image can also be input into the style encoder, and the style feature of the style image can be extracted from the style encoder. Exemplarily, the first network unit block is a residual block (Residual Block, RB), and the content feature output by the first network unit block of each layer is a content feature map.
图3为本公开实施例的内容编码器的一个示例性的结构示意图,如图3所示,内容编码器的残差块可以记为CRB,内容编码器包括七层CRB,图3的CRB(A,B)中,A代表输入通道数,B代表输出通道数;图3中,CRB(3,64)的输入为内容图像,第一层CRB至第七层CRB分别为从下到上排列的CRB(3,64)、CRB(64,128)、CRB(128,256)、CRB(256,512)、CRB(512,1024)、CRB(1024,1024)、CRB(1024,1024)和CRB(1024,1024),第一层CRB至第七层CRB可分别输出七个内容特征图。Fig. 3 is a schematic diagram of an exemplary structure of the content encoder according to the embodiment of the disclosure. As shown in Fig. 3, the residual block of the content encoder can be marked as CRB, and the content encoder includes seven layers of CRB, the CRB( In A, B), A represents the number of input channels and B represents the number of output channels; in Figure 3, the input of CRB(3,64) is the content image, and the first CRB to the seventh CRB are arranged from bottom to top. CRB(3,64), CRB(64,128), CRB(128,256), CRB(256,512), CRB(512,1024), CRB(1024,1024), CRB(1024,1024) and CRB(1024,1024) , The first layer CRB to the seventh layer CRB can output seven content feature maps respectively.
图4为本公开实施例的CRB的一个示例性的结构示意图,图4中,sync BN表示同步BN层,整流线性单元(Rectified Linear Unit,ReLu)表示ReLu层,Conv表示卷积层,
Figure PCTCN2020076835-appb-000001
表示求和操作;图4所示的CRB的结构为标准的残差块的结构。
FIG. 4 is an exemplary structural diagram of the CRB of an embodiment of the disclosure. In FIG. 4, sync BN represents a synchronous BN layer, a rectified linear unit (ReLu) represents a ReLu layer, and Conv represents a convolutional layer.
Figure PCTCN2020076835-appb-000001
Represents a summation operation; the structure of CRB shown in Figure 4 is the structure of a standard residual block.
参照图3和图4,本公开实施例中可以采用标准的残差网络结构进行内容特征的提取,便于实现对内容图像的内容特征的提取,减少语义信息丢失。在生成器中,可以利用第二神经网络的多层第二网络单元块进行处理;示例性地,第二网络单元块为RB。3 and 4, in the embodiments of the present disclosure, a standard residual network structure can be used to extract content features, which facilitates the extraction of content features of content images and reduces semantic information loss. In the generator, the multi-layer second network unit block of the second neural network can be used for processing; for example, the second network unit block is RB.
图5为本公开实施例的生成器的一个示例性的结构示意图,如图5所示,生成器中的残差块可以记为GB,生成器可以包括七层GB,每层GB的输入为内容编码器的一层CRB的输出;在生成器中,第一层GB至第七层GB分别为从上到下排列的GB ResBlk(1024)、GB ResBlk(1024)、GB ResBlk(1024)、GB ResBlk(512)、GB ResBlk(256)、GB ResBlk(128)和GB ResBlk(64);图5的GB ResBlk(C)中,C表示通道数;第一层GB用于接收风格特征,第一层GB至第七层GB用于对应接收第七层CRB至第一层CRB输出的内容特征图;经各层GB对输入的特征进行处理后,可以利用第七层GB输出生成图像。Figure 5 is an exemplary structural diagram of the generator of the embodiment of the disclosure. As shown in Figure 5, the residual block in the generator can be denoted as GB, and the generator can include seven layers of GB, and the input of each layer of GB is The output of one layer of CRB of the content encoder; in the generator, the first layer GB to the seventh layer GB are GB ResBlk (1024), GB ResBlk (1024), GB ResBlk (1024), arranged from top to bottom, respectively GB ResBlk (512), GB ResBlk (256), GB ResBlk (128), and GB ResBlk (64); in GB ResBlk (C) in Figure 5, C represents the number of channels; the first layer of GB is used to receive style features, The first layer GB to the seventh layer GB are used to receive the content feature maps output from the seventh layer CRB to the first layer CRB; after each layer GB processes the input features, the seventh layer GB output can be used to generate an image.
可以看出,可以基于内容编码器的多层残差块,对内容图像的结构信息进行编码,以生成多个不同层次的内容特征图;内容编码器可以在深层提取更抽象的特征,在表层保留了大量的结构信息。It can be seen that based on the multi-layer residual block of the content encoder, the structural information of the content image can be encoded to generate multiple content feature maps of different levels; the content encoder can extract more abstract features in the deep layer, and in the surface layer A lot of structural information is retained.
本公开实施例的图像生成方法可以应用于各种图像生成场景,例如,可以应用于图像娱乐化数据生成、自动驾驶模型训练测试数据生成等场景。The image generation method of the embodiment of the present disclosure can be applied to various image generation scenarios, for example, can be applied to scenarios such as image entertainment data generation, automatic driving model training test data generation, and the like.
下面结合附图说明本公开实施例的图像生成方法的效果。图6为本公开实施例中几组示例性的内容图像、风格图像和生成图像,如图6所示,第一列表示内容图像,第二列表示风格图像,第三列表示基于本公开实施例的图像生成方法得到的生成图像,同一行的图像表示一组内容图像、风格图像和生成图像;从第一行到最后一行的风格转换分别为从白天到夜晚、夜晚到白天、晴天到雨天、雨天到晴天、晴天到阴天、阴天到晴天、晴天到下雪以及下雪到晴天的风格转换,从图6可以看出,基于本公开实施例的图像生成方法得到的生成图像,可以保留内容图像的内容信息以及风格图像的风格信息。The effect of the image generation method of the embodiment of the present disclosure will be described below with reference to the accompanying drawings. Figure 6 shows several exemplary sets of content images, style images, and generated images in the embodiments of the present disclosure. As shown in Figure 6, the first column represents content images, the second column represents style images, and the third column represents implementations based on the present disclosure. The generated image obtained by the image generation method of the example, the image in the same row represents a group of content images, style images and generated images; the style conversion from the first row to the last row is from day to night, night to day, sunny to rainy. , Rainy to sunny, sunny to cloudy, cloudy to sunny, sunny to snow, and snowy to sunny style conversion, as can be seen from FIG. 6, the generated image based on the image generation method of the embodiment of the present disclosure can be The content information of the content image and the style information of the style image are retained.
在本公开实施例的神经网络的训练过程中,不仅涉及从输入到输出的前向传播过程,还涉及到从输出的输入的反向传播过程;本公开的神经网络的训练过程,可以使用前向过程来生成图像并使用反向过程来调整神经网络的网络参数。下面对本公开实施例涉及的神经网络的训练方法进行说明。In the training process of the neural network in the embodiments of the present disclosure, not only the forward propagation process from input to output is involved, but also the back propagation process from output to input; the training process of the neural network of the present disclosure can be used before Use the reverse process to generate images and use the reverse process to adjust the network parameters of the neural network. The following describes the neural network training method involved in the embodiments of the present disclosure.
图7为本公开实施例的神经网络的训练方法的流程图,如图7所示,该流程可以包括:FIG. 7 is a flowchart of a neural network training method according to an embodiment of the disclosure. As shown in FIG. 7, the process may include:
步骤701:利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征。Step 701: Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer.
步骤702:提取风格图像的风格特征。Step 702: Extract the style features of the style image.
步骤703:将各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将风格特征从多层第二网络单元块中的首层第二网络单元块前馈输入,经各第二网络单元块对各自输入的特征处理后得到第二神经网络输出的生成图像,其中,多层第一网络单元块与多层第二网络单元块对应。Step 703: Feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and remove the style features from the multi-layer second network unit blocks The first layer of the second network unit block feed forward input, and the generated image of the second neural network output is obtained after each second network unit block processes the respective input characteristics. Among them, the multi-layer first network unit block and the multi-layer second The network unit block corresponds.
本实施例步骤701至步骤703的实现方式与步骤101至103的实现方式相同,这里不再赘述。The implementation of steps 701 to 703 in this embodiment is the same as the implementation of steps 101 to 103, and will not be repeated here.
步骤704:对生成图像进行鉴别,得出鉴别结果。Step 704: Discriminate the generated image, and obtain an identification result.
本公开实施例中,与神经网络的测试方法(即基于训练完成的神经网络进行图像生成的方法)不同的是,在神经网络的训练过程中,生成器生成的输出图像还需要进行鉴别。这里,对生成图像进行鉴别的目的为判断生成图像为真实图像的概率;在实际应用中,本步骤可以利用鉴别器等实现。In the embodiments of the present disclosure, unlike the neural network testing method (ie, the method of image generation based on the trained neural network), during the neural network training process, the output image generated by the generator needs to be identified. Here, the purpose of discriminating the generated image is to determine the probability that the generated image is a real image; in practical applications, this step can be implemented using a discriminator or the like.
步骤705:根据内容图像、风格图像、生成图像和鉴别结果,调整第一神经网络和/或所述第二神经网络的网络参数。Step 705: Adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
在实际应用中,可以根据内容图像、风格图像、生成图像和鉴别结果,基于反向过程调整第一神经网络和/或所述第二神经网络的网络参数,再使用前向过程重新得到生成图像和鉴别结果,如此,通过多次交替进行上述前向过程和反向过程,进行神经网络的网络迭代优化,直至满足预定的训练完成条件,便可以得出训练完成的用于图像生成的神经网络。In practical applications, the network parameters of the first neural network and/or the second neural network can be adjusted based on the reverse process according to the content image, style image, generated image, and identification result, and then the forward process can be used to obtain the generated image again And the identification result, so, through the above-mentioned forward process and reverse process repeatedly, the network iterative optimization of the neural network is performed until the predetermined training completion conditions are met, and the trained neural network for image generation can be obtained. .
在实际应用中,步骤701至步骤705可以利用电子设备中的处理器实现,上述处理器可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。In practical applications, steps 701 to 705 can be implemented by a processor in an electronic device. The aforementioned processor can be at least ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. One kind.
在本公开实施例中,内容图像和风格图像均可以根据实际需要确定,内容图像和风格图像并不需要是成对图像,如此便于实现。在神经网络的训练过程的图像生成过程中,可以利用第一神经网络的各层第一网络单元块对内容图像的内容特征进行多次提取,进而保留了内容图像的更多的语义信息,使得生成图像与内容图像相比,保留了较多的语义信息;进而可以使训练得到的神经网络具有较好地保持内容图像语义信息的性能。In the embodiment of the present disclosure, both the content image and the style image can be determined according to actual needs, and the content image and the style image do not need to be a pair of images, which is easy to implement. In the image generation process of the neural network training process, the first network unit blocks of each layer of the first neural network can be used to extract the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that Compared with the content image, the generated image retains more semantic information; in turn, the trained neural network can better maintain the semantic information of the content image.
对于调整第二神经网络的网络参数的实现方式,示例性地,可以调整各层第二网络单元块中使用的上述乘法运算和/或加法运算的参数。Regarding the implementation of adjusting the network parameters of the second neural network, for example, the parameters of the above-mentioned multiplication operation and/or addition operation used in the second network unit block of each layer can be adjusted.
作为一种实现方式,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,包括:根据内容图像、风格图像、生成图像和鉴别结果,确定生成对抗网络(Generative Adversarial Net,GAN)损失;其中,所述生成对抗网络损失用于表征生成图像与所述内容图像的内容特征差异、以及生成图像与风格图像的风格特征差异;在一个示例中,生成对抗网络包括生成器和鉴别器;响应于生成对抗网络损失不满足第一预定条件的情况,根据生成对抗网络损失,调整第一神经网络和/或第二神经网络的网络参数。在实际应用中,可以基于生成对抗网络损失,并采用极大极小对策对对第一神经网络和/或第二神经网络的网络参数进行调整。As an implementation manner, the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result includes : Determine the Generative Adversarial Net (GAN) loss according to the content image, style image, generated image, and identification results; wherein the Generative Adversarial Net (GAN) loss is used to characterize the difference in content characteristics between the generated image and the content image. And the difference in style characteristics between the generated image and the style image; in one example, the generated confrontation network includes a generator and a discriminator; in response to the loss of the generated confrontation network that does not meet the first predetermined condition, adjust the first Network parameters of the neural network and/or the second neural network. In practical applications, the network parameters of the first neural network and/or the second neural network can be adjusted based on the generation of countermeasures against the network loss, and a minimax strategy can be adopted.
这里,第一预定条件可以表示预定的训练完成条件;可以理解的是,根据生成对抗网络损失的含义可知,基于生成对抗网络损失训练神经网络,可以使基于训练后的神经网络得到的生成图像,具有较高的保持内容图像的内容特征以及风格图像的风格特征的性能。Here, the first predetermined condition may represent a predetermined training completion condition; it is understandable that according to the meaning of generating a confrontation network loss, training the neural network based on the loss of the generation confrontation network can make the generated image obtained based on the trained neural network, It has a high performance of maintaining the content characteristics of the content image and the style characteristics of the style image.
可选地,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,还包括:根据生成图像与风格图像,确定风格损失;响应于风格损失不满足第二预定条件的情况,根据所述风格损失,调整第一神经网络和/或第二神经网络的网络参数;其中,风格损失用于表征所述生成图像与风格图像的风格特征的差异。Optionally, the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result, further includes: According to the generated image and the style image, determine the style loss; in response to the situation that the style loss does not meet the second predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the style loss; wherein, the style loss It is used to characterize the difference between the style characteristics of the generated image and the style image.
可选地,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,还包括:根据生成图像与内容图像,确定内容损失;响应于内容损失不满足第三预定条件的情况,根据内容损失,调整第一神经网络和/或第二神经网络的网络参数;其中,内容损失用于表征生成图像与内容图像的内容特征差异。Optionally, the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result, further includes: Determine the content loss according to the generated image and the content image; in response to the content loss not meeting the third predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the content loss; wherein the content loss is used for Characterize the difference in content characteristics between the generated image and the content image.
可选地,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,还包括:根据各第二网络单元块中的各中间层第二网络单元块的输出特征、以及风格图像,确定特征匹配损失;响应于特征匹配损失不满足第四预定条件的情况,根据特征匹配损失,调整第一神经网络和/或第二神经网络的网络参数;其中,特征匹配损失用于表征各中间层第二网络单元块的输出特征与风格图像的风格特征的差异。Optionally, the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result, further includes: Determine the feature matching loss according to the output features of each intermediate layer second network unit block in each second network unit block and the style image; in response to the feature matching loss not satisfying the fourth predetermined condition, adjust according to the feature matching loss The network parameters of the first neural network and/or the second neural network; wherein, the feature matching loss is used to characterize the difference between the output feature of the second network unit block of each intermediate layer and the style feature of the style image.
上述第二预定条件、第三预定条件和第四预定条件可以表示预定的训练完成条件;可以理解的是,根据风格损失、内容损失或特征匹配损失的含义可知,基于风格损失、内容损失或特征匹配损失训练神经网络,可以使基于训练后的神经网络得到的生成图像,具有较高的保持内容图像的内容特征的性能。The aforementioned second predetermined condition, third predetermined condition, and fourth predetermined condition may represent predetermined training completion conditions; it is understandable that according to the meaning of style loss, content loss or feature matching loss, it can be known that based on style loss, content loss or feature The matching loss training neural network can make the generated image based on the trained neural network have a higher performance of maintaining the content characteristics of the content image.
本公开实施例中,可以基于上述一种损失或多种损失训练神经网络,当基于一种损失训练神经网络时,在该损失满足对应的预定条件时,可以得到训练完成的神经网络;当基于多种损失训练神经网络时,需要在上述多种损失均满足对应的预定条件时,可以得到训练完成的神经网络。在基于多种损失训练神经网络时,由于可以从神经网络训练的各个方面综合考虑神经网络的损失,使训练 出的神经网络的风格转换的准确性更高。In the embodiments of the present disclosure, a neural network can be trained based on the foregoing one loss or multiple losses. When the neural network is trained based on one loss, the trained neural network can be obtained when the loss meets the corresponding predetermined condition; When training a neural network with multiple losses, it is necessary to obtain a trained neural network when the aforementioned multiple losses meet the corresponding predetermined conditions. When training a neural network based on multiple losses, since the loss of the neural network can be comprehensively considered from all aspects of neural network training, the accuracy of the style conversion of the trained neural network is higher.
本公开实施例中,生成对抗网络损失、风格损失、内容损失或特征匹配损失可以采用损失函数表示。In the embodiment of the present disclosure, the generation of countermeasure network loss, style loss, content loss, or feature matching loss can be represented by a loss function.
下面通过一个具体的应用实施例对本公开实施例进行进一步说明。The following further describes the embodiments of the present disclosure through a specific application embodiment.
在该应用实施例中,神经网络方法的训练过程可以基于内容编码器、风格编码器、生成器和鉴别器等实现,基于训练完成神经网络方法进行图像生成的过程可以基于内容编码器、风格编码器和生成器等实现。In this application embodiment, the training process of the neural network method can be implemented based on the content encoder, style encoder, generator and discriminator, etc., and the process of image generation based on the completion of the training of the neural network method can be based on the content encoder, style encoding Implements such as generators and generators.
图8为本公开应用实施例提出的图像生成方法的框架的结构示意图,如图8所示,内容编码器的输入为待处理图像(即内容图像),用于提取内容图像的内容特征;风格编码器负责提取风格图像的风格特征;生成器融合了不同层第一网络单元块的内容特征和风格特征,进而生成高质量的图像。需要说明的是,图8中未示出神经网络训练过程使用的鉴别器。FIG. 8 is a schematic structural diagram of the framework of the image generation method proposed by the application embodiment of the disclosure. As shown in FIG. 8, the input of the content encoder is the image to be processed (that is, the content image), which is used to extract the content characteristics of the content image; The encoder is responsible for extracting the style features of the style image; the generator combines the content features and style features of the first network unit blocks of different layers to generate high-quality images. It should be noted that the discriminator used in the neural network training process is not shown in FIG. 8.
具体地说,参照图8,内容编码器包括多层残差块,CRB-1、CRB-2…CRB-T分别表示内容编码器的第1层残差块至第T层残差块;生成器包括多层残差块,GB-1…GB-T-1、GB-T分别表示生成器的第1层残差块至第T层残差块。响应于i处于1至T之间的情况,将内容编码器的第i层残差块的输出结果输入至生成器的第T-i+1层残差块中;风格编码器的输入为风格图像,用于提取出风格图像的风格特征,风格特征被输入至生成器的第1层残差块中。输出图像是基于生成器的第T层残差块GB-T的输出结果得到的。Specifically, referring to FIG. 8, the content encoder includes multiple layers of residual blocks, CRB-1, CRB-2...CRB-T respectively represent the layer 1 residual block to the T layer residual block of the content encoder; generate The generator includes multiple layers of residual blocks, GB-1...GB-T-1, GB-T respectively represent the first layer to the T-th residual block of the generator. In response to the situation where i is between 1 and T, the output result of the i-th layer residual block of the content encoder is input to the T-i+1th layer residual block of the generator; the input of the style encoder is style The image is used to extract the style feature of the style image, and the style feature is input into the first layer residual block of the generator. The output image is obtained based on the output result of the T-th layer residual block GB-T of the generator.
在本公开应用实施例中,将f i定义为从内容编码器的第i层残差块输出的内容特征图,用
Figure PCTCN2020076835-appb-000002
表示生成器的第i个残差块输出的特征。这里,内容编码器的第i个残差块对应于生成器的第T-i+1层残差块;
Figure PCTCN2020076835-appb-000003
与f i具有相同的通道数,N表示批尺寸,C i表示通道数;H i和W i分别表示高度和宽度。激活值(n∈[1,N],c∈[1,C i],h∈[1,H i],ω∈[1,W i])可以表示为公式(1)。
In the application embodiment of the present disclosure, f i is defined as the content feature map output from the i-th layer residual block of the content encoder, using
Figure PCTCN2020076835-appb-000002
Represents the output characteristics of the i-th residual block of the generator. Here, the i-th residual block of the content encoder corresponds to the T-i+1-th layer residual block of the generator;
Figure PCTCN2020076835-appb-000003
F i with the same number of channels, N denotes the size of the batch, C i represents the number of the channel; H i and W i indicates the height and width, respectively. The activation value (n∈[1,N], c∈[1,C i ], h∈[1,H i ], ω∈[1,W i ]) can be expressed as formula (1).
Figure PCTCN2020076835-appb-000004
Figure PCTCN2020076835-appb-000004
其中,
Figure PCTCN2020076835-appb-000005
Figure PCTCN2020076835-appb-000006
均与生成器的第i个残差块对应,分别表示上一层残差块(即第二神经网络的残差块)输出的特征的均值和标准差,
Figure PCTCN2020076835-appb-000007
Figure PCTCN2020076835-appb-000008
可以按照公式(2)进行计算。
among them,
Figure PCTCN2020076835-appb-000005
with
Figure PCTCN2020076835-appb-000006
Both correspond to the i-th residual block of the generator, and respectively represent the mean and standard deviation of the features output by the residual block of the previous layer (that is, the residual block of the second neural network),
Figure PCTCN2020076835-appb-000007
with
Figure PCTCN2020076835-appb-000008
It can be calculated according to formula (2).
Figure PCTCN2020076835-appb-000009
Figure PCTCN2020076835-appb-000009
Figure PCTCN2020076835-appb-000010
Figure PCTCN2020076835-appb-000011
为生成器的第i个残差块的参数,
Figure PCTCN2020076835-appb-000012
Figure PCTCN2020076835-appb-000013
可以由f i的单层卷积得到;本公开应用实施例的图像生成方法是特征自适应的,即,可以直接基于内容图像的内容特征计算调制参数;而在相关的图像生成方法中,调制参数是不变的。
Figure PCTCN2020076835-appb-000010
with
Figure PCTCN2020076835-appb-000011
Is the parameter of the i-th residual block of the generator,
Figure PCTCN2020076835-appb-000012
with
Figure PCTCN2020076835-appb-000013
It can be obtained by single-layer convolution of f i ; the image generation method of the application embodiment of the present disclosure is feature-adaptive, that is, the modulation parameter can be calculated directly based on the content characteristics of the content image; and in the related image generation method, the modulation The parameters are unchanged.
在本公开应用实施例中,将内容编码器表示为E c,将风格编码器表示为E s;风格图像的潜在分布x s被E s进行编码,例如,z=E s(x s)。 In the application embodiment of the present disclosure, the content encoder is represented as E c and the style encoder is represented as E s ; the potential distribution x s of the style image is encoded by E s , for example, z=E s (x s ).
分别使用χ c和χ s表示内容图像域和风格图像域,训练样本(x c,x s)是在无监督学习环境下从边缘分布
Figure PCTCN2020076835-appb-000014
Figure PCTCN2020076835-appb-000015
中提取的。
Use χ c and χ s to represent the content image domain and style image domain, respectively, and the training samples (x c , x s ) are distributed from the edge in an unsupervised learning environment
Figure PCTCN2020076835-appb-000014
with
Figure PCTCN2020076835-appb-000015
Extracted from.
图9a为本公开应用实施例中内容编码器的残差块的结构示意图,如图9a所示,BN表示BN层,ReLu表示ReLu层,Conv表示卷积层,
Figure PCTCN2020076835-appb-000016
表示求和操作;内容编码器的每个残差块CRB的结构为标准残差块的结构,内容编码器的每个残差块包括三个卷积层,其中一个用于跳过连接(skip connection)。
Figure 9a is a schematic structural diagram of the residual block of the content encoder in the application embodiment of the disclosure. As shown in Figure 9a, BN represents the BN layer, ReLu represents the ReLu layer, and Conv represents the convolutional layer.
Figure PCTCN2020076835-appb-000016
Represents the summation operation; the structure of each residual block CRB of the content encoder is the structure of the standard residual block, and each residual block of the content encoder includes three convolutional layers, one of which is used to skip the connection (skip connection).
本公开应用实施例中,生成器和内容编码器的残差块的层数相同;图9b为本公开应用实施例中生成器的残差块的结构示意图,如图9b所示,在标准残差块的基础上,利用FADE模块代替BN层,得到生成器的每层残差块GB的结构;在图9b中,F1、F2和F3分别表示第一FADE模块、第二FADE模块和第三FADE模块;在生成器的每个残差块中,每个FADE模块的输入包括内容编码器输出的相应的内容特征图,参照图9b,在生成器的每个残差块中,在生成器的每个残差块的3个FADE模块中,F1和F2的输入还包括第二神经网络的上一层残差块的输出特征,F3的输入还包括经F1、ReLu层和卷积层依次处理后得出的特征。In the application embodiment of the present disclosure, the number of layers of the residual block of the generator and the content encoder is the same; Fig. 9b is a schematic structural diagram of the residual block of the generator in the application embodiment of the present disclosure, as shown in Fig. 9b, in the standard residual block On the basis of the difference block, the FADE module is used to replace the BN layer to obtain the structure of the residual block GB of each layer of the generator; in Figure 9b, F1, F2 and F3 represent the first FADE module, the second FADE module and the third FADE module, respectively. FADE module; in each residual block of the generator, the input of each FADE module includes the corresponding content feature map output by the content encoder, refer to Figure 9b, in each residual block of the generator, in the generator Among the three FADE modules of each residual block, the input of F1 and F2 also includes the output characteristics of the previous residual block of the second neural network, and the input of F3 also includes the F1, ReLu layer and convolutional layer in turn Features obtained after processing.
图9c为本公开应用实施例的FADE模块的结构示意图,如图9c所示,虚线框表示FADE模块内的结构,
Figure PCTCN2020076835-appb-000017
表示相乘操作,
Figure PCTCN2020076835-appb-000018
表示相加;Conv表示卷积层,BN表示BN层;Υ和β表示生成器的每个残差块的调制参数,可以看出,FADE将内容特征图作为输入,可以从卷积后的特征导出去正规化参数(denormalization parameters)。
Fig. 9c is a schematic diagram of the structure of the FADE module of the application embodiment of the present disclosure.
Figure PCTCN2020076835-appb-000017
Represents the multiplication operation,
Figure PCTCN2020076835-appb-000018
Means addition; Conv means convolutional layer, BN means BN layer; Υ and β represent the modulation parameters of each residual block of the generator. It can be seen that FADE takes the content feature map as input, which can be obtained from the convolutional features Derive denormalization parameters.
在本公开应用实施例中,通过对内容编码器和生成器连接结构的精细设计,使训练的神经网络在风格图像的控制下自适应地转换内容图像。In the application embodiment of the present disclosure, through the fine design of the connection structure of the content encoder and generator, the trained neural network is made to adaptively transform the content image under the control of the style image.
作为一种实现方式,风格编码器是基于变分自适应编码器(Variational Adaptive Encoder,VAE)提出的。风格编码器的输出是均值向量(mean vector)
Figure PCTCN2020076835-appb-000019
和标准差向量(standard deviation vector)
Figure PCTCN2020076835-appb-000020
隐编码(latent code)z来源于对风格图像编码后的重采样
Figure PCTCN2020076835-appb-000021
As an implementation method, the style encoder is proposed based on the Variational Adaptive Encoder (VAE). The output of the style encoder is a mean vector
Figure PCTCN2020076835-appb-000019
And standard deviation vector
Figure PCTCN2020076835-appb-000020
Latent code (latent code) z is derived from the resampling of style images after encoding
Figure PCTCN2020076835-appb-000021
由于采样操作是不可微的,这里,可以利用重参数化技巧(reparameterization trick)将采样转化为可微运算。设η为均匀分布且与z大小相同的随机向量;这里,η~Ν(η|0,1),那么z可以重参数化为
Figure PCTCN2020076835-appb-000022
通过这种操作,我们可以训练带有后向传播的风格编码器,并将整个网络训练为端到端模型(end-to-end model)。
Since the sampling operation is not differentiable, here, the reparameterization trick can be used to convert the sampling into a differentiable operation. Let η be a uniformly distributed random vector with the same size as z; here, η~Ν(η|0,1), then z can be re-parameterized as
Figure PCTCN2020076835-appb-000022
Through this operation, we can train a style encoder with backward propagation and train the entire network as an end-to-end model.
在本公开应用实施例中,可以共同训练整个神经网络的各个部分。对于神经网络的训练,可以在极大极小对策进行优化的基础上,参照公式(3)计算整个第一神经网络的损失函数,进而实现对第一神经网络的训练。In the application embodiments of the present disclosure, various parts of the entire neural network can be jointly trained. For the training of the neural network, the loss function of the entire first neural network can be calculated by referring to formula (3) based on the optimization of the minimax strategy, and then the training of the first neural network can be realized.
Figure PCTCN2020076835-appb-000023
Figure PCTCN2020076835-appb-000023
其中,G表示生成器,D表示鉴别器,L VAE(E s,G)表示风格损失,示例性地,风格损失可以是KL散度(Kullback-Leibler divergence)的损失;L VAE(E s,G)可以根据公式(4)进行计算。 Among them, G represents the generator, D represents the discriminator, L VAE (E s , G) represents the style loss. Illustratively, the style loss can be the loss of Kullback-Leibler divergence; L VAE (E s , G) can be calculated according to formula (4).
L VAE(E s,G)=λ 0KL(q(z|x s)||p η(z))          (4) L VAE (E s ,G)=λ 0 KL(q(z|x s )||p η (z)) (4)
其中,KL(·)表示KL散度,λ 0表示L VAE(E s,G)中的超参数。 Among them, KL(·) represents the KL divergence, and λ 0 represents the hyperparameter in L VAE (E s ,G).
L GAN(E s,E c,G,D)表示生成对抗网络损失,它用于生成器和鉴别器的对抗性训练中;L GAN(E s,E c,G,D)可以根据公式(5)进行计算。 L GAN (E s ,E c ,G,D) represents the loss of the generated adversarial network, which is used in the adversarial training of the generator and discriminator; L GAN (E s ,E c ,G,D) can be based on the formula ( 5) Perform calculations.
Figure PCTCN2020076835-appb-000024
Figure PCTCN2020076835-appb-000024
其中,
Figure PCTCN2020076835-appb-000025
Figure PCTCN2020076835-appb-000026
表示数学期望,D(·)表示判别器,G(·)表示生成器,E c(x c)表示编码器,λ 1表示L GAN(E s,E c,G,D)中的超参数。
among them,
Figure PCTCN2020076835-appb-000025
with
Figure PCTCN2020076835-appb-000026
Denotes mathematical expectation, D(·) denotes discriminator, G(·) denotes generator, E c (x c ) denotes encoder, λ 1 denotes hyperparameters in L GAN (E s ,E c ,G,D) .
L VGG(E s,E c,G)表示内容损失,示例性地,内容损失可以是VGG(Visual Geometry Group)损失。L VGG(E s,E c,G)可以根据公式(6)进行计算。 L VGG (E s , E c , G) represents content loss. Illustratively, the content loss may be a VGG (Visual Geometry Group) loss. L VGG (E s , E c , G) can be calculated according to formula (6).
Figure PCTCN2020076835-appb-000027
Figure PCTCN2020076835-appb-000027
其中,
Figure PCTCN2020076835-appb-000028
表示从总M层中选择的第m层的激活图(activation map),
Figure PCTCN2020076835-appb-000029
表示
Figure PCTCN2020076835-appb-000030
的元素数量,λ 2
Figure PCTCN2020076835-appb-000031
是L VGG(E s,E c,G)中相应的超参数,
Figure PCTCN2020076835-appb-000032
表示通过生成器得到的输出图像,
Figure PCTCN2020076835-appb-000033
||·|| 1表示1-范数。
among them,
Figure PCTCN2020076835-appb-000028
Represents the activation map of the mth layer selected from the total M layers,
Figure PCTCN2020076835-appb-000029
Means
Figure PCTCN2020076835-appb-000030
The number of elements, λ 2 and
Figure PCTCN2020076835-appb-000031
Is the corresponding hyperparameter in L VGG (E s ,E c ,G),
Figure PCTCN2020076835-appb-000032
Represents the output image obtained by the generator,
Figure PCTCN2020076835-appb-000033
||·|| 1 means 1-norm.
L FM(E s,E c,G)表示特征匹配损失;L FM(E s,E c,G)可以根据公式(7)进行计算。 L FM (E s , E c , G) represents the feature matching loss; L FM (E s , E c , G) can be calculated according to formula (7).
Figure PCTCN2020076835-appb-000034
Figure PCTCN2020076835-appb-000034
其中,
Figure PCTCN2020076835-appb-000035
表示鉴别器第i层的第k个尺度(多尺度鉴别器具有k个不同的尺度),N i表示 鉴别器第i层中元素的总数,Q表示层数;λ *在上述所有损失函数中,都是相应的权重。VGG损失在不同的层具有不同的权重。
among them,
Figure PCTCN2020076835-appb-000035
Represents the k-th scale discriminator i-th layer (k multiscale discriminator having different scales), N i represents the total number of elements in a discriminator layer i, Q represents the number of layers; λ * loss in all of the above functions , Are all corresponding weights. The VGG loss has different weights in different layers.
在本公开应用实施例中,第一神经网络基于多尺度鉴别器进行训练,不同尺度上的每个鉴别器具有完全相同的结构;具有最粗糙尺度的鉴别器具有最大的感受野;利用较高的感受野,鉴别器能够区分较高分辨率的图像。In the application embodiment of the present disclosure, the first neural network is trained based on multi-scale discriminators, and each discriminator on different scales has exactly the same structure; the discriminator with the roughest scale has the largest receptive field; The discriminator can distinguish higher-resolution images.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above methods of the specific implementation, the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possibility. The inner logic is determined.
在前述实施例提出的图像生成方法的基础上,本公开实施例提出了一种图像生成装置。图10为本公开实施例的图像生成装置的组成结构示意图,如图10所示,所述装置包括:第一提取模块1001、第二提取模块1002和第一处理模块1003,其中,Based on the image generation method proposed in the foregoing embodiment, an embodiment of the present disclosure proposes an image generation device. FIG. 10 is a schematic diagram of the composition structure of an image generation device according to an embodiment of the disclosure. As shown in FIG. 10, the device includes: a first extraction module 1001, a second extraction module 1002, and a first processing module 1003, wherein:
所述第一提取模块1001,用于利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;The first extraction module 1001 is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
所述第二提取模块1002,用于提取风格图像的风格特征;The second extraction module 1002 is used to extract style features of style images;
所述第一处理模块1003,用于将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应。The first processing module 1003 is configured to feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and combine the The style features are fed forward from the first layer second network unit block in the multi-layer second network unit block, and the second neural network output is obtained after each second network unit block processes the respective input features An image is generated, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
可选地,所述第一处理模块1003,用于响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数。Optionally, the first processing module 1003 is configured to feed forward the content features output by the first network unit block of the i-th layer to the T-i+1th layer in response to the situation that i takes 1 to T in sequence. In the second network unit block, i is a positive integer, and T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
可选地,所述各所述第二网络单元块中的首层第二网络单元块,用于将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将所述来自末层第一网络单元块的内容特征与所述首层第二网络单元块的中间特征进行加法运算,得到所述首层第二网络单元块的输出特征;将所述首层第二网络单元块的输出特征输入第二层第二网络单元块。Optionally, the first-level second network unit block in each of the second network unit blocks is used to multiply the content feature from the last-level first network unit block and the style feature to obtain the Intermediate features of the first-level second network unit block; add the content features from the first-level first-level network unit block and the intermediate features of the first-level second-level network unit block to obtain the first-level second network The output characteristics of the unit block; input the output characteristics of the first layer second network unit block into the second layer second network unit block.
可选地,所述首层第二网络单元块,还用于在将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算前,对所述来自末层第一网络单元块的内容特征进行卷积运算。Optionally, the first-layer second network unit block is also used to perform multiplication operations on the first network unit from the last layer before the content feature from the first network unit block at the last layer and the style feature are multiplied. The content feature of the block is subjected to convolution operation.
可选地,所述各所述第二网络单元块中的中间层第二网络单元块,用于对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述中间层第二网络单元块的中间特征;将所述输入的内容特征与所述中间层第二网络单元块的中间特征进行加法运算,得到所述中间层第二网络单元块的输出特征;将所述中间层第二网络单元块的输出特征输入下一层第二网络单元块。Optionally, the middle layer second network unit block in each of the second network unit blocks is used to multiply the input content feature and the output feature of the second network unit block of the upper layer to obtain the Intermediate features of the second network unit block of the middle layer; add the content features of the input and the intermediate features of the second network unit block of the middle layer to obtain the output features of the second network unit block of the middle layer; The output characteristics of the second network unit block of the middle layer are input to the second network unit block of the next layer.
可选地,所述中间层第二网络单元块,还用于在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运算。Optionally, the middle layer second network unit block is further configured to perform multiplication on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block Convolution operation.
可选地,所述各所述第二网络单元块中的末层第二网络单元块,用于将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述末层第二网络单元块的中间特征;将所述来自首层第一网络单元块的内容特征与所述末层第二网络单元块的中间特征进行加法运算,得到所述生成图像。Optionally, the last-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the first-level first network unit block and the output characteristics of the upper-level second network unit block Perform a multiplication operation to obtain the intermediate feature of the second network unit block of the last layer; add the content feature from the first network unit block of the first layer and the intermediate feature of the last second network unit block to obtain The generated image.
可选地,所述末层第二网络单元块,用于在对所述来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。Optionally, the second network unit block of the last layer is used to perform multiplication operations on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer. The content features from the first network unit block of the first layer are convolved.
可选地,所述第二提取模块1002,用于提取所述风格图像分布的特征;对所述风格图像分布的特征进行采样,得到所述风格特征,所述风格特征包括所述风格图像分布的特征的均值和标准差。Optionally, the second extraction module 1002 is configured to extract features of the style image distribution; sampling the features of the style image distribution to obtain the style feature, the style feature includes the style image distribution The mean and standard deviation of the features.
可选地,所述第一网络单元块,用于基于所述第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征;和/或,所述第二网络单元块,用于基于所述第二网络单元块中以残差结构组织的多个神经网络层对输入到所述第二网络单元的特征进行处理。Optionally, the first network unit block is configured to extract content features of content images based on multiple neural network layers organized in a residual structure in the first network unit block; and/or, the second network The unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
在实际应用中,第一提取模块1001、第二提取模块1002和第一处理模块1003均可以利用处理器实现,上述处理器可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。In practical applications, the first extraction module 1001, the second extraction module 1002, and the first processing module 1003 can all be implemented by processors. The aforementioned processors can be ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, micro At least one of a controller and a microprocessor.
另外,在本实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存 在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, the functional modules in this embodiment can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be realized in the form of hardware or software function module.
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially or It is said that the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can A personal computer, server, or network device, etc.) or a processor (processor) executes all or part of the steps of the method described in this embodiment. The aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
具体来讲,本实施例中的一种图像生成方法或神经网络训练方法对应的计算机程序指令可以被存储在光盘,硬盘,U盘等存储介质上,当存储介质中的与一种图像生成方法或神经网络训练方法对应的计算机程序指令被一电子设备读取或被执行时,实现前述实施例的任意一种图像生成方法或任意一种神经网络训练方法。Specifically, the computer program instructions corresponding to an image generation method or neural network training method in this embodiment can be stored on storage media such as optical disks, hard disks, and USB flash drives. When the storage medium is related to an image generation method Or when the computer program instructions corresponding to the neural network training method are read or executed by an electronic device, any one of the image generation methods or any one of the neural network training methods of the foregoing embodiments is implemented.
基于前述实施例相同的技术构思,参见图11,其示出了本公开实施例提供的一种电子设备11,电子设备11包括:存储器111和处理器112;其中,所述存储器111,用于存储计算机程序;所述处理器112,用于执行所述存储器中存储的计算机程序,以实现前述实施例的任意一种图像生成方法或任意一种神经网络训练方法。Based on the same technical concept of the foregoing embodiment, refer to FIG. 11, which shows an electronic device 11 provided by an embodiment of the present disclosure. The electronic device 11 includes: a memory 111 and a processor 112; wherein, the memory 111 is used for A computer program is stored; the processor 112 is configured to execute the computer program stored in the memory to implement any image generation method or any neural network training method in the foregoing embodiments.
电子设备11中的各个组件可通过总线系统耦合在一起。可理解,总线系统用于实现这些组件之间的连接通信。总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图11中将各种总线都标为总线系统。The various components in the electronic device 11 may be coupled together through a bus system. It can be understood that the bus system is used to realize the connection and communication between these components. In addition to the data bus, the bus system also includes a power bus, a control bus, and a status signal bus. However, for the sake of clear description, various buses are marked as bus systems in FIG. 11.
在实际应用中,上述存储器111可以是易失性存储器(volatile memory),例如RAM;或者非易失性存储器(non-volatile memory),例如ROM,快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,并向处理器112提供指令和数据。In practical applications, the aforementioned memory 111 may be a volatile memory (volatile memory), such as RAM; or a non-volatile memory (non-volatile memory), such as ROM, flash memory, or hard disk (Hard Disk). Drive, HDD) or Solid-State Drive (SSD); or a combination of the foregoing types of memories, and provide instructions and data to the processor 112.
上述处理器112可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的设备,用于实现上述处理器功能的电子器件还可以为其它,本公开实施例不作具体限定。The aforementioned processor 112 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It can be understood that, for different devices, the electronic devices used to implement the above-mentioned processor functions may also be other, which is not specifically limited in the embodiment of the present disclosure.
图12为本公开实施例的神经网络训练装置的组成结构示意图,如图12所示,所述装置包括:第三提取模块1201、第四提取模块1202、第二处理模块1203和调整模块1204;其中,FIG. 12 is a schematic diagram of the composition structure of a neural network training device according to an embodiment of the disclosure. As shown in FIG. 12, the device includes: a third extraction module 1201, a fourth extraction module 1202, a second processing module 1203, and an adjustment module 1204; among them,
所述第三提取模块1201,用于利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;The third extraction module 1201 is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
所述第四提取模块1202,用于提取风格图像的风格特征;The fourth extraction module 1202 is used to extract style features of the style image;
所述第二处理模块1203,用于将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像;对所述生成图像进行鉴别,得出鉴别结果;其中,所述多层第一网络单元块与所述多层第二网络单元块对应;The second processing module 1203 is configured to feed forward the content characteristics respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and combine the The style feature is fed forward from the first layer second network unit block in the multi-layer second network unit block, and the output of the second neural network is obtained after each second network unit block processes the respective input features Generate an image; identify the generated image to obtain an authentication result; wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block;
所述调整模块1204,用于根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数。The adjustment module 1204 is configured to adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
可选地,所述第二处理模块1203,用于响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数。Optionally, the second processing module 1203 is configured to feed forward the content features output by the first network unit block of the i-th layer to the T-i+1th layer in response to the situation where i takes 1 to T in sequence. In the second network unit block, i is a positive integer, and T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
可选地,所述各所述第二网络单元块中的首层第二网络单元块,用于将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将所述来自末层第一网络单元块的内容特征与所述首层第二网络单元块的中间特征进行加法运算,得到所述首层第二网络单元块的输出特征;将所述首层第二网络单元块的输出特征输入第二层第二网络单元块。Optionally, the first-level second network unit block in each of the second network unit blocks is used to multiply the content feature from the last-level first network unit block and the style feature to obtain the Intermediate features of the first-level second network unit block; add the content features from the first-level first-level network unit block and the intermediate features of the first-level second-level network unit block to obtain the first-level second network The output characteristics of the unit block; input the output characteristics of the first layer second network unit block into the second layer second network unit block.
可选地,所述首层第二网络单元块,还用于在将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算前,对所述来自末层第一网络单元块的内容特征进行卷积运算。Optionally, the first-layer second network unit block is also used to perform multiplication operations on the first network unit from the last layer before the content feature from the first network unit block at the last layer and the style feature are multiplied. The content feature of the block is subjected to convolution operation.
可选地,所述各所述第二网络单元块中的中间层第二网络单元块,用于对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述中间层第二网络单元块的中间特征;将所 述输入的内容特征与所述中间层第二网络单元块的中间特征进行加法运算,得到所述中间层第二网络单元块的输出特征;将所述中间层第二网络单元块的输出特征输入下一层第二网络单元块。Optionally, the middle layer second network unit block in each of the second network unit blocks is used to multiply the input content feature and the output feature of the second network unit block of the upper layer to obtain the Intermediate features of the second network unit block of the middle layer; add the content features of the input and the intermediate features of the second network unit block of the middle layer to obtain the output features of the second network unit block of the middle layer; The output characteristics of the second network unit block of the middle layer are input to the second network unit block of the next layer.
可选地,所述中间层第二网络单元块,还用于在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运算。Optionally, the middle layer second network unit block is further configured to perform multiplication on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block Convolution operation.
可选地,所述各所述第二网络单元块中的末层第二网络单元块,用于将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述末层第二网络单元块的中间特征;将所述来自首层第一网络单元块的内容特征与所述末层第二网络单元块的中间特征进行加法运算,得到所述生成图像。Optionally, the last-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the first-level first network unit block and the output characteristics of the upper-level second network unit block Perform a multiplication operation to obtain the intermediate feature of the second network unit block of the last layer; add the content feature from the first network unit block of the first layer and the intermediate feature of the last second network unit block to obtain The generated image.
可选地,所述末层第二网络单元块,还用于在对所述来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。Optionally, the last-level second network unit block is also used to perform multiplication operations on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer. The content feature from the first network unit block of the first layer is subjected to a convolution operation.
可选地,所述调整模块1204,用于调整所述乘法运算参数和/或加法运算参数。Optionally, the adjustment module 1204 is configured to adjust the multiplication operation parameter and/or the addition operation parameter.
可选地,所述调整模块1204,用于根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,确定生成对抗网络损失;响应于所述生成对抗网络损失不满足第一预定条件的情况,根据所述生成对抗网络损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述生成对抗网络损失用于表征所述生成图像与所述内容图像的内容特征差异、以及所述生成图像与所述风格图像的风格特征差异。Optionally, the adjustment module 1204 is configured to determine, according to the content image, the style image, the generated image, and the identification result, to generate a countermeasure network loss; in response to the generation countermeasure network loss that does not meet the first Under a predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the loss of the generated confrontation network; wherein the loss of the generated confrontation network is used to characterize the difference between the generated image and The content feature difference of the content image, and the style feature difference between the generated image and the style image.
可选地,所述调整模块1204,还用于根据所述生成图像与所述风格图像,确定风格损失;响应于所述风格损失不满足第二预定条件的情况,根据所述风格损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述风格损失用于表征所述生成图像与所述风格图像的风格特征的差异。Optionally, the adjustment module 1204 is further configured to determine a style loss according to the generated image and the style image; in response to the situation that the style loss does not meet a second predetermined condition, adjust according to the style loss The network parameters of the first neural network and/or the second neural network; wherein the style loss is used to characterize the difference between the style characteristics of the generated image and the style image.
可选地,所述调整模块1204,还用于根据所述生成图像与所述内容图像,确定内容损失;响应于所述内容损失不满足第三预定条件的情况,根据所述内容损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述内容损失用于表征所述生成图像与所述内容图像的内容特征差异。Optionally, the adjustment module 1204 is further configured to determine the content loss according to the generated image and the content image; in response to the content loss not satisfying a third predetermined condition, adjust the content loss according to the content loss The network parameters of the first neural network and/or the second neural network; wherein the content loss is used to characterize the content feature difference between the generated image and the content image.
可选地,所述调整模块1204,还用于根据各所述第二网络单元块中的各中间层第二网络单元块的输出特征、以及风格图像,确定特征匹配损失;响应于所述特征匹配损失不满足第四预定条件的情况,根据所述特征匹配损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述特征匹配损失用于表征所述各中间层第二网络单元块的输出特征与所述风格图像的风格特征的差异。Optionally, the adjustment module 1204 is further configured to determine the feature matching loss according to the output feature of each intermediate layer second network unit block in each second network unit block and the style image; in response to the feature If the matching loss does not meet the fourth predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the feature matching loss; wherein, the feature matching loss is used to characterize the The difference between the output feature of the second network unit block of each middle layer and the style feature of the style image.
可选地,所述第四提取模块1202,用于提取所述风格图像分布的特征;对所述风格图像分布的特征进行采样,得到所述风格特征,所述风格特征包括所述风格图像分布的特征的均值和标准差。Optionally, the fourth extraction module 1202 is configured to extract features of the style image distribution; sampling the features of the style image distribution to obtain the style feature, the style feature includes the style image distribution The mean and standard deviation of the features.
可选地,所述第一网络单元块,用于基于所述第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征;和/或,所述第二网络单元块,用于基于所述第二网络单元块中以残差结构组织的多个神经网络层对输入到所述第二网络单元的特征进行处理。Optionally, the first network unit block is configured to extract content features of content images based on multiple neural network layers organized in a residual structure in the first network unit block; and/or, the second network The unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
在实际应用中,上述第三提取模块1201、第四提取模块1202、第二处理模块1203和调整模块1204均可以利用处理器实现,上述处理器可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。In practical applications, the third extraction module 1201, the fourth extraction module 1202, the second processing module 1203, and the adjustment module 1204 can all be implemented by a processor, and the processor can be ASIC, DSP, DSPD, PLD, FPGA, CPU , At least one of a controller, a microcontroller, and a microprocessor.
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。In some embodiments, the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments. For specific implementation, refer to the description of the above method embodiments. For brevity, here No longer.
在示例性实施例中,本公开实施例还提供了一种计算机存储介质,例如包括计算机程序的存储器111,上述计算机程序可由电子设备11的处理器112执行,以完成前述方法所述步骤。计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、Flash Memory、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备,如移动电话、计算机、平板设备、个人数字助理等。In an exemplary embodiment, the embodiment of the present disclosure further provides a computer storage medium, such as the memory 111 including a computer program, which can be executed by the processor 112 of the electronic device 11 to complete the steps described in the foregoing method. The computer-readable storage medium can be FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM, etc.; it can also be a variety of devices including one or any combination of the foregoing memories, such as Mobile phones, computers, tablet devices, personal digital assistants, etc.
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现前述实施例的任意一种图像生成方法或任意一种神经网络训练方法。The embodiments of the present disclosure provide a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, any image generation method or any neural network training method in the foregoing embodiments is implemented.
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考,为了简洁,本文不再赘述。The foregoing description of the various embodiments tends to emphasize the differences between the various embodiments, and the same or similarities can be referred to each other, and for the sake of brevity, details are not repeated herein.
本公开实施例所提供的各方法或产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或产品实施例。The features disclosed in each method or product embodiment provided in the embodiments of the present disclosure can be combined arbitrarily without conflict to obtain a new method embodiment or product embodiment.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present invention.
上面结合附图对本发明的实施例进行了描述,但是本公开实施例并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本发明的启示下,在不脱离本公开实施例宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本公开实施例的保护之内。The embodiments of the present invention are described above with reference to the accompanying drawings, but the embodiments of the present disclosure are not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are only illustrative, not restrictive, and are common in the art. Under the enlightenment of the present invention, persons can make many forms without departing from the purpose of the embodiments of the present disclosure and the scope of protection of the claims, and these are all protected by the embodiments of the present disclosure.

Claims (52)

  1. 一种图像生成方法,所述方法包括:An image generation method, the method includes:
    利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
    提取风格图像的风格特征;Extract style features of style images;
    将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应。The content features respectively output by the first network unit blocks of each layer correspond to the feedforward input to the sequentially connected multi-layer second network unit blocks in the second neural network, and the style features are removed from the multi-layer second network The first layer of the second network unit block in the unit block feeds forward input, and the generated image output by the second neural network is obtained after each of the second network unit blocks processes the respective input characteristics, wherein the multi-layer first A network unit block corresponds to the multi-layer second network unit block.
  2. 根据权利要求1所述的方法,其中,所述将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块,包括:The method according to claim 1, wherein the content features respectively output from the first network unit blocks of each layer correspond to the feedforward input into the second neural network of the sequentially connected multi-layer second network unit blocks, comprising :
    响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数。In response to the situation where i takes 1 to T in turn, feed forward the content features output by the first network unit block of the i-th layer into the second network unit block of the T-i+1th layer, where i is a positive integer, and T represents all The number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
  3. 根据权利要求1或2所述的方法,其中,各所述第二网络单元块中的首层第二网络单元块对输入的特征处理,包括:The method according to claim 1 or 2, wherein the feature processing of the first-level second network unit block in each of the second network unit blocks on the input includes:
    将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将所述来自末层第一网络单元块的内容特征与所述首层第二网络单元块的中间特征进行加法运算,得到所述首层第二网络单元块的输出特征;将所述首层第二网络单元块的输出特征作输入第二层第二网络单元块。Multiply the content feature from the first network unit block of the last layer and the style feature to obtain the intermediate feature of the second network unit block of the first layer; combine the content feature from the first network unit block of the last layer with The intermediate features of the first-level second network unit block are added to obtain the output features of the first-level second network unit block; the output features of the first-level second network unit block are input into the second-level second Network unit block.
  4. 根据权利要求3所述的方法,其中,所述方法还包括:在将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算前,对所述来自末层第一网络单元块的内容特征进行卷积运算。The method according to claim 3, wherein the method further comprises: before multiplying the content feature from the first network unit block of the last layer and the style feature, performing the multiplication operation on the first network unit from the last layer The content feature of the block is subjected to convolution operation.
  5. 根据权利要求1-4任一项所述的方法,其中,各所述第二网络单元块中的中间层第二网络单元块对输入的特征处理,包括:The method according to any one of claims 1 to 4, wherein the processing of input characteristics by the second network unit block of the middle layer in each of the second network unit blocks comprises:
    对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述中间层第二网络单元块的中间特征;将所述输入的内容特征与所述中间层第二网络单元块的中间特征进行加法运算,得到所述中间层第二网络单元块的输出特征;将所述中间层第二网络单元块的输出特征输入下一层第二网络单元块的输入。Multiply the input content feature and the output feature of the upper layer second network unit block to obtain the intermediate feature of the middle layer second network unit block; compare the input content feature with the middle layer second network The intermediate characteristics of the unit blocks are added to obtain the output characteristics of the second network unit block of the intermediate layer; the output characteristics of the second network unit block of the intermediate layer are input into the input of the second network unit block of the next layer.
  6. 根据权利要求5所述的方法,其中,所述方法还包括:在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运算。The method according to claim 5, wherein the method further comprises: performing multiplication on the received content feature before multiplying the input content feature and the output feature of the second network unit block of the upper layer Convolution operation.
  7. 根据权利要求1-6任一项所述的方法,其中,各所述第二网络单元块中的末层第二网络单元块对输入的特征处理,包括:The method according to any one of claims 1 to 6, wherein the characteristic processing of the input by the last-layer second network unit block in each of the second network unit blocks comprises:
    将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述末层第二网络单元块的中间特征;将所述来自首层第一网络单元块的内容特征与所述末层第二网络单元块的中间特征进行加法运算,得到所述生成图像。Multiply the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer to obtain the intermediate feature of the second network unit block of the last layer; The content feature of the network unit block and the intermediate feature of the second network unit block of the last layer are added to obtain the generated image.
  8. 根据权利要求7所述的方法,其中,所述方法还包括:在对所述来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。The method according to claim 7, wherein the method further comprises: before multiplying the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the previous layer, The content feature from the first network unit block of the first layer is subjected to a convolution operation.
  9. 根据权利要求1至8任一项所述的方法,其中,所述提取所述风格图像的风格特征,包括:提取所述风格图像分布的特征;8. The method according to any one of claims 1 to 8, wherein said extracting the style features of the style image comprises: extracting the features of the style image distribution;
    对所述风格图像分布的特征进行采样,得到所述风格特征,所述风格特征包括所述风格图像分布的特征的均值和标准差。The characteristics of the style image distribution are sampled to obtain the style characteristics, and the style characteristics include the mean value and the standard deviation of the characteristics of the style image distribution.
  10. 根据权利要求1至9任一项所述的方法,其中,所述第一网络单元块提取内容图像的内容特征,包括:基于所述第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征;和/或,The method according to any one of claims 1 to 9, wherein the first network unit block extracting content features of the content image comprises: based on a plurality of nerves organized in a residual structure in the first network unit block The network layer extracts the content features of the content image; and/or,
    经所述第二网络单元块对其输入的特征进行处理,包括:基于所述第二网络单元块中以残差结构组织的多个神经网络层对输入到所述第二网络单元的特征进行处理。Processing the input features of the second network unit block includes: processing the features input to the second network unit based on a plurality of neural network layers organized in a residual structure in the second network unit block deal with.
  11. 一种神经网络训练方法,所述方法还包括:A neural network training method, the method further includes:
    利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
    提取风格图像的风格特征;Extract style features of style images;
    将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应;The content features respectively output by the first network unit blocks of each layer correspond to the feedforward input to the sequentially connected multi-layer second network unit blocks in the second neural network, and the style features are removed from the multi-layer second network The first layer of the second network unit block in the unit block feeds forward input, and the generated image output by the second neural network is obtained after each of the second network unit blocks processes the respective input characteristics, wherein the multi-layer first A network unit block corresponds to the multi-layer second network unit block;
    对所述生成图像进行鉴别,得出鉴别结果;Discriminate the generated image to obtain an identification result;
    根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数。Adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
  12. 根据权利要求11所述的方法,其中,所述将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块包括:The method according to claim 11, wherein said inputting the content features respectively outputted by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network corresponding to the feedforward input comprises:
    响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数。In response to the situation where i takes 1 to T in turn, feed forward the content features output by the first network unit block of the i-th layer into the second network unit block of the T-i+1th layer, where i is a positive integer, and T represents all The number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
  13. 根据权利要求11或12所述的方法,其中,各所述第二网络单元块中的首层第二网络单元块对输入的特征处理,包括:The method according to claim 11 or 12, wherein the feature processing of the first-level second network unit block in each of the second network unit blocks on the input includes:
    将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将所述来自末层第一网络单元块的内容特征与所述首层第二网络单元块的中间特征进行加法运算,得到所述首层第二网络单元块的输出特征;将所述首层第二网络单元块的输出特征输入第二层第二网络单元块。Multiply the content feature from the first network unit block of the last layer and the style feature to obtain the intermediate feature of the second network unit block of the first layer; combine the content feature from the first network unit block of the last layer with Add the intermediate features of the first-level second network unit block to obtain the output feature of the first-level second network unit block; input the output feature of the first-level second network unit block into the second-level second network Unit block.
  14. 根据权利要求13所述的方法,其中,所述方法还包括:The method according to claim 13, wherein the method further comprises:
    在将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算前,对所述来自末层第一网络单元块的内容特征进行卷积运算。Before multiplying the content feature from the first network unit block of the last layer with the style feature, perform a convolution operation on the content feature from the first network unit block of the last layer.
  15. 根据权利要求11至14任一项所述的方法,其中,各所述第二网络单元块中的中间层第二网络单元块对输入的特征处理,包括:The method according to any one of claims 11 to 14, wherein the processing of input characteristics by the second network unit block of the middle layer in each of the second network unit blocks comprises:
    对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述中间层第二网络单元块的中间特征;将所述输入的内容特征与所述中间层第二网络单元块的中间特征进行加法运算,得到所述中间层第二网络单元块的输出特征;将所述中间层第二网络单元块的输出特征输入下一层第二网络单元块。Multiply the input content feature and the output feature of the upper layer second network unit block to obtain the intermediate feature of the middle layer second network unit block; compare the input content feature with the middle layer second network The intermediate characteristics of the unit blocks are added to obtain the output characteristics of the second network unit block of the intermediate layer; the output characteristics of the second network unit block of the intermediate layer are input to the second network unit block of the next layer.
  16. 根据权利要求15所述的方法,其中,所述方法还包括:在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运算。15. The method according to claim 15, wherein the method further comprises: performing multiplication on the received content feature before multiplying the input content feature with the output feature of the second network unit block of the upper layer Convolution operation.
  17. 根据权利要求11至16任一项所述的方法,其中,各所述第二网络单元块中的末层第二网络单元块对输入的特征处理,包括:The method according to any one of claims 11 to 16, wherein the characteristic processing of the input by the last second network unit block in each of the second network unit blocks comprises:
    将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述末层第二网络单元块的中间特征;将所述来自首层第一网络单元块的内容特征与所述末层第二网络单元块的中间特征进行加法运算,得到所述生成图像。Multiply the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer to obtain the intermediate feature of the second network unit block of the last layer; The content feature of the network unit block and the intermediate feature of the second network unit block of the last layer are added to obtain the generated image.
  18. 根据权利要求17所述的方法,其中,所述方法还包括:在对所述来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。The method according to claim 17, wherein the method further comprises: before multiplying the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the previous layer, The content feature from the first network unit block of the first layer is subjected to a convolution operation.
  19. 根据权利要求13至18任一项所述的方法,其中,调整所述第二神经网络的网络参数,包括:调整所述乘法运算参数和/或加法运算参数。The method according to any one of claims 13 to 18, wherein adjusting the network parameters of the second neural network comprises: adjusting the multiplication operation parameters and/or addition operation parameters.
  20. 根据权利要求11至19任一项所述的方法,其中,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,包括:The method according to any one of claims 11 to 19, wherein the first neural network and/or the first neural network is adjusted according to the content image, the style image, the generated image, and the identification result. The network parameters of the second neural network include:
    根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,确定生成对抗网络损失;According to the content image, the style image, the generated image, and the identification result, determine to generate a counter-network loss;
    响应于所述生成对抗网络损失不满足第一预定条件的情况,根据所述生成对抗网络损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述生成对抗网络损失用于表征所述生成图像与所述内容图像的内容特征差异、以及所述生成图像与所述风格图像的风格特征差异。In response to the situation that the loss of the generative confrontation network does not meet the first predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the loss of the generative confrontation network; wherein, the generation The anti-network loss is used to characterize the content feature difference between the generated image and the content image, and the style feature difference between the generated image and the style image.
  21. 根据权利要求20所述的方法,其中,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,还包括:根据所述生成图像与所述风格图像,确定风格损失;22. The method of claim 20, wherein the first neural network and/or the second neural network are adjusted according to the content image, the style image, the generated image, and the identification result The network parameters further include: determining a style loss according to the generated image and the style image;
    响应于所述风格损失不满足第二预定条件的情况,根据所述风格损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述风格损失用于表征所述生成图像与所述风格图像的风格特征的差异。In response to the situation that the style loss does not meet the second predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the style loss; wherein the style loss is used for characterization The difference between the style characteristics of the generated image and the style image.
  22. 根据权利要求20或21所述的方法,其中,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,还包括:The method according to claim 20 or 21, wherein the first neural network and/or the second neural network are adjusted according to the content image, the style image, the generated image, and the identification result. The network parameters of the neural network also include:
    根据所述生成图像与所述内容图像,确定内容损失;Determine content loss according to the generated image and the content image;
    响应于所述内容损失不满足第三预定条件的情况,根据所述内容损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述内容损失用于表征所述生成图像与所述内容图像的内容特征差异。In response to the situation that the content loss does not meet the third predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the content loss; wherein the content loss is used to characterize The content feature difference between the generated image and the content image.
  23. 根据权利要求20-22任一项所述的方法,其中,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,还包括:The method according to any one of claims 20-22, wherein the first neural network and/or the first neural network is adjusted according to the content image, the style image, the generated image, and the identification result. The network parameters of the second neural network also include:
    根据各所述第二网络单元块中的各中间层第二网络单元块的输出特征、以及风格图像,确定特征匹配损失;Determine the feature matching loss according to the output feature of each intermediate layer second network unit block in each second network unit block and the style image;
    响应于所述特征匹配损失不满足第四预定条件的情况,根据所述特征匹配损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述特征匹配损失用于表征所述各中间层第二网络单元块的输出特征与所述风格图像的风格特征的差异。In response to the case that the feature matching loss does not meet the fourth predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the feature matching loss; wherein, the feature matching loss It is used to characterize the difference between the output feature of the second network unit block of each intermediate layer and the style feature of the style image.
  24. 根据权利要求11至23任一项所述的方法,其中,所述提取所述风格图像的风格特征,包括:提取所述风格图像分布的特征;The method according to any one of claims 11 to 23, wherein the extracting the style features of the style image comprises: extracting the features of the style image distribution;
    对所述风格图像分布的特征进行采样,得到所述风格特征,所述风格特征包括所述风格图像分布的特征的均值和标准差。The characteristics of the style image distribution are sampled to obtain the style characteristics, and the style characteristics include the mean value and the standard deviation of the characteristics of the style image distribution.
  25. 根据权利要求11至24任一项所述的方法,其中,所述第一网络单元块提取内容图像的内容特征,包括:基于所述第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征;和/或,The method according to any one of claims 11 to 24, wherein the extraction of the content feature of the content image by the first network unit block comprises: based on a plurality of nerves organized in a residual structure in the first network unit block The network layer extracts the content features of the content image; and/or,
    经所述第二网络单元块对其输入的特征进行处理,包括:基于所述第二网络单元块中以残差结构组织的多个神经网络层对输入到所述第二网络单元的特征进行处理。Processing the input features of the second network unit block includes: processing the features input to the second network unit based on a plurality of neural network layers organized in a residual structure in the second network unit block deal with.
  26. 一种图像生成装置,所述装置包括第一提取模块、第二提取模块和第一处理模块,其中,An image generation device, the device includes a first extraction module, a second extraction module, and a first processing module, wherein:
    第一提取模块,用于利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;The first extraction module is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
    第二提取模块,用于提取风格图像的风格特征;The second extraction module is used to extract the style features of the style image;
    第一处理模块,用于将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应。The first processing module is configured to feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and change the style features from Feed forward input of the first layer second network unit block in the multi-layer second network unit block, and obtain the generated image output by the second neural network after each of the second network unit blocks process the characteristics of the respective input, Wherein, the multi-layer first network unit block corresponds to the multi-layer second network unit block.
  27. 根据权利要求26所述的装置,其中,所述第一处理模块,用于响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数。The apparatus according to claim 26, wherein the first processing module is configured to feed forward the content feature output by the first network unit block of the i-th layer to the T-th in response to the situation that i takes 1 to T in sequence. -i+1 layer of the second network unit block, i is a positive integer, and T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
  28. 根据权利要求26或27所述的装置,其中,所述各所述第二网络单元块中的首层第二网络单元块,用于将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将所述来自末层第一网络单元块的内容特征与所述首层第二网络单元块的中间特征进行加法运算,得到所述首层第二网络单元块的输出特征;将所述首层第二网络单元块的输出特征输入第二层第二网络单元块。The device according to claim 26 or 27, wherein the first-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the last-level first network unit block with the The style feature is multiplied to obtain the intermediate feature of the first-level second network unit block; the content feature from the last-level first network unit block is added to the intermediate feature of the first-level second network unit block , Obtain the output characteristics of the first layer second network unit block; input the output characteristics of the first layer second network unit block into the second layer second network unit block.
  29. 根据权利要求28所述的装置,其中,所述首层第二网络单元块,还用于在将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算前,对所述来自末层第一网络单元块的内容特征进行卷积运算。The device according to claim 28, wherein the first-level second network unit block is further configured to perform multiplication on the content feature from the last-level first network unit block and the style feature. The content features of the first network unit block from the last layer are subjected to convolution operation.
  30. 根据权利要求26至29任一项所述的装置,其中,所述各所述第二网络单元块中的中间层第二网络单元块,用于对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述中间层第二网络单元块的中间特征;将所述输入的内容特征与所述中间层第二网络单元块的中间特征进行加法运算,得到所述中间层第二网络单元块的输出特征;将所述中间层第二网络单元块的输出特征输入下一层第二网络单元块。The device according to any one of claims 26 to 29, wherein the middle layer second network unit block in each of the second network unit blocks is used to compare the input content characteristics and the upper layer second network The output feature of the unit block is multiplied to obtain the intermediate feature of the second network unit block of the intermediate layer; the content feature of the input and the intermediate feature of the second network unit block of the intermediate layer are added to obtain the Output characteristics of the second network unit block of the middle layer; input the output characteristics of the second network unit block of the middle layer into the second network unit block of the next layer.
  31. 根据权利要求30所述的装置,其中,所述中间层第二网络单元块,还用于在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运 算。The device according to claim 30, wherein the middle-level second network unit block is further configured to perform multiplication operations on the content feature of the input and the output feature of the upper-level second network unit block. The received content feature performs a convolution operation.
  32. 根据权利要求26至31任一项所述的装置,其中,所述各所述第二网络单元块中的末层第二网络单元块,用于将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述末层第二网络单元块的中间特征;将所述来自首层第一网络单元块的内容特征与所述末层第二网络单元块的中间特征进行加法运算,得到所述生成图像。The apparatus according to any one of claims 26 to 31, wherein the last-layer second network unit block in each of the second network unit blocks is used to combine content characteristics from the first-layer first network unit block Multiply the output feature of the second network unit block of the upper layer to obtain the intermediate feature of the second network unit block of the last layer; combine the content feature from the first network unit block of the first layer with the first network unit block of the last layer. The intermediate features of the two network unit blocks are added to obtain the generated image.
  33. 根据权利要求32所述的装置,其中,所述末层第二网络单元块,用于在对所述来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。The apparatus according to claim 32, wherein the second network unit block of the last layer is used to compare the content characteristics of the first network unit block from the first layer and the output characteristics of the second network unit block of the upper layer. Before performing the multiplication operation, perform a convolution operation on the content feature from the first network unit block of the first layer.
  34. 根据权利要求26至33任一项所述的装置,其中,所述第二提取模块,用于提取所述风格图像分布的特征;对所述风格图像分布的特征进行采样,得到所述风格特征,所述风格特征包括所述风格图像分布的特征的均值和标准差。The apparatus according to any one of claims 26 to 33, wherein the second extraction module is configured to extract the characteristics of the style image distribution; sampling the characteristics of the style image distribution to obtain the style characteristics The style feature includes the mean value and standard deviation of the feature of the style image distribution.
  35. 根据权利要求26至34任一项所述的装置,其中,所述第一网络单元块,用于基于所述第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征;和/或,The apparatus according to any one of claims 26 to 34, wherein the first network unit block is configured to extract content images based on multiple neural network layers organized in a residual structure in the first network unit block Content characteristics; and/or,
    所述第二网络单元块,用于基于所述第二网络单元块中以残差结构组织的多个神经网络层对输入到所述第二网络单元的特征进行处理。The second network unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
  36. 一种神经网络训练装置,所述装置包括第三提取模块、第四提取模块、第二处理模块和调整模块;其中,A neural network training device, which includes a third extraction module, a fourth extraction module, a second processing module, and an adjustment module; wherein,
    第三提取模块,用于利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;The third extraction module is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
    第四提取模块,用于提取风格图像的风格特征;The fourth extraction module is used to extract the style features of the style image;
    第二处理模块,用于将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像;对所述生成图像进行鉴别,得出鉴别结果;其中,所述多层第一网络单元块与所述多层第二网络单元块对应;The second processing module is configured to feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and change the style features from Feed forward input of the first layer of the second network unit block in the multi-layer second network unit block, and obtain the generated image output by the second neural network after each of the second network unit blocks processes the characteristics of the respective input; Authenticating the generated image to obtain an authentication result; wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block;
    调整模块,用于根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数。The adjustment module is configured to adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
  37. 根据权利要求36所述的装置,其中,所述第二处理模块,用于响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数。The device according to claim 36, wherein the second processing module is configured to feed forward the content feature output by the first network unit block of the i-th layer to the T-th in response to the situation that i takes 1 to T in sequence. -i+1 layer of the second network unit block, i is a positive integer, and T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
  38. 根据权利要求36或37所述的装置,其中,所述各所述第二网络单元块中的首层第二网络单元块,用于将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将所述来自末层第一网络单元块的内容特征与所述首层第二网络单元块的中间特征进行加法运算,得到所述首层第二网络单元块的输出特征;将所述首层第二网络单元块的输出特征输入第二层第二网络单元块。The apparatus according to claim 36 or 37, wherein the first-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the last-level first network unit block with the The style feature is multiplied to obtain the intermediate feature of the first-level second network unit block; the content feature from the last-level first network unit block is added to the intermediate feature of the first-level second network unit block , Obtain the output characteristics of the first layer second network unit block; input the output characteristics of the first layer second network unit block into the second layer second network unit block.
  39. 根据权利要求38所述的装置,其中,所述首层第二网络单元块,还用于在将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算前,对所述来自末层第一网络单元块的内容特征进行卷积运算。The apparatus according to claim 38, wherein the first-level second network unit block is further configured to perform multiplication on the content feature from the last-level first network unit block and the style feature. The content features of the first network unit block from the last layer are subjected to convolution operation.
  40. 根据权利要求36至39任一项所述的装置,其中,所述各所述第二网络单元块中的中间层第二网络单元块,用于对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述中间层第二网络单元块的中间特征;将所述输入的内容特征与所述中间层第二网络单元块的中间特征进行加法运算,得到所述中间层第二网络单元块的输出特征;将所述中间层第二网络单元块的输出特征输入下一层第二网络单元块。The device according to any one of claims 36 to 39, wherein the middle layer second network unit block in each of the second network unit blocks is used to compare the input content characteristics and the upper layer second network The output feature of the unit block is multiplied to obtain the intermediate feature of the second network unit block of the intermediate layer; the content feature of the input and the intermediate feature of the second network unit block of the intermediate layer are added to obtain the Output characteristics of the second network unit block of the middle layer; input the output characteristics of the second network unit block of the middle layer into the second network unit block of the next layer.
  41. 根据权利要求40所述的装置,其中,所述中间层第二网络单元块,还用于在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运算。The device according to claim 40, wherein the second network unit block of the middle layer is further configured to perform multiplication operations on the content characteristics of the input and the output characteristics of the second network unit block of the upper layer. The received content feature performs a convolution operation.
  42. 根据权利要求36至41任一项所述的装置,其中,所述各所述第二网络单元块中的末层第二网络单元块,用于将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述末层第二网络单元块的中间特征;将所述来自首层第一网络单元块的内容特征与所述末层第二网络单元块的中间特征进行加法运算,得到所述生成图像。The device according to any one of claims 36 to 41, wherein the last-layer second network unit block in each of the second network unit blocks is used to combine content characteristics from the first-layer first network unit block Multiply the output feature of the second network unit block of the upper layer to obtain the intermediate feature of the second network unit block of the last layer; combine the content feature from the first network unit block of the first layer with the first network unit block of the last layer. The intermediate features of the two network unit blocks are added to obtain the generated image.
  43. 根据权利要求42所述的装置,其中,所述末层第二网络单元块,还用于在对所述来自首层 第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。The device according to claim 42, wherein the last-level second network unit block is further used to compare the content characteristics from the first-level first network unit block and the output of the upper-level second network unit block. Before the feature is multiplied, convolution is performed on the content feature from the first network unit block of the first layer.
  44. 根据权利要求38至43任一项所述的装置,其中,所述调整模块,用于调整所述乘法运算参数和/或加法运算参数。The device according to any one of claims 38 to 43, wherein the adjustment module is configured to adjust the multiplication operation parameter and/or the addition operation parameter.
  45. 根据权利要求36至44任一项所述的装置,其中,所述调整模块,用于根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,确定生成对抗网络损失;响应于所述生成对抗网络损失不满足第一预定条件的情况,根据所述生成对抗网络损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述生成对抗网络损失用于表征所述生成图像与所述内容图像的内容特征差异、以及所述生成图像与所述风格图像的风格特征差异。The device according to any one of claims 36 to 44, wherein the adjustment module is configured to determine to generate a counter network loss according to the content image, the style image, the generated image, and the identification result; In response to the situation that the loss of the generative confrontation network does not meet the first predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the loss of the generative confrontation network; wherein, the generation The anti-network loss is used to characterize the difference in content characteristics between the generated image and the content image, and the difference in style characteristics between the generated image and the style image.
  46. 根据权利要求45所述的装置,其中,所述调整模块,还用于根据所述生成图像与所述风格图像,确定风格损失;响应于所述风格损失不满足第二预定条件的情况,根据所述风格损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述风格损失用于表征所述生成图像与所述风格图像的风格特征的差异。The device according to claim 45, wherein the adjustment module is further configured to determine a style loss according to the generated image and the style image; in response to the situation that the style loss does not meet a second predetermined condition, according to The style loss adjusts the network parameters of the first neural network and/or the second neural network; wherein the style loss is used to characterize the difference between the style characteristics of the generated image and the style image.
  47. 根据权利要求45或46所述的装置,其中,所述调整模块,还用于根据所述生成图像与所述内容图像,确定内容损失;响应于所述内容损失不满足第三预定条件的情况,根据所述内容损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述内容损失用于表征所述生成图像与所述内容图像的内容特征差异。The device according to claim 45 or 46, wherein the adjustment module is further configured to determine a content loss based on the generated image and the content image; in response to a situation where the content loss does not meet a third predetermined condition Adjust the network parameters of the first neural network and/or the second neural network according to the content loss; wherein the content loss is used to characterize the difference in content characteristics between the generated image and the content image.
  48. 根据权利要求45至47任一项所述的装置,其中,所述调整模块,还用于根据各所述第二网络单元块中的各中间层第二网络单元块的输出特征、以及风格图像,确定特征匹配损失;The apparatus according to any one of claims 45 to 47, wherein the adjustment module is further configured to perform according to the output characteristics of each intermediate layer second network unit block in each of the second network unit blocks and the style image , Determine the feature matching loss;
    响应于所述特征匹配损失不满足第四预定条件的情况,根据所述特征匹配损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述特征匹配损失用于表征所述各中间层第二网络单元块的输出特征与所述风格图像的风格特征的差异。In response to the case that the feature matching loss does not meet the fourth predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the feature matching loss; wherein, the feature matching loss It is used to characterize the difference between the output feature of the second network unit block of each intermediate layer and the style feature of the style image.
  49. 根据权利要求36至48任一项所述的装置,其中,所述第四提取模块,用于提取所述风格图像分布的特征;对所述风格图像分布的特征进行采样,得到所述风格特征,所述风格特征包括所述风格图像分布的特征的均值和标准差。The apparatus according to any one of claims 36 to 48, wherein the fourth extraction module is configured to extract the characteristics of the style image distribution; sampling the characteristics of the style image distribution to obtain the style characteristics The style feature includes the mean value and standard deviation of the feature of the style image distribution.
  50. 根据权利要求36至49任一项所述的装置,其中,所述第一网络单元块,用于基于所述第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征;和/或,The device according to any one of claims 36 to 49, wherein the first network unit block is configured to extract content images based on multiple neural network layers organized in a residual structure in the first network unit block Content characteristics; and/or,
    所述第二网络单元块,用于基于所述第二网络单元块中以残差结构组织的多个神经网络层对输入到所述第二网络单元的特征进行处理。The second network unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
  51. 一种电子设备,包括处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,An electronic device including a processor and a memory for storing a computer program that can run on the processor; wherein,
    所述处理器用于运行所述计算机程序时,执行权利要求1至10任一项所述的图像生成方法或权利要求11至25任一项所述的神经网络训练方法。When the processor is used to run the computer program, it executes the image generation method according to any one of claims 1 to 10 or the neural network training method according to any one of claims 11 to 25.
  52. 一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至10任一项所述的图像生成方法或权利要求11至25任一项所述的神经网络训练方法。A computer storage medium on which a computer program is stored, which when executed by a processor realizes the image generation method of any one of claims 1 to 10 or the neural network of any one of claims 11 to 25 Training method.
PCT/CN2020/076835 2019-06-24 2020-02-26 Image generating and neural network training method, apparatus, device, and medium WO2020258902A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020217017354A KR20210088656A (en) 2019-06-24 2020-02-26 Methods, devices, devices and media for image generation and neural network training
JP2021532473A JP2022512340A (en) 2019-06-24 2020-02-26 Image generation and neural network training methods, devices, equipment and media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910551145.3 2019-06-24
CN201910551145.3A CN112132167B (en) 2019-06-24 2019-06-24 Image generation and neural network training method, device, equipment and medium

Publications (1)

Publication Number Publication Date
WO2020258902A1 true WO2020258902A1 (en) 2020-12-30

Family

ID=73850015

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/076835 WO2020258902A1 (en) 2019-06-24 2020-02-26 Image generating and neural network training method, apparatus, device, and medium

Country Status (4)

Country Link
JP (1) JP2022512340A (en)
KR (1) KR20210088656A (en)
CN (1) CN112132167B (en)
WO (1) WO2020258902A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733946A (en) * 2021-01-14 2021-04-30 北京市商汤科技开发有限公司 Training sample generation method and device, electronic equipment and storage medium
CN113255813A (en) * 2021-06-02 2021-08-13 北京理工大学 Multi-style image generation method based on feature fusion

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20230137732A (en) * 2022-03-22 2023-10-05 삼성전자주식회사 Method and electronic device generating user-preffered content
KR102490503B1 (en) 2022-07-12 2023-01-19 프로메디우스 주식회사 Method and apparatus for processing image using cycle generative adversarial network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180068463A1 (en) * 2016-09-02 2018-03-08 Artomatix Ltd. Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures
CN108205803A (en) * 2017-07-19 2018-06-26 北京市商汤科技开发有限公司 Image processing method, the training method of neural network model and device
CN108205813A (en) * 2016-12-16 2018-06-26 微软技术许可有限责任公司 Image stylization based on learning network
CN109766895A (en) * 2019-01-03 2019-05-17 京东方科技集团股份有限公司 The training method and image Style Transfer method of convolutional neural networks for image Style Transfer
CN109840924A (en) * 2018-12-28 2019-06-04 浙江工业大学 A kind of product image rapid generation based on series connection confrontation network
CN109919829A (en) * 2019-01-17 2019-06-21 北京达佳互联信息技术有限公司 Image Style Transfer method, apparatus and computer readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018132855A (en) * 2017-02-14 2018-08-23 国立大学法人電気通信大学 Image style conversion apparatus, image style conversion method and image style conversion program
GB201800811D0 (en) * 2018-01-18 2018-03-07 Univ Oxford Innovation Ltd Localising a vehicle
CN109919828B (en) * 2019-01-16 2023-01-06 中德(珠海)人工智能研究院有限公司 Method for judging difference between 3D models

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180068463A1 (en) * 2016-09-02 2018-03-08 Artomatix Ltd. Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures
CN108205813A (en) * 2016-12-16 2018-06-26 微软技术许可有限责任公司 Image stylization based on learning network
CN108205803A (en) * 2017-07-19 2018-06-26 北京市商汤科技开发有限公司 Image processing method, the training method of neural network model and device
CN109840924A (en) * 2018-12-28 2019-06-04 浙江工业大学 A kind of product image rapid generation based on series connection confrontation network
CN109766895A (en) * 2019-01-03 2019-05-17 京东方科技集团股份有限公司 The training method and image Style Transfer method of convolutional neural networks for image Style Transfer
CN109919829A (en) * 2019-01-17 2019-06-21 北京达佳互联信息技术有限公司 Image Style Transfer method, apparatus and computer readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733946A (en) * 2021-01-14 2021-04-30 北京市商汤科技开发有限公司 Training sample generation method and device, electronic equipment and storage medium
CN112733946B (en) * 2021-01-14 2023-09-19 北京市商汤科技开发有限公司 Training sample generation method and device, electronic equipment and storage medium
CN113255813A (en) * 2021-06-02 2021-08-13 北京理工大学 Multi-style image generation method based on feature fusion
CN113255813B (en) * 2021-06-02 2022-12-02 北京理工大学 Multi-style image generation method based on feature fusion

Also Published As

Publication number Publication date
JP2022512340A (en) 2022-02-03
CN112132167A (en) 2020-12-25
CN112132167B (en) 2024-04-16
KR20210088656A (en) 2021-07-14

Similar Documents

Publication Publication Date Title
WO2020258902A1 (en) Image generating and neural network training method, apparatus, device, and medium
CN109241880B (en) Image processing method, image processing apparatus, computer-readable storage medium
WO2019100723A1 (en) Method and device for training multi-label classification model
CN106415594B (en) Method and system for face verification
CN112446476A (en) Neural network model compression method, device, storage medium and chip
CN110929622A (en) Video classification method, model training method, device, equipment and storage medium
CN110543841A (en) Pedestrian re-identification method, system, electronic device and medium
CN109508717A (en) A kind of licence plate recognition method, identification device, identification equipment and readable storage medium storing program for executing
WO2015180101A1 (en) Compact face representation
CN109377532B (en) Image processing method and device based on neural network
CN111340077B (en) Attention mechanism-based disparity map acquisition method and device
CN114549913B (en) Semantic segmentation method and device, computer equipment and storage medium
CN108021908B (en) Face age group identification method and device, computer device and readable storage medium
Chiaroni et al. Learning with a generative adversarial network from a positive unlabeled dataset for image classification
CN114418030B (en) Image classification method, training method and device for image classification model
JP2019508803A (en) Method, apparatus and electronic device for training neural network model
CN111898703A (en) Multi-label video classification method, model training method, device and medium
An et al. Weather classification using convolutional neural networks
CN112446888A (en) Processing method and processing device for image segmentation model
CN114064627A (en) Knowledge graph link completion method and system for multiple relations
CN109492610A (en) A kind of pedestrian recognition methods, device and readable storage medium storing program for executing again
CN112949706B (en) OCR training data generation method, device, computer equipment and storage medium
JP6935868B2 (en) Image recognition device, image recognition method, and program
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN110717401A (en) Age estimation method and device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20832168

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20217017354

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021532473

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20832168

Country of ref document: EP

Kind code of ref document: A1