WO2020258902A1 - Image generating and neural network training method, apparatus, device, and medium - Google Patents
Image generating and neural network training method, apparatus, device, and medium Download PDFInfo
- Publication number
- WO2020258902A1 WO2020258902A1 PCT/CN2020/076835 CN2020076835W WO2020258902A1 WO 2020258902 A1 WO2020258902 A1 WO 2020258902A1 CN 2020076835 W CN2020076835 W CN 2020076835W WO 2020258902 A1 WO2020258902 A1 WO 2020258902A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network unit
- unit block
- layer
- network
- content
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
Definitions
- the present disclosure relates to the field of image processing, and in particular to an image generation method and neural network training method, device, electronic equipment, and computer storage medium.
- the method of image generation can be to generate from one real image to another, and then subjectively judge whether the generated image is more realistic through human vision.
- neural network-based image generation methods have emerged in related technologies.
- the neural network can usually be trained based on paired data, and then the content image can be styled through the trained neural network.
- paired data is represented by The content image and style image that have the same content characteristics for training, and the style image and the content image have different style characteristics.
- this method is not easy to implement.
- the embodiments of the present disclosure are expected to provide a technical solution for image generation.
- an embodiment of the present disclosure provides an image generation method, the method includes: extracting content features of a content image by using sequentially connected multi-layer first network unit blocks in a first neural network to obtain first The content features respectively output by the network unit blocks; the style features of the style images are extracted; the content features respectively output by the first network unit blocks of each layer are correspondingly fed forward into the second neural network connected to the multilayer second network unit sequentially Block, and feed-forward the style features from the first-layer second network unit block in the multi-layer second network unit block, and obtain the input feature after each second network unit block processes The generated image output by the second neural network, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
- the embodiments of the present disclosure also propose a neural network training method.
- the method further includes: extracting the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain each The content features respectively output by the first network unit block of the layer; extract the style features of the style image; the content features respectively output by the first network unit blocks of each layer are fed forward into the sequentially connected multi-layer first neural network in the second neural network.
- Two network unit blocks and feed forward the style features from the first-layer second network unit block in the multi-layer second network unit block, and after each second network unit block processes the respective input features Obtain the generated image output by the second neural network, where the multi-layer first network unit block corresponds to the multi-layer second network unit block; the generated image is identified to obtain the authentication result; Adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
- an embodiment of the present disclosure also provides an image generation device.
- the device includes a first extraction module, a second extraction module, and a first processing module.
- the first extraction module is used to use the first neural network.
- the first network unit blocks of multiple layers connected in sequence extract the content features of the content image to obtain the content features respectively output by the first network unit blocks of each layer;
- the second extraction module is used to extract the style features of the style image;
- the first processing Module used to feed forward the content features respectively outputted by the first network unit blocks of each layer into the second neural network connected in sequence in the second neural network, and transfer the style features from the multiple
- the first layer second network unit block in the second layer of the network unit block feeds forward input, and the generated image output by the second neural network is obtained after each second network unit block processes the characteristics of each input, wherein
- the multi-layer first network unit block corresponds to the multi-layer second network unit block.
- the embodiments of the present disclosure also provide a neural network training device, which includes a third extraction module, a fourth extraction module, a second processing module, and an adjustment module; wherein, the third extraction module is used to use In the first neural network, the sequentially connected multi-layer first network unit blocks extract the content features of the content image, and obtain the content features respectively output by the first network unit blocks of each layer; the fourth extraction module is used to extract the style features of the style image
- the second processing module is used to feed the content features respectively output by the first network unit blocks of each layer into the second neural network sequentially connected multi-layer second network unit blocks corresponding to the feedforward input, and the style features Feed forward input from the first-layer second network unit block in the multi-layer second network unit block, and obtain the generated image output by the second neural network after each of the second network unit blocks process the features of their respective inputs Identify the generated image to obtain an identification result; wherein, the multi-layer first network unit block corresponds to the multi-layer second network unit block; an adjustment module is used for
- the embodiments of the present disclosure also propose an electronic device, including a processor and a memory for storing a computer program that can run on the processor; wherein, when the processor is used to run the computer program, execute Any one of the above image generation methods or any one of the above neural network training methods.
- the embodiments of the present disclosure also propose a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, any one of the foregoing image generation methods or any of the foregoing neural network training methods is implemented.
- the content features of the content image are extracted by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain The content features respectively output by the first network unit blocks of each layer; the style features of the style images are extracted; the content features respectively output by the first network unit blocks of each layer are correspondingly fed forward into the sequentially connected multilayers in the second neural network
- the second network unit block, and the style feature is fed forward from the first-layer second network unit block in the multi-layer second network unit block, and the respective input features are processed by each of the second network unit blocks
- the generated image output by the second neural network is obtained, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
- both the content image and the style image can be determined in actual need, and the content image and the style image do not need to be a pair of images, which is easy to implement; in addition, in the process of image generation, the first neural network can be used The first network unit block of each layer extracts the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that the generated image retains more semantic information compared with the content image. Therefore, The generated image is more realistic.
- FIG. 1 is a flowchart of an image generation method according to an embodiment of the disclosure
- FIG. 2 is a schematic diagram of the structure of a neural network pre-trained in an embodiment of the disclosure
- FIG. 3 is an exemplary structural diagram of a content encoder according to an embodiment of the disclosure.
- FIG. 4 is a schematic diagram of an exemplary structure of a CRB in an embodiment of the disclosure.
- FIG. 5 is an exemplary structural diagram of a generator of an embodiment of the disclosure.
- Fig. 6 shows several exemplary sets of content images, style images, and generated images in the embodiments of the disclosure
- Fig. 7 is a flowchart of a neural network training method according to an embodiment of the disclosure.
- FIG. 8 is a schematic structural diagram of the framework of the image generation method proposed by the application embodiment of the disclosure.
- Fig. 9a is a schematic structural diagram of a residual block of a content encoder in an application embodiment of the present disclosure.
- Fig. 9b is a schematic structural diagram of a residual block of the generator in an application embodiment of the present disclosure.
- FIG. 9c is a schematic structural diagram of the FADE module of the application embodiment of the disclosure.
- FIG. 10 is a schematic diagram of the composition structure of an image generating device according to an embodiment of the disclosure.
- FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
- FIG. 12 is a schematic diagram of the composition structure of a neural network training device according to an embodiment of the disclosure.
- the terms "including”, “including” or any other variations thereof are intended to cover non-exclusive inclusion, so that a method or device including a series of elements not only includes what is clearly stated Elements, but also include other elements not explicitly listed, or elements inherent to the implementation of the method or device. Without more restrictions, the element defined by the sentence “including a" does not exclude the existence of other related elements (such as steps or steps in the method) in the method or device that includes the element.
- the unit in the device for example, the unit may be part of a circuit, part of a processor, part of a program or software, etc.).
- the image generation method and neural network training method provided by the embodiments of the present disclosure include a series of steps, but the image generation method and neural network training method provided by the embodiments of the present disclosure are not limited to the recorded steps.
- the present disclosure The image generation device and neural network training device provided in the embodiments include a series of modules, but the devices provided in the embodiments of the present disclosure are not limited to include the explicitly recorded modules, and may also include information for obtaining relevant information or processing based on information. The module that needs to be set.
- the embodiments of the present disclosure can be applied to a computer system composed of a terminal and a server, and can operate with many other general-purpose or special-purpose computing system environments or configurations.
- the terminal can be a thin client, a thick client, a handheld or laptop device, a microprocessor-based system, a set-top box, a programmable consumer electronics product, a network personal computer, a vehicle-mounted device, a small computer system, etc.
- the server can It is server computer system, small computer system, large computer system and distributed cloud computing technology environment including any of the above systems, etc.
- Electronic devices such as terminals and servers can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system.
- program modules may include routines, programs, object programs, components, logic, data structures, etc., which perform specific tasks or implement specific abstract data types.
- the computer system/server can be implemented in a distributed cloud computing environment. In the distributed cloud computing environment, tasks are executed by remote processing equipment linked through a communication network.
- program modules may be located on a storage medium of a local or remote computing system including a storage device.
- an image generation method is proposed.
- the applicable scenarios of the embodiments of the present disclosure include but are not limited to automatic driving, image generation, image synthesis, computer vision, deep learning, Machine learning, etc.
- FIG. 1 is a flowchart of an image generation method according to an embodiment of the disclosure. As shown in FIG. 1, the method may include:
- Step 101 Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer.
- the content image may be an image that requires style conversion; for example, the content image may be obtained from a local storage area or the content image may be obtained from the network.
- the content image may be an image taken by a mobile terminal or a camera.
- the format of the content image can be Joint Photographic Experts GROUP (JPEG), Bitmap (BMP), Portable Network Graphics (PNG) or other formats; it should be noted that this is only for The format and source of the content image are exemplified, and the embodiment of the present disclosure does not limit the format and source of the content image.
- content features and style features can be extracted.
- the content feature is used to characterize the content information of the image, for example, the content feature represents the object position, object shape, object size, etc. in the image
- the style feature is used to represent the style information of the content image, for example, the style feature is used to represent weather, Style information such as day, night, and conversation style.
- the style conversion may refer to the conversion of the style feature of the content image into another style feature.
- the conversion of the style feature of the content image may be the conversion from day to night, and from night to day.
- the conversion between styles can be from sunny to rainy, rainy to sunny, sunny to cloudy, cloudy to sunny, cloudy to rainy, rainy to cloudy, sunny to cloudy Snow conversion, snow to sunny conversion, cloudy to snow conversion, snow to cloudy conversion, snow to rain conversion or rain to snow conversion, etc.
- the conversion of different painting styles can be oil painting Conversion to ink painting, ink painting to oil painting, oil painting to sketch painting, sketch painting to oil painting, sketch painting to ink painting or ink painting to sketch painting, etc.
- the first neural network is a network for extracting content features of content images, and the embodiment of the present disclosure does not limit the type of the first neural network.
- the first neural network includes sequentially connected multi-layer first network unit blocks.
- the content characteristics of the content image can be changed from the first network unit block of the multi-layer first network.
- Feed forward input of the first network unit block of the layer the data processing direction corresponding to the feedforward input represents the data processing direction from the input end of the neural network to the output end, corresponding to the forward propagation or forward propagation process; for the feedforward input process, the upper layer of the neural network unit block
- the output result is used as the input result of the next layer of network unit block.
- the first network unit block of each layer of the first neural network can extract content features for the input data, that is, the output result of the first network unit block of each layer of the first neural network is the corresponding layer first network
- the content characteristics of the unit blocks and the content characteristics output by different first network unit blocks in the first neural network are different.
- the representation mode of the content feature of the content image may be a content feature map or other representation mode, which is not limited in the embodiment of the present disclosure.
- each layer of the first network unit block in the first neural network is a plurality of neural network layers organized in a residual structure, so that it can be based on multiple layers of the first network unit block in each layer organized in a residual structure
- the neural network layer extracts the content features of the content image.
- Step 102 Extract the style features of the style image.
- the style image is an image with the target style feature
- the target style feature represents the style feature to which the content image needs to be converted
- the style image can be set as needed.
- the target style feature to be converted after acquiring the content image, the target style feature to be converted can be determined, and then the style image can be selected according to the demand.
- the style image can be obtained from the local storage area or the network.
- the style image can be an image taken through a mobile terminal or camera;
- the format of the style image can be JPEG, BMP, PNG or other formats; Yes, here is only an example of the format and source of the style image, and the embodiment of the present disclosure does not limit the format and source of the style image.
- the style feature of the content image is different from the style feature of the style image
- the purpose of performing style conversion on the content image may be: to make the generated image obtained after the style conversion have the content feature and style of the content image The stylistic characteristics of the image.
- the extracting the style features of the style image includes: extracting the features of the style image distribution; sampling the features of the style image distribution to obtain the style features, and the style features include the style image distribution The mean and standard deviation of the features.
- the style characteristics of the style image can be accurately extracted, which is conducive to accurate style conversion of the content image.
- at least one layer of convolution operation may be performed on the style image to obtain the characteristics of the style image distribution.
- Step 103 Feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and remove the style features from the multi-layer second network unit blocks
- the first layer of the second network unit block feed forward input, and the generated image of the second neural network output is obtained after each second network unit block processes the respective input characteristics.
- the multi-layer first network unit block and the multi-layer second The network unit block corresponds.
- the second neural network includes successively connected multi-layer second network unit blocks, and the output result of the previous network unit block in the second neural network is the input result of the next network unit block; optionally,
- the second network unit blocks of each layer in the second neural network are multiple neural network layers organized in a residual structure. In this way, it can be based on the multiple neural network layer pairs organized in a residual structure in each second network unit block.
- the input features are processed.
- step 101 to step 103 can be implemented by a processor in an electronic device.
- the processor can be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), Digital signal processing device (Digital Signal Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), FPGA, central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor At least one.
- ASIC Application Specific Integrated Circuit
- DSP Digital Signal Processor
- DSPD Digital Signal Processing Device
- PLD programmable logic device
- FPGA central processing unit
- CPU Central Processing Unit
- controller microcontroller
- microprocessor At least one.
- both the content image and the style image can be determined according to actual needs, and the content image and the style image do not need to be a pair of images, which is easy to implement; in addition, in the process of image generation, The first network unit block of each layer of the first neural network is used to extract the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that the generated image retains more than the content image. Semantic information, therefore, the generated image is more realistic.
- the style of the style image can be determined according to actual needs, and does not limit the style characteristics of the style image and the style characteristics of the style image used when training the neural network. That is to say, the training image of the dark night style is used when the neural network is trained, but when the image is generated based on the trained neural network, you can choose the content image and the snowy style, the rainy style or other styles Images, which generate images that meet the actual needs of the style, not only the dark night style images, and improve the generalization and universality of the image generation method.
- style images with different style characteristics can be set according to user needs, and then generated images with different style characteristics can be obtained for one content image.
- generated images with different style characteristics can be obtained for one content image.
- dark night style, cloudy style and rainy style that is, based on the same content image, multiple styles of generated images can be obtained, not only one style of image can be generated, and the applicability of the image generation method is improved.
- the number of layers of the first network unit block of the first neural network and the number of layers of the second network unit block of the second neural network may be the same, and the first network unit block of each layer of the first neural network It forms a one-to-one correspondence with the second network unit blocks of each layer of the second neural network.
- the corresponding feedforward input of the content features respectively output by the first network unit blocks of each layer to the sequentially connected multi-layer second network unit blocks in the second neural network includes: sequentially responding to i In the case of 1 to T, the content features output by the first network unit block of the i-th layer are fed forward into the second network unit block of the T-i+1th layer, i is a positive integer, and T represents the first nerve
- T represents the first nerve
- the content features output by the first network unit block of the last layer are input to the second network unit block of the first layer.
- the received content feature of the second network unit block of each layer in the second neural network is the output feature of the first network unit block of each layer of the first neural network, and the second network unit of each layer in the second neural network
- the received content characteristics of the block vary with different positions in the second neural network.
- the second neural network uses style features as input. As the style features deepen from the lower-level second network unit block of the second neural network to the high-level second network unit block, more content features can be integrated, which can be based on The style feature gradually merges the semantic information of each layer of the content image, so that the resulting image can retain the multi-layer voice information and style feature information of the content image.
- the feature processing of the first-level second network unit block in each of the second network unit blocks includes: the content feature and the style feature from the last-level first network unit block can be multiplied , Obtain the intermediate feature of the first-level second network unit block; add the content feature from the first-level first network unit block of the last layer and the intermediate feature of the first-level second network unit block to obtain the first-level second network unit block
- the output characteristics of the first layer of the second network unit block input the output characteristics of the second layer of the second network unit block.
- a convolution operation may be performed on the content feature from the first network unit block of the last layer. That is, it is possible to first perform a convolution operation on the content features from the first network unit block of the last layer, and then perform a multiplication operation on the result of the convolution operation and the style feature.
- the input feature processing of the middle layer second network unit block in each second network unit block includes: the input content feature and the output feature of the upper layer second network unit block can be multiplied , Get the intermediate feature of the second network unit block of the middle layer; add the input content feature and the intermediate feature of the second network unit block of the middle layer to obtain the output feature of the second network unit block of the middle layer; The output feature of the network unit block is input to the second network unit block of the next layer. It can be seen that by performing the above-mentioned multiplication operation and addition operation, it is convenient to realize the fusion of the output characteristics of the second network unit block of the upper layer and the corresponding content characteristics.
- the second network unit block in the middle layer is the second network unit block in the second neural network except the first layer second network unit block and the last layer second network unit block.
- the second neural network There can be one intermediate second network unit block, or there can be multiple second network unit blocks; the above-mentioned content is only an intermediate second network unit block as an example, the data of the intermediate second network unit block The processing procedure is explained.
- the intermediate layer second network unit block performs a convolution operation on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block.
- the input feature processing of the second network unit block of the last layer in each second network unit block includes: the content characteristics from the first network unit block of the first layer can be combined with the second network unit of the upper layer.
- the output feature of the block is multiplied to obtain the intermediate feature of the second network unit block of the last layer; the content feature from the first network unit block of the first layer and the intermediate feature of the second network unit block of the last layer are added to obtain the generated image .
- the second network unit block of the last layer performs a multiplication operation on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer, and performs the multiplication operation on the first network unit block from the first layer.
- the content feature of the unit block is subjected to convolution operation.
- FIG. 2 is a schematic structural diagram of a neural network pre-trained in an embodiment of the disclosure.
- the pre-trained neural network includes a content encoder, a style encoder, and a generator; wherein the content encoder is used to utilize the first neural network described above.
- the network extracts the content features of the content image, the style encoder is used to extract the style features of the style image, and the generator is used to use the second neural network to realize the fusion of the style features and the content features output by the first network unit blocks of each layer.
- the first neural network can be used as the content encoder
- the second neural network can be used as the generator
- the neural network used for style feature extraction on the style image can be used as the style encoder.
- the image to be processed ie, content image
- the multi-layer first network unit block of the first neural network can be used for processing, and each layer of the first network unit block
- the content feature can be output;
- the style image can also be input into the style encoder, and the style feature of the style image can be extracted from the style encoder.
- the first network unit block is a residual block (Residual Block, RB)
- the content feature output by the first network unit block of each layer is a content feature map.
- Fig. 3 is a schematic diagram of an exemplary structure of the content encoder according to the embodiment of the disclosure.
- the residual block of the content encoder can be marked as CRB, and the content encoder includes seven layers of CRB, the CRB( In A, B), A represents the number of input channels and B represents the number of output channels; in Figure 3, the input of CRB(3,64) is the content image, and the first CRB to the seventh CRB are arranged from bottom to top.
- the first layer CRB to the seventh layer CRB can output seven content feature maps respectively.
- FIG. 4 is an exemplary structural diagram of the CRB of an embodiment of the disclosure.
- sync BN represents a synchronous BN layer
- a rectified linear unit (ReLu) represents a ReLu layer
- Conv represents a convolutional layer. Represents a summation operation; the structure of CRB shown in Figure 4 is the structure of a standard residual block.
- a standard residual network structure can be used to extract content features, which facilitates the extraction of content features of content images and reduces semantic information loss.
- the multi-layer second network unit block of the second neural network can be used for processing; for example, the second network unit block is RB.
- FIG. 5 is an exemplary structural diagram of the generator of the embodiment of the disclosure.
- the residual block in the generator can be denoted as GB, and the generator can include seven layers of GB, and the input of each layer of GB is The output of one layer of CRB of the content encoder; in the generator, the first layer GB to the seventh layer GB are GB ResBlk (1024), GB ResBlk (1024), GB ResBlk (1024), arranged from top to bottom, respectively GB ResBlk (512), GB ResBlk (256), GB ResBlk (128), and GB ResBlk (64); in GB ResBlk (C) in Figure 5, C represents the number of channels; the first layer of GB is used to receive style features, The first layer GB to the seventh layer GB are used to receive the content feature maps output from the seventh layer CRB to the first layer CRB; after each layer GB processes the input features, the seventh layer GB output can be used to generate an image.
- the structural information of the content image can be encoded to generate multiple content feature maps of different levels; the content encoder can extract more abstract features in the deep layer, and in the surface layer A lot of structural information is retained.
- the image generation method of the embodiment of the present disclosure can be applied to various image generation scenarios, for example, can be applied to scenarios such as image entertainment data generation, automatic driving model training test data generation, and the like.
- Figure 6 shows several exemplary sets of content images, style images, and generated images in the embodiments of the present disclosure.
- the first column represents content images
- the second column represents style images
- the third column represents implementations based on the present disclosure.
- the generated image obtained by the image generation method of the example the image in the same row represents a group of content images, style images and generated images; the style conversion from the first row to the last row is from day to night, night to day, sunny to rainy. , Rainy to sunny, sunny to cloudy, cloudy to sunny, sunny to snow, and snowy to sunny style conversion, as can be seen from FIG. 6, the generated image based on the image generation method of the embodiment of the present disclosure can be The content information of the content image and the style information of the style image are retained.
- the training process of the neural network in the embodiments of the present disclosure not only the forward propagation process from input to output is involved, but also the back propagation process from output to input; the training process of the neural network of the present disclosure can be used before Use the reverse process to generate images and use the reverse process to adjust the network parameters of the neural network.
- FIG. 7 is a flowchart of a neural network training method according to an embodiment of the disclosure. As shown in FIG. 7, the process may include:
- Step 701 Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer.
- Step 702 Extract the style features of the style image.
- Step 703 Feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and remove the style features from the multi-layer second network unit blocks
- the first layer of the second network unit block feed forward input, and the generated image of the second neural network output is obtained after each second network unit block processes the respective input characteristics.
- the multi-layer first network unit block and the multi-layer second The network unit block corresponds.
- steps 701 to 703 in this embodiment is the same as the implementation of steps 101 to 103, and will not be repeated here.
- Step 704 Discriminate the generated image, and obtain an identification result.
- the output image generated by the generator needs to be identified.
- the purpose of discriminating the generated image is to determine the probability that the generated image is a real image; in practical applications, this step can be implemented using a discriminator or the like.
- Step 705 Adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
- the network parameters of the first neural network and/or the second neural network can be adjusted based on the reverse process according to the content image, style image, generated image, and identification result, and then the forward process can be used to obtain the generated image again And the identification result, so, through the above-mentioned forward process and reverse process repeatedly, the network iterative optimization of the neural network is performed until the predetermined training completion conditions are met, and the trained neural network for image generation can be obtained. .
- steps 701 to 705 can be implemented by a processor in an electronic device.
- the aforementioned processor can be at least ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor.
- ASIC ASIC
- DSP digital signal processor
- DSPD DSPD
- PLD PLD
- FPGA field-programmable gate array
- both the content image and the style image can be determined according to actual needs, and the content image and the style image do not need to be a pair of images, which is easy to implement.
- the first network unit blocks of each layer of the first neural network can be used to extract the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that Compared with the content image, the generated image retains more semantic information; in turn, the trained neural network can better maintain the semantic information of the content image.
- the parameters of the above-mentioned multiplication operation and/or addition operation used in the second network unit block of each layer can be adjusted.
- the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result includes : Determine the Generative Adversarial Net (GAN) loss according to the content image, style image, generated image, and identification results; wherein the Generative Adversarial Net (GAN) loss is used to characterize the difference in content characteristics between the generated image and the content image.
- the difference in style characteristics between the generated image and the style image in one example, the generated confrontation network includes a generator and a discriminator; in response to the loss of the generated confrontation network that does not meet the first predetermined condition, adjust the first Network parameters of the neural network and/or the second neural network.
- the network parameters of the first neural network and/or the second neural network can be adjusted based on the generation of countermeasures against the network loss, and a minimax strategy can be adopted.
- the first predetermined condition may represent a predetermined training completion condition; it is understandable that according to the meaning of generating a confrontation network loss, training the neural network based on the loss of the generation confrontation network can make the generated image obtained based on the trained neural network, It has a high performance of maintaining the content characteristics of the content image and the style characteristics of the style image.
- the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result further includes: According to the generated image and the style image, determine the style loss; in response to the situation that the style loss does not meet the second predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the style loss; wherein, the style loss It is used to characterize the difference between the style characteristics of the generated image and the style image.
- the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result further includes: Determine the content loss according to the generated image and the content image; in response to the content loss not meeting the third predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the content loss; wherein the content loss is used for Characterize the difference in content characteristics between the generated image and the content image.
- the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result further includes: Determine the feature matching loss according to the output features of each intermediate layer second network unit block in each second network unit block and the style image; in response to the feature matching loss not satisfying the fourth predetermined condition, adjust according to the feature matching loss The network parameters of the first neural network and/or the second neural network; wherein, the feature matching loss is used to characterize the difference between the output feature of the second network unit block of each intermediate layer and the style feature of the style image.
- the aforementioned second predetermined condition, third predetermined condition, and fourth predetermined condition may represent predetermined training completion conditions; it is understandable that according to the meaning of style loss, content loss or feature matching loss, it can be known that based on style loss, content loss or feature
- the matching loss training neural network can make the generated image based on the trained neural network have a higher performance of maintaining the content characteristics of the content image.
- a neural network can be trained based on the foregoing one loss or multiple losses.
- the trained neural network can be obtained when the loss meets the corresponding predetermined condition;
- the accuracy of the style conversion of the trained neural network is higher.
- the generation of countermeasure network loss, style loss, content loss, or feature matching loss can be represented by a loss function.
- the training process of the neural network method can be implemented based on the content encoder, style encoder, generator and discriminator, etc., and the process of image generation based on the completion of the training of the neural network method can be based on the content encoder, style encoding Implements such as generators and generators.
- FIG. 8 is a schematic structural diagram of the framework of the image generation method proposed by the application embodiment of the disclosure.
- the input of the content encoder is the image to be processed (that is, the content image), which is used to extract the content characteristics of the content image;
- the encoder is responsible for extracting the style features of the style image;
- the generator combines the content features and style features of the first network unit blocks of different layers to generate high-quality images.
- the discriminator used in the neural network training process is not shown in FIG. 8.
- the content encoder includes multiple layers of residual blocks, CRB-1, CRB-2...CRB-T respectively represent the layer 1 residual block to the T layer residual block of the content encoder; generate The generator includes multiple layers of residual blocks, GB-1...GB-T-1, GB-T respectively represent the first layer to the T-th residual block of the generator.
- the output result of the i-th layer residual block of the content encoder is input to the T-i+1th layer residual block of the generator; the input of the style encoder is style
- the image is used to extract the style feature of the style image, and the style feature is input into the first layer residual block of the generator.
- the output image is obtained based on the output result of the T-th layer residual block GB-T of the generator.
- f i is defined as the content feature map output from the i-th layer residual block of the content encoder, using Represents the output characteristics of the i-th residual block of the generator.
- the i-th residual block of the content encoder corresponds to the T-i+1-th layer residual block of the generator;
- F i with the same number of channels, N denotes the size of the batch,
- C i represents the number of the channel;
- H i and W i indicates the height and width, respectively.
- the activation value (n ⁇ [1,N], c ⁇ [1,C i ], h ⁇ [1,H i ], ⁇ [1,W i ]) can be expressed as formula (1).
- Both correspond to the i-th residual block of the generator, and respectively represent the mean and standard deviation of the features output by the residual block of the previous layer (that is, the residual block of the second neural network), with It can be calculated according to formula (2).
- the image generation method of the application embodiment of the present disclosure is feature-adaptive, that is, the modulation parameter can be calculated directly based on the content characteristics of the content image; and in the related image generation method, the modulation The parameters are unchanged.
- Figure 9a is a schematic structural diagram of the residual block of the content encoder in the application embodiment of the disclosure.
- BN represents the BN layer
- ReLu represents the ReLu layer
- Conv represents the convolutional layer. Represents the summation operation;
- the structure of each residual block CRB of the content encoder is the structure of the standard residual block, and each residual block of the content encoder includes three convolutional layers, one of which is used to skip the connection (skip connection).
- Fig. 9b is a schematic structural diagram of the residual block of the generator in the application embodiment of the present disclosure, as shown in Fig. 9b, in the standard residual block
- the FADE module is used to replace the BN layer to obtain the structure of the residual block GB of each layer of the generator
- F1, F2 and F3 represent the first FADE module, the second FADE module and the third FADE module, respectively.
- each FADE module in each residual block of the generator, the input of each FADE module includes the corresponding content feature map output by the content encoder, refer to Figure 9b, in each residual block of the generator, in the generator Among the three FADE modules of each residual block, the input of F1 and F2 also includes the output characteristics of the previous residual block of the second neural network, and the input of F3 also includes the F1, ReLu layer and convolutional layer in turn Features obtained after processing.
- Fig. 9c is a schematic diagram of the structure of the FADE module of the application embodiment of the present disclosure. Represents the multiplication operation, Means addition; Conv means convolutional layer, BN means BN layer; ⁇ and ⁇ represent the modulation parameters of each residual block of the generator. It can be seen that FADE takes the content feature map as input, which can be obtained from the convolutional features Derive denormalization parameters.
- the trained neural network is made to adaptively transform the content image under the control of the style image.
- the style encoder is proposed based on the Variational Adaptive Encoder (VAE).
- VAE Variational Adaptive Encoder
- the output of the style encoder is a mean vector And standard deviation vector Latent code (latent code) z is derived from the resampling of style images after encoding
- ⁇ be a uniformly distributed random vector with the same size as z; here, ⁇ ( ⁇
- various parts of the entire neural network can be jointly trained.
- the loss function of the entire first neural network can be calculated by referring to formula (3) based on the optimization of the minimax strategy, and then the training of the first neural network can be realized.
- G represents the generator
- D represents the discriminator
- L VAE (E s , G) represents the style loss.
- the style loss can be the loss of Kullback-Leibler divergence;
- L VAE (E s , G) can be calculated according to formula (4).
- KL( ⁇ ) represents the KL divergence
- ⁇ 0 represents the hyperparameter in L VAE (E s ,G).
- L GAN (E s ,E c ,G,D) represents the loss of the generated adversarial network, which is used in the adversarial training of the generator and discriminator;
- L GAN (E s ,E c ,G,D) can be based on the formula ( 5) Perform calculations.
- L VGG (E s , E c , G) represents content loss.
- the content loss may be a VGG (Visual Geometry Group) loss.
- L VGG (E s , E c , G) can be calculated according to formula (6).
- L FM (E s , E c , G) represents the feature matching loss;
- L FM (E s , E c , G) can be calculated according to formula (7).
- the VGG loss has different weights in different layers.
- the first neural network is trained based on multi-scale discriminators, and each discriminator on different scales has exactly the same structure; the discriminator with the roughest scale has the largest receptive field; The discriminator can distinguish higher-resolution images.
- the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
- the specific execution order of each step should be based on its function and possibility.
- the inner logic is determined.
- FIG. 10 is a schematic diagram of the composition structure of an image generation device according to an embodiment of the disclosure. As shown in FIG. 10, the device includes: a first extraction module 1001, a second extraction module 1002, and a first processing module 1003, wherein:
- the first extraction module 1001 is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
- the second extraction module 1002 is used to extract style features of style images
- the first processing module 1003 is configured to feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and combine the The style features are fed forward from the first layer second network unit block in the multi-layer second network unit block, and the second neural network output is obtained after each second network unit block processes the respective input features An image is generated, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
- the first processing module 1003 is configured to feed forward the content features output by the first network unit block of the i-th layer to the T-i+1th layer in response to the situation that i takes 1 to T in sequence.
- i is a positive integer
- T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
- the first-level second network unit block in each of the second network unit blocks is used to multiply the content feature from the last-level first network unit block and the style feature to obtain the Intermediate features of the first-level second network unit block; add the content features from the first-level first-level network unit block and the intermediate features of the first-level second-level network unit block to obtain the first-level second network
- the output characteristics of the unit block input the output characteristics of the first layer second network unit block into the second layer second network unit block.
- the first-layer second network unit block is also used to perform multiplication operations on the first network unit from the last layer before the content feature from the first network unit block at the last layer and the style feature are multiplied.
- the content feature of the block is subjected to convolution operation.
- the middle layer second network unit block in each of the second network unit blocks is used to multiply the input content feature and the output feature of the second network unit block of the upper layer to obtain the Intermediate features of the second network unit block of the middle layer; add the content features of the input and the intermediate features of the second network unit block of the middle layer to obtain the output features of the second network unit block of the middle layer; The output characteristics of the second network unit block of the middle layer are input to the second network unit block of the next layer.
- the middle layer second network unit block is further configured to perform multiplication on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block Convolution operation.
- the last-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the first-level first network unit block and the output characteristics of the upper-level second network unit block Perform a multiplication operation to obtain the intermediate feature of the second network unit block of the last layer; add the content feature from the first network unit block of the first layer and the intermediate feature of the last second network unit block to obtain The generated image.
- the second network unit block of the last layer is used to perform multiplication operations on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer.
- the content features from the first network unit block of the first layer are convolved.
- the second extraction module 1002 is configured to extract features of the style image distribution; sampling the features of the style image distribution to obtain the style feature, the style feature includes the style image distribution The mean and standard deviation of the features.
- the first network unit block is configured to extract content features of content images based on multiple neural network layers organized in a residual structure in the first network unit block; and/or, the second network The unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
- the first extraction module 1001, the second extraction module 1002, and the first processing module 1003 can all be implemented by processors.
- the aforementioned processors can be ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, micro At least one of a controller and a microprocessor.
- the functional modules in this embodiment can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit.
- the above-mentioned integrated unit can be realized in the form of hardware or software function module.
- the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of this embodiment is essentially or It is said that the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can A personal computer, server, or network device, etc.) or a processor (processor) executes all or part of the steps of the method described in this embodiment.
- the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
- the computer program instructions corresponding to an image generation method or neural network training method in this embodiment can be stored on storage media such as optical disks, hard disks, and USB flash drives.
- storage media such as optical disks, hard disks, and USB flash drives.
- FIG. 11 shows an electronic device 11 provided by an embodiment of the present disclosure.
- the electronic device 11 includes: a memory 111 and a processor 112; wherein, the memory 111 is used for A computer program is stored; the processor 112 is configured to execute the computer program stored in the memory to implement any image generation method or any neural network training method in the foregoing embodiments.
- the various components in the electronic device 11 may be coupled together through a bus system. It can be understood that the bus system is used to realize the connection and communication between these components.
- the bus system also includes a power bus, a control bus, and a status signal bus.
- various buses are marked as bus systems in FIG. 11.
- the aforementioned memory 111 may be a volatile memory (volatile memory), such as RAM; or a non-volatile memory (non-volatile memory), such as ROM, flash memory, or hard disk (Hard Disk). Drive, HDD) or Solid-State Drive (SSD); or a combination of the foregoing types of memories, and provide instructions and data to the processor 112.
- volatile memory volatile memory
- non-volatile memory non-volatile memory
- ROM read-only memory
- flash memory read-only memory
- HDD hard disk
- SSD Solid-State Drive
- the aforementioned processor 112 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It can be understood that, for different devices, the electronic devices used to implement the above-mentioned processor functions may also be other, which is not specifically limited in the embodiment of the present disclosure.
- FIG. 12 is a schematic diagram of the composition structure of a neural network training device according to an embodiment of the disclosure. As shown in FIG. 12, the device includes: a third extraction module 1201, a fourth extraction module 1202, a second processing module 1203, and an adjustment module 1204; among them,
- the third extraction module 1201 is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
- the fourth extraction module 1202 is used to extract style features of the style image
- the second processing module 1203 is configured to feed forward the content characteristics respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and combine the The style feature is fed forward from the first layer second network unit block in the multi-layer second network unit block, and the output of the second neural network is obtained after each second network unit block processes the respective input features Generate an image; identify the generated image to obtain an authentication result; wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block;
- the adjustment module 1204 is configured to adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
- the second processing module 1203 is configured to feed forward the content features output by the first network unit block of the i-th layer to the T-i+1th layer in response to the situation where i takes 1 to T in sequence.
- i is a positive integer
- T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
- the first-level second network unit block in each of the second network unit blocks is used to multiply the content feature from the last-level first network unit block and the style feature to obtain the Intermediate features of the first-level second network unit block; add the content features from the first-level first-level network unit block and the intermediate features of the first-level second-level network unit block to obtain the first-level second network
- the output characteristics of the unit block input the output characteristics of the first layer second network unit block into the second layer second network unit block.
- the first-layer second network unit block is also used to perform multiplication operations on the first network unit from the last layer before the content feature from the first network unit block at the last layer and the style feature are multiplied.
- the content feature of the block is subjected to convolution operation.
- the middle layer second network unit block in each of the second network unit blocks is used to multiply the input content feature and the output feature of the second network unit block of the upper layer to obtain the Intermediate features of the second network unit block of the middle layer; add the content features of the input and the intermediate features of the second network unit block of the middle layer to obtain the output features of the second network unit block of the middle layer; The output characteristics of the second network unit block of the middle layer are input to the second network unit block of the next layer.
- the middle layer second network unit block is further configured to perform multiplication on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block Convolution operation.
- the last-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the first-level first network unit block and the output characteristics of the upper-level second network unit block Perform a multiplication operation to obtain the intermediate feature of the second network unit block of the last layer; add the content feature from the first network unit block of the first layer and the intermediate feature of the last second network unit block to obtain The generated image.
- the last-level second network unit block is also used to perform multiplication operations on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer.
- the content feature from the first network unit block of the first layer is subjected to a convolution operation.
- the adjustment module 1204 is configured to adjust the multiplication operation parameter and/or the addition operation parameter.
- the adjustment module 1204 is configured to determine, according to the content image, the style image, the generated image, and the identification result, to generate a countermeasure network loss; in response to the generation countermeasure network loss that does not meet the first Under a predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the loss of the generated confrontation network; wherein the loss of the generated confrontation network is used to characterize the difference between the generated image and The content feature difference of the content image, and the style feature difference between the generated image and the style image.
- the adjustment module 1204 is further configured to determine a style loss according to the generated image and the style image; in response to the situation that the style loss does not meet a second predetermined condition, adjust according to the style loss The network parameters of the first neural network and/or the second neural network; wherein the style loss is used to characterize the difference between the style characteristics of the generated image and the style image.
- the adjustment module 1204 is further configured to determine the content loss according to the generated image and the content image; in response to the content loss not satisfying a third predetermined condition, adjust the content loss according to the content loss The network parameters of the first neural network and/or the second neural network; wherein the content loss is used to characterize the content feature difference between the generated image and the content image.
- the adjustment module 1204 is further configured to determine the feature matching loss according to the output feature of each intermediate layer second network unit block in each second network unit block and the style image; in response to the feature If the matching loss does not meet the fourth predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the feature matching loss; wherein, the feature matching loss is used to characterize the The difference between the output feature of the second network unit block of each middle layer and the style feature of the style image.
- the fourth extraction module 1202 is configured to extract features of the style image distribution; sampling the features of the style image distribution to obtain the style feature, the style feature includes the style image distribution The mean and standard deviation of the features.
- the first network unit block is configured to extract content features of content images based on multiple neural network layers organized in a residual structure in the first network unit block; and/or, the second network The unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
- the third extraction module 1201, the fourth extraction module 1202, the second processing module 1203, and the adjustment module 1204 can all be implemented by a processor, and the processor can be ASIC, DSP, DSPD, PLD, FPGA, CPU , At least one of a controller, a microcontroller, and a microprocessor.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the embodiment of the present disclosure further provides a computer storage medium, such as the memory 111 including a computer program, which can be executed by the processor 112 of the electronic device 11 to complete the steps described in the foregoing method.
- the computer-readable storage medium can be FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM, etc.; it can also be a variety of devices including one or any combination of the foregoing memories, such as Mobile phones, computers, tablet devices, personal digital assistants, etc.
- the embodiments of the present disclosure provide a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, any image generation method or any neural network training method in the foregoing embodiments is implemented.
- the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present invention.
- a terminal which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.
Abstract
Description
Claims (52)
- 一种图像生成方法,所述方法包括:An image generation method, the method includes:利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;提取风格图像的风格特征;Extract style features of style images;将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应。The content features respectively output by the first network unit blocks of each layer correspond to the feedforward input to the sequentially connected multi-layer second network unit blocks in the second neural network, and the style features are removed from the multi-layer second network The first layer of the second network unit block in the unit block feeds forward input, and the generated image output by the second neural network is obtained after each of the second network unit blocks processes the respective input characteristics, wherein the multi-layer first A network unit block corresponds to the multi-layer second network unit block.
- 根据权利要求1所述的方法,其中,所述将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块,包括:The method according to claim 1, wherein the content features respectively output from the first network unit blocks of each layer correspond to the feedforward input into the second neural network of the sequentially connected multi-layer second network unit blocks, comprising :响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数。In response to the situation where i takes 1 to T in turn, feed forward the content features output by the first network unit block of the i-th layer into the second network unit block of the T-i+1th layer, where i is a positive integer, and T represents all The number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
- 根据权利要求1或2所述的方法,其中,各所述第二网络单元块中的首层第二网络单元块对输入的特征处理,包括:The method according to claim 1 or 2, wherein the feature processing of the first-level second network unit block in each of the second network unit blocks on the input includes:将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将所述来自末层第一网络单元块的内容特征与所述首层第二网络单元块的中间特征进行加法运算,得到所述首层第二网络单元块的输出特征;将所述首层第二网络单元块的输出特征作输入第二层第二网络单元块。Multiply the content feature from the first network unit block of the last layer and the style feature to obtain the intermediate feature of the second network unit block of the first layer; combine the content feature from the first network unit block of the last layer with The intermediate features of the first-level second network unit block are added to obtain the output features of the first-level second network unit block; the output features of the first-level second network unit block are input into the second-level second Network unit block.
- 根据权利要求3所述的方法,其中,所述方法还包括:在将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算前,对所述来自末层第一网络单元块的内容特征进行卷积运算。The method according to claim 3, wherein the method further comprises: before multiplying the content feature from the first network unit block of the last layer and the style feature, performing the multiplication operation on the first network unit from the last layer The content feature of the block is subjected to convolution operation.
- 根据权利要求1-4任一项所述的方法,其中,各所述第二网络单元块中的中间层第二网络单元块对输入的特征处理,包括:The method according to any one of claims 1 to 4, wherein the processing of input characteristics by the second network unit block of the middle layer in each of the second network unit blocks comprises:对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述中间层第二网络单元块的中间特征;将所述输入的内容特征与所述中间层第二网络单元块的中间特征进行加法运算,得到所述中间层第二网络单元块的输出特征;将所述中间层第二网络单元块的输出特征输入下一层第二网络单元块的输入。Multiply the input content feature and the output feature of the upper layer second network unit block to obtain the intermediate feature of the middle layer second network unit block; compare the input content feature with the middle layer second network The intermediate characteristics of the unit blocks are added to obtain the output characteristics of the second network unit block of the intermediate layer; the output characteristics of the second network unit block of the intermediate layer are input into the input of the second network unit block of the next layer.
- 根据权利要求5所述的方法,其中,所述方法还包括:在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运算。The method according to claim 5, wherein the method further comprises: performing multiplication on the received content feature before multiplying the input content feature and the output feature of the second network unit block of the upper layer Convolution operation.
- 根据权利要求1-6任一项所述的方法,其中,各所述第二网络单元块中的末层第二网络单元块对输入的特征处理,包括:The method according to any one of claims 1 to 6, wherein the characteristic processing of the input by the last-layer second network unit block in each of the second network unit blocks comprises:将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述末层第二网络单元块的中间特征;将所述来自首层第一网络单元块的内容特征与所述末层第二网络单元块的中间特征进行加法运算,得到所述生成图像。Multiply the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer to obtain the intermediate feature of the second network unit block of the last layer; The content feature of the network unit block and the intermediate feature of the second network unit block of the last layer are added to obtain the generated image.
- 根据权利要求7所述的方法,其中,所述方法还包括:在对所述来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。The method according to claim 7, wherein the method further comprises: before multiplying the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the previous layer, The content feature from the first network unit block of the first layer is subjected to a convolution operation.
- 根据权利要求1至8任一项所述的方法,其中,所述提取所述风格图像的风格特征,包括:提取所述风格图像分布的特征;8. The method according to any one of claims 1 to 8, wherein said extracting the style features of the style image comprises: extracting the features of the style image distribution;对所述风格图像分布的特征进行采样,得到所述风格特征,所述风格特征包括所述风格图像分布的特征的均值和标准差。The characteristics of the style image distribution are sampled to obtain the style characteristics, and the style characteristics include the mean value and the standard deviation of the characteristics of the style image distribution.
- 根据权利要求1至9任一项所述的方法,其中,所述第一网络单元块提取内容图像的内容特征,包括:基于所述第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征;和/或,The method according to any one of claims 1 to 9, wherein the first network unit block extracting content features of the content image comprises: based on a plurality of nerves organized in a residual structure in the first network unit block The network layer extracts the content features of the content image; and/or,经所述第二网络单元块对其输入的特征进行处理,包括:基于所述第二网络单元块中以残差结构组织的多个神经网络层对输入到所述第二网络单元的特征进行处理。Processing the input features of the second network unit block includes: processing the features input to the second network unit based on a plurality of neural network layers organized in a residual structure in the second network unit block deal with.
- 一种神经网络训练方法,所述方法还包括:A neural network training method, the method further includes:利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;提取风格图像的风格特征;Extract style features of style images;将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应;The content features respectively output by the first network unit blocks of each layer correspond to the feedforward input to the sequentially connected multi-layer second network unit blocks in the second neural network, and the style features are removed from the multi-layer second network The first layer of the second network unit block in the unit block feeds forward input, and the generated image output by the second neural network is obtained after each of the second network unit blocks processes the respective input characteristics, wherein the multi-layer first A network unit block corresponds to the multi-layer second network unit block;对所述生成图像进行鉴别,得出鉴别结果;Discriminate the generated image to obtain an identification result;根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数。Adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
- 根据权利要求11所述的方法,其中,所述将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块包括:The method according to claim 11, wherein said inputting the content features respectively outputted by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network corresponding to the feedforward input comprises:响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数。In response to the situation where i takes 1 to T in turn, feed forward the content features output by the first network unit block of the i-th layer into the second network unit block of the T-i+1th layer, where i is a positive integer, and T represents all The number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
- 根据权利要求11或12所述的方法,其中,各所述第二网络单元块中的首层第二网络单元块对输入的特征处理,包括:The method according to claim 11 or 12, wherein the feature processing of the first-level second network unit block in each of the second network unit blocks on the input includes:将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将所述来自末层第一网络单元块的内容特征与所述首层第二网络单元块的中间特征进行加法运算,得到所述首层第二网络单元块的输出特征;将所述首层第二网络单元块的输出特征输入第二层第二网络单元块。Multiply the content feature from the first network unit block of the last layer and the style feature to obtain the intermediate feature of the second network unit block of the first layer; combine the content feature from the first network unit block of the last layer with Add the intermediate features of the first-level second network unit block to obtain the output feature of the first-level second network unit block; input the output feature of the first-level second network unit block into the second-level second network Unit block.
- 根据权利要求13所述的方法,其中,所述方法还包括:The method according to claim 13, wherein the method further comprises:在将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算前,对所述来自末层第一网络单元块的内容特征进行卷积运算。Before multiplying the content feature from the first network unit block of the last layer with the style feature, perform a convolution operation on the content feature from the first network unit block of the last layer.
- 根据权利要求11至14任一项所述的方法,其中,各所述第二网络单元块中的中间层第二网络单元块对输入的特征处理,包括:The method according to any one of claims 11 to 14, wherein the processing of input characteristics by the second network unit block of the middle layer in each of the second network unit blocks comprises:对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述中间层第二网络单元块的中间特征;将所述输入的内容特征与所述中间层第二网络单元块的中间特征进行加法运算,得到所述中间层第二网络单元块的输出特征;将所述中间层第二网络单元块的输出特征输入下一层第二网络单元块。Multiply the input content feature and the output feature of the upper layer second network unit block to obtain the intermediate feature of the middle layer second network unit block; compare the input content feature with the middle layer second network The intermediate characteristics of the unit blocks are added to obtain the output characteristics of the second network unit block of the intermediate layer; the output characteristics of the second network unit block of the intermediate layer are input to the second network unit block of the next layer.
- 根据权利要求15所述的方法,其中,所述方法还包括:在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运算。15. The method according to claim 15, wherein the method further comprises: performing multiplication on the received content feature before multiplying the input content feature with the output feature of the second network unit block of the upper layer Convolution operation.
- 根据权利要求11至16任一项所述的方法,其中,各所述第二网络单元块中的末层第二网络单元块对输入的特征处理,包括:The method according to any one of claims 11 to 16, wherein the characteristic processing of the input by the last second network unit block in each of the second network unit blocks comprises:将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述末层第二网络单元块的中间特征;将所述来自首层第一网络单元块的内容特征与所述末层第二网络单元块的中间特征进行加法运算,得到所述生成图像。Multiply the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer to obtain the intermediate feature of the second network unit block of the last layer; The content feature of the network unit block and the intermediate feature of the second network unit block of the last layer are added to obtain the generated image.
- 根据权利要求17所述的方法,其中,所述方法还包括:在对所述来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。The method according to claim 17, wherein the method further comprises: before multiplying the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the previous layer, The content feature from the first network unit block of the first layer is subjected to a convolution operation.
- 根据权利要求13至18任一项所述的方法,其中,调整所述第二神经网络的网络参数,包括:调整所述乘法运算参数和/或加法运算参数。The method according to any one of claims 13 to 18, wherein adjusting the network parameters of the second neural network comprises: adjusting the multiplication operation parameters and/or addition operation parameters.
- 根据权利要求11至19任一项所述的方法,其中,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,包括:The method according to any one of claims 11 to 19, wherein the first neural network and/or the first neural network is adjusted according to the content image, the style image, the generated image, and the identification result. The network parameters of the second neural network include:根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,确定生成对抗网络损失;According to the content image, the style image, the generated image, and the identification result, determine to generate a counter-network loss;响应于所述生成对抗网络损失不满足第一预定条件的情况,根据所述生成对抗网络损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述生成对抗网络损失用于表征所述生成图像与所述内容图像的内容特征差异、以及所述生成图像与所述风格图像的风格特征差异。In response to the situation that the loss of the generative confrontation network does not meet the first predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the loss of the generative confrontation network; wherein, the generation The anti-network loss is used to characterize the content feature difference between the generated image and the content image, and the style feature difference between the generated image and the style image.
- 根据权利要求20所述的方法,其中,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,还包括:根据所述生成图像与所述风格图像,确定风格损失;22. The method of claim 20, wherein the first neural network and/or the second neural network are adjusted according to the content image, the style image, the generated image, and the identification result The network parameters further include: determining a style loss according to the generated image and the style image;响应于所述风格损失不满足第二预定条件的情况,根据所述风格损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述风格损失用于表征所述生成图像与所述风格图像的风格特征的差异。In response to the situation that the style loss does not meet the second predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the style loss; wherein the style loss is used for characterization The difference between the style characteristics of the generated image and the style image.
- 根据权利要求20或21所述的方法,其中,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,还包括:The method according to claim 20 or 21, wherein the first neural network and/or the second neural network are adjusted according to the content image, the style image, the generated image, and the identification result. The network parameters of the neural network also include:根据所述生成图像与所述内容图像,确定内容损失;Determine content loss according to the generated image and the content image;响应于所述内容损失不满足第三预定条件的情况,根据所述内容损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述内容损失用于表征所述生成图像与所述内容图像的内容特征差异。In response to the situation that the content loss does not meet the third predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the content loss; wherein the content loss is used to characterize The content feature difference between the generated image and the content image.
- 根据权利要求20-22任一项所述的方法,其中,所述根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数,还包括:The method according to any one of claims 20-22, wherein the first neural network and/or the first neural network is adjusted according to the content image, the style image, the generated image, and the identification result. The network parameters of the second neural network also include:根据各所述第二网络单元块中的各中间层第二网络单元块的输出特征、以及风格图像,确定特征匹配损失;Determine the feature matching loss according to the output feature of each intermediate layer second network unit block in each second network unit block and the style image;响应于所述特征匹配损失不满足第四预定条件的情况,根据所述特征匹配损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述特征匹配损失用于表征所述各中间层第二网络单元块的输出特征与所述风格图像的风格特征的差异。In response to the case that the feature matching loss does not meet the fourth predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the feature matching loss; wherein, the feature matching loss It is used to characterize the difference between the output feature of the second network unit block of each intermediate layer and the style feature of the style image.
- 根据权利要求11至23任一项所述的方法,其中,所述提取所述风格图像的风格特征,包括:提取所述风格图像分布的特征;The method according to any one of claims 11 to 23, wherein the extracting the style features of the style image comprises: extracting the features of the style image distribution;对所述风格图像分布的特征进行采样,得到所述风格特征,所述风格特征包括所述风格图像分布的特征的均值和标准差。The characteristics of the style image distribution are sampled to obtain the style characteristics, and the style characteristics include the mean value and the standard deviation of the characteristics of the style image distribution.
- 根据权利要求11至24任一项所述的方法,其中,所述第一网络单元块提取内容图像的内容特征,包括:基于所述第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征;和/或,The method according to any one of claims 11 to 24, wherein the extraction of the content feature of the content image by the first network unit block comprises: based on a plurality of nerves organized in a residual structure in the first network unit block The network layer extracts the content features of the content image; and/or,经所述第二网络单元块对其输入的特征进行处理,包括:基于所述第二网络单元块中以残差结构组织的多个神经网络层对输入到所述第二网络单元的特征进行处理。Processing the input features of the second network unit block includes: processing the features input to the second network unit based on a plurality of neural network layers organized in a residual structure in the second network unit block deal with.
- 一种图像生成装置,所述装置包括第一提取模块、第二提取模块和第一处理模块,其中,An image generation device, the device includes a first extraction module, a second extraction module, and a first processing module, wherein:第一提取模块,用于利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;The first extraction module is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;第二提取模块,用于提取风格图像的风格特征;The second extraction module is used to extract the style features of the style image;第一处理模块,用于将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像,其中,所述多层第一网络单元块与所述多层第二网络单元块对应。The first processing module is configured to feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and change the style features from Feed forward input of the first layer second network unit block in the multi-layer second network unit block, and obtain the generated image output by the second neural network after each of the second network unit blocks process the characteristics of the respective input, Wherein, the multi-layer first network unit block corresponds to the multi-layer second network unit block.
- 根据权利要求26所述的装置,其中,所述第一处理模块,用于响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数。The apparatus according to claim 26, wherein the first processing module is configured to feed forward the content feature output by the first network unit block of the i-th layer to the T-th in response to the situation that i takes 1 to T in sequence. -i+1 layer of the second network unit block, i is a positive integer, and T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
- 根据权利要求26或27所述的装置,其中,所述各所述第二网络单元块中的首层第二网络单元块,用于将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将所述来自末层第一网络单元块的内容特征与所述首层第二网络单元块的中间特征进行加法运算,得到所述首层第二网络单元块的输出特征;将所述首层第二网络单元块的输出特征输入第二层第二网络单元块。The device according to claim 26 or 27, wherein the first-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the last-level first network unit block with the The style feature is multiplied to obtain the intermediate feature of the first-level second network unit block; the content feature from the last-level first network unit block is added to the intermediate feature of the first-level second network unit block , Obtain the output characteristics of the first layer second network unit block; input the output characteristics of the first layer second network unit block into the second layer second network unit block.
- 根据权利要求28所述的装置,其中,所述首层第二网络单元块,还用于在将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算前,对所述来自末层第一网络单元块的内容特征进行卷积运算。The device according to claim 28, wherein the first-level second network unit block is further configured to perform multiplication on the content feature from the last-level first network unit block and the style feature. The content features of the first network unit block from the last layer are subjected to convolution operation.
- 根据权利要求26至29任一项所述的装置,其中,所述各所述第二网络单元块中的中间层第二网络单元块,用于对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述中间层第二网络单元块的中间特征;将所述输入的内容特征与所述中间层第二网络单元块的中间特征进行加法运算,得到所述中间层第二网络单元块的输出特征;将所述中间层第二网络单元块的输出特征输入下一层第二网络单元块。The device according to any one of claims 26 to 29, wherein the middle layer second network unit block in each of the second network unit blocks is used to compare the input content characteristics and the upper layer second network The output feature of the unit block is multiplied to obtain the intermediate feature of the second network unit block of the intermediate layer; the content feature of the input and the intermediate feature of the second network unit block of the intermediate layer are added to obtain the Output characteristics of the second network unit block of the middle layer; input the output characteristics of the second network unit block of the middle layer into the second network unit block of the next layer.
- 根据权利要求30所述的装置,其中,所述中间层第二网络单元块,还用于在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运 算。The device according to claim 30, wherein the middle-level second network unit block is further configured to perform multiplication operations on the content feature of the input and the output feature of the upper-level second network unit block. The received content feature performs a convolution operation.
- 根据权利要求26至31任一项所述的装置,其中,所述各所述第二网络单元块中的末层第二网络单元块,用于将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述末层第二网络单元块的中间特征;将所述来自首层第一网络单元块的内容特征与所述末层第二网络单元块的中间特征进行加法运算,得到所述生成图像。The apparatus according to any one of claims 26 to 31, wherein the last-layer second network unit block in each of the second network unit blocks is used to combine content characteristics from the first-layer first network unit block Multiply the output feature of the second network unit block of the upper layer to obtain the intermediate feature of the second network unit block of the last layer; combine the content feature from the first network unit block of the first layer with the first network unit block of the last layer. The intermediate features of the two network unit blocks are added to obtain the generated image.
- 根据权利要求32所述的装置,其中,所述末层第二网络单元块,用于在对所述来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。The apparatus according to claim 32, wherein the second network unit block of the last layer is used to compare the content characteristics of the first network unit block from the first layer and the output characteristics of the second network unit block of the upper layer. Before performing the multiplication operation, perform a convolution operation on the content feature from the first network unit block of the first layer.
- 根据权利要求26至33任一项所述的装置,其中,所述第二提取模块,用于提取所述风格图像分布的特征;对所述风格图像分布的特征进行采样,得到所述风格特征,所述风格特征包括所述风格图像分布的特征的均值和标准差。The apparatus according to any one of claims 26 to 33, wherein the second extraction module is configured to extract the characteristics of the style image distribution; sampling the characteristics of the style image distribution to obtain the style characteristics The style feature includes the mean value and standard deviation of the feature of the style image distribution.
- 根据权利要求26至34任一项所述的装置,其中,所述第一网络单元块,用于基于所述第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征;和/或,The apparatus according to any one of claims 26 to 34, wherein the first network unit block is configured to extract content images based on multiple neural network layers organized in a residual structure in the first network unit block Content characteristics; and/or,所述第二网络单元块,用于基于所述第二网络单元块中以残差结构组织的多个神经网络层对输入到所述第二网络单元的特征进行处理。The second network unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
- 一种神经网络训练装置,所述装置包括第三提取模块、第四提取模块、第二处理模块和调整模块;其中,A neural network training device, which includes a third extraction module, a fourth extraction module, a second processing module, and an adjustment module; wherein,第三提取模块,用于利用第一神经网络中顺次连接的多层第一网络单元块提取内容图像的内容特征,得到各层第一网络单元块分别输出的内容特征;The third extraction module is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;第四提取模块,用于提取风格图像的风格特征;The fourth extraction module is used to extract the style features of the style image;第二处理模块,用于将所述各层第一网络单元块分别输出的内容特征对应前馈输入第二神经网络中顺次连接的多层第二网络单元块、并将所述风格特征从所述多层第二网络单元块中的首层第二网络单元块前馈输入,经各所述第二网络单元块对各自输入的特征处理后得到所述第二神经网络输出的生成图像;对所述生成图像进行鉴别,得出鉴别结果;其中,所述多层第一网络单元块与所述多层第二网络单元块对应;The second processing module is configured to feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and change the style features from Feed forward input of the first layer of the second network unit block in the multi-layer second network unit block, and obtain the generated image output by the second neural network after each of the second network unit blocks processes the characteristics of the respective input; Authenticating the generated image to obtain an authentication result; wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block;调整模块,用于根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,调整所述第一神经网络和/或所述第二神经网络的网络参数。The adjustment module is configured to adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
- 根据权利要求36所述的装置,其中,所述第二处理模块,用于响应于i依次取1至T的情况,将第i层第一网络单元块输出的内容特征前馈输入至第T-i+1层第二网络单元块中,i为正整数,T表示所述第一神经网络的第一网络单元块和所述第二神经网络的第二网络单元块的层数。The device according to claim 36, wherein the second processing module is configured to feed forward the content feature output by the first network unit block of the i-th layer to the T-th in response to the situation that i takes 1 to T in sequence. -i+1 layer of the second network unit block, i is a positive integer, and T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
- 根据权利要求36或37所述的装置,其中,所述各所述第二网络单元块中的首层第二网络单元块,用于将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算,得到所述首层第二网络单元块的中间特征;将所述来自末层第一网络单元块的内容特征与所述首层第二网络单元块的中间特征进行加法运算,得到所述首层第二网络单元块的输出特征;将所述首层第二网络单元块的输出特征输入第二层第二网络单元块。The apparatus according to claim 36 or 37, wherein the first-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the last-level first network unit block with the The style feature is multiplied to obtain the intermediate feature of the first-level second network unit block; the content feature from the last-level first network unit block is added to the intermediate feature of the first-level second network unit block , Obtain the output characteristics of the first layer second network unit block; input the output characteristics of the first layer second network unit block into the second layer second network unit block.
- 根据权利要求38所述的装置,其中,所述首层第二网络单元块,还用于在将来自末层第一网络单元块的内容特征和所述风格特征进行乘法运算前,对所述来自末层第一网络单元块的内容特征进行卷积运算。The apparatus according to claim 38, wherein the first-level second network unit block is further configured to perform multiplication on the content feature from the last-level first network unit block and the style feature. The content features of the first network unit block from the last layer are subjected to convolution operation.
- 根据权利要求36至39任一项所述的装置,其中,所述各所述第二网络单元块中的中间层第二网络单元块,用于对输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述中间层第二网络单元块的中间特征;将所述输入的内容特征与所述中间层第二网络单元块的中间特征进行加法运算,得到所述中间层第二网络单元块的输出特征;将所述中间层第二网络单元块的输出特征输入下一层第二网络单元块。The device according to any one of claims 36 to 39, wherein the middle layer second network unit block in each of the second network unit blocks is used to compare the input content characteristics and the upper layer second network The output feature of the unit block is multiplied to obtain the intermediate feature of the second network unit block of the intermediate layer; the content feature of the input and the intermediate feature of the second network unit block of the intermediate layer are added to obtain the Output characteristics of the second network unit block of the middle layer; input the output characteristics of the second network unit block of the middle layer into the second network unit block of the next layer.
- 根据权利要求40所述的装置,其中,所述中间层第二网络单元块,还用于在对所述输入的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述接收的内容特征进行卷积运算。The device according to claim 40, wherein the second network unit block of the middle layer is further configured to perform multiplication operations on the content characteristics of the input and the output characteristics of the second network unit block of the upper layer. The received content feature performs a convolution operation.
- 根据权利要求36至41任一项所述的装置,其中,所述各所述第二网络单元块中的末层第二网络单元块,用于将来自首层第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算,得到所述末层第二网络单元块的中间特征;将所述来自首层第一网络单元块的内容特征与所述末层第二网络单元块的中间特征进行加法运算,得到所述生成图像。The device according to any one of claims 36 to 41, wherein the last-layer second network unit block in each of the second network unit blocks is used to combine content characteristics from the first-layer first network unit block Multiply the output feature of the second network unit block of the upper layer to obtain the intermediate feature of the second network unit block of the last layer; combine the content feature from the first network unit block of the first layer with the first network unit block of the last layer. The intermediate features of the two network unit blocks are added to obtain the generated image.
- 根据权利要求42所述的装置,其中,所述末层第二网络单元块,还用于在对所述来自首层 第一网络单元块的内容特征和上一层第二网络单元块的输出特征进行乘法运算前,对所述来自首层第一网络单元块的内容特征进行卷积运算。The device according to claim 42, wherein the last-level second network unit block is further used to compare the content characteristics from the first-level first network unit block and the output of the upper-level second network unit block. Before the feature is multiplied, convolution is performed on the content feature from the first network unit block of the first layer.
- 根据权利要求38至43任一项所述的装置,其中,所述调整模块,用于调整所述乘法运算参数和/或加法运算参数。The device according to any one of claims 38 to 43, wherein the adjustment module is configured to adjust the multiplication operation parameter and/or the addition operation parameter.
- 根据权利要求36至44任一项所述的装置,其中,所述调整模块,用于根据所述内容图像、所述风格图像、所述生成图像和所述鉴别结果,确定生成对抗网络损失;响应于所述生成对抗网络损失不满足第一预定条件的情况,根据所述生成对抗网络损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述生成对抗网络损失用于表征所述生成图像与所述内容图像的内容特征差异、以及所述生成图像与所述风格图像的风格特征差异。The device according to any one of claims 36 to 44, wherein the adjustment module is configured to determine to generate a counter network loss according to the content image, the style image, the generated image, and the identification result; In response to the situation that the loss of the generative confrontation network does not meet the first predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the loss of the generative confrontation network; wherein, the generation The anti-network loss is used to characterize the difference in content characteristics between the generated image and the content image, and the difference in style characteristics between the generated image and the style image.
- 根据权利要求45所述的装置,其中,所述调整模块,还用于根据所述生成图像与所述风格图像,确定风格损失;响应于所述风格损失不满足第二预定条件的情况,根据所述风格损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述风格损失用于表征所述生成图像与所述风格图像的风格特征的差异。The device according to claim 45, wherein the adjustment module is further configured to determine a style loss according to the generated image and the style image; in response to the situation that the style loss does not meet a second predetermined condition, according to The style loss adjusts the network parameters of the first neural network and/or the second neural network; wherein the style loss is used to characterize the difference between the style characteristics of the generated image and the style image.
- 根据权利要求45或46所述的装置,其中,所述调整模块,还用于根据所述生成图像与所述内容图像,确定内容损失;响应于所述内容损失不满足第三预定条件的情况,根据所述内容损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述内容损失用于表征所述生成图像与所述内容图像的内容特征差异。The device according to claim 45 or 46, wherein the adjustment module is further configured to determine a content loss based on the generated image and the content image; in response to a situation where the content loss does not meet a third predetermined condition Adjust the network parameters of the first neural network and/or the second neural network according to the content loss; wherein the content loss is used to characterize the difference in content characteristics between the generated image and the content image.
- 根据权利要求45至47任一项所述的装置,其中,所述调整模块,还用于根据各所述第二网络单元块中的各中间层第二网络单元块的输出特征、以及风格图像,确定特征匹配损失;The apparatus according to any one of claims 45 to 47, wherein the adjustment module is further configured to perform according to the output characteristics of each intermediate layer second network unit block in each of the second network unit blocks and the style image , Determine the feature matching loss;响应于所述特征匹配损失不满足第四预定条件的情况,根据所述特征匹配损失,调整所述第一神经网络和/或所述第二神经网络的网络参数;其中,所述特征匹配损失用于表征所述各中间层第二网络单元块的输出特征与所述风格图像的风格特征的差异。In response to the case that the feature matching loss does not meet the fourth predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the feature matching loss; wherein, the feature matching loss It is used to characterize the difference between the output feature of the second network unit block of each intermediate layer and the style feature of the style image.
- 根据权利要求36至48任一项所述的装置,其中,所述第四提取模块,用于提取所述风格图像分布的特征;对所述风格图像分布的特征进行采样,得到所述风格特征,所述风格特征包括所述风格图像分布的特征的均值和标准差。The apparatus according to any one of claims 36 to 48, wherein the fourth extraction module is configured to extract the characteristics of the style image distribution; sampling the characteristics of the style image distribution to obtain the style characteristics The style feature includes the mean value and standard deviation of the feature of the style image distribution.
- 根据权利要求36至49任一项所述的装置,其中,所述第一网络单元块,用于基于所述第一网络单元块中以残差结构组织的多个神经网络层提取内容图像的内容特征;和/或,The device according to any one of claims 36 to 49, wherein the first network unit block is configured to extract content images based on multiple neural network layers organized in a residual structure in the first network unit block Content characteristics; and/or,所述第二网络单元块,用于基于所述第二网络单元块中以残差结构组织的多个神经网络层对输入到所述第二网络单元的特征进行处理。The second network unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
- 一种电子设备,包括处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,An electronic device including a processor and a memory for storing a computer program that can run on the processor; wherein,所述处理器用于运行所述计算机程序时,执行权利要求1至10任一项所述的图像生成方法或权利要求11至25任一项所述的神经网络训练方法。When the processor is used to run the computer program, it executes the image generation method according to any one of claims 1 to 10 or the neural network training method according to any one of claims 11 to 25.
- 一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至10任一项所述的图像生成方法或权利要求11至25任一项所述的神经网络训练方法。A computer storage medium on which a computer program is stored, which when executed by a processor realizes the image generation method of any one of claims 1 to 10 or the neural network of any one of claims 11 to 25 Training method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020217017354A KR20210088656A (en) | 2019-06-24 | 2020-02-26 | Methods, devices, devices and media for image generation and neural network training |
JP2021532473A JP2022512340A (en) | 2019-06-24 | 2020-02-26 | Image generation and neural network training methods, devices, equipment and media |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910551145.3 | 2019-06-24 | ||
CN201910551145.3A CN112132167B (en) | 2019-06-24 | 2019-06-24 | Image generation and neural network training method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020258902A1 true WO2020258902A1 (en) | 2020-12-30 |
Family
ID=73850015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/076835 WO2020258902A1 (en) | 2019-06-24 | 2020-02-26 | Image generating and neural network training method, apparatus, device, and medium |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP2022512340A (en) |
KR (1) | KR20210088656A (en) |
CN (1) | CN112132167B (en) |
WO (1) | WO2020258902A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112733946A (en) * | 2021-01-14 | 2021-04-30 | 北京市商汤科技开发有限公司 | Training sample generation method and device, electronic equipment and storage medium |
CN113255813A (en) * | 2021-06-02 | 2021-08-13 | 北京理工大学 | Multi-style image generation method based on feature fusion |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20230137732A (en) * | 2022-03-22 | 2023-10-05 | 삼성전자주식회사 | Method and electronic device generating user-preffered content |
KR102490503B1 (en) | 2022-07-12 | 2023-01-19 | 프로메디우스 주식회사 | Method and apparatus for processing image using cycle generative adversarial network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180068463A1 (en) * | 2016-09-02 | 2018-03-08 | Artomatix Ltd. | Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures |
CN108205803A (en) * | 2017-07-19 | 2018-06-26 | 北京市商汤科技开发有限公司 | Image processing method, the training method of neural network model and device |
CN108205813A (en) * | 2016-12-16 | 2018-06-26 | 微软技术许可有限责任公司 | Image stylization based on learning network |
CN109766895A (en) * | 2019-01-03 | 2019-05-17 | 京东方科技集团股份有限公司 | The training method and image Style Transfer method of convolutional neural networks for image Style Transfer |
CN109840924A (en) * | 2018-12-28 | 2019-06-04 | 浙江工业大学 | A kind of product image rapid generation based on series connection confrontation network |
CN109919829A (en) * | 2019-01-17 | 2019-06-21 | 北京达佳互联信息技术有限公司 | Image Style Transfer method, apparatus and computer readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018132855A (en) * | 2017-02-14 | 2018-08-23 | 国立大学法人電気通信大学 | Image style conversion apparatus, image style conversion method and image style conversion program |
GB201800811D0 (en) * | 2018-01-18 | 2018-03-07 | Univ Oxford Innovation Ltd | Localising a vehicle |
CN109919828B (en) * | 2019-01-16 | 2023-01-06 | 中德(珠海)人工智能研究院有限公司 | Method for judging difference between 3D models |
-
2019
- 2019-06-24 CN CN201910551145.3A patent/CN112132167B/en active Active
-
2020
- 2020-02-26 KR KR1020217017354A patent/KR20210088656A/en active Search and Examination
- 2020-02-26 JP JP2021532473A patent/JP2022512340A/en active Pending
- 2020-02-26 WO PCT/CN2020/076835 patent/WO2020258902A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180068463A1 (en) * | 2016-09-02 | 2018-03-08 | Artomatix Ltd. | Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures |
CN108205813A (en) * | 2016-12-16 | 2018-06-26 | 微软技术许可有限责任公司 | Image stylization based on learning network |
CN108205803A (en) * | 2017-07-19 | 2018-06-26 | 北京市商汤科技开发有限公司 | Image processing method, the training method of neural network model and device |
CN109840924A (en) * | 2018-12-28 | 2019-06-04 | 浙江工业大学 | A kind of product image rapid generation based on series connection confrontation network |
CN109766895A (en) * | 2019-01-03 | 2019-05-17 | 京东方科技集团股份有限公司 | The training method and image Style Transfer method of convolutional neural networks for image Style Transfer |
CN109919829A (en) * | 2019-01-17 | 2019-06-21 | 北京达佳互联信息技术有限公司 | Image Style Transfer method, apparatus and computer readable storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112733946A (en) * | 2021-01-14 | 2021-04-30 | 北京市商汤科技开发有限公司 | Training sample generation method and device, electronic equipment and storage medium |
CN112733946B (en) * | 2021-01-14 | 2023-09-19 | 北京市商汤科技开发有限公司 | Training sample generation method and device, electronic equipment and storage medium |
CN113255813A (en) * | 2021-06-02 | 2021-08-13 | 北京理工大学 | Multi-style image generation method based on feature fusion |
CN113255813B (en) * | 2021-06-02 | 2022-12-02 | 北京理工大学 | Multi-style image generation method based on feature fusion |
Also Published As
Publication number | Publication date |
---|---|
JP2022512340A (en) | 2022-02-03 |
CN112132167A (en) | 2020-12-25 |
CN112132167B (en) | 2024-04-16 |
KR20210088656A (en) | 2021-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020258902A1 (en) | Image generating and neural network training method, apparatus, device, and medium | |
CN109241880B (en) | Image processing method, image processing apparatus, computer-readable storage medium | |
WO2019100723A1 (en) | Method and device for training multi-label classification model | |
CN106415594B (en) | Method and system for face verification | |
CN112446476A (en) | Neural network model compression method, device, storage medium and chip | |
CN110929622A (en) | Video classification method, model training method, device, equipment and storage medium | |
CN110543841A (en) | Pedestrian re-identification method, system, electronic device and medium | |
CN109508717A (en) | A kind of licence plate recognition method, identification device, identification equipment and readable storage medium storing program for executing | |
WO2015180101A1 (en) | Compact face representation | |
CN109377532B (en) | Image processing method and device based on neural network | |
CN111340077B (en) | Attention mechanism-based disparity map acquisition method and device | |
CN114549913B (en) | Semantic segmentation method and device, computer equipment and storage medium | |
CN108021908B (en) | Face age group identification method and device, computer device and readable storage medium | |
Chiaroni et al. | Learning with a generative adversarial network from a positive unlabeled dataset for image classification | |
CN114418030B (en) | Image classification method, training method and device for image classification model | |
JP2019508803A (en) | Method, apparatus and electronic device for training neural network model | |
CN111898703A (en) | Multi-label video classification method, model training method, device and medium | |
An et al. | Weather classification using convolutional neural networks | |
CN112446888A (en) | Processing method and processing device for image segmentation model | |
CN114064627A (en) | Knowledge graph link completion method and system for multiple relations | |
CN109492610A (en) | A kind of pedestrian recognition methods, device and readable storage medium storing program for executing again | |
CN112949706B (en) | OCR training data generation method, device, computer equipment and storage medium | |
JP6935868B2 (en) | Image recognition device, image recognition method, and program | |
CN114492634A (en) | Fine-grained equipment image classification and identification method and system | |
CN110717401A (en) | Age estimation method and device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20832168 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20217017354 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2021532473 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20832168 Country of ref document: EP Kind code of ref document: A1 |