WO2020258902A1 - Procédé de génération d'images et d'apprentissage de réseau neuronal, appareil, dispositif et support - Google Patents

Procédé de génération d'images et d'apprentissage de réseau neuronal, appareil, dispositif et support Download PDF

Info

Publication number
WO2020258902A1
WO2020258902A1 PCT/CN2020/076835 CN2020076835W WO2020258902A1 WO 2020258902 A1 WO2020258902 A1 WO 2020258902A1 CN 2020076835 W CN2020076835 W CN 2020076835W WO 2020258902 A1 WO2020258902 A1 WO 2020258902A1
Authority
WO
WIPO (PCT)
Prior art keywords
network unit
unit block
layer
network
content
Prior art date
Application number
PCT/CN2020/076835
Other languages
English (en)
Chinese (zh)
Inventor
黄明杨
张昶旭
刘春晓
石建萍
Original Assignee
商汤集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 商汤集团有限公司 filed Critical 商汤集团有限公司
Priority to JP2021532473A priority Critical patent/JP2022512340A/ja
Priority to KR1020217017354A priority patent/KR20210088656A/ko
Publication of WO2020258902A1 publication Critical patent/WO2020258902A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation

Definitions

  • the present disclosure relates to the field of image processing, and in particular to an image generation method and neural network training method, device, electronic equipment, and computer storage medium.
  • the method of image generation can be to generate from one real image to another, and then subjectively judge whether the generated image is more realistic through human vision.
  • neural network-based image generation methods have emerged in related technologies.
  • the neural network can usually be trained based on paired data, and then the content image can be styled through the trained neural network.
  • paired data is represented by The content image and style image that have the same content characteristics for training, and the style image and the content image have different style characteristics.
  • this method is not easy to implement.
  • the embodiments of the present disclosure are expected to provide a technical solution for image generation.
  • an embodiment of the present disclosure provides an image generation method, the method includes: extracting content features of a content image by using sequentially connected multi-layer first network unit blocks in a first neural network to obtain first The content features respectively output by the network unit blocks; the style features of the style images are extracted; the content features respectively output by the first network unit blocks of each layer are correspondingly fed forward into the second neural network connected to the multilayer second network unit sequentially Block, and feed-forward the style features from the first-layer second network unit block in the multi-layer second network unit block, and obtain the input feature after each second network unit block processes The generated image output by the second neural network, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
  • the embodiments of the present disclosure also propose a neural network training method.
  • the method further includes: extracting the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain each The content features respectively output by the first network unit block of the layer; extract the style features of the style image; the content features respectively output by the first network unit blocks of each layer are fed forward into the sequentially connected multi-layer first neural network in the second neural network.
  • Two network unit blocks and feed forward the style features from the first-layer second network unit block in the multi-layer second network unit block, and after each second network unit block processes the respective input features Obtain the generated image output by the second neural network, where the multi-layer first network unit block corresponds to the multi-layer second network unit block; the generated image is identified to obtain the authentication result; Adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
  • an embodiment of the present disclosure also provides an image generation device.
  • the device includes a first extraction module, a second extraction module, and a first processing module.
  • the first extraction module is used to use the first neural network.
  • the first network unit blocks of multiple layers connected in sequence extract the content features of the content image to obtain the content features respectively output by the first network unit blocks of each layer;
  • the second extraction module is used to extract the style features of the style image;
  • the first processing Module used to feed forward the content features respectively outputted by the first network unit blocks of each layer into the second neural network connected in sequence in the second neural network, and transfer the style features from the multiple
  • the first layer second network unit block in the second layer of the network unit block feeds forward input, and the generated image output by the second neural network is obtained after each second network unit block processes the characteristics of each input, wherein
  • the multi-layer first network unit block corresponds to the multi-layer second network unit block.
  • the embodiments of the present disclosure also provide a neural network training device, which includes a third extraction module, a fourth extraction module, a second processing module, and an adjustment module; wherein, the third extraction module is used to use In the first neural network, the sequentially connected multi-layer first network unit blocks extract the content features of the content image, and obtain the content features respectively output by the first network unit blocks of each layer; the fourth extraction module is used to extract the style features of the style image
  • the second processing module is used to feed the content features respectively output by the first network unit blocks of each layer into the second neural network sequentially connected multi-layer second network unit blocks corresponding to the feedforward input, and the style features Feed forward input from the first-layer second network unit block in the multi-layer second network unit block, and obtain the generated image output by the second neural network after each of the second network unit blocks process the features of their respective inputs Identify the generated image to obtain an identification result; wherein, the multi-layer first network unit block corresponds to the multi-layer second network unit block; an adjustment module is used for
  • the embodiments of the present disclosure also propose an electronic device, including a processor and a memory for storing a computer program that can run on the processor; wherein, when the processor is used to run the computer program, execute Any one of the above image generation methods or any one of the above neural network training methods.
  • the embodiments of the present disclosure also propose a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, any one of the foregoing image generation methods or any of the foregoing neural network training methods is implemented.
  • the content features of the content image are extracted by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain The content features respectively output by the first network unit blocks of each layer; the style features of the style images are extracted; the content features respectively output by the first network unit blocks of each layer are correspondingly fed forward into the sequentially connected multilayers in the second neural network
  • the second network unit block, and the style feature is fed forward from the first-layer second network unit block in the multi-layer second network unit block, and the respective input features are processed by each of the second network unit blocks
  • the generated image output by the second neural network is obtained, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
  • both the content image and the style image can be determined in actual need, and the content image and the style image do not need to be a pair of images, which is easy to implement; in addition, in the process of image generation, the first neural network can be used The first network unit block of each layer extracts the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that the generated image retains more semantic information compared with the content image. Therefore, The generated image is more realistic.
  • FIG. 1 is a flowchart of an image generation method according to an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of the structure of a neural network pre-trained in an embodiment of the disclosure
  • FIG. 3 is an exemplary structural diagram of a content encoder according to an embodiment of the disclosure.
  • FIG. 4 is a schematic diagram of an exemplary structure of a CRB in an embodiment of the disclosure.
  • FIG. 5 is an exemplary structural diagram of a generator of an embodiment of the disclosure.
  • Fig. 6 shows several exemplary sets of content images, style images, and generated images in the embodiments of the disclosure
  • Fig. 7 is a flowchart of a neural network training method according to an embodiment of the disclosure.
  • FIG. 8 is a schematic structural diagram of the framework of the image generation method proposed by the application embodiment of the disclosure.
  • Fig. 9a is a schematic structural diagram of a residual block of a content encoder in an application embodiment of the present disclosure.
  • Fig. 9b is a schematic structural diagram of a residual block of the generator in an application embodiment of the present disclosure.
  • FIG. 9c is a schematic structural diagram of the FADE module of the application embodiment of the disclosure.
  • FIG. 10 is a schematic diagram of the composition structure of an image generating device according to an embodiment of the disclosure.
  • FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
  • FIG. 12 is a schematic diagram of the composition structure of a neural network training device according to an embodiment of the disclosure.
  • the terms "including”, “including” or any other variations thereof are intended to cover non-exclusive inclusion, so that a method or device including a series of elements not only includes what is clearly stated Elements, but also include other elements not explicitly listed, or elements inherent to the implementation of the method or device. Without more restrictions, the element defined by the sentence “including a" does not exclude the existence of other related elements (such as steps or steps in the method) in the method or device that includes the element.
  • the unit in the device for example, the unit may be part of a circuit, part of a processor, part of a program or software, etc.).
  • the image generation method and neural network training method provided by the embodiments of the present disclosure include a series of steps, but the image generation method and neural network training method provided by the embodiments of the present disclosure are not limited to the recorded steps.
  • the present disclosure The image generation device and neural network training device provided in the embodiments include a series of modules, but the devices provided in the embodiments of the present disclosure are not limited to include the explicitly recorded modules, and may also include information for obtaining relevant information or processing based on information. The module that needs to be set.
  • the embodiments of the present disclosure can be applied to a computer system composed of a terminal and a server, and can operate with many other general-purpose or special-purpose computing system environments or configurations.
  • the terminal can be a thin client, a thick client, a handheld or laptop device, a microprocessor-based system, a set-top box, a programmable consumer electronics product, a network personal computer, a vehicle-mounted device, a small computer system, etc.
  • the server can It is server computer system, small computer system, large computer system and distributed cloud computing technology environment including any of the above systems, etc.
  • Electronic devices such as terminals and servers can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system.
  • program modules may include routines, programs, object programs, components, logic, data structures, etc., which perform specific tasks or implement specific abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment. In the distributed cloud computing environment, tasks are executed by remote processing equipment linked through a communication network.
  • program modules may be located on a storage medium of a local or remote computing system including a storage device.
  • an image generation method is proposed.
  • the applicable scenarios of the embodiments of the present disclosure include but are not limited to automatic driving, image generation, image synthesis, computer vision, deep learning, Machine learning, etc.
  • FIG. 1 is a flowchart of an image generation method according to an embodiment of the disclosure. As shown in FIG. 1, the method may include:
  • Step 101 Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer.
  • the content image may be an image that requires style conversion; for example, the content image may be obtained from a local storage area or the content image may be obtained from the network.
  • the content image may be an image taken by a mobile terminal or a camera.
  • the format of the content image can be Joint Photographic Experts GROUP (JPEG), Bitmap (BMP), Portable Network Graphics (PNG) or other formats; it should be noted that this is only for The format and source of the content image are exemplified, and the embodiment of the present disclosure does not limit the format and source of the content image.
  • content features and style features can be extracted.
  • the content feature is used to characterize the content information of the image, for example, the content feature represents the object position, object shape, object size, etc. in the image
  • the style feature is used to represent the style information of the content image, for example, the style feature is used to represent weather, Style information such as day, night, and conversation style.
  • the style conversion may refer to the conversion of the style feature of the content image into another style feature.
  • the conversion of the style feature of the content image may be the conversion from day to night, and from night to day.
  • the conversion between styles can be from sunny to rainy, rainy to sunny, sunny to cloudy, cloudy to sunny, cloudy to rainy, rainy to cloudy, sunny to cloudy Snow conversion, snow to sunny conversion, cloudy to snow conversion, snow to cloudy conversion, snow to rain conversion or rain to snow conversion, etc.
  • the conversion of different painting styles can be oil painting Conversion to ink painting, ink painting to oil painting, oil painting to sketch painting, sketch painting to oil painting, sketch painting to ink painting or ink painting to sketch painting, etc.
  • the first neural network is a network for extracting content features of content images, and the embodiment of the present disclosure does not limit the type of the first neural network.
  • the first neural network includes sequentially connected multi-layer first network unit blocks.
  • the content characteristics of the content image can be changed from the first network unit block of the multi-layer first network.
  • Feed forward input of the first network unit block of the layer the data processing direction corresponding to the feedforward input represents the data processing direction from the input end of the neural network to the output end, corresponding to the forward propagation or forward propagation process; for the feedforward input process, the upper layer of the neural network unit block
  • the output result is used as the input result of the next layer of network unit block.
  • the first network unit block of each layer of the first neural network can extract content features for the input data, that is, the output result of the first network unit block of each layer of the first neural network is the corresponding layer first network
  • the content characteristics of the unit blocks and the content characteristics output by different first network unit blocks in the first neural network are different.
  • the representation mode of the content feature of the content image may be a content feature map or other representation mode, which is not limited in the embodiment of the present disclosure.
  • each layer of the first network unit block in the first neural network is a plurality of neural network layers organized in a residual structure, so that it can be based on multiple layers of the first network unit block in each layer organized in a residual structure
  • the neural network layer extracts the content features of the content image.
  • Step 102 Extract the style features of the style image.
  • the style image is an image with the target style feature
  • the target style feature represents the style feature to which the content image needs to be converted
  • the style image can be set as needed.
  • the target style feature to be converted after acquiring the content image, the target style feature to be converted can be determined, and then the style image can be selected according to the demand.
  • the style image can be obtained from the local storage area or the network.
  • the style image can be an image taken through a mobile terminal or camera;
  • the format of the style image can be JPEG, BMP, PNG or other formats; Yes, here is only an example of the format and source of the style image, and the embodiment of the present disclosure does not limit the format and source of the style image.
  • the style feature of the content image is different from the style feature of the style image
  • the purpose of performing style conversion on the content image may be: to make the generated image obtained after the style conversion have the content feature and style of the content image The stylistic characteristics of the image.
  • the extracting the style features of the style image includes: extracting the features of the style image distribution; sampling the features of the style image distribution to obtain the style features, and the style features include the style image distribution The mean and standard deviation of the features.
  • the style characteristics of the style image can be accurately extracted, which is conducive to accurate style conversion of the content image.
  • at least one layer of convolution operation may be performed on the style image to obtain the characteristics of the style image distribution.
  • Step 103 Feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and remove the style features from the multi-layer second network unit blocks
  • the first layer of the second network unit block feed forward input, and the generated image of the second neural network output is obtained after each second network unit block processes the respective input characteristics.
  • the multi-layer first network unit block and the multi-layer second The network unit block corresponds.
  • the second neural network includes successively connected multi-layer second network unit blocks, and the output result of the previous network unit block in the second neural network is the input result of the next network unit block; optionally,
  • the second network unit blocks of each layer in the second neural network are multiple neural network layers organized in a residual structure. In this way, it can be based on the multiple neural network layer pairs organized in a residual structure in each second network unit block.
  • the input features are processed.
  • step 101 to step 103 can be implemented by a processor in an electronic device.
  • the processor can be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), Digital signal processing device (Digital Signal Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), FPGA, central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor At least one.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD programmable logic device
  • FPGA central processing unit
  • CPU Central Processing Unit
  • controller microcontroller
  • microprocessor At least one.
  • both the content image and the style image can be determined according to actual needs, and the content image and the style image do not need to be a pair of images, which is easy to implement; in addition, in the process of image generation, The first network unit block of each layer of the first neural network is used to extract the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that the generated image retains more than the content image. Semantic information, therefore, the generated image is more realistic.
  • the style of the style image can be determined according to actual needs, and does not limit the style characteristics of the style image and the style characteristics of the style image used when training the neural network. That is to say, the training image of the dark night style is used when the neural network is trained, but when the image is generated based on the trained neural network, you can choose the content image and the snowy style, the rainy style or other styles Images, which generate images that meet the actual needs of the style, not only the dark night style images, and improve the generalization and universality of the image generation method.
  • style images with different style characteristics can be set according to user needs, and then generated images with different style characteristics can be obtained for one content image.
  • generated images with different style characteristics can be obtained for one content image.
  • dark night style, cloudy style and rainy style that is, based on the same content image, multiple styles of generated images can be obtained, not only one style of image can be generated, and the applicability of the image generation method is improved.
  • the number of layers of the first network unit block of the first neural network and the number of layers of the second network unit block of the second neural network may be the same, and the first network unit block of each layer of the first neural network It forms a one-to-one correspondence with the second network unit blocks of each layer of the second neural network.
  • the corresponding feedforward input of the content features respectively output by the first network unit blocks of each layer to the sequentially connected multi-layer second network unit blocks in the second neural network includes: sequentially responding to i In the case of 1 to T, the content features output by the first network unit block of the i-th layer are fed forward into the second network unit block of the T-i+1th layer, i is a positive integer, and T represents the first nerve
  • T represents the first nerve
  • the content features output by the first network unit block of the last layer are input to the second network unit block of the first layer.
  • the received content feature of the second network unit block of each layer in the second neural network is the output feature of the first network unit block of each layer of the first neural network, and the second network unit of each layer in the second neural network
  • the received content characteristics of the block vary with different positions in the second neural network.
  • the second neural network uses style features as input. As the style features deepen from the lower-level second network unit block of the second neural network to the high-level second network unit block, more content features can be integrated, which can be based on The style feature gradually merges the semantic information of each layer of the content image, so that the resulting image can retain the multi-layer voice information and style feature information of the content image.
  • the feature processing of the first-level second network unit block in each of the second network unit blocks includes: the content feature and the style feature from the last-level first network unit block can be multiplied , Obtain the intermediate feature of the first-level second network unit block; add the content feature from the first-level first network unit block of the last layer and the intermediate feature of the first-level second network unit block to obtain the first-level second network unit block
  • the output characteristics of the first layer of the second network unit block input the output characteristics of the second layer of the second network unit block.
  • a convolution operation may be performed on the content feature from the first network unit block of the last layer. That is, it is possible to first perform a convolution operation on the content features from the first network unit block of the last layer, and then perform a multiplication operation on the result of the convolution operation and the style feature.
  • the input feature processing of the middle layer second network unit block in each second network unit block includes: the input content feature and the output feature of the upper layer second network unit block can be multiplied , Get the intermediate feature of the second network unit block of the middle layer; add the input content feature and the intermediate feature of the second network unit block of the middle layer to obtain the output feature of the second network unit block of the middle layer; The output feature of the network unit block is input to the second network unit block of the next layer. It can be seen that by performing the above-mentioned multiplication operation and addition operation, it is convenient to realize the fusion of the output characteristics of the second network unit block of the upper layer and the corresponding content characteristics.
  • the second network unit block in the middle layer is the second network unit block in the second neural network except the first layer second network unit block and the last layer second network unit block.
  • the second neural network There can be one intermediate second network unit block, or there can be multiple second network unit blocks; the above-mentioned content is only an intermediate second network unit block as an example, the data of the intermediate second network unit block The processing procedure is explained.
  • the intermediate layer second network unit block performs a convolution operation on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block.
  • the input feature processing of the second network unit block of the last layer in each second network unit block includes: the content characteristics from the first network unit block of the first layer can be combined with the second network unit of the upper layer.
  • the output feature of the block is multiplied to obtain the intermediate feature of the second network unit block of the last layer; the content feature from the first network unit block of the first layer and the intermediate feature of the second network unit block of the last layer are added to obtain the generated image .
  • the second network unit block of the last layer performs a multiplication operation on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer, and performs the multiplication operation on the first network unit block from the first layer.
  • the content feature of the unit block is subjected to convolution operation.
  • FIG. 2 is a schematic structural diagram of a neural network pre-trained in an embodiment of the disclosure.
  • the pre-trained neural network includes a content encoder, a style encoder, and a generator; wherein the content encoder is used to utilize the first neural network described above.
  • the network extracts the content features of the content image, the style encoder is used to extract the style features of the style image, and the generator is used to use the second neural network to realize the fusion of the style features and the content features output by the first network unit blocks of each layer.
  • the first neural network can be used as the content encoder
  • the second neural network can be used as the generator
  • the neural network used for style feature extraction on the style image can be used as the style encoder.
  • the image to be processed ie, content image
  • the multi-layer first network unit block of the first neural network can be used for processing, and each layer of the first network unit block
  • the content feature can be output;
  • the style image can also be input into the style encoder, and the style feature of the style image can be extracted from the style encoder.
  • the first network unit block is a residual block (Residual Block, RB)
  • the content feature output by the first network unit block of each layer is a content feature map.
  • Fig. 3 is a schematic diagram of an exemplary structure of the content encoder according to the embodiment of the disclosure.
  • the residual block of the content encoder can be marked as CRB, and the content encoder includes seven layers of CRB, the CRB( In A, B), A represents the number of input channels and B represents the number of output channels; in Figure 3, the input of CRB(3,64) is the content image, and the first CRB to the seventh CRB are arranged from bottom to top.
  • the first layer CRB to the seventh layer CRB can output seven content feature maps respectively.
  • FIG. 4 is an exemplary structural diagram of the CRB of an embodiment of the disclosure.
  • sync BN represents a synchronous BN layer
  • a rectified linear unit (ReLu) represents a ReLu layer
  • Conv represents a convolutional layer. Represents a summation operation; the structure of CRB shown in Figure 4 is the structure of a standard residual block.
  • a standard residual network structure can be used to extract content features, which facilitates the extraction of content features of content images and reduces semantic information loss.
  • the multi-layer second network unit block of the second neural network can be used for processing; for example, the second network unit block is RB.
  • FIG. 5 is an exemplary structural diagram of the generator of the embodiment of the disclosure.
  • the residual block in the generator can be denoted as GB, and the generator can include seven layers of GB, and the input of each layer of GB is The output of one layer of CRB of the content encoder; in the generator, the first layer GB to the seventh layer GB are GB ResBlk (1024), GB ResBlk (1024), GB ResBlk (1024), arranged from top to bottom, respectively GB ResBlk (512), GB ResBlk (256), GB ResBlk (128), and GB ResBlk (64); in GB ResBlk (C) in Figure 5, C represents the number of channels; the first layer of GB is used to receive style features, The first layer GB to the seventh layer GB are used to receive the content feature maps output from the seventh layer CRB to the first layer CRB; after each layer GB processes the input features, the seventh layer GB output can be used to generate an image.
  • the structural information of the content image can be encoded to generate multiple content feature maps of different levels; the content encoder can extract more abstract features in the deep layer, and in the surface layer A lot of structural information is retained.
  • the image generation method of the embodiment of the present disclosure can be applied to various image generation scenarios, for example, can be applied to scenarios such as image entertainment data generation, automatic driving model training test data generation, and the like.
  • Figure 6 shows several exemplary sets of content images, style images, and generated images in the embodiments of the present disclosure.
  • the first column represents content images
  • the second column represents style images
  • the third column represents implementations based on the present disclosure.
  • the generated image obtained by the image generation method of the example the image in the same row represents a group of content images, style images and generated images; the style conversion from the first row to the last row is from day to night, night to day, sunny to rainy. , Rainy to sunny, sunny to cloudy, cloudy to sunny, sunny to snow, and snowy to sunny style conversion, as can be seen from FIG. 6, the generated image based on the image generation method of the embodiment of the present disclosure can be The content information of the content image and the style information of the style image are retained.
  • the training process of the neural network in the embodiments of the present disclosure not only the forward propagation process from input to output is involved, but also the back propagation process from output to input; the training process of the neural network of the present disclosure can be used before Use the reverse process to generate images and use the reverse process to adjust the network parameters of the neural network.
  • FIG. 7 is a flowchart of a neural network training method according to an embodiment of the disclosure. As shown in FIG. 7, the process may include:
  • Step 701 Extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer.
  • Step 702 Extract the style features of the style image.
  • Step 703 Feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and remove the style features from the multi-layer second network unit blocks
  • the first layer of the second network unit block feed forward input, and the generated image of the second neural network output is obtained after each second network unit block processes the respective input characteristics.
  • the multi-layer first network unit block and the multi-layer second The network unit block corresponds.
  • steps 701 to 703 in this embodiment is the same as the implementation of steps 101 to 103, and will not be repeated here.
  • Step 704 Discriminate the generated image, and obtain an identification result.
  • the output image generated by the generator needs to be identified.
  • the purpose of discriminating the generated image is to determine the probability that the generated image is a real image; in practical applications, this step can be implemented using a discriminator or the like.
  • Step 705 Adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
  • the network parameters of the first neural network and/or the second neural network can be adjusted based on the reverse process according to the content image, style image, generated image, and identification result, and then the forward process can be used to obtain the generated image again And the identification result, so, through the above-mentioned forward process and reverse process repeatedly, the network iterative optimization of the neural network is performed until the predetermined training completion conditions are met, and the trained neural network for image generation can be obtained. .
  • steps 701 to 705 can be implemented by a processor in an electronic device.
  • the aforementioned processor can be at least ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor.
  • ASIC ASIC
  • DSP digital signal processor
  • DSPD DSPD
  • PLD PLD
  • FPGA field-programmable gate array
  • both the content image and the style image can be determined according to actual needs, and the content image and the style image do not need to be a pair of images, which is easy to implement.
  • the first network unit blocks of each layer of the first neural network can be used to extract the content features of the content image multiple times, thereby retaining more semantic information of the content image, so that Compared with the content image, the generated image retains more semantic information; in turn, the trained neural network can better maintain the semantic information of the content image.
  • the parameters of the above-mentioned multiplication operation and/or addition operation used in the second network unit block of each layer can be adjusted.
  • the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result includes : Determine the Generative Adversarial Net (GAN) loss according to the content image, style image, generated image, and identification results; wherein the Generative Adversarial Net (GAN) loss is used to characterize the difference in content characteristics between the generated image and the content image.
  • the difference in style characteristics between the generated image and the style image in one example, the generated confrontation network includes a generator and a discriminator; in response to the loss of the generated confrontation network that does not meet the first predetermined condition, adjust the first Network parameters of the neural network and/or the second neural network.
  • the network parameters of the first neural network and/or the second neural network can be adjusted based on the generation of countermeasures against the network loss, and a minimax strategy can be adopted.
  • the first predetermined condition may represent a predetermined training completion condition; it is understandable that according to the meaning of generating a confrontation network loss, training the neural network based on the loss of the generation confrontation network can make the generated image obtained based on the trained neural network, It has a high performance of maintaining the content characteristics of the content image and the style characteristics of the style image.
  • the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result further includes: According to the generated image and the style image, determine the style loss; in response to the situation that the style loss does not meet the second predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the style loss; wherein, the style loss It is used to characterize the difference between the style characteristics of the generated image and the style image.
  • the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result further includes: Determine the content loss according to the generated image and the content image; in response to the content loss not meeting the third predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the content loss; wherein the content loss is used for Characterize the difference in content characteristics between the generated image and the content image.
  • the adjusting the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result further includes: Determine the feature matching loss according to the output features of each intermediate layer second network unit block in each second network unit block and the style image; in response to the feature matching loss not satisfying the fourth predetermined condition, adjust according to the feature matching loss The network parameters of the first neural network and/or the second neural network; wherein, the feature matching loss is used to characterize the difference between the output feature of the second network unit block of each intermediate layer and the style feature of the style image.
  • the aforementioned second predetermined condition, third predetermined condition, and fourth predetermined condition may represent predetermined training completion conditions; it is understandable that according to the meaning of style loss, content loss or feature matching loss, it can be known that based on style loss, content loss or feature
  • the matching loss training neural network can make the generated image based on the trained neural network have a higher performance of maintaining the content characteristics of the content image.
  • a neural network can be trained based on the foregoing one loss or multiple losses.
  • the trained neural network can be obtained when the loss meets the corresponding predetermined condition;
  • the accuracy of the style conversion of the trained neural network is higher.
  • the generation of countermeasure network loss, style loss, content loss, or feature matching loss can be represented by a loss function.
  • the training process of the neural network method can be implemented based on the content encoder, style encoder, generator and discriminator, etc., and the process of image generation based on the completion of the training of the neural network method can be based on the content encoder, style encoding Implements such as generators and generators.
  • FIG. 8 is a schematic structural diagram of the framework of the image generation method proposed by the application embodiment of the disclosure.
  • the input of the content encoder is the image to be processed (that is, the content image), which is used to extract the content characteristics of the content image;
  • the encoder is responsible for extracting the style features of the style image;
  • the generator combines the content features and style features of the first network unit blocks of different layers to generate high-quality images.
  • the discriminator used in the neural network training process is not shown in FIG. 8.
  • the content encoder includes multiple layers of residual blocks, CRB-1, CRB-2...CRB-T respectively represent the layer 1 residual block to the T layer residual block of the content encoder; generate The generator includes multiple layers of residual blocks, GB-1...GB-T-1, GB-T respectively represent the first layer to the T-th residual block of the generator.
  • the output result of the i-th layer residual block of the content encoder is input to the T-i+1th layer residual block of the generator; the input of the style encoder is style
  • the image is used to extract the style feature of the style image, and the style feature is input into the first layer residual block of the generator.
  • the output image is obtained based on the output result of the T-th layer residual block GB-T of the generator.
  • f i is defined as the content feature map output from the i-th layer residual block of the content encoder, using Represents the output characteristics of the i-th residual block of the generator.
  • the i-th residual block of the content encoder corresponds to the T-i+1-th layer residual block of the generator;
  • F i with the same number of channels, N denotes the size of the batch,
  • C i represents the number of the channel;
  • H i and W i indicates the height and width, respectively.
  • the activation value (n ⁇ [1,N], c ⁇ [1,C i ], h ⁇ [1,H i ], ⁇ [1,W i ]) can be expressed as formula (1).
  • Both correspond to the i-th residual block of the generator, and respectively represent the mean and standard deviation of the features output by the residual block of the previous layer (that is, the residual block of the second neural network), with It can be calculated according to formula (2).
  • the image generation method of the application embodiment of the present disclosure is feature-adaptive, that is, the modulation parameter can be calculated directly based on the content characteristics of the content image; and in the related image generation method, the modulation The parameters are unchanged.
  • Figure 9a is a schematic structural diagram of the residual block of the content encoder in the application embodiment of the disclosure.
  • BN represents the BN layer
  • ReLu represents the ReLu layer
  • Conv represents the convolutional layer. Represents the summation operation;
  • the structure of each residual block CRB of the content encoder is the structure of the standard residual block, and each residual block of the content encoder includes three convolutional layers, one of which is used to skip the connection (skip connection).
  • Fig. 9b is a schematic structural diagram of the residual block of the generator in the application embodiment of the present disclosure, as shown in Fig. 9b, in the standard residual block
  • the FADE module is used to replace the BN layer to obtain the structure of the residual block GB of each layer of the generator
  • F1, F2 and F3 represent the first FADE module, the second FADE module and the third FADE module, respectively.
  • each FADE module in each residual block of the generator, the input of each FADE module includes the corresponding content feature map output by the content encoder, refer to Figure 9b, in each residual block of the generator, in the generator Among the three FADE modules of each residual block, the input of F1 and F2 also includes the output characteristics of the previous residual block of the second neural network, and the input of F3 also includes the F1, ReLu layer and convolutional layer in turn Features obtained after processing.
  • Fig. 9c is a schematic diagram of the structure of the FADE module of the application embodiment of the present disclosure. Represents the multiplication operation, Means addition; Conv means convolutional layer, BN means BN layer; ⁇ and ⁇ represent the modulation parameters of each residual block of the generator. It can be seen that FADE takes the content feature map as input, which can be obtained from the convolutional features Derive denormalization parameters.
  • the trained neural network is made to adaptively transform the content image under the control of the style image.
  • the style encoder is proposed based on the Variational Adaptive Encoder (VAE).
  • VAE Variational Adaptive Encoder
  • the output of the style encoder is a mean vector And standard deviation vector Latent code (latent code) z is derived from the resampling of style images after encoding
  • be a uniformly distributed random vector with the same size as z; here, ⁇ ( ⁇
  • various parts of the entire neural network can be jointly trained.
  • the loss function of the entire first neural network can be calculated by referring to formula (3) based on the optimization of the minimax strategy, and then the training of the first neural network can be realized.
  • G represents the generator
  • D represents the discriminator
  • L VAE (E s , G) represents the style loss.
  • the style loss can be the loss of Kullback-Leibler divergence;
  • L VAE (E s , G) can be calculated according to formula (4).
  • KL( ⁇ ) represents the KL divergence
  • ⁇ 0 represents the hyperparameter in L VAE (E s ,G).
  • L GAN (E s ,E c ,G,D) represents the loss of the generated adversarial network, which is used in the adversarial training of the generator and discriminator;
  • L GAN (E s ,E c ,G,D) can be based on the formula ( 5) Perform calculations.
  • L VGG (E s , E c , G) represents content loss.
  • the content loss may be a VGG (Visual Geometry Group) loss.
  • L VGG (E s , E c , G) can be calculated according to formula (6).
  • L FM (E s , E c , G) represents the feature matching loss;
  • L FM (E s , E c , G) can be calculated according to formula (7).
  • the VGG loss has different weights in different layers.
  • the first neural network is trained based on multi-scale discriminators, and each discriminator on different scales has exactly the same structure; the discriminator with the roughest scale has the largest receptive field; The discriminator can distinguish higher-resolution images.
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • FIG. 10 is a schematic diagram of the composition structure of an image generation device according to an embodiment of the disclosure. As shown in FIG. 10, the device includes: a first extraction module 1001, a second extraction module 1002, and a first processing module 1003, wherein:
  • the first extraction module 1001 is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
  • the second extraction module 1002 is used to extract style features of style images
  • the first processing module 1003 is configured to feed forward the content features respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and combine the The style features are fed forward from the first layer second network unit block in the multi-layer second network unit block, and the second neural network output is obtained after each second network unit block processes the respective input features An image is generated, wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block.
  • the first processing module 1003 is configured to feed forward the content features output by the first network unit block of the i-th layer to the T-i+1th layer in response to the situation that i takes 1 to T in sequence.
  • i is a positive integer
  • T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
  • the first-level second network unit block in each of the second network unit blocks is used to multiply the content feature from the last-level first network unit block and the style feature to obtain the Intermediate features of the first-level second network unit block; add the content features from the first-level first-level network unit block and the intermediate features of the first-level second-level network unit block to obtain the first-level second network
  • the output characteristics of the unit block input the output characteristics of the first layer second network unit block into the second layer second network unit block.
  • the first-layer second network unit block is also used to perform multiplication operations on the first network unit from the last layer before the content feature from the first network unit block at the last layer and the style feature are multiplied.
  • the content feature of the block is subjected to convolution operation.
  • the middle layer second network unit block in each of the second network unit blocks is used to multiply the input content feature and the output feature of the second network unit block of the upper layer to obtain the Intermediate features of the second network unit block of the middle layer; add the content features of the input and the intermediate features of the second network unit block of the middle layer to obtain the output features of the second network unit block of the middle layer; The output characteristics of the second network unit block of the middle layer are input to the second network unit block of the next layer.
  • the middle layer second network unit block is further configured to perform multiplication on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block Convolution operation.
  • the last-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the first-level first network unit block and the output characteristics of the upper-level second network unit block Perform a multiplication operation to obtain the intermediate feature of the second network unit block of the last layer; add the content feature from the first network unit block of the first layer and the intermediate feature of the last second network unit block to obtain The generated image.
  • the second network unit block of the last layer is used to perform multiplication operations on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer.
  • the content features from the first network unit block of the first layer are convolved.
  • the second extraction module 1002 is configured to extract features of the style image distribution; sampling the features of the style image distribution to obtain the style feature, the style feature includes the style image distribution The mean and standard deviation of the features.
  • the first network unit block is configured to extract content features of content images based on multiple neural network layers organized in a residual structure in the first network unit block; and/or, the second network The unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
  • the first extraction module 1001, the second extraction module 1002, and the first processing module 1003 can all be implemented by processors.
  • the aforementioned processors can be ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, micro At least one of a controller and a microprocessor.
  • the functional modules in this embodiment can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software function module.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can A personal computer, server, or network device, etc.) or a processor (processor) executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • the computer program instructions corresponding to an image generation method or neural network training method in this embodiment can be stored on storage media such as optical disks, hard disks, and USB flash drives.
  • storage media such as optical disks, hard disks, and USB flash drives.
  • FIG. 11 shows an electronic device 11 provided by an embodiment of the present disclosure.
  • the electronic device 11 includes: a memory 111 and a processor 112; wherein, the memory 111 is used for A computer program is stored; the processor 112 is configured to execute the computer program stored in the memory to implement any image generation method or any neural network training method in the foregoing embodiments.
  • the various components in the electronic device 11 may be coupled together through a bus system. It can be understood that the bus system is used to realize the connection and communication between these components.
  • the bus system also includes a power bus, a control bus, and a status signal bus.
  • various buses are marked as bus systems in FIG. 11.
  • the aforementioned memory 111 may be a volatile memory (volatile memory), such as RAM; or a non-volatile memory (non-volatile memory), such as ROM, flash memory, or hard disk (Hard Disk). Drive, HDD) or Solid-State Drive (SSD); or a combination of the foregoing types of memories, and provide instructions and data to the processor 112.
  • volatile memory volatile memory
  • non-volatile memory non-volatile memory
  • ROM read-only memory
  • flash memory read-only memory
  • HDD hard disk
  • SSD Solid-State Drive
  • the aforementioned processor 112 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It can be understood that, for different devices, the electronic devices used to implement the above-mentioned processor functions may also be other, which is not specifically limited in the embodiment of the present disclosure.
  • FIG. 12 is a schematic diagram of the composition structure of a neural network training device according to an embodiment of the disclosure. As shown in FIG. 12, the device includes: a third extraction module 1201, a fourth extraction module 1202, a second processing module 1203, and an adjustment module 1204; among them,
  • the third extraction module 1201 is configured to extract the content features of the content image by using the sequentially connected multi-layer first network unit blocks in the first neural network to obtain the content features respectively output by the first network unit blocks of each layer;
  • the fourth extraction module 1202 is used to extract style features of the style image
  • the second processing module 1203 is configured to feed forward the content characteristics respectively output by the first network unit blocks of each layer into the sequentially connected multi-layer second network unit blocks in the second neural network, and combine the The style feature is fed forward from the first layer second network unit block in the multi-layer second network unit block, and the output of the second neural network is obtained after each second network unit block processes the respective input features Generate an image; identify the generated image to obtain an authentication result; wherein the multi-layer first network unit block corresponds to the multi-layer second network unit block;
  • the adjustment module 1204 is configured to adjust the network parameters of the first neural network and/or the second neural network according to the content image, the style image, the generated image, and the identification result.
  • the second processing module 1203 is configured to feed forward the content features output by the first network unit block of the i-th layer to the T-i+1th layer in response to the situation where i takes 1 to T in sequence.
  • i is a positive integer
  • T represents the number of layers of the first network unit block of the first neural network and the second network unit block of the second neural network.
  • the first-level second network unit block in each of the second network unit blocks is used to multiply the content feature from the last-level first network unit block and the style feature to obtain the Intermediate features of the first-level second network unit block; add the content features from the first-level first-level network unit block and the intermediate features of the first-level second-level network unit block to obtain the first-level second network
  • the output characteristics of the unit block input the output characteristics of the first layer second network unit block into the second layer second network unit block.
  • the first-layer second network unit block is also used to perform multiplication operations on the first network unit from the last layer before the content feature from the first network unit block at the last layer and the style feature are multiplied.
  • the content feature of the block is subjected to convolution operation.
  • the middle layer second network unit block in each of the second network unit blocks is used to multiply the input content feature and the output feature of the second network unit block of the upper layer to obtain the Intermediate features of the second network unit block of the middle layer; add the content features of the input and the intermediate features of the second network unit block of the middle layer to obtain the output features of the second network unit block of the middle layer; The output characteristics of the second network unit block of the middle layer are input to the second network unit block of the next layer.
  • the middle layer second network unit block is further configured to perform multiplication on the received content feature before multiplying the input content feature and the output feature of the upper layer second network unit block Convolution operation.
  • the last-level second network unit block in each of the second network unit blocks is used to combine the content characteristics from the first-level first network unit block and the output characteristics of the upper-level second network unit block Perform a multiplication operation to obtain the intermediate feature of the second network unit block of the last layer; add the content feature from the first network unit block of the first layer and the intermediate feature of the last second network unit block to obtain The generated image.
  • the last-level second network unit block is also used to perform multiplication operations on the content feature from the first network unit block of the first layer and the output feature of the second network unit block of the upper layer.
  • the content feature from the first network unit block of the first layer is subjected to a convolution operation.
  • the adjustment module 1204 is configured to adjust the multiplication operation parameter and/or the addition operation parameter.
  • the adjustment module 1204 is configured to determine, according to the content image, the style image, the generated image, and the identification result, to generate a countermeasure network loss; in response to the generation countermeasure network loss that does not meet the first Under a predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the loss of the generated confrontation network; wherein the loss of the generated confrontation network is used to characterize the difference between the generated image and The content feature difference of the content image, and the style feature difference between the generated image and the style image.
  • the adjustment module 1204 is further configured to determine a style loss according to the generated image and the style image; in response to the situation that the style loss does not meet a second predetermined condition, adjust according to the style loss The network parameters of the first neural network and/or the second neural network; wherein the style loss is used to characterize the difference between the style characteristics of the generated image and the style image.
  • the adjustment module 1204 is further configured to determine the content loss according to the generated image and the content image; in response to the content loss not satisfying a third predetermined condition, adjust the content loss according to the content loss The network parameters of the first neural network and/or the second neural network; wherein the content loss is used to characterize the content feature difference between the generated image and the content image.
  • the adjustment module 1204 is further configured to determine the feature matching loss according to the output feature of each intermediate layer second network unit block in each second network unit block and the style image; in response to the feature If the matching loss does not meet the fourth predetermined condition, adjust the network parameters of the first neural network and/or the second neural network according to the feature matching loss; wherein, the feature matching loss is used to characterize the The difference between the output feature of the second network unit block of each middle layer and the style feature of the style image.
  • the fourth extraction module 1202 is configured to extract features of the style image distribution; sampling the features of the style image distribution to obtain the style feature, the style feature includes the style image distribution The mean and standard deviation of the features.
  • the first network unit block is configured to extract content features of content images based on multiple neural network layers organized in a residual structure in the first network unit block; and/or, the second network The unit block is used to process the features input to the second network unit based on multiple neural network layers organized in a residual structure in the second network unit block.
  • the third extraction module 1201, the fourth extraction module 1202, the second processing module 1203, and the adjustment module 1204 can all be implemented by a processor, and the processor can be ASIC, DSP, DSPD, PLD, FPGA, CPU , At least one of a controller, a microcontroller, and a microprocessor.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the embodiment of the present disclosure further provides a computer storage medium, such as the memory 111 including a computer program, which can be executed by the processor 112 of the electronic device 11 to complete the steps described in the foregoing method.
  • the computer-readable storage medium can be FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM, etc.; it can also be a variety of devices including one or any combination of the foregoing memories, such as Mobile phones, computers, tablet devices, personal digital assistants, etc.
  • the embodiments of the present disclosure provide a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, any image generation method or any neural network training method in the foregoing embodiments is implemented.
  • the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present invention.
  • a terminal which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne un procédé de génération d'images, un procédé d'apprentissage de réseau neuronal, un appareil, un dispositif électronique et un support de stockage informatique. Le procédé de génération d'images consiste à : extraire des caractéristiques de contenu d'une image de contenu en utilisant de multiples premiers blocs d'unités de réseau connectés consécutivement dans un premier réseau neuronal pour obtenir les caractéristiques de contenu délivrées par les premiers blocs d'unités de réseau (101) ; extraire une caractéristique de style d'une image de style (102) ; entrer vers l'avant de la même manière les caractéristiques de contenu délivrées respectivement par les premiers blocs d'unités de réseau dans de multiples seconds blocs d'unités de réseau connectés consécutivement dans un second réseau neuronal, entrer vers l'avant la caractéristique de style du premier second bloc d'unité de réseau dans les multiples seconds blocs d'unités de réseau et, lorsque les seconds blocs d'unités de réseau traitent les caractéristiques entrées par chacun, obtenir une image générée délivrée par le second réseau neuronal, les multiples premiers blocs d'unités de réseau correspondant aux multiples seconds blocs d'unités de réseau (103).
PCT/CN2020/076835 2019-06-24 2020-02-26 Procédé de génération d'images et d'apprentissage de réseau neuronal, appareil, dispositif et support WO2020258902A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021532473A JP2022512340A (ja) 2019-06-24 2020-02-26 画像生成及びニューラルネットワーク訓練方法、装置、機器並びに媒体
KR1020217017354A KR20210088656A (ko) 2019-06-24 2020-02-26 이미지 생성 및 신경망 트레이닝 방법, 장치, 기기 및 매체

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910551145.3A CN112132167B (zh) 2019-06-24 2019-06-24 图像生成和神经网络训练方法、装置、设备和介质
CN201910551145.3 2019-06-24

Publications (1)

Publication Number Publication Date
WO2020258902A1 true WO2020258902A1 (fr) 2020-12-30

Family

ID=73850015

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/076835 WO2020258902A1 (fr) 2019-06-24 2020-02-26 Procédé de génération d'images et d'apprentissage de réseau neuronal, appareil, dispositif et support

Country Status (4)

Country Link
JP (1) JP2022512340A (fr)
KR (1) KR20210088656A (fr)
CN (1) CN112132167B (fr)
WO (1) WO2020258902A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733946A (zh) * 2021-01-14 2021-04-30 北京市商汤科技开发有限公司 一种训练样本的生成方法、装置、电子设备及存储介质
CN113255813A (zh) * 2021-06-02 2021-08-13 北京理工大学 一种基于特征融合的多风格图像生成方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20230137732A (ko) * 2022-03-22 2023-10-05 삼성전자주식회사 사용자 선호 콘텐트를 생성하는 전자 장치 및 그 동작 방법
KR102490503B1 (ko) 2022-07-12 2023-01-19 프로메디우스 주식회사 순환형 적대적 생성 신경망을 이용한 이미지 처리 장치 및 방법

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180068463A1 (en) * 2016-09-02 2018-03-08 Artomatix Ltd. Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures
CN108205803A (zh) * 2017-07-19 2018-06-26 北京市商汤科技开发有限公司 图像处理方法、神经网络模型的训练方法及装置
CN108205813A (zh) * 2016-12-16 2018-06-26 微软技术许可有限责任公司 基于学习网络的图像风格化
CN109766895A (zh) * 2019-01-03 2019-05-17 京东方科技集团股份有限公司 用于图像风格迁移的卷积神经网络的训练方法和图像风格迁移方法
CN109840924A (zh) * 2018-12-28 2019-06-04 浙江工业大学 一种基于串联对抗网络的产品图像快速生成方法
CN109919829A (zh) * 2019-01-17 2019-06-21 北京达佳互联信息技术有限公司 图像风格迁移方法、装置和计算机可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018132855A (ja) * 2017-02-14 2018-08-23 国立大学法人電気通信大学 画像スタイル変換装置、画像スタイル変換方法および画像スタイル変換プログラム
GB201800811D0 (en) * 2018-01-18 2018-03-07 Univ Oxford Innovation Ltd Localising a vehicle
CN109919828B (zh) * 2019-01-16 2023-01-06 中德(珠海)人工智能研究院有限公司 一种判断3d模型之间差异的方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180068463A1 (en) * 2016-09-02 2018-03-08 Artomatix Ltd. Systems and Methods for Providing Convolutional Neural Network Based Image Synthesis Using Stable and Controllable Parametric Models, a Multiscale Synthesis Framework and Novel Network Architectures
CN108205813A (zh) * 2016-12-16 2018-06-26 微软技术许可有限责任公司 基于学习网络的图像风格化
CN108205803A (zh) * 2017-07-19 2018-06-26 北京市商汤科技开发有限公司 图像处理方法、神经网络模型的训练方法及装置
CN109840924A (zh) * 2018-12-28 2019-06-04 浙江工业大学 一种基于串联对抗网络的产品图像快速生成方法
CN109766895A (zh) * 2019-01-03 2019-05-17 京东方科技集团股份有限公司 用于图像风格迁移的卷积神经网络的训练方法和图像风格迁移方法
CN109919829A (zh) * 2019-01-17 2019-06-21 北京达佳互联信息技术有限公司 图像风格迁移方法、装置和计算机可读存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733946A (zh) * 2021-01-14 2021-04-30 北京市商汤科技开发有限公司 一种训练样本的生成方法、装置、电子设备及存储介质
CN112733946B (zh) * 2021-01-14 2023-09-19 北京市商汤科技开发有限公司 一种训练样本的生成方法、装置、电子设备及存储介质
CN113255813A (zh) * 2021-06-02 2021-08-13 北京理工大学 一种基于特征融合的多风格图像生成方法
CN113255813B (zh) * 2021-06-02 2022-12-02 北京理工大学 一种基于特征融合的多风格图像生成方法

Also Published As

Publication number Publication date
CN112132167A (zh) 2020-12-25
CN112132167B (zh) 2024-04-16
JP2022512340A (ja) 2022-02-03
KR20210088656A (ko) 2021-07-14

Similar Documents

Publication Publication Date Title
WO2020258902A1 (fr) Procédé de génération d'images et d'apprentissage de réseau neuronal, appareil, dispositif et support
CN109241880B (zh) 图像处理方法、图像处理装置、计算机可读存储介质
WO2019100723A1 (fr) Procédé et dispositif destinés à l'apprentissage d'un modèle de classification à étiquettes multiples
CN106415594B (zh) 用于面部验证的方法和系统
CN112446476A (zh) 神经网络模型压缩的方法、装置、存储介质和芯片
CN110929622A (zh) 视频分类方法、模型训练方法、装置、设备及存储介质
CN110543841A (zh) 行人重识别方法、系统、电子设备及介质
US9710697B2 (en) Method and system for exacting face features from data of face images
CN109508717A (zh) 一种车牌识别方法、识别装置、识别设备及可读存储介质
WO2015180101A1 (fr) Représentation compacte de visage
CN109377532B (zh) 基于神经网络的图像处理方法及装置
CN111340077B (zh) 基于注意力机制的视差图获取方法和装置
CN114549913B (zh) 一种语义分割方法、装置、计算机设备和存储介质
CN108021908B (zh) 人脸年龄段识别方法及装置、计算机装置及可读存储介质
Chiaroni et al. Learning with a generative adversarial network from a positive unlabeled dataset for image classification
JP2019508803A (ja) ニューラルネットワークモデルの訓練方法、装置及び電子機器
CN114418030A (zh) 图像分类方法、图像分类模型的训练方法及装置
CN111898703A (zh) 多标签视频分类方法、模型训练方法、装置及介质
An et al. Weather classification using convolutional neural networks
CN112446888A (zh) 图像分割模型的处理方法和处理装置
CN114064627A (zh) 一种针对多重关系的知识图谱链接补全方法及系统
CN109492610A (zh) 一种行人重识别方法、装置及可读存储介质
CN112949706B (zh) Ocr训练数据生成方法、装置、计算机设备及存储介质
JP6935868B2 (ja) 画像認識装置、画像認識方法、およびプログラム
CN114492634A (zh) 一种细粒度装备图片分类识别方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20832168

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20217017354

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021532473

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20832168

Country of ref document: EP

Kind code of ref document: A1