WO2023143118A1 - 图像处理方法、装置、设备及介质 - Google Patents

图像处理方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023143118A1
WO2023143118A1 PCT/CN2023/072054 CN2023072054W WO2023143118A1 WO 2023143118 A1 WO2023143118 A1 WO 2023143118A1 CN 2023072054 W CN2023072054 W CN 2023072054W WO 2023143118 A1 WO2023143118 A1 WO 2023143118A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
style
sample
generate
image generator
Prior art date
Application number
PCT/CN2023/072054
Other languages
English (en)
French (fr)
Inventor
陈朗
周思宇
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023143118A1 publication Critical patent/WO2023143118A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to an image processing method, device, equipment, and medium.
  • image processing applications provide editing controls for various image parameters, such as editing controls for brightness processing, sticker addition processing, and makeup transformation processing. Users can adjust image parameters through editing operations on editing controls, thereby realizing Image style transfer.
  • An embodiment of the present disclosure provides an image processing method, the method comprising: training and generating a first image generator, wherein the first image generator is used to process an input random feature vector to generate a target object of a first style image, and training to generate a second image generator, the second image generator is used to process the input random feature vector to generate a target object image of a second style; based on the first image generator and the second image The generator processes the input sample feature vectors respectively, and generates the sample image of the first style and the sample image of the second style as paired sample data; trains a preset model based on the paired sample data to generate a target image generator , wherein the target image generator is configured to process the input image of the first style to generate an output image of the second style.
  • An embodiment of the present disclosure also provides an image processing device, the device includes: a training module, used to train and generate a first image generator, wherein the first image generator is used to process and generate an input random feature vector The target object image of the first style, and the training generates the second image generator, and the second image generator is used to process the input random feature vector to generate the target object image of the second style; the sample generation module is used to generate the target object image based on the set The first image generator and the second image generator respectively process the input sample feature vector, and generate the sample image of the first style and the sample image of the second style as paired sample data; An image generator generating module, configured to train a preset model based on the paired sample data to generate a target image generator, wherein the target image generator is used to process the input image of the first style to generate the second style output image.
  • a training module used to train and generate a first image generator, wherein the first image generator is used to process and generate an input random feature vector The target object image of the first style, and the training generate
  • An embodiment of the present disclosure also provides an electronic device, which includes: a processor; a memory for storing instructions executable by the processor; and the processor, for reading the instruction from the memory.
  • the instructions can be executed, and the instructions are executed to implement the image processing method provided by the embodiment of the present disclosure.
  • the embodiment of the present disclosure also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute the image processing method provided by the embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a computer program, including: an instruction, which when executed by a processor causes the processor to execute the image processing method provided by the embodiment of the present disclosure.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of another image processing method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an image processing scene provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of another image processing scenario provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of another image processing scenario provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of another image processing scenario provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic flowchart of another image processing method provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic flowchart of another image processing method provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic flowchart of another image processing method provided by an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of an image processing device provided by an embodiment of the present disclosure.
  • Fig. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • the inventors of the present disclosure have found that, in the related art, the method of implementing image style conversion by adjusting the editing controls by the user depends on the manual operation of the user, the conversion effect is difficult to guarantee, and the conversion efficiency is low.
  • the present disclosure provides an image processing method to try to solve the problems in the related art that the conversion effect is difficult to guarantee and the conversion efficiency is low when performing style conversion on an image.
  • An embodiment of the present disclosure provides an image processing method, in which an image generator for generating an image of a first style and an image generator for generating an image of a second style are trained, based on the two image generators Generate paired sample data, thereby obtain training sample data with better quality, and then train and obtain a target image generator for style conversion based on the training sample data.
  • the pair of sample images with the second style can be obtained according to the above image generator, which overcomes the problem as much as possible.
  • the problem that samples are difficult to obtain ensures the effect and efficiency of the style transfer.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure.
  • the method can be executed by an image processing device, wherein the device can be implemented by using software and/or hardware, and generally can be integrated in electronic equipment. As shown in FIG. 1 , the method includes steps 101 to 103 .
  • Step 101 train and generate a first image generator, wherein the first image generator is used to process the input random feature vector to generate a target object image of the first style, and train to generate a second image generator, the second image generator uses It is used to process the input random feature vector to generate the target object image of the second style.
  • the target object may be a human, an animal, etc., which is not limited here.
  • the input random feature vector includes but not limited to contour features or pixel color features, etc.
  • the input random feature vector includes: at least one of contour features and pixel color features.
  • the first style and the second style can be any different style information, for example, the first style can be "human face in plain style", and the second style can be "human face in Hong Kong style”.
  • the first object image data of the first style is randomly collected according to a plurality of first preset indicators.
  • the multiple first preset indicators correspond to multiple feature dimensions of the target object, and are used to ensure the robustness of training the first image generator.
  • the corresponding first preset index includes: face angle index type, face age index type, face temperament index type, face contour index type, face brightness index type, etc.
  • each index type can include multiple index values to ensure that the face sample data obtained based on multiple first preset indexes not only covers multiple index types, but also has different values under the same index type.
  • the index value of for example, with various face angles, etc.).
  • the first image generator is obtained by training the parameters of the generation confrontation network according to the first object image data.
  • the first image generator can obtain a corresponding first-style target object image based on the input random feature vector.
  • a second image generator for generating a target object image of a second style may also be trained on the input random feature vector.
  • the random feature vector includes but not limited to contour feature, pixel color feature and so on.
  • training and generating a second image generator includes steps 201 to 205 .
  • Step 201 collect second object image data of a second style according to a plurality of second preset indicators.
  • the second object image data with a second style is collected according to a plurality of second preset indicators.
  • the multiple second preset indicators may correspond to multiple different dimensions of the second object image data.
  • the corresponding second preset index may include face angle index type, face age index type, face temperament index type, face contour index type, face Brightness index types, etc., where each index type can include multiple index values to ensure that the second object image data acquired based on multiple second preset indexes not only covers multiple index types, but also in the same There are also different index values under the index type (for example, there are various face angles, etc.).
  • the random feature vector input above can be randomly generated, for example, it can be based on random Any dimension generated by the algorithm has the characteristics of the first style or the second style, and the input random feature vector can also be extracted based on the target object image with the first style or the second style, and the random feature vector extracted based on the real target object image
  • the feature vector can ensure that the images output by the trained first image generator and the second image generator are more realistic.
  • the above-mentioned second object image data may be randomly generated, for example, it may be a face feature of any dimension generated according to a random algorithm, and the second object image data may also be extracted based on a real target object image.
  • the obtained second object image data extracted based on the real target object image can ensure that the image output by the relevant image generator obtained through training is more realistic.
  • Step 202 training the network parameters of the first image generator by using the second object image data to obtain a third image generator.
  • the parameters of the generated confrontation network are trained according to the second object image data, and the first image generator is obtained.
  • the first image generator can obtain the corresponding target object image of the first style based on the input random feature vector,
  • a third image generator is obtained through training, and the third image generator can be used to output an image of the target object in the first style according to the input feature vector.
  • Step 203 during the process of upsampling the input feature information in the first image generator, determine the first network in the first image generator that is smaller than or equal to the target image resolution, and the first network that is larger than the target image resolution second network.
  • the first image generator obtains the object image of the first style by layer-by-layer sampling based on the input feature information. For example, as shown in FIG. 3 , if the input is a 1*1 resolution rate feature information, then the first layer network parameters of the first image generator upsample the feature information to obtain 2*2 resolution feature information, and then further upsample the feature information through the second layer network parameters to obtain 4
  • the characteristic information of *4 resolution is up-sampled step by step to obtain the corresponding up-sampled target object image (512*512 resolution in the figure).
  • the third image generator obtains the target object image of the second style layer by layer based on the input feature information, for example, as shown in Figure 4, if the input is a 1*1 resolution feature information, then The first layer of network parameters of the third image generator upsamples the feature information to obtain 2*2 resolution feature information, and then further upsamples the feature information through the second layer of network parameters to obtain 4*4 resolution features Information, upsampled step by step to obtain the corresponding upsampled target object image of the second style (512*512 resolution in the figure).
  • the converted target object image in the second style is more similar to the input target object image in the first style, for example, the input If the input is the face image of user A, the output is also a second-style face image similar to A's facial features, and the first image generator and the third image generator are fused to obtain the second image generator .
  • the target image corresponding to the fusion boundary layer is determined according to the experimental data resolution, the fusion of image generators based on the target image resolution will obtain a second image generator with better image conversion effect.
  • the first network in the first image generator determines the first network in the first image generator that is smaller than or equal to the target image resolution, and the first network that is larger than the target image resolution resolution of the second network. For example, if the target image resolution is 16*16, then determine the first network corresponding to the resolution less than or equal to 16*16, and the second network corresponding to the resolution greater than 16*16.
  • Step 204 during the process of upsampling the feature information in the third image generator, determine the third network in the third image generator that is smaller than or equal to the resolution of the target image, and the fourth network that is larger than the resolution of the target image network.
  • the third network in the third image generator determines the third network in the third image generator that is smaller than or equal to the target image resolution, and the third network that is larger than the target image resolution resolution of the fourth network. For example, if the target image resolution is 16*16, then determine the third network corresponding to the resolution less than or equal to 16*16, and the fourth network corresponding to the resolution greater than 16*16.
  • Step 205 merging the first network with the third network according to the first preset merging parameter, and merging the second network with the fourth network according to the second preset merging parameter, and then obtaining the second image generator.
  • the first network is fused with the third network according to the preset first merging parameter
  • the second network is fused with the fourth network according to the preset second merging parameter
  • the first fusion parameter and the second fusion parameter may be any fusion parameters that take into account both the first style and the second style according to the input feature vector. Different fusion parameters correspond to different fusion degrees.
  • the first fusion parameter and the second fusion parameter may have different fusion parameter types and weight values.
  • the network output results at corresponding resolutions may be fused, and the merging is performed according to corresponding merging parameters.
  • the weight of the network parameters under the corresponding network can be modified first according to the fusion parameters, and the output results at the corresponding resolution can be obtained according to the modified network parameters, and the output results at the same resolution can be obtained from The outputs of different image generators are fused.
  • each network parameter of the first image generator and the third image generator less than or equal to 16*16 is fused, and the Each of the network parameters of the first image generator and the third image generator greater than 16*16 are fused, and the fused network layer can not only obtain the conversion of the real face according to the input feature information, but also can be based on the input
  • the feature information of the second style image is obtained, and the fusion of the two ensures that the corresponding network layer can output the feature information of both the second style and the second style.
  • the first image generator and the third image generator are fused to obtain A second image generator that takes into account both the first style and the second style
  • the second image generator can output the corresponding target object image that takes into account both the first style and the second style only according to the relevant feature vectors, without the need to To actually construct the corresponding second-style object real image and the like.
  • the relevant feature vectors can be obtained by feature extraction of the real image of the object by the image encoder. Therefore, since the relevant feature vectors are derived from the real image of the object, it is ensured that The output of the target image generator at training is more natural.
  • Step 102 based on the first image generator and the second image generator respectively processing the input sample feature vectors to generate a sample image of the first style and a sample image of the second style as paired sample data.
  • a first-style sample image and a second-style sample image are generated as paired sample data.
  • Step 103 training a preset model based on the paired sample data to generate a target image generator, wherein the target image generator is used to process the input image of the first style to generate the output image of the second style.
  • the target image generator is generated by training the preset model with paired sample data, wherein the target image generator is used to process the input image to generate an output image matching the second style, which ensures the accuracy of image conversion of the target style Naturalness and efficiency.
  • the parameters of the Generative Adversarial Network can be supervised by pairing sample data to generate the target image generator , and, in order to further ensure the similarity between the output second-style image and the input first-style image during training (for example, when the target object is a face, the second-style output image and the first-style input image look like more like the same person, etc.), then during the training process, the image texture of the paired sample data is weighted and fused according to the preset weight to adjust the texture of the output image.
  • the preset weights can be calibrated according to experimental data to ensure that the texture of the output image is closer to the texture of the input image in the first style, while taking into account the texture of the texture in the second style.
  • the similarity of the image can also obtain the background image of the input first-style image and the texture information of other entities such as hair, glasses, clothes and hats, and map the texture information to the corresponding second-style image.
  • the paired sample data generated at this time is relatively real. On the basis of improving the efficiency of paired sample data acquisition, the quality of the paired sample data is guaranteed. It is assumed that the model generation target image generator can be used in smart terminals, etc., which overcomes the problem that the training sample data is difficult to construct, which leads to poor style conversion effects, and realizes lightweight image style conversion.
  • the image processing method of the embodiment of the present disclosure can train the first image generator that can obtain the target object image of the first style based on the input random feature vector, and can generate the target object image of the second style based on the input random feature vector.
  • the second image generator of the object image processes the input sample feature vectors based on the first image generator and the second image generator respectively, and generates a sample image of the first style and a sample image of the second style as paired sample data,
  • the target image generator is generated by training the preset model through the paired sample data, wherein the target image generator is used to process the input image to generate the second Style-matched output image.
  • the production of paired sample data is of great significance to the effect of the final style conversion, therefore, in the embodiments of the present disclosure, the quality of the constructed paired sample data is also guaranteed.
  • generating a sample image of a first style and a sample image of a second style as paired sample data includes steps 701 to 704 .
  • Step 701 Input the real image of the object in the first style to a pre-trained image encoder for processing, and extract the first sample feature vector.
  • Step 702 Input the first sample feature vector into the first image generator to generate the object reference image of the first style.
  • the real image of the object in the first style is input to the pre-trained image encoder for processing, and the first sample feature vector is extracted, and then the first sample feature vector is input to the first image generator, A first-style object reference image is generated. Since the object reference image is generated according to the feature vector of the real image of the object in the first style, it has a close relationship with the real object and can be subsequently used as sample data.
  • Step 703 Input the first sample feature vector to the second image generator to generate the first target image of the second style.
  • the first sample feature vector is input to the second image generator to generate the first target image of the second style. Since the first target image is also obtained according to the first sample feature vector, therefore, It has a close relationship with real objects and can be used as sample data later.
  • Step 704 using the object reference image of the first style as the sample image of the first style, and using the first target image of the second style as the sample image of the second style, generating paired sample data of the first type with a preset first ratio .
  • the object reference image of the first style is used as the sample image of the first style
  • the first target image of the second style is used as the sample image of the second style to generate a first class with a preset first ratio.
  • Paired sample data, wherein the first ratio can be calibrated according to experimental data.
  • the first sample feature vector A is input to the second image generator to generate the first target image S2 of the second style corresponding to the real image S1 of the object, and the first sample feature vector A is input Go to the first image generator to generate a reference face image S3 corresponding to the real image of the subject, so that the generated face pairing data set includes: the real image of the subject S1 and the corresponding first target image S2 of the second style, and the reference face The face image S3 corresponds to the first target image S2 of the second style.
  • the method further includes: using the real object image of the first style as the sample image of the first style, and using the first target image of the second style as the sample image of the second style, generating The second type of paired sample data of the second proportion is preset. That is, the sample image of the first style and the sample image of the second style are used as paired samples, that is, the first target image S2 of the second style corresponding to the real image S1 of the object is used as paired sample data.
  • the method further includes: inputting the randomly generated second sample feature vector into the first image generator, generating a random image of the object in the first style, and inputting the second sample feature vector into the first image generator.
  • the second image generator generates a second target image of a second style.
  • the random image of the object and the corresponding second target image of the second style can also be used as paired sample data.
  • the method further includes: using a random object image of the first style as a sample image of the first style, and using a second target image of the second style as a sample image of the second style, generating a third class with a preset third ratio paired sample data.
  • the sum of the first ratio, the second ratio and the third ratio is 1, and the ratio value can be calibrated according to the needs of the scene.
  • the first proportion, the second proportion and the third proportion may be 30%, 50% and 20% respectively.
  • the first sample feature vector A is input to the second image generator to generate the first target image S2 of the second style corresponding to the real image S1 of the object , input the first sample feature vector A to the first image generator to generate the object reference image S3 of the first style, and also input the randomly generated second sample feature vector B to the first image generator to generate the object random Image S4, input the second sample feature vector B to the second image generator, generate the second target image S5 of the second style corresponding to the second sample feature vector, generate
  • the paired sample data includes: the random image S4 of the object collected according to the preset first ratio and the corresponding second target image S5S5, the real image S1 of the object collected according to the second preset ratio and the corresponding first image of the second style.
  • the target image S2, and the object reference image S3 collected according to the preset third ratio and the corresponding first target image S2 of the second style.
  • the difference parts of the facial key points of the paired image in the paired sample data are also deformed Compensation processing. For example, identify the key points of the human face; based on the angle and The distance generates a scaling value and a rotation angle; and adjusting the sample image of the second style based on the scaling value and the rotation angle, so that the sample image of the second style in the paired sample data is relatively similar to the corresponding sample image of the first style.
  • Non-facial key points include but are not limited to facial decorations, such as glasses, beards, and image backgrounds on a human face. Therefore, the image regions of the different parts can be matted and then mapped in the corresponding sample images of the second style based on contour recognition and other methods.
  • the image processing method of the embodiment of the present disclosure can generate the corresponding paired sample data according to the relevant image generator and the second image generator, without thinking of shooting and creating the corresponding paired image, and realizes the automatic acquisition of the paired sample data. It provides technical support for improving the image style conversion effect and efficiency.
  • the present disclosure also proposes an image device.
  • FIG. 10 is a schematic structural diagram of an image processing device provided by an embodiment of the present disclosure.
  • the device can be realized by software and/or hardware, and generally can be integrated into electronic equipment to realize face image processing. As shown in FIG. 10 , the device includes: a training module 1010 , a sample generation module 1020 and an image generator generation module 1030 .
  • the training module 1010 is used to train and generate a first image generator, wherein the first image generator is used to process the input random feature vector to generate a first-style target object image, and train to generate a second image generator, the second The image generator is used to process the input random feature vector to generate the target object image of the second style.
  • the sample generation module 1020 is configured to process the input sample feature vectors based on the first image generator and the second image generator respectively, and generate a sample image of the first style and a sample image of the second style as paired sample data.
  • the image generator generating module 1030 is configured to train a preset model based on paired sample data to generate a target image generator, wherein the target image generator is used to process an input image of a first style to generate an output image of a second style.
  • the image device provided by the embodiment of the present disclosure can execute the image processing method provided by any embodiment of the present disclosure, with The corresponding functional modules and beneficial effects of the device execution method.
  • the present disclosure also proposes a computer program product, including computer programs/instructions, which implement the image processing methods in the above embodiments when the computer programs/instructions are executed by a processor.
  • Fig. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 11 it shows a schematic structural diagram of an electronic device suitable for implementing an embodiment of the present disclosure.
  • the electronic equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 11 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device may include a processor (such as a central processing unit, a graphics processing unit, etc.) 1101, which may be stored in a read-only memory (ROM) 1102 or loaded from a memory 1108 to a random access memory (RAM) 1103 to execute various appropriate actions and processes.
  • ROM read-only memory
  • RAM random access memory
  • various programs and data necessary for the operation of the electronic device are also stored.
  • the processor 1101, ROM 1102, and RAM 1103 are connected to each other through a bus 1104.
  • An input/output (I/O) interface 1105 is also connected to the bus 1104 .
  • the following devices can be connected to the I/O interface 1105: input devices 1106 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 1107 such as a computer; a memory 1108 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1109 .
  • the communication means 1109 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While FIG. 11 shows an electronic device having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 1109, or from memory 1108, or from ROM 1102.
  • the processor 1101 When the computer program is executed by the processor 1101, the above-mentioned functions defined in the image processing method of the embodiment of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: training can obtain the first-style target object image based on the input random feature vector An image generator, and a second image generator that can be trained to generate a target object image of a second style for the input random feature vector, and process the input sample feature vector based on the first image generator and the second image generator , generate the sample image of the first style and the sample image of the second style as the paired sample data, on the basis of improving the efficiency of paired sample data set acquisition, ensure the quality of the paired sample data, and train the preset model to generate the target through the paired sample data An image generator, wherein the target image generator is used to process the input image to generate an output image matching the second style.
  • Computer programs for performing the operations of the present disclosure can be written in one or more programming languages, or combinations thereof Programming languages include, but are not limited to, object-oriented programming languages—such as Java, Smalltalk, C++, and conventional procedural programming languages—such as “C” or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the present disclosure provides an image processing method, including: training and generating a first image generator, wherein the first image generator is used to process and generate an input random feature vector The target object image of the first style, and training to generate a second image generator, the second image generator is used to process the input random feature vector to generate the target object image of the second style; based on the first image generator Processing the input sample feature vector with the second image generator respectively, generating the sample image of the first style and the sample image of the second style as paired sample data; and training a pre-set based on the paired sample data Assume that the model generates a target image generator, wherein the target image generator is used to process the input image of the first style to generate the output image of the second style.
  • the training and generating the first image generator includes: randomly collecting the first images of the first style according to a plurality of first preset indicators Object image data; and training parameters of the generation confrontation network according to the first object image data to obtain a first image generator.
  • the multiple first preset indicators correspond to multiple feature dimensions of the target object.
  • the training and generating the second image generator includes: collecting a second object of the second style according to a plurality of second preset indicators Image data; training the network parameters of the first image generator through the second object image data to obtain a third image generator; performing upsampling processing on the input feature information in the first image generator In the process, determine the first network in the first image generator that is smaller than or equal to the target image resolution, and the second network that is larger than the target image resolution; During the process of upsampling the feature information, determining a third network in the third image generator that is smaller than or equal to the resolution of the target image, and a fourth network that is larger than the resolution of the target image; and according to preset merging the first network with the third network according to the first merging parameter, and merging the second network with the fourth network according to the preset second merging parameter, and then obtaining the second image generated device.
  • the multiple second preset indicators correspond to multiple different dimensions of the second object image data.
  • the first image generator and the second image generator respectively process the input sample feature vectors to generate the The sample image of the first style and the sample image of the second style are used as paired sample data, including: inputting the real image of the object of the first style to a pre-trained image encoder for processing, and extracting the features of the first sample vector; the first sample feature vector is input to the first image generator to generate an object reference image of the first style; the second A sample feature vector is input to the second image generator to generate a first target image of the second style; and the object reference image of the first style is used as a sample image of the first style, and the The first target image of the second style is used as the sample image of the second style, and the paired sample data of the first type with a preset first ratio is generated.
  • the image processing method provided in the present disclosure further includes: using the real object image of the first style as the sample image of the first style, and using the second style
  • the first target image is used as the sample image of the second style, and the paired sample data of the second type with a preset second ratio is generated.
  • the image processing method provided by the present disclosure further includes: inputting the randomly generated second sample feature vector into the first image generator to generate the object of the first style A random image; the second sample feature vector is input to the second image generator to generate a second target image with the second style; and the first style object random image is used as the first The sample image of the style, and the second target image of the second style is used as the sample image of the second style to generate a third type of paired sample data with a preset third ratio.
  • the sum of the first ratio, the second ratio and the third ratio is 1.
  • the image processing method provided in the present disclosure further includes: according to the object image data of the first style and the first image generator, training parameters of an image encoder to According to the trained image encoder, a corresponding feature vector is extracted from the input real image.
  • the image processing method provided in the present disclosure further comprising: analyzing the difference between the sample image of the first style and the sample image of the second style in the paired sample data Perform deformation compensation processing on the difference parts of facial key points; and/or, perform deformation compensation processing on the difference parts of non-face key points between the sample image of the first style and the sample image of the second style in the paired sample data Texture compensation processing.
  • the training of the preset model based on the paired sample data to generate the target image generator includes: supervised training based on the paired sample data
  • the parameters of the generative confrontation network are used to generate the target image generator, wherein, during the training process, the image textures of the sample images of the first style and the sample images of the second style are weighted and fused according to preset weights to adjust the output image image texture.
  • the input random feature vector includes: at least one of contour features and pixel color features.
  • the present disclosure provides an image processing device, including: a training module, used for training and generating a first image generator, wherein the first image generator is used for random input Eigenvector into Row processing generates the target object image of the first style, and training generates the second image generator, and the second image generator is used to process the input random feature vector to generate the target object image of the second style;
  • the sample generation module uses Processing the input sample feature vectors based on the first image generator and the second image generator respectively, and generating the sample image of the first style and the sample image of the second style as paired sample data;
  • an image generator generating module configured to train a preset model based on the paired sample data to generate a target image generator, wherein the target image generator is used to process the input image of the first style to generate the second Style the output image.
  • the training module is specifically configured to: randomly collect the first object image data of the first style according to a plurality of first preset indicators ; and according to the first object image data, train parameters of the generation confrontation network to obtain a first image generator.
  • the training module is specifically configured to: collect the second object image data of the second style according to a plurality of second preset indicators; Using the second object image data to train the network parameters of the first image generator to obtain a third image generator; during the process of upsampling the input feature information in the first image generator, Determining the first network in the first image generator that is smaller than or equal to the resolution of the target image, and the second network that is larger than the resolution of the target image; performing the feature information in the third image generator During the up-sampling process, determine a third network in the third image generator that is smaller than or equal to the target image resolution, and a fourth network that is larger than the target image resolution; and according to the preset first The fusion parameter fuses the first network with the third network, and fuses the second network with the fourth network according to a preset second fusion parameter, so as to obtain the second image generator.
  • the sample generation module is specifically configured to: input the real image of the object in the first style to a pre-trained image encoder for processing , extracting the first sample feature vector; the first sample feature vector is input to the first image generator to generate the object reference image of the first style; the first sample feature vector is input to The second image generator generates a first target image of the second style; and uses the object reference image of the first style as a sample image of the first style, and uses the first target image of the second style A target image is used as the sample image of the second style, and the paired sample data of the first type with a preset first ratio is generated.
  • the sample generation module is specifically used for:
  • the sample generation module Specifically used for: inputting the randomly generated second sample feature vector into the first image generator to generate a random image of the object in the first style; inputting the second sample feature vector into the second image generator A device that generates a second target image of the second style; and uses the random object image of the first style as a sample image of the first style, and uses the second target image of the second style as the
  • the sample images of the second style are used to generate paired sample data of the third type with a preset third ratio.
  • the image generator generation module is specifically configured to: according to the object image data of the first style and the first image generator , train parameters of the image encoder, so as to extract corresponding feature vectors from the input real image according to the trained image encoder.
  • the image processing device provided in the present disclosure further includes: a compensation processing module, configured to compare the sample images of the first style and the second style in the paired sample data Perform deformation compensation processing on the difference parts of the facial key points between the sample images; and/or, perform the deformation compensation processing on the non-face between the sample image of the first style and the sample image of the second style in the paired sample data
  • a compensation processing module configured to compare the sample images of the first style and the second style in the paired sample data Perform deformation compensation processing on the difference parts of the facial key points between the sample images; and/or, perform the deformation compensation processing on the non-face between the sample image of the first style and the sample image of the second style in the paired sample data
  • the difference parts of the key points are subjected to texture compensation processing.
  • the image generator generating module is specifically configured to: generate parameters of a supervised training generative adversarial network based on the paired sample data, generate The target image generator, wherein, during the training process, the image textures of the sample images of the first style and the sample images of the second style are weighted and fused according to preset weights to adjust the image texture of the output image.
  • the present disclosure also provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; The executable instruction is read from the computer, and the instruction is executed to implement the image processing method described in any one of the above embodiments.
  • the present disclosure also provides a computer-readable storage medium (for example, a non-transitory computer-readable storage medium), the storage medium stores a computer program, and the computer program uses To execute the image processing method described in any one of the above-mentioned embodiments.
  • a computer-readable storage medium for example, a non-transitory computer-readable storage medium
  • the present disclosure also provides a computer program, including: instructions, which when executed by a processor cause the processor to execute the image described in any one of the above-mentioned embodiments Approach.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例涉及一种图像处理方法、装置、设备及介质,其中该方法包括:训练生成第一图像生成器,以及训练生成第二图像生成器,第二图像生成器用于对输入的随机特征向量进行处理生成第二风格的目标对象图像;基于第一图像生成器和第二图像生成器分别对输入的样本特征向量进行处理,生成第一风格的样本图像和第二风格的样本图像作为配对样本数据;基于配对样本数据训练预设模型生成目标图像生成器,其中,目标图像生成器用于对第一风格的输入图像进行处理生成第二风格的输出图像。

Description

图像处理方法、装置、设备及介质
相关申请的交叉引用
本申请是以申请号为202210089956.8,申请日为2022年1月25日的中国申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及图像处理技术领域,尤其涉及一种图像处理方法、装置、设备及介质。
背景技术
对目标对象拍摄的图像进行风格转换,成为图像处理用户的一种热门需求。相关技术中,图像处理应用提供了多种图像参数的编辑控件,比如,包括亮度处理、贴纸添加处理、妆容转换处理等编辑控件,用户通过对编辑控件的编辑操作对图像参数进行调整,从而实现图像的风格转换。
发明内容
本公开实施例提供了一种图像处理方法,所述方法包括:训练生成第一图像生成器,其中,所述第一图像生成器用于对输入的随机特征向量进行处理生成第一风格的目标对象图像,以及训练生成第二图像生成器,所述第二图像生成器用于对输入的随机特征向量进行处理生成第二风格的目标对象图像;基于所述第一图像生成器和所述第二图像生成器分别对输入的样本特征向量进行处理,生成所述第一风格的样本图像和所述第二风格的样本图像作为配对样本数据;基于所述配对样本数据训练预设模型生成目标图像生成器,其中,所述目标图像生成器用于对所述第一风格的输入图像进行处理生成所述第二风格的输出图像。
本公开实施例还提供了一种图像处理装置,所述装置包括:训练模块,用于训练生成第一图像生成器,其中,所述第一图像生成器用于对输入的随机特征向量进行处理生成第一风格的目标对象图像,以及训练生成第二图像生成器,所述第二图像生成器用于对输入的随机特征向量进行处理生成第二风格的目标对象图像;样本生成模块,用于基于所述第一图像生成器和所述第二图像生成器分别对输入的样本特征向量进行处理,生成所述第一风格的样本图像和所述第二风格的样本图像作为配对样本数据; 图像生成器生成模块,用于基于所述配对样本数据训练预设模型生成目标图像生成器,其中,所述目标图像生成器用于对所述第一风格的输入图像进行处理生成所述第二风格的输出图像。
本公开实施例还提供了一种电子设备,所述电子设备包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现如本公开实施例提供的图像处理方法。
本公开实施例还提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行如本公开实施例提供的图像处理方法。
本公开实施例还提供了一种计算机程序,包括:指令,所述指令当由处理器执行时使所述处理器执行如本公开实施例提供的图像处理方法。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为本公开实施例提供的一种图像处理方法的流程示意图;
图2为本公开实施例提供的另一种图像处理方法的流程示意图;
图3为本公开实施例提供的一种图像处理场景示意图;
图4为本公开实施例提供的另一种图像处理场景示意图;
图5为本公开实施例提供的另一种图像处理场景示意图;
图6为本公开实施例提供的另一种图像处理场景示意图;
图7为本公开实施例提供的另一种图像处理方法的流程示意图;
图8为本公开实施例提供的另一种图像处理方法的流程示意图;
图9为本公开实施例提供的另一种图像处理方法的流程示意图;
图10为本公开实施例提供的一种图像处理装置的结构示意图;
图11为本公开实施例提供的一种电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里 阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
本公开的发明人发现,在相关技术中,通过用户调整编辑控件来实现图像风格转换的方式依赖于用户的人工操作,转换效果难以保证,并且转换效率较低。
鉴于此,本公开提供了一种图像处理方法,以尽可能地解决相关技术中对图像进行风格转换时,转换效果难以保证以及转换效率较低的问题。
本公开实施例提供了一种图像处理方法,在该方法中训练用于生成第一风格的图像的图像生成器,以及用于生成第二风格图像的图像生成器,基于两个图像生成器来生成配对样本数据,由此,得到质量较好的训练样本数据,进而,基于训练样本数据训练得到用于风格转换的目标图像生成器。
从而,即使在有些场景下,难以获取第一风格的对象图像,以及对应的第二风格的图像来作为训练数据,也能够根据上述图像生成器得到具有第二风格的样本图像对,尽量克服了在风格转换模型的训练过程中,样本难以获取的问题,保证了风格转换的效果和效率。
下面结合具体的实施例对该方法进行介绍。
图1为本公开实施例提供的一种图像处理方法的流程示意图。该方法可以由图像处理装置执行,其中该装置可以采用软件和/或硬件实现,一般可集成在电子设备中。如图1所示,该方法包括步骤101至103。
步骤101,训练生成第一图像生成器,其中,第一图像生成器用于对输入的随机特征向量进行处理生成第一风格的目标对象图像,以及训练生成第二图像生成器,第二图像生成器用于对输入的随机特征向量进行处理生成第二风格的目标对象图像。
例如,目标对象可以为人、可以为动物等,在此不作限制。输入的随机特征向量包括但不限于轮廓特征或像素颜色特征等。例如,所述输入的随机特征向量包括:轮廓特征和像素颜色特征中的至少一个。第一风格和第二风格可以为任意不同风格信息,比如,第一风格可以为“素风格的人脸”,第二风格可以为“港风风格的人脸”等。
在一些实施例中,为了保证第一图像生成器的鲁棒性,根据多个第一预设指标随机采集第一风格的第一对象图像数据。该多个第一预设指标对应于目标对象的多个特征维度,以用于保证训练第一图像生成器的鲁棒性。比如,当目标对象为人脸时,则对应的第一预设指标包括:人脸角度指标类型、人脸年龄指标类型、人脸气质指标类型、人脸轮廓指标类型、人脸亮度指标类型等,其中,每个指标类型下可以包括多个指标值,以用于保证基于多个第一预设指标获取的人脸样本数据,不但覆盖了多个指标类型,在同一个指标类型下也具有不同的指标值(比如,具有各个人脸角度等)。
在一些实施例中,根据第一对象图像数据训练生成对抗网络的参数,获取第一图像生成器。该第一图像生成器可以基于输入的随机特征向量得到对应的第一风格的目标对象图像。
在一些实施例中,还可以对输入的随机特征向量进行训练处理生成第二风格的目标对象图像的第二图像生成器。该随机特征向量包括但不限于轮廓特征、像素颜色特征等。
在本公开的一个实施例中,如图2所示,训练生成第二图像生成器,包括步骤201至205。
步骤201,根据多个第二预设指标采集第二风格的第二对象图像数据。
为了保证第二图像生成器的鲁棒性,根据多个第二预设指标采集具有第二风格的第二对象图像数据。该多个第二预设指标可以对应于第二对象图像数据的多个不同的维度。比如,当第二对象图像数据对应于人脸时,则对应的第二预设指标可以包括人脸角度指标类型、人脸年龄指标类型、人脸气质指标类型、人脸轮廓指标类型、人脸亮度指标类型等,其中,每个指标类型下可以包括多个指标值,以用于保证基于多个第二预设指标获取的第二对象图像数据,不但覆盖了多个指标类型,在同一个指标类型下也具有不同的指标值(比如,具有各个人脸角度等)。
需要说明的是,上述输入的随机特征向量可以是随机生成的,比如,可以是根据随机 算法生成的任意维度具有第一风格或第二风格的特征,该输入的随机特征向量也可以是基于具有第一风格或者第二风格的目标对象图像提取得到的,基于真实目标对象图像提取的随机特征向量,可以保证训练得到的第一图像生成器和第二图像生成器输出的图像更加真实。
还需要说明的是,上述第二对象图像数据可以是随机生成的,比如,可以是根据随机算法生成的任意维度的人脸特征,该第二对象图像数据也可以是基于真实的目标对象图像提取得到的,基于真实目标对象图像提取的第二对象图像数据,可以保证训练得到的有关图像生成器输出的图像更加真实。
步骤202,通过第二对象图像数据训练第一图像生成器的网络参数,获取第三图像生成器。
在一些实施例中,根据第二对象图像数据训练生成对抗网络的参数,获取第一图像生成器,该第一图像生成器可以基于输入的随机特征向量得到对应的第一风格的目标对象图像,训练得到第三图像生成器,该第三图像生成器可以用于根据输入的特征向量输出第一风格的目标对象图像。
步骤203,在通过第一图像生成器中对输入的特征信息进行上采样处理的过程中,确定第一图像生成器中小于或等于目标图像分辨率的第一网络,以及大于目标图像分辨率的第二网络。
在本实施例中,可以理解的是,第一图像生成器基于输入的特征信息逐层采样得到的第一风格的对象图像,比如,如图3所示,若是输入的是一个1*1分辨率的特征信息,则第一图像生成器的第一层网络参数对该特征信息上采样得到2*2分辨率的特征信息,进而,经过第二层网络参数对该特征信息进一步上采样得到4*4分辨率的特征信息,逐级上采样以得到对应的上采样后的目标对象图像(图中以512*512分辨率)。
类似地,若是第三图像生成器基于输入的特征信息逐层采样得到的第二风格的目标对象图像,比如,如图4所示,若是输入的是一个1*1分辨率的特征信息,则第三图像生成器的第一层网络参数对该特征信息上采样得到2*2分辨率的特征信息,进而,经过第二层网络参数对该特征信息进一步上采样得到4*4分辨率的特征信息,逐级上采样以得到对应的上采样后的第二风格的目标对象图像(图中以512*512分辨率)。
在本实施例中,为了保证输出的第二风格的目标对象图像和输出的第一风格的目标人脸图像的相似度,保证图像的风格转换的体验,也即是说,为了保证后续进行风格转换时,转换后的第二风格的目标对象图像与输入的第一风格的目标对象图像更为相似,比如,输 入的是A用户的人脸图像,则输出的也为与A的五官相似的第二风格的人脸图像,将第一图像生成器和第三图像生成器进行融合以得到第二图像生成器。
然而,在融合过程中,不同的分辨率之间融合的参数的不同,将会导致第二图像生成器的图像生成效果不同,在本实施例中,根据实验数据确定融合边界层对应的目标图像分辨率,基于该目标图像分辨率进行图像生成器的融合会得到图像转换效果较好的第二图像生成器。
在本实施例中,在通过第一图像生成器中对输入的特征信息进行上采样处理的过程中,确定第一图像生成器中小于或等于目标图像分辨率的第一网络,以及大于目标图像分辨率的第二网络。比如,若是目标图像分辨率为16*16,则确定小于等于16*16的分辨率对应的第一网络,以及大于16*16的分辨率对应的第二网络。
步骤204,在通过第三图像生成器中对特征信息进行上采样处理的过程中,确定第三图像生成器中小于或等于目标图像分辨率的第三网络,以及大于目标图像分辨率的第四网络。
在本实施例中,在通过第三图像生成器中对输入的特征信息进行上采样处理的过程中,确定第三图像生成器中小于或等于目标图像分辨率的第三网络,以及大于目标图像分辨率的第四网络。比如,若是目标图像分辨率为16*16,则确定小于等于16*16的分辨率对应的第三网络,以及大于16*16的分辨率对应的第四网络。
步骤205,根据预设的第一融合参数将第一网络与第三网络融合,以及根据预设的第二融合参数将第二网络与第四网络融合,进而获取第二图像生成器。
在本实施例中,根据预设的第一融合参数将第一网络与第三网络融合,根据预设的第二融合参数将第二网络与第四网络融合,其中,基于不同的融合参数进行不同分辨率的网络融合,使得融合后的第二图像生成器的风格转换效果较好。其中,第一融合参数和第二融合参数可以为根据输入的特征向量,兼顾第一风格以及第二风格的任意融合参数。不同的融合参数对应的融合程度不同。比如,第一融合参数和第二融合参数可能对融合的参数类型以及权重值不同等。
在不同的应用场景中,根据融合参数对网络进行融合的方式不同。在一些可能的实施例中,可以对对应分辨率下的网络输出结果进行融合,融合时根据对应的融合参数进行融合。在另一些可能的实施例中,可以首先根据融合参数对对应网络下的网络参数权重等进行修改,根据修改后的网络参数得到对应的分辨率下的输出结果,将相同分辨率下的来源于不同的图像生成器的输出结果进行融合。
举例而言,如图5所示,继续以目标分辨率为16*16为例,将第一图像生成器和第三图像生成器的每个小于等于16*16的网络参数,进行融合,将第一图像生成器和第三图像生成器的每个大于16*16的网络参数,进行融合,融合后的网络层,既能根据输入的特征信息得到真实人脸的转换,也可能够根据输入的特征信息得到第二风格的图像,二者的融合,保证了对应的网络层可以输出兼顾第二风格和第二风格的特征信息。
在本公开的一个实施例中,如图6所示,在本实施例中,训练第一图像生成器和第三图像生成器后,将第一图像生成器和第三图像生成器融合,得到兼顾第一风格和第二风格的第二图像生成器,该第二图像生成器可以仅仅根据有关特征向量即可输出对应的兼顾第一风格和第二风格的目标对象图像,无需在实际场景中去真实构建对应的第二风格的对象真实图像等。在本实施例中,为了进一步提升配对样本数据的真实感,有关特征向量可以是图像编码器对对象真实图像进行特征提取得到的,由此,由于有关特征向量来源于对象真实图像,因此,确保训练处的目标图像生成器的输出效果更加自然。
步骤102,基于第一图像生成器和第二图像生成器分别对输入的样本特征向量进行处理,生成第一风格的样本图像和第二风格的样本图像作为配对样本数据。
在本实施例中,基于第一图像生成器和第二图像生成器分别对输入的样本特征向量进行处理,生成第一风格的样本图像和第二风格的样本图像作为配对样本数据。
步骤103,基于配对样本数据训练预设模型生成目标图像生成器,其中,目标图像生成器用于对第一风格的输入图像进行处理生成第二风格的输出图像。
在本实施例中,通过配对样本数据训练预设模型生成目标图像生成器,其中,目标图像生成器用于对输入图像进行处理生成与第二风格匹配的输出图像,保证了目标风格的图像转换的自然度以及效率。
在本公开的一个实施例中,若是目标图像生成器为GAN(Generative Adversarial Networks,生成式对抗网络)网络,则可以通过配对样本数据有监督的训练生成式对抗网络的参数,生成目标图像生成器,并且,在训练时为了进一步保证输出的第二风格图像和输入的第一风格的图像相似度(比如,当目标对象为人脸时,第二风格的输出图像和第一风格的输入图像看起来更像是同一个人等),则在训练过程中根据预设权重对配对样本数据的图像纹理进行加权融合调整输出图像的纹理。其中,预设权重可以根据实验数据标定,以保证输出图像的纹理和第一风格的输入图像的纹理更为接近,同时兼顾了第二风格的纹理的质感。
在本公开的一个实施例中,为了进一步保证输出的第二风格的图像与输入的第一风格 的图像的相似度,还可以获取输入的第一风格的图像的背景图像以及头发、眼镜、衣帽等其他实体的纹理信息,将该纹理信息贴图到对应的第二风格的图像中。
由于训练的人脸特征向量来源于对象真实图像,因此,此时生成的配对样本数据较为真实,在提高配对样本数据获取效率的基础上,保证了配对样本数据的质量,通过配对样本数据训练预设模型生成目标图像生成器可以用于智能终端等,克服了训练样本数据难以构建从而导致风格转换效果不好的问题,实现轻量级的图像的风格转换。
综上,本公开实施例的图像处理方法,训练可以基于输入的随机特征向量得到第一风格的目标对象图像的第一图像生成器,以及训练可以对输入的随机特征向量生成第二风格的目标对象图像的第二图像生成器,基于第一图像生成器和第二图像生成器分别对输入的样本特征向量进行处理,生成第一风格的样本图像和第二风格的样本图像作为配对样本数据,在提高配对样本数据集合获取效率的基础上,保证了配对样本数据的质量,通过配对样本数据训练预设模型生成目标图像生成器,其中,目标图像生成器用于对输入图像进行处理生成与第二风格匹配的输出图像。由此,实现了风格转换场景下高质量的配对样本数据的构建,克服了样本数据难以获取的问题,尽可能地保证了对图像进行风格转换的效果,提升了对图像进行风格转换的效率。
正如以上实施例所提到的,配对样本数据的制作对最后风格转换的效果具有重要意义,因此,在本公开的实施例中,还保证了构建的配对样本数据的质量。
在本公开的一个实施例中,如图7所示,生成第一风格的样本图像和第二风格的样本图像作为配对样本数据,包括步骤701至704。
步骤701,将第一风格的对象真实图像输入到预先训练的图像编码器进行处理,提取第一样本特征向量。
步骤702,将第一样本特征向量输入到第一图像生成器,生成第一风格的对象参考图像。
在本实施例中,将第一风格的对象真实图像输入到预先训练的图像编码器进行处理,提取第一样本特征向量,进而,将第一样本特征向量输入到第一图像生成器,生成第一风格的对象参考图像。该对象参考图像由于是根据第一风格的对象真实图像的特征向量生成的,因此,与真实对象的关系较为紧密,可以后续作为样本数据来使用。
步骤703,将第一样本特征向量输入到第二图像生成器,生成第二风格的第一目标图像。
在本实施例中,将第一样本特征向量输入到第二图像生成器,生成第二风格的第一目标图像,该第一目标图像由于也是根据第一样本特征向量得到的,因此,与真实对象的关系较为紧密,可以后续作为样本数据来使用。
步骤704,将第一风格的对象参考图像作为第一风格的样本图像,以及将第二风格的第一目标图像作为第二风格的样本图像,生成预设第一比例的第一类配对样本数据。
在本实施例中,将第一风格的对象参考图像作为第一风格的样本图像,以及将第二风格的第一目标图像作为第二风格的样本图像,生成预设第一比例的第一类配对样本数据,其中,第一比例可以根据实验数据标定。
即如图8所示,将第一样本特征向量A输入到第二图像生成器,生成与对象真实图像S1对应的第二风格的第一目标图像S2,将第一样本特征向量A输入到第一图像生成器,生成与对象真实图像对应的参考人脸图像S3,从而生成的人脸配对数据集合包括:对象真实图像S1与对应的第二风格的第一目标图像S2,以及参考人脸图像S3与对应的第二风格的第一目标图像S2。
在本公开的一些实施例中,所述方法还包括:将第一风格的对象真实图像作为第一风格的样本图像,以及将第二风格的第一目标图像作为第二风格的样本图像,生成预设第二比例的第二类配对样本数据。即将第一风格的样本图像和第二风格的样本图像作为配对样本,即将与对象真实图像S1对应的第二风格的第一目标图像S2作为配对样本数据。
在本公开的一些实施例中,所述方法还包括:将随机生成的第二样本特征向量输入到第一图像生成器,生成第一风格的对象随机图像,将第二样本特征向量输入到第二图像生成器,生成第二风格的第二目标图像。在本实施例中,对象随机图像和对应的第二风格的第二目标图像也可以作为配对样本数据。所述方法还包括:将第一风格的对象随机图像作为第一风格的样本图像,以及将第二风格的第二目标图像作为第二风格的样本图像,生成预设第三比例的第三类配对样本数据。其中,第一比例、第二比例和第三比例之和为1,比例值可以根据场景需要标定。在一些可能的实施例中,第一比例、第二比例和第三比例可以分别为30%、50%、20%。
继续以图8所示的场景为例,如图9所示,将第一样本特征向量A输入到第二图像生成器,生成与对象真实图像S1对应的第二风格的第一目标图像S2,将第一样本特征向量A输入到第一图像生成器,生成第一风格的对象参考图像S3,还可以将随机生成的第二样本特征向量B输入到第一图像生成器,生成对象随机图像S4,将第二样本特征向量B输入到第二图像生成器,生成与第二样本特征向量对应的第二风格的第二目标图像S5,生成 的配对样本数据包括:根据预设的第一比例采集的对象随机图像S4与对应的第二目标图像S5S5,根据预设的第二比例采集的对象真实图像S1与对应的第二风格的第一目标图像S2,以及根据预设的第三比例采集的对象参考图像S3与对应的第二风格的第一目标图像S2。
类似地,为了保证输出的第二风格图像和第一风格的输入图像或者是对象参考图像的相似度,在本实施例中,还对配对样本数据中配对图像的面部关键点的差异部位进行形变补偿处理。比如,识别人脸的关键点;基于配对图像(即所述配对样本数据中所述第一风格的样本图像和所述第二风格的样本图像)之间的人脸关键点之间的角度和距离生成缩放比例值以及旋转角度;和基于缩放比例值和旋转角度调整第二风格的样本图像,以使得配对样本数据中的第二风格的样本图像和对应的第一风格的样本图像较为相似。
也可以对配对样本数据中配对图像的(即,对所述配对样本数据中所述第一风格的样本图像和所述第二风格的样本图像之间的)非面部关键点的差异部位进行贴图补偿处理。非面部关键点包括但不限于面部装饰物,比如,人脸上的眼镜、胡子、以及图像背景等。从而,可以基于轮廓识别等方式将差异部位的图像区域抠图后贴图在对应的第二风格的样本图像中。
综上,本公开实施例的图像处理方法,可以根据有关图像生成器和第二图像生成器生成对应的配对样本数据,无需认为拍摄创建对应的配对图像,实现了对配对样本数据的自动获取,为提升图像的风格转换效果和效率提供了技术支撑。
为了实现上述实施例,本公开还提出了一种图像装置。
图10为本公开实施例提供的一种图像处理装置的结构示意图。该装置可由软件和/或硬件实现,一般可集成在电子设备中实现人脸图像处理。如图10所示,该装置包括:训练模块1010、样本生成模块1020和图像生成器生成模块1030。
训练模块1010,用于训练生成第一图像生成器,其中,第一图像生成器用于对输入的随机特征向量进行处理生成第一风格的目标对象图像,以及训练生成第二图像生成器,第二图像生成器用于对输入的随机特征向量进行处理生成第二风格的目标对象图像。
样本生成模块1020,用于基于第一图像生成器和第二图像生成器分别对输入的样本特征向量进行处理,生成第一风格的样本图像和第二风格的样本图像作为配对样本数据。
图像生成器生成模块1030,用于基于配对样本数据训练预设模型生成目标图像生成器,其中,目标图像生成器用于对第一风格的输入图像进行处理生成第二风格的输出图像。
本公开实施例所提供的图像装置可执行本公开任意实施例所提供的图像处理方法,具 备执行方法相应的功能模块和有益效果。
为了实现上述实施例,本公开还提出一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现上述实施例中的图像处理方法。
图11为本公开实施例提供的一种电子设备的结构示意图。
下面具体参考图11,其示出了适于用来实现本公开实施例中的电子设备的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图11示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图11所示,电子设备可以包括处理器(例如中央处理器、图形处理器等)1101,其可以根据存储在只读存储器(ROM)1102中的程序或者从存储器1108加载到随机访问存储器(RAM)1103中的程序而执行各种适当的动作和处理。在RAM 1103中,还存储有电子设备操作所需的各种程序和数据。处理器1101、ROM 1102以及RAM 1103通过总线1104彼此相连。输入/输出(I/O)接口1105也连接至总线1104。
通常,以下装置可以连接至I/O接口1105:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置1106;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置1107;包括例如磁带、硬盘等的存储器1108;以及通信装置1109。通信装置1109可以允许电子设备与其他设备进行无线或有线通信以交换数据。虽然图11示出了具有各种装置的电子设备,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置1109从网络上被下载和安装,或者从存储器1108被安装,或者从ROM 1102被安装。在该计算机程序被处理器1101执行时,执行本公开实施例的图像处理方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。 计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:训练可以基于输入的随机特征向量得到第一风格的目标对象图像的第一图像生成器,以及训练可以对输入的随机特征向量生成第二风格的目标对象图像的第二图像生成器,基于第一图像生成器和第二图像生成器分别对输入的样本特征向量进行处理,生成第一风格的样本图像和第二风格的样本图像作为配对样本数据,在提高配对样本数据集合获取效率的基础上,保证了配对样本数据的质量,通过配对样本数据训练预设模型生成目标图像生成器,其中,目标图像生成器用于对输入图像进行处理生成与第二风格匹配的输出图像。由此,实现了风格转换场景下高质量的配对样本数据的构建,克服了样本数据难以获取的问题,保证了对图像进行风格转换的效果,提升了对图像进行风格转换的效率。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程 序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,本公开提供了一种图像处理方法,包括:训练生成第一图像生成器,其中,所述第一图像生成器用于对输入的随机特征向量进行处理生成第一风格的目标对象图像,以及训练生成第二图像生成器,所述第二图像生成器用于对输入的随机特征向量进行处理生成第二风格的目标对象图像;基于所述第一图像生成器和所述第二图像生成器分别对输入的样本特征向量进行处理,生成所述第一风格的样本图像和所述第二风格的样本图像作为配对样本数据;和基于所述配对样本数据训练预设模型生成目标图像生成器,其中,所述目标图像生成器用于对所述第一风格的输入图像进行处理生成所述第二风格的输出图像。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,所述训练生成第一图像生成器,包括:根据多个第一预设指标随机采集所述第一风格的第一对象图像数据;和根据所述第一对象图像数据训练生成对抗网络的参数,获取第一图像生成器。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,所述多个第一预设指标对应于目标对象的多个特征维度。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,所述训练生成第二图像生成器,包括:根据多个第二预设指标采集所述第二风格的第二对象图像数据;通过所述第二对象图像数据训练所述第一图像生成器的网络参数,获取第三图像生成器;在通过所述第一图像生成器中对输入的特征信息进行上采样处理的过程中,确定所述第一图像生成器中小于或等于目标图像分辨率的第一网络,以及大于所述目标图像分辨率的第二网络;在通过所述第三图像生成器中对所述特征信息进行上采样处理的过程中,确定所述第三图像生成器中小于或等于所述目标图像分辨率的第三网络,以及大于所述目标图像分辨率的第四网络;和根据预设的第一融合参数将所述第一网络与所述第三网络融合,以及根据预设的第二融合参数将所述第二网络与所述第四网络融合,进而获取所述第二图像生成器。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,所述多个第二预设指标对应于所述第二对象图像数据的多个不同的维度。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,所述基于所述第一图像生成器和所述第二图像生成器分别对输入的样本特征向量进行处理,生成所述第一风格的样本图像和所述第二风格的样本图像作为配对样本数据,包括:将所述第一风格的对象真实图像输入到预先训练的图像编码器进行处理,提取第一样本特征向量;将所述第一样本特征向量输入到所述第一图像生成器,生成所述第一风格的对象参考图像;将所述第 一样本特征向量输入到所述第二图像生成器,生成所述第二风格的第一目标图像;和将所述第一风格的对象参考图像作为所述第一风格的样本图像,以及将所述第二风格的第一目标图像作为所述第二风格的样本图像,生成预设第一比例的第一类配对样本数据。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,还包括:将所述第一风格的对象真实图像作为所述第一风格的样本图像,以及将所述第二风格的第一目标图像作为所述第二风格的样本图像,生成预设第二比例的第二类配对样本数据。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,还包括:将随机生成的第二样本特征向量输入到所述第一图像生成器,生成所述第一风格的对象随机图像;将所述第二样本特征向量输入到所述第二图像生成器,生成与所述第二风格的第二目标图像;和将所述第一风格的对象随机图像作为所述第一风格的样本图像,以及将所述第二风格的第二目标图像作为所述第二风格的样本图像,生成预设第三比例的第三类配对样本数据。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,所述第一比例、所述第二比例和所述第三比例之和为1。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,还包括:根据所述第一风格的对象图像数据和所述第一图像生成器,训练图像编码器的参数,以根据训练成的所述图像编码器对输入的真实图像提取对应的特征向量。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,还包括:对所述配对样本数据中所述第一风格的样本图像和所述第二风格的样本图像之间的面部关键点的差异部位进行形变补偿处理;和/或,对所述配对样本数据中所述第一风格的样本图像和所述第二风格的样本图像之间的非面部关键点的差异部位进行贴图补偿处理。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,所述基于所述配对样本数据训练预设模型生成目标图像生成器,包括:基于所述配对样本数据有监督的训练生成式对抗网络的参数,生成目标图像生成器,其中,在训练过程中根据预设权重对所述第一风格的样本图像和所述第二风格的样本图像的图像纹理进行加权融合调整输出图像的图像纹理。
根据本公开的一个或多个实施例,本公开提供的图像处理方法中,所述输入的随机特征向量包括:轮廓特征和像素颜色特征中的至少一个。
根据本公开的一个或多个实施例,本公开提供了一种图像处理装置,包括:训练模块,用于训练生成第一图像生成器,其中,所述第一图像生成器用于对输入的随机特征向量进 行处理生成第一风格的目标对象图像,以及训练生成第二图像生成器,所述第二图像生成器用于对输入的随机特征向量进行处理生成第二风格的目标对象图像;样本生成模块,用于基于所述第一图像生成器和所述第二图像生成器分别对输入的样本特征向量进行处理,生成所述第一风格的样本图像和所述第二风格的样本图像作为配对样本数据;和图像生成器生成模块,用于基于所述配对样本数据训练预设模型生成目标图像生成器,其中,所述目标图像生成器用于对所述第一风格的输入图像进行处理生成所述第二风格的输出图像。
根据本公开的一个或多个实施例,本公开提供的图像处理装置中,所述训练模块,具体用于:根据多个第一预设指标随机采集所述第一风格的第一对象图像数据;和根据所述第一对象图像数据训练生成对抗网络的参数,获取第一图像生成器。
根据本公开的一个或多个实施例,本公开提供的图像处理装置中,所述训练模块,具体用于:根据多个第二预设指标采集所述第二风格的第二对象图像数据;通过所述第二对象图像数据训练所述第一图像生成器的网络参数,获取第三图像生成器;在通过所述第一图像生成器中对输入的特征信息进行上采样处理的过程中,确定所述第一图像生成器中小于或等于目标图像分辨率的第一网络,以及大于所述目标图像分辨率的第二网络;在通过所述第三图像生成器中对所述特征信息进行上采样处理的过程中,确定所述第三图像生成器中小于或等于所述目标图像分辨率的第三网络,以及大于所述目标图像分辨率的第四网络;和根据预设的第一融合参数将所述第一网络与所述第三网络融合,以及根据预设的第二融合参数将所述第二网络与所述第四网络融合,进而获取所述第二图像生成器。
根据本公开的一个或多个实施例,本公开提供的图像处理装置中,所述样本生成模块,具体用于:将所述第一风格的对象真实图像输入到预先训练的图像编码器进行处理,提取第一样本特征向量;将所述第一样本特征向量输入到所述第一图像生成器,生成所述第一风格的对象参考图像;将所述第一样本特征向量输入到所述第二图像生成器,生成所述第二风格的第一目标图像;和将所述第一风格的对象参考图像作为所述第一风格的样本图像,以及将所述第二风格的第一目标图像作为所述第二风格的样本图像,生成预设第一比例的第一类配对样本数据。
根据本公开的一个或多个实施例,本公开提供的图像处理装置中,所述样本生成模块,具体用于:
将所述第一风格的对象真实图像作为所述第一风格的样本图像,以及将所述第二风格的第一目标图像作为所述第二风格的样本图像,生成预设第二比例的第二类配对样本数据。
根据本公开的一个或多个实施例,本公开提供的图像处理装置中,所述样本生成模块, 具体用于:将随机生成的第二样本特征向量输入到所述第一图像生成器,生成所述第一风格的对象随机图像;将所述第二样本特征向量输入到所述第二图像生成器,生成与所述第二风格的第二目标图像;和将所述第一风格的对象随机图像作为所述第一风格的样本图像,以及将所述第二风格的第二目标图像作为所述第二风格的样本图像,生成预设第三比例的第三类配对样本数据。
根据本公开的一个或多个实施例,本公开提供的图像处理装置中,所述图像生成器生成模块,具体用于:根据所述第一风格的对象图像数据和所述第一图像生成器,训练图像编码器的参数,以根据训练成的所述图像编码器对输入的真实图像提取对应的特征向量。
根据本公开的一个或多个实施例,本公开提供的图像处理装置中,还包括:补偿处理模块,用于对所述配对样本数据中所述第一风格的样本图像和所述第二风格的样本图像之间的面部关键点的差异部位进行形变补偿处理;和/或,对所述配对样本数据中所述第一风格的样本图像和所述第二风格的样本图像之间的非面部关键点的差异部位进行贴图补偿处理。
根据本公开的一个或多个实施例,本公开提供的图像处理装置中,所述图像生成器生成模块,具体用于:基于所述配对样本数据有监督的训练生成式对抗网络的参数,生成目标图像生成器,其中,在训练过程中根据预设权重对所述第一风格的样本图像和所述第二风格的样本图像的图像纹理进行加权融合调整输出图像的图像纹理。
根据本公开的一个或多个实施例,本公开还提供了一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述实施例中任一所述的图像处理方法。
根据本公开的一个或多个实施例,本公开还提供了一种计算机可读存储介质(例如,非瞬时性计算机可读存储介质),所述存储介质存储有计算机程序,所述计算机程序用于执行上述实施例中任一所述的图像处理方法。
根据本公开的一个或多个实施例,本公开还提供了一种计算机程序,包括:指令,所述指令当由处理器执行时使所述处理器执行如上述实施例任一所述的图像处理方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (17)

  1. 一种图像处理方法,包括:
    训练生成第一图像生成器,其中,所述第一图像生成器用于对输入的随机特征向量进行处理生成第一风格的目标对象图像,以及训练生成第二图像生成器,所述第二图像生成器用于对输入的随机特征向量进行处理生成第二风格的目标对象图像;
    基于所述第一图像生成器和所述第二图像生成器分别对输入的样本特征向量进行处理,生成所述第一风格的样本图像和所述第二风格的样本图像作为配对样本数据;和
    基于所述配对样本数据训练预设模型生成目标图像生成器,其中,所述目标图像生成器用于对所述第一风格的输入图像进行处理生成所述第二风格的输出图像。
  2. 根据权利要求1所述的方法,其中,所述训练生成第一图像生成器,包括:
    根据多个第一预设指标随机采集所述第一风格的第一对象图像数据;和
    根据所述第一对象图像数据训练生成对抗网络的参数,获取第一图像生成器。
  3. 根据权利要求2所述的方法,其中,所述多个第一预设指标对应于目标对象的多个特征维度。
  4. 根据权利要求1所述的方法,其中,所述训练生成第二图像生成器,包括:
    根据多个第二预设指标采集所述第二风格的第二对象图像数据;
    通过所述第二对象图像数据训练所述第一图像生成器的网络参数,获取第三图像生成器;
    在通过所述第一图像生成器中对输入的特征信息进行上采样处理的过程中,确定所述第一图像生成器中小于或等于目标图像分辨率的第一网络,以及大于所述目标图像分辨率的第二网络;
    在通过所述第三图像生成器中对所述特征信息进行上采样处理的过程中,确定所述第三图像生成器中小于或等于所述目标图像分辨率的第三网络,以及大于所述目标图像分辨率的第四网络;和
    根据预设的第一融合参数将所述第一网络与所述第三网络融合,以及根据预设的 第二融合参数将所述第二网络与所述第四网络融合,进而获取所述第二图像生成器。
  5. 根据权利要求4所述的方法,其中,所述多个第二预设指标对应于所述第二对象图像数据的多个不同的维度。
  6. 根据权利要求1至5任一项所述的方法,其中,所述基于所述第一图像生成器和所述第二图像生成器分别对输入的样本特征向量进行处理,生成所述第一风格的样本图像和所述第二风格的样本图像作为配对样本数据,包括:
    将所述第一风格的对象真实图像输入到预先训练的图像编码器进行处理,提取第一样本特征向量;
    将所述第一样本特征向量输入到所述第一图像生成器,生成所述第一风格的对象参考图像;
    将所述第一样本特征向量输入到所述第二图像生成器,生成所述第二风格的第一目标图像;和
    将所述第一风格的对象参考图像作为所述第一风格的样本图像,以及将所述第二风格的第一目标图像作为所述第二风格的样本图像,生成预设第一比例的第一类配对样本数据。
  7. 根据权利要求6所述的方法,还包括:
    将所述第一风格的对象真实图像作为所述第一风格的样本图像,以及将所述第二风格的第一目标图像作为所述第二风格的样本图像,生成预设第二比例的第二类配对样本数据。
  8. 根据权利要求7所述的方法,还包括:
    将随机生成的第二样本特征向量输入到所述第一图像生成器,生成所述第一风格的对象随机图像;
    将所述第二样本特征向量输入到所述第二图像生成器,生成所述第二风格的第二目标图像;和
    将所述第一风格的对象随机图像作为所述第一风格的样本图像,以及将所述第二风格的第二目标图像作为所述第二风格的样本图像,生成预设第三比例的第三类配对 样本数据。
  9. 根据权利要求8所述的方法,其中,所述第一比例、所述第二比例和所述第三比例之和为1。
  10. 根据权利要求4或5所述的方法,还包括:
    根据所述第一风格的对象图像数据和所述第一图像生成器,训练图像编码器的参数,以根据训练成的所述图像编码器对输入的真实图像提取对应的特征向量。
  11. 根据权利要求1至10任一项所述的方法,还包括:
    对所述配对样本数据中所述第一风格的样本图像和所述第二风格的样本图像之间的面部关键点的差异部位进行形变补偿处理;和/或,对所述配对样本数据中所述第一风格的样本图像和所述第二风格的样本图像之间的非面部关键点的差异部位进行贴图补偿处理。
  12. 根据权利要求1至11任一所述的方法,其中,所述基于所述配对样本数据训练预设模型生成目标图像生成器,包括:
    基于所述配对样本数据有监督的训练生成式对抗网络的参数,生成目标图像生成器,其中,在训练过程中根据预设权重对所述第一风格的样本图像和所述第二风格的样本图像的图像纹理进行加权融合调整输出图像的图像纹理。
  13. 根据权利要求1至12任一所述的方法,其中,
    所述输入的随机特征向量包括:轮廓特征和像素颜色特征中的至少一个。
  14. 一种图像处理装置,包括:
    训练模块,用于训练生成第一图像生成器,其中,所述第一图像生成器用于对输入的随机特征向量进行处理生成第一风格的目标对象图像,以及训练生成第二图像生成器,所述第二图像生成器用于对输入的随机特征向量进行处理生成第二风格的目标对象图像;
    样本生成模块,用于基于所述第一图像生成器和所述第二图像生成器分别对输入 的样本特征向量进行处理,生成所述第一风格的样本图像和所述第二风格的样本图像作为配对样本数据;和
    图像生成器生成模块,用于基于所述配对样本数据训练预设模型生成目标图像生成器,其中,所述目标图像生成器用于对所述第一风格的输入图像进行处理生成所述第二风格的输出图像。
  15. 一种电子设备,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1至13中任一所述的图像处理方法。
  16. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1至13中任一所述的图像处理方法。
  17. 一种计算机程序,包括:
    指令,所述指令当由处理器执行时使所述处理器执行如权利要求1至13中任一项所述的图像处理方法。
PCT/CN2023/072054 2022-01-25 2023-01-13 图像处理方法、装置、设备及介质 WO2023143118A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210089956.8 2022-01-25
CN202210089956.8A CN114418835A (zh) 2022-01-25 2022-01-25 图像处理方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2023143118A1 true WO2023143118A1 (zh) 2023-08-03

Family

ID=81277611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/072054 WO2023143118A1 (zh) 2022-01-25 2023-01-13 图像处理方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN114418835A (zh)
WO (1) WO2023143118A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418835A (zh) * 2022-01-25 2022-04-29 北京字跳网络技术有限公司 图像处理方法、装置、设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180240257A1 (en) * 2017-02-21 2018-08-23 Adobe Systems Incorporated Deep high-resolution style synthesis
CN109741244A (zh) * 2018-12-27 2019-05-10 广州小狗机器人技术有限公司 图片生成方法及装置、存储介质及电子设备
US20200327382A1 (en) * 2019-04-15 2020-10-15 Noblis, Inc. Adapting pre-trained classification algorithms
CN112132208A (zh) * 2020-09-18 2020-12-25 北京奇艺世纪科技有限公司 图像转换模型的生成方法、装置、电子设备及存储介质
CN112991150A (zh) * 2021-02-08 2021-06-18 北京字跳网络技术有限公司 风格图像生成方法、模型训练方法、装置和设备
CN113393544A (zh) * 2020-09-30 2021-09-14 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备及介质
CN114418835A (zh) * 2022-01-25 2022-04-29 北京字跳网络技术有限公司 图像处理方法、装置、设备及介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402151A (zh) * 2020-03-09 2020-07-10 北京字节跳动网络技术有限公司 图像处理方法、装置、电子设备及计算机可读介质
CN112232425B (zh) * 2020-10-21 2023-11-28 腾讯科技(深圳)有限公司 图像处理方法、装置、存储介质及电子设备
CN113240576B (zh) * 2021-05-12 2024-04-30 北京达佳互联信息技术有限公司 风格迁移模型的训练方法、装置、电子设备及存储介质
CN113837934B (zh) * 2021-11-26 2022-02-22 北京市商汤科技开发有限公司 图像生成方法及装置、电子设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180240257A1 (en) * 2017-02-21 2018-08-23 Adobe Systems Incorporated Deep high-resolution style synthesis
CN109741244A (zh) * 2018-12-27 2019-05-10 广州小狗机器人技术有限公司 图片生成方法及装置、存储介质及电子设备
US20200327382A1 (en) * 2019-04-15 2020-10-15 Noblis, Inc. Adapting pre-trained classification algorithms
CN112132208A (zh) * 2020-09-18 2020-12-25 北京奇艺世纪科技有限公司 图像转换模型的生成方法、装置、电子设备及存储介质
CN113393544A (zh) * 2020-09-30 2021-09-14 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备及介质
CN112991150A (zh) * 2021-02-08 2021-06-18 北京字跳网络技术有限公司 风格图像生成方法、模型训练方法、装置和设备
CN114418835A (zh) * 2022-01-25 2022-04-29 北京字跳网络技术有限公司 图像处理方法、装置、设备及介质

Also Published As

Publication number Publication date
CN114418835A (zh) 2022-04-29

Similar Documents

Publication Publication Date Title
CN109800732B (zh) 用于生成漫画头像生成模型的方法和装置
WO2020155907A1 (zh) 用于生成漫画风格转换模型的方法和装置
JP7225188B2 (ja) ビデオを生成する方法および装置
US20230206396A1 (en) Image super-resolution reconstructing
WO2023125374A1 (zh) 图像处理方法、装置、电子设备及存储介质
CN112562019A (zh) 图像色彩调整方法及装置、计算机可读介质和电子设备
WO2022037602A1 (zh) 表情变换方法、装置、电子设备和计算机可读介质
WO2023232056A1 (zh) 图像处理方法、装置、存储介质及电子设备
WO2023030381A1 (zh) 三维人头重建方法、装置、设备及介质
WO2023051244A1 (zh) 图像生成方法、装置、设备及存储介质
WO2023143118A1 (zh) 图像处理方法、装置、设备及介质
WO2023143129A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2023072015A1 (zh) 人物风格形象图的生成方法、装置、设备及存储介质
WO2023273697A1 (zh) 图像处理方法、模型训练方法、装置、电子设备及介质
CN113744286A (zh) 虚拟头发生成方法及装置、计算机可读介质和电子设备
CN112785670A (zh) 一种形象合成方法、装置、设备及存储介质
CN112785669B (zh) 一种虚拟形象合成方法、装置、设备及存储介质
CN114429658A (zh) 人脸关键点信息获取方法、生成人脸动画的方法及装置
CN110059739B (zh) 图像合成方法、装置、电子设备和计算机可读存储介质
WO2023130925A1 (zh) 字体识别方法、装置、可读介质及电子设备
WO2023116744A1 (zh) 图像处理方法、装置、设备及介质
WO2022262473A1 (zh) 图像处理方法、装置、设备及存储介质
CN110619602A (zh) 一种图像生成方法、装置、电子设备及存储介质
CN113706663B (zh) 图像生成方法、装置、设备及存储介质
CN115880526A (zh) 图像处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23746001

Country of ref document: EP

Kind code of ref document: A1