WO2023284416A1 - 数据处理方法及设备 - Google Patents

数据处理方法及设备 Download PDF

Info

Publication number
WO2023284416A1
WO2023284416A1 PCT/CN2022/094556 CN2022094556W WO2023284416A1 WO 2023284416 A1 WO2023284416 A1 WO 2023284416A1 CN 2022094556 W CN2022094556 W CN 2022094556W WO 2023284416 A1 WO2023284416 A1 WO 2023284416A1
Authority
WO
WIPO (PCT)
Prior art keywords
generator
loss
output
teacher
image
Prior art date
Application number
PCT/CN2022/094556
Other languages
English (en)
French (fr)
Inventor
吴捷
任玉羲
肖学锋
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Priority to EP22841056.9A priority Critical patent/EP4354343A1/en
Publication of WO2023284416A1 publication Critical patent/WO2023284416A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Definitions

  • Embodiments of the present disclosure relate to the technical field of computer and network communication, and in particular, to a data processing method and device.
  • the model compression of deep learning network is to formulate the model compression process of deep learning network as a multi-stage task, which includes multiple operations such as network structure search, distillation, pruning, and quantization.
  • GAN Generative Adversarial Networks
  • Embodiments of the present disclosure provide a data processing method and device to improve the model compression efficiency of a generative adversarial network, and realize image processing through a generative adversarial network on a lightweight device.
  • an embodiment of the present disclosure provides a data processing method, which is applicable to a generative confrontation network obtained through model distillation, and the data processing method includes:
  • the generative confrontation network includes the first generator, the second generator and a discriminator
  • the model distillation is a process of alternately training the first generator and the second generator
  • the first A model size of a generator is smaller than a model size of said second generator.
  • an embodiment of the present disclosure provides a data processing device, which is suitable for a generative confrontation network obtained through model distillation, and the data processing device includes:
  • Obtaining module used for obtaining the image to be processed
  • a processing module configured to process the image through the first generator to obtain a processed image
  • the generative confrontation network includes the first generator, the second generator and a discriminator
  • the model distillation is a process of alternately training the first generator and the second generator, so The model size of the first generator is smaller than the model size of the second generator.
  • an embodiment of the present disclosure provides an electronic device, including: at least one processor and a memory;
  • the memory stores computer-executable instructions
  • the at least one processor executes the computer-executed instructions stored in the memory, so that the at least one processor executes the data processing method described in the above first aspect and various possible designs of the first aspect.
  • an embodiment of the present disclosure provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the computer-executable instructions, the above first aspect and the first Aspects of various possible designs for the data processing methods described.
  • the embodiments of the present disclosure provide a computer program product, the computer program product includes computer-executable instructions, and when the processor executes the computer-executable instructions, various possible implementations of the above-mentioned first aspect and the first aspect can be realized. Design the data processing method described.
  • the embodiments of the present disclosure provide a computer program, the computer program includes computer-executable instructions, and when the processor executes the computer-executable instructions, the above-mentioned first aspect and various possible designs of the first aspect can be realized.
  • the generative confrontation network includes a first generator, a second generator, and a discriminator.
  • the model size of the first generator is smaller than the model size of the second generator.
  • the first generator and the second generator in the generative confrontation network are alternately trained, and in each training process, the training of the first generator is guided by the optimized second generator.
  • the first generator obtained through model distillation processes the image to be processed.
  • the multi-stage model compression process is abandoned, and the model compression of only the model distillation stage is realized, which reduces the complexity of model compression and improves the efficiency of model compression; on the other hand, through During the model distillation process, the first generator and the second generator alternately train the online distillation method, which improves the model training effect of the first generator and improves the quality of the image processed by the first generator.
  • the finally obtained first generator can adapt to lightweight devices with weak computing power in terms of model scale, and the quality of the processed image is guaranteed to be better through the first generator.
  • FIG. 1 is an example diagram of an application scenario provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic flow diagram of a training process of a generative confrontation network in a data processing method provided by an embodiment of the present disclosure
  • FIG. 4 is a second schematic flow diagram of a training process of the generative confrontation network in the data processing method provided by the embodiment of the present disclosure
  • FIG. 5 is an example diagram of a model structure of a generative confrontation network provided by an embodiment of the present disclosure
  • FIG. 6 is a structural block diagram of a data processing device provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 1 is an example diagram of an application scenario provided by an embodiment of the present disclosure.
  • the application scenario shown in Figure 1 is an image processing scenario.
  • the involved devices include a terminal 101 and a server 102, and the terminal 101 communicates with the server 102 through a network, for example.
  • the server 102 is used to train the deep learning model, and deploy the trained deep learning model to the terminal 101 .
  • the terminal 101 performs image processing through a deep learning model.
  • the deep learning model is Generative Adversarial Networks (GAN).
  • GAN Generative Adversarial Networks
  • the server 102 deploys the generator in the trained generative confrontation network to the terminal.
  • the terminal 101 is a lightweight device (such as a camera, a mobile phone, and a smart home appliance) with relatively weak computing power, and is suitable for deploying a small-scale deep learning model. Therefore, how to obtain a smaller-scale generator adapted to be deployed on a lightweight device and improve the image processing effect of a smaller-scale generator is one of the problems that need to be solved urgently.
  • Model compression is one of the ways to train deep learning models with small model sizes.
  • the current model compression methods for generative adversarial networks still have the following shortcomings:
  • the mature model compression technology in the field of deep learning is not customized for generative adversarial networks, and lacks the exploration of the complex characteristics and structure of generative adversarial networks; 2)
  • the model compression process includes network structure search, distillation, pruning, quantization, etc. Multiple stages require high time requirements and computing resources; 3)
  • the compressed generative adversarial network consumes high computing resources and is difficult to apply to lightweight devices.
  • an embodiment of the present disclosure provides a data processing method.
  • a model compression method suitable for generative adversarial networks is designed.
  • one-step compression of generative adversarial networks is realized through model distillation, which reduces the complexity of model compression and improves the efficiency of model compression.
  • the training effect of the smaller generator is improved through the online distillation method in which the smaller generator and the larger generator are alternately trained.
  • the resulting generator is smaller in size, suitable for lightweight devices, and the processed image quality is better.
  • the data processing method provided by the embodiments of the present disclosure may be applied in a terminal or a server.
  • the method When the method is applied to a terminal, real-time processing of images collected by the terminal can be realized.
  • the method is applied to the server, it can realize the processing of the image sent by the terminal.
  • the terminal device may be a personal digital assistant (PDA for short) device, a handheld device (such as a smart phone, a tablet computer) with a wireless communication function, a computing device (such as a personal computer (personal computer, PC for short)), Vehicle-mounted devices, wearable devices (such as smart watches, smart bracelets), and smart home devices (such as smart display devices), etc.
  • PDA personal digital assistant
  • a handheld device such as a smart phone, a tablet computer
  • a computing device such as a personal computer (personal computer, PC for short)
  • Vehicle-mounted devices such as a personal computer (personal computer, PC for short)
  • wearable devices such as smart watches, smart bracelets
  • smart home devices such
  • FIG. 2 is a first schematic flowchart of a data processing method provided by an embodiment of the present disclosure. As shown in Figure 2, the data processing method includes:
  • the image to be processed may be an image captured by the terminal in real time, or one or more frames of images acquired from a video captured by the terminal in real time.
  • the image to be processed may be an image input by the user or selected by the user.
  • the user inputs the image to be processed and detected on the display interface of the terminal, or selects the image to be detected.
  • the server receives user input sent by the terminal or an image selected by the user.
  • the image to be processed may be an image played on the terminal in real time.
  • the terminal acquires the image or video frame being played. Thereby, the processing of the image played on the terminal in real time is realized.
  • the image to be processed is an image in a database pre-stored on the terminal and/or the server.
  • a database storing a plurality of images to be processed is pre-established at the terminal, and images to be processed are obtained from the database when image processing is performed.
  • the model size of the first generator is smaller than the model size of the second generator, so, compared with the first generator, the second generator.
  • the image processing capability of the image processor is stronger, and it can extract more detailed features of the image, process and obtain higher-quality images, and the image processing process requires more computing resources.
  • the model distillation is used to train the generative confrontation network.
  • the first generator and the second generator are alternately trained, that is, online distillation is performed on the first generator and the second generator, so as to use the optimized second generator to guide the first generator Optimization, so that the first generator whose model size is smaller than the second generator can approach the second generator in terms of image processing quality.
  • the training process of the generative adversarial network can be carried out on the server. Considering that the computing power of the terminal is weak, the first generator with a smaller model size after model distillation can be deployed on the terminal.
  • the image to be processed is directly input into the first generator, or the image to be processed is input into the first generator after preprocessing operations such as cropping, denoising, and enhancement, and the second generator is obtained.
  • preprocessing operations such as cropping, denoising, and enhancement
  • the model compression of the generative confrontation network is realized, the efficiency and effect of model compression are improved, and the image quality obtained by processing is obtained with a small model size
  • the higher first generator is especially suitable for deployment on lightweight devices for image processing, improving the processing efficiency and quality of images on lightweight devices.
  • the training process of the generative confrontation network is carried out separately from the process of applying the generative confrontation network to image processing. For example, after the training of the generative confrontation network is performed on the server, the trained students generate , image processing through the Student Generator. After the server updates the generative adversarial network each time, the student generator can be redeployed on the terminal.
  • FIG. 3 is a schematic flow diagram of a training process of the generative confrontation network in the data processing method provided by the embodiment of the present disclosure, that is, an alternate training process of the first generator and the second generator in the generative confrontation network schematic diagram of the process.
  • an alternate training process of the first generator and the second generator in the generative confrontation network includes:
  • the sample data includes a sample image and a reference image corresponding to the sample image.
  • the sample data includes the sample image and the real depth map of the sample image; in the face recognition of the image, the sample image includes the sample image and the real face label map of the sample image, such as The location of each face can be manually marked in the marker map.
  • the sample image is processed by the second generator to obtain the processed sample image output by the second generator, which is referred to as the output image of the second generator for concise description.
  • the discriminator the authenticity of the reference image corresponding to the sample image and the output image of the second generator are discriminated, and the adversarial loss of the second generator is determined.
  • the second generator makes its own output image close to the reference image corresponding to the sample image, and the discriminator tries to distinguish the output image of the second generator from the reference image corresponding to the sample image.
  • the loss reflects the loss value of authenticity discrimination performed by the discriminator on the output image of the second generator and the reference image corresponding to the sample image.
  • the discriminator In the process of discriminating the reference image corresponding to the sample image and the output image of the second generator through the discriminator, the reference image corresponding to the sample image is input to the discriminator, and the output image of the second generator is input to the discriminator , the discriminator respectively judges whether the reference image corresponding to the sample image and the output image of the second generator come from the sample data. Finally, the adversarial loss of the teacher generator is calculated according to the output of the discriminator when the reference image corresponding to the sample image is input to the discriminator, the output of the discriminator when the output image of the second generator is input to the discriminator, and the adversarial loss function.
  • the output of the discriminator is 1, which means that the input data of the discriminator comes from sample data
  • the output of the discriminator is 0, which means that the input data of the discriminator does not come from sample data
  • determine the reference image corresponding to the sample image for input discrimination The expected value of the output of the discriminator at the time of the generator, determine the expected value of the difference obtained by subtracting the output image of the second generator from the output of the discriminator when the output image of the second generator is input to the discriminator, and add and sum the two expected values to obtain the second generation against the loss of the device.
  • the adversarial loss function used to calculate the adversarial loss of the second generator is expressed as:
  • L GAN (G T , D) is the adversarial loss function
  • G T represents the second generator
  • D represents the discriminator
  • x represents the sample image
  • y represents the reference image corresponding to the sample image
  • G T (x) represents the sample
  • E ⁇ x,y ⁇ [] represents the expected function under the sample data ⁇ x, y ⁇
  • E ⁇ x ⁇ [] represents the sample data x expected function of .
  • determining the loss value of the second generator is an adversarial loss of the second generator. That is, the adversarial loss obtained by the above calculation is directly used as the loss value for teaching the second generator.
  • the second generator's loss value includes the second generator's reconstruction loss in addition to the adversarial loss.
  • a possible implementation of S301 includes: processing the sample image through the second generator to obtain the output image of the second generator; The authenticity of the image is determined to determine the adversarial loss of the second generator; the loss value of the second generator is determined according to the difference between the reference image corresponding to the sample image and the output image of the second generator. Therefore, in the loss value of the second generator, both the confrontation loss when the discriminator performs image discrimination and the reconstruction loss reflecting the difference between the reference image corresponding to the sample image and the output image of the second generator are considered, Improve the comprehensiveness and accuracy of the loss value of the second generator, thereby improving the training effect of the second generator.
  • the difference between the reference image corresponding to the sample image and the output image of the second generator is determined, and the reconstruction loss of the second generator is calculated according to the difference.
  • reconstruction loss function used to calculate the reconstruction loss of the second generator is expressed as:
  • L recon (G T , D) is the reconstruction loss function of the second generator
  • yG T (x) is the difference between the reference image corresponding to the sample image and the output image of the second generator.
  • the second generator can be adjusted according to the optimization objective function to complete a training session of the second generator.
  • the optimization objective function is, for example, a function that maximizes the loss value, or a function that minimizes the loss value, and the optimization algorithm used in the adjustment process of the second generator, such as the gradient descent algorithm, does not affect the optimization algorithm here Do limit.
  • optimizing the objective function includes maximizing the adversarial loss based on the discriminator and minimizing the adversarial loss based on the teacher generator. loss.
  • the optimization direction of the discriminator is to maximize the confrontation loss, so as to improve the discrimination ability of the discriminator; the optimization goal of the second generator is to minimize the confrontation loss, so as to pass the second generation
  • the output image of the generator is close to the reference image corresponding to the sample image, so that the discriminator can judge that the output image of the second generator is from the sample data.
  • optimizing the objective function includes minimizing the reconstruction loss on the basis of the second generator, that is, by adjusting the second generator to minimize The reconstruction loss is minimized, the output image of the second generator is approached to the reference image corresponding to the sample image, and the image quality of the output image of the second generator is improved.
  • the optimization objective function of the second generator is expressed as:
  • S303 Determine a distillation loss between the adjusted second generator and the first generator according to the sample image, the adjusted second generator, and the first generator.
  • the sample image is processed by the adjusted second generator, and the sample image is processed by the first generator. Since the model scale of the second generator is larger than that of the first generator, the difference between the data obtained by processing the sample image through the adjusted second generator and the data obtained by processing the sample image through the first generator is Differences exist between and from these differences the adjusted distillation losses between the second generator and the first generator are determined.
  • distillation losses Examples of distillation losses and procedures for determining distillation losses are provided below.
  • the adjusted distillation loss between the second generator and the first generator comprises an adjusted outgoing distillation loss between the second generator and the first generator.
  • the network layer includes an input layer, an intermediate layer and an output layer
  • the output distillation loss is the distillation loss between the output layer of the second generator and the output layer of the first generator, reflecting the second generator's The difference between the output image and the output image of the first generator.
  • a possible implementation of S303 includes: using the first generator and the adjusted second generator to process the sample image respectively to obtain the output image of the first generator and the output image of the second generator; The difference between the output image of the first generator and the output image of the second generator, determines the output distillation loss.
  • the difference between the output image of the first generator and the output image of the second generator can be obtained by comparing the output image of the first generator with the output image of the second generator. For example, comparing each pixel in the output image of the first generator with the pixel at the corresponding position in the output image of the second generator; Compare with the output image of the second generator.
  • the optimization of the first generator is guided by the output distillation loss reflecting the difference between the output image of the first generator and the output image of the second generator.
  • the first generator's The output image gradually approaches the adjusted output image of the second generator, which is beneficial to improve the image quality of the image processed by the first generator.
  • the output distillation loss includes a structural similarity loss and/or a perceptual loss between the output image of the first generator and the output image of the second generator.
  • the structural similarity loss is similar to the observation of images by the Human Visual System (HVS), focusing on the local structural differences between the output image of the first generator and the output image of the second generator, including the brightness and contrast of the image etc. differences.
  • the perceptual loss focuses on the difference in feature representation between the output image of the first generator and the output image of the second generator.
  • the output image of the second generator is similar to the output image of the first generator
  • the difference between the output image of the second generator and the output image of the first generator is compared respectively, and the structural similarity loss is obtained.
  • feature extraction is performed on the output image of the first generator and the output image of the second generator through the feature extraction network, and the perceptual loss between the output image of the first generator and the output image of the second generator is determined , for example, comparing the extracted features of the output image of the first generator with the extracted features of the output image of the second generator to obtain the perceptual loss.
  • Determine the output distillation loss based on the structural similarity loss and/or the perceptual loss for example, determine the output distillation loss as the structural similarity loss, or determine the output distillation loss as the perceptual loss, or weight the structural similarity loss and the perceptual loss Summed to get the output distillation loss.
  • structured similarity loss and/or perceptual loss from one or more aspects of human vision, feature representation, etc., determine the difference between the output image of the first generator and the output image of the second generator, and improve the output distillation
  • the comprehensiveness and accuracy of the loss improve the training effect of the first generator.
  • the process of determining the structural similarity loss includes: determining the brightness estimation of the output image of the second generator, the brightness estimation of the output image of the first generator, the contrast estimation of the output image of the second generator, the first generating The contrast estimation of the output image of the generator, the structural similarity estimation between the output image of the second generator and the output image of the first generator; according to these parameters, the output image of the first generator and the output of the second generator are determined Structural similarity loss between images.
  • calculate the pixel mean value and pixel standard deviation of the output image of the second generator calculate the pixel mean value and pixel standard deviation of the output image of the first generator, and calculate the difference between the pixels of the output image of the second generator and the first generator The covariance between the pixels of the output image.
  • the brightness estimation and contrast estimation of the output image are determined as the pixel mean value and pixel standard deviation of the output image respectively.
  • the brightness estimation and contrast estimation of the output image are determined as the pixel mean value and pixel standard deviation of the output image respectively.
  • a structural similarity estimate between the output image of the second generator and the output image of the first generator is determined as a covariance between pixels of the output image of the second generator and pixels of the output image of the first generator.
  • L SSIM (pt , p s ) represents the structure similarity loss function
  • p t and p s represent the output image of the second generator and the output image of the first generator respectively
  • ⁇ t and ⁇ s represent the second
  • ⁇ ts denote the structural similarity estimation between the output image of the second generator and the output image of the first generator.
  • the process of determining the perceptual loss includes: inputting the output image of the first generator and the output image of the second generator into the feature extraction network respectively, and obtaining the output of the first generator output by the preset network layer of the feature extraction network features of the image and features of the output image of the second generator; determining a feature reconstruction loss and/or a style reconstruction loss based on a difference between features of the output image of the first generator and features of the output image of the second generator.
  • the perceptual loss includes feature reconstruction loss and/or style reconstruction loss: the feature reconstruction loss is used to reflect the lower level (or more specific) feature representation of the output image of the first generator and the lower level of the output image of the second generator.
  • the difference between the feature representations of the first generator is used to encourage the output image of the first generator to have a similar feature representation to the output image of the second generator;
  • the style reconstruction loss is used to reflect the more abstract style of the output image of the first generator.
  • the difference between a feature (e.g. color, texture, pattern) and the more abstract stylistic features of the output image of the second generator is used to encourage the output image of the first generator to have similar stylistic features to the output image of the second generator.
  • the abstraction of features extracted by different network layers based on the same feature extraction network is different: the features of the output image of the first generator and the output image of the second generator extracted by the network layer used to extract the underlying features are obtained According to the difference between the features of the output image of the first generator and the features of the output image of the second generator, the feature reconstruction loss is determined; the first generator extracted by the network layer for extracting abstract features is obtained The features of the output image and the features of the output image of the second generator determine the style reconstruction loss based on the difference between the features of the output image of the first generator and the features of the output image of the second generator.
  • image features are extracted through different feature extraction networks, where one feature extraction network is good at extracting underlying feature representations, and the other feature extraction network is good at extracting abstract style features.
  • the feature reconstruction loss and the style reconstruction loss are respectively determined.
  • the feature extraction network is a super-resolution test sequence (Visual Geometry Group, VGG) network, wherein the VGG network is a deep convolutional neural network, which can be used to extract the characteristics of the output image of the first generator and the second The characteristics of the generator. Therefore, the features of different abstraction levels in the output image of the first generator and the output image of the second generator can be obtained from different network layers of the same VGG network or from different network layers of different VGG networks.
  • VGG Visual Geometry Group
  • the feature reconstruction loss function used to calculate the feature reconstruction loss is expressed as:
  • L fea ( pt , p s ) represents the feature loss function, which is used to calculate the feature reconstruction loss between the output image p t of the second generator and the output image of the first generator p s
  • ⁇ j (p t ) represents the feature activation value (i.e. feature) of the output image of the second generator extracted by the j-th layer of the VGG network ⁇
  • ⁇ j (p s ) represents the first generator’s extracted by the j-th layer of the VGG network ⁇
  • C j ⁇ H j ⁇ W j represents the dimensionality of the feature activation values output by the jth layer of the VGG network ⁇ .
  • style reconstruction loss function used to calculate the style reconstruction loss is expressed as:
  • L style ( pt , p s ) represents the style loss function, which is used to calculate the style reconstruction loss between the output image p t of the second generator and the output image of the first generator p s
  • the distillation loss is backpropagated, and the model parameters of the first generator are adjusted during the backpropagation process, so that the learning generation
  • the filter is optimized to minimize distillation losses.
  • the distillation loss includes the output distillation loss, and the output distillation loss is backpropagated, and the model parameters of the first generator are adjusted during the backpropagation process, so that the learning generator proceeds in the direction of minimizing the output distillation loss optimization.
  • the concept and determination process of the output distillation loss can refer to the description of the preceding steps, and will not be repeated here.
  • the online loss of the first generator relative to the second generator also includes a total variation loss of the output image of the first generator.
  • the total variation loss of the output image of the first generator is used to reflect the spatial smoothness of the output image of the first generator, and optimizing the first generator through the total variation loss can improve the output of the first generator Spatial smoothness of images to improve image quality.
  • a possible implementation of S304 includes: The distillation loss and the total variation loss are weighted and summed to obtain the online loss of the first generator; and the first generator is adjusted according to the online loss of the first generator.
  • the weights corresponding to the distillation loss and the total variation loss can be determined by professionals based on experience and experimental processes.
  • the data difference of the image processing between the first generator and the second generator and the image output by the first generator are taken into account.
  • Noise situation, and through the weighting method of distillation loss and total variation loss, the distillation loss and total variation loss are balanced, which is beneficial to improve the training effect of the first generator.
  • the distillation loss includes the output distillation loss
  • the output distillation loss includes the structural similarity loss between the output image of the first generator and the output image of the second generator
  • the perceptual loss includes the first generator's
  • the online distillation loss function used to calculate the online loss of the first generator is expressed as:
  • L kd (p t , p s ) ⁇ ssim L ssim + ⁇ fea L fea + ⁇ style L style + ⁇ tv L tv .
  • L kd ( pt , p s ) represents the loss function of the first generator
  • ⁇ ssim , ⁇ fea , ⁇ style , and ⁇ tv represent the weight corresponding to the structure similarity loss L ssim and the feature reconstruction loss L fea corresponding to The weight corresponding to the style reconstruction loss L style , the weight corresponding to the total variation loss L tv .
  • the second generator and the first generator are distilled online, that is, the second generator and the first generator perform training synchronously.
  • the first generator is optimized with only the adjusted second generator in the current training epoch.
  • the first generator is trained in an environment with a discriminator, and the first generator does not need to be tightly bound to the discriminator, so that the first generator can be trained more flexibly and obtain further compression; the other
  • the optimization of the first generator does not require real labels, and the first generator only learns the output of the second generator with a similar structure and a larger model size, which effectively reduces the difficulty of fitting the real labels for the first generator.
  • the first generator is a student generator and the second generator is a teacher generator.
  • the model structure of the student generator is similar to that of the teacher generator.
  • the scale and complexity of the model of the teacher generator are larger than that of the student generator.
  • the teacher generator has a stronger learning ability. Ability to better guide the training of the student generator during the distillation process.
  • the teacher generator includes a first teacher generator and a second teacher generator, wherein the model capacity of the first teacher generator is greater than that of the student generator, and the model depth of the second teacher generator is greater than that of the student generator The model depth of the generator.
  • the student generator with two different teacher generators from two complementary dimensions can provide a complementary comprehensive distillation loss for the student generator model during model distillation, as follows: the first teacher generator starts from the model capacity (that is, the model width, also known as the number of channels of the model) to make up for the student generator to capture more detailed image information that the student generator cannot capture; the second model generator compensates for the student generator from the depth of the model, achieve better image quality.
  • the student generator is similar to the first teacher generator and the second teacher generator in terms of model structure, both of which are deep learning models including four network layers.
  • the number of channels of the middle layer of the first teacher generator is a multiple of the number of channels of the middle layer of the student generator, wherein the multiple is greater than 1. Therefore, the relationship between the first teacher generator and the student generator is established succinctly through the multiple relationship, which is more conducive to the calculation of the channel distillation loss in the subsequent embodiments.
  • the number of network layers of the second teacher generator is greater than the number of network layers of the student generator.
  • one or more network layers are added before each upsampling network layer and each downsampling network layer of the student generator to obtain the second teacher generator.
  • a deep residual network (Deep residual network, Resnet) is added before each upsampling network layer and each downsampling network layer of the student generator, and the second teacher generates device.
  • Resnet Deep residual network
  • the loss value of the first teacher generator can be determined according to the sample data and the discriminator, the sample data includes the sample image and the reference image of the sample image; according to the first teacher According to the loss value of the generator, adjust the first teacher generator; according to the sample data and the discriminator, determine the loss value of the second teacher generator; according to the loss value of the second teacher generator, adjust the second teacher generator; according to the sample image , the adjusted first teacher generator and the adjusted second teacher generator, the adjusted student generator.
  • the adjustment of the first teacher generator and the adjustment of the second teacher generator can refer to the adjustment of the second generator in the previous embodiment.
  • the difference from the previous embodiment is that when adjusting the student generator, it is necessary to determine the first teacher
  • the determination process of the distillation loss between the first teacher generator and the student generator, and the determination process of the distillation loss between the second teacher generator and the student generator can refer to the second generator and the student generator in the previous embodiment. The process of determining the distillation loss between the first generators will not be repeated here.
  • the discriminator includes a first discriminator and a second discriminator, and there is a shared convolutional layer between the first discriminator and the second discriminator.
  • the first teacher The generator uses the first discriminator
  • the second teacher generator uses the second discriminator. Therefore, fully considering that the model structures of the first teacher generator and the second teacher generator are similar but not identical, the first discriminator and the second discriminator shared by the convolutional layer are used to train the first teacher generator and the second discriminator respectively.
  • the second teacher generator improves the effect and efficiency of model training.
  • FIG. 4 is a second schematic flow diagram of a training process of the generative confrontation network in the data processing method provided by the embodiment of the present disclosure, that is, the student generator, the first teacher generator, and the second teacher generator in the generative confrontation network Schematic flow chart of an alternate training process.
  • an alternate training process of the student generator, the first teacher generator, and the second teacher generator in the generative confrontation network includes:
  • the loss value of the first teacher generator includes an adversarial loss of the first teacher generator.
  • the adversarial loss function used to calculate the adversarial loss of the first teacher generator can be expressed as:
  • the loss value of the first teacher generator also includes the reconstruction loss of the first teacher generator.
  • the reconstruction loss function used to calculate the reconstruction loss of the first teacher generator can be expressed as:
  • the optimization objective function of the first teacher generator is expressed as:
  • the loss value of the second teacher generator includes an adversarial loss of the second teacher generator.
  • the adversarial loss function used to calculate the adversarial loss of the second teacher generator can be expressed as:
  • the loss value of the second teacher generator also includes the reconstruction loss of the second teacher generator.
  • the reconstruction loss function used to calculate the reconstruction loss of the second teacher generator can be expressed as:
  • the optimization objective function of the second teacher generator is expressed as:
  • the sample image is processed by the adjusted first teacher generator, the adjusted second teacher generator, and the student generator respectively.
  • the model capacity of the first teacher generator is larger than the model capacity of the student generator, and the model depth of the second teacher generator is larger than that of the student generator, so: the sample image obtained by the adjusted first teacher generator.
  • the adjusted first teacher can be determined The distillation loss between the generator, the adjusted second teacher generator and the student generator respectively.
  • the student generator is adjusted, and in each training process, based on the optimized first teacher generator
  • the optimized second teacher generator guides the optimization of the student generator, and integrates the first teacher generator and the second teacher generator to improve the training effect of the student generator.
  • the distillation loss between the first teacher generator and the student generator includes an output distillation loss between the first teacher generator and the student generator, that is, the output layer of the first teacher generator and the student generator The distillation loss between the output layers of .
  • the output distillation loss may include: a structural similarity loss and/or a perceptual loss between the output image of the first teacher generator and the output image of the student generator.
  • the perceptual loss may include: a feature reconstruction loss and/or a style reconstruction loss between the output image of the first teacher generator and the output image of the student generator.
  • the distillation loss between the second teacher generator and the student generator includes an output distillation loss between the second teacher generator and the student generator, that is, the output layer of the second teacher generator and the student generator The distillation loss between the output layers of .
  • the output distillation loss may include: a structural similarity loss and/or a perceptual loss between the output image of the second division generator and the output image of the student generator.
  • the perceptual loss may include: a feature reconstruction loss and/or a style reconstruction loss between the output image of the second teacher generator and the output image of the student generator.
  • the model depth of the first teacher generator is the same as the model depth of the student generator.
  • the model capacity of the first teacher generator is larger, that is, the number of channels of the convolutional layer is smaller. Many, able to capture details that student generators cannot.
  • the distillation loss between the teacher generator and the student generator only includes the output distillation loss, that is, in the model distillation process, only the information of the output layer of the teacher generator is distilled, or in other words, only the teacher generated
  • the discrepancy between the output image of the generator and the output image of the student generator does not take into account the information of the intermediate layers of the teacher generator.
  • the information of the middle layer of the first teacher generator that is, the information of the channel granularity, can be used as one of the supervisory signals for the optimization process of the student generator to further improve the student generation. device training effect.
  • the distillation loss between the first teacher generator and the student generator includes the output distillation loss and the channel distillation loss between the first teacher generator and the student generator, where the channel distillation loss is the first teacher generator.
  • the distillation loss between the middle layer of the teacher generator and the middle layer of the student generator reflects the difference between the features of the sample image extracted by the middle layer of the first teacher generator and the features of the sample image extracted by the middle layer of the student generator. Therefore, combining the output distillation loss and the channel distillation loss as the supervisory information for the optimization of the student generator can realize multi-granularity model distillation and improve the effect of model distillation.
  • a possible implementation of S405 includes: using the student generator, the adjusted first teacher generator, and the adjusted second teacher generator to process the sample image respectively to obtain the output image of the student generator and the second The output image of a teacher generator, determine the first output distillation loss, the first output distillation loss is the distillation loss between the output layer of the first teacher generator and the output layer of the student generator; according to the intermediate layer output of the student generator The feature map of the first teacher generator and the feature map output by the middle layer of the first teacher generator determine the channel distillation loss; according to the output image of the student generator and the output image of the second teacher generator, determine the second output distillation loss, the second output distillation Loss is the distillation loss between the output layer of the second teacher generator and the output layer of the student generator; the student generator is adjusted according to the first output distillation loss, the channel distillation loss and the second output distillation loss.
  • the feature map output by the middle layer of the student generator refers to the feature of the sample image extracted by the middle layer of the student generator, including the feature map value output by each channel in the middle layer of the student generator.
  • the feature map output by the middle layer of the first teacher generator refers to the features of the sample image extracted by the middle layer of the first teacher generator, including the features output by each channel in the middle layer of the first teacher generator map value.
  • the model depth of the student generator is the same as that of the first teacher generator, for each intermediate layer, determine the feature map output by the intermediate layer of the student generator and the feature map output by the intermediate layer of the first teacher generator According to the difference between, the channel distillation loss is determined.
  • the student generator is adjusted according to the first output distillation loss, the channel distillation loss, and the second output distillation loss.
  • the first teacher generates A channel convolution layer is connected between the middle layer of the teacher generator and the middle layer of the student generator, and the channel convolution layer is used to establish the mapping between the channel of the middle layer of the first teacher generator and the channel of the middle layer of the student generator relation. Therefore, based on the channel convolution layer, for each channel of the middle layer of the student generator, there is a corresponding channel in the channel of the middle layer of the first teacher generator, without changing the middle layer of the student generator. On the premise of the channel number, through the channel convolution layer, the channel number expansion of the middle layer of the student generator is realized in the process of determining the channel distillation loss.
  • a possible implementation of determining the channel distillation loss includes: according to the middle layer of the student generator.
  • the feature map output by each channel determines the attention weight of each channel in the middle layer of the student generator; according to the feature map output by each channel in the middle layer of the first teacher generator, determine the weight of each channel in the middle layer of the first teacher generator.
  • Attention weights for the channels; the channel distillation loss is determined from the difference between the attention weights of the channels mapped to each other in the middle layer of the student generator and the middle layer of the first teacher generator. Among them, the attention weight of the channel is used to measure the importance of the channel.
  • the attention weight of the channel can be calculated based on the pixels on the feature map output by the channel, for example, the sum or mean of all pixels on the feature map output by the channel , determined as the attention weight of the channel.
  • the attention weight of the channel can be calculated based on the pixels on the feature map output by the channel.
  • determine the channels that are mapped to each other compare the attention weights of the channels that are mapped to each other, and determine the attention of the channels that are mapped to each other The difference between the force weights, which in turn determines the channel distillation loss.
  • the channel convolution layer is a 1*1 learnable convolution layer
  • the channel of the intermediate layer of the student generator is mapped to the channel of the corresponding intermediate layer of the first teacher generator through the 1*1 learnable convolution layer.
  • channels so that the number of channels of the middle layer of the student generator is upscaled to be consistent with the number of channels of the corresponding middle layer of the student generator.
  • the attention weight of the channel can be calculated according to each pixel on the feature map output by the channel and the size of the feature map output by the channel.
  • the pixels of each row on the feature map output by the channel can be added to obtain the sum of pixels corresponding to each row; the sum of the pixels corresponding to each row Add up to get the sum of pixels corresponding to the feature map; according to the size of the feature map, average the sum of pixels to get the attention weight of the channel.
  • calculation formula of the attention weight of the channel can be expressed as:
  • w c represents the attention weight of channel c
  • H is the height of the feature map output by channel C
  • W is the width of the feature map output by channel C
  • u c (i,j) is the position on the feature map (i, j) pixels.
  • the determination process of the attention weights of each channel in the middle layer of the first teacher generator can refer to the relevant description of the student generator, and will not be repeated here.
  • each intermediate layer of the student generator and each intermediate layer of the teacher generator determine the difference between the attention weights of each pair of channels mapped to each other, and according to the attention weights of each pair of channels mapped to each other
  • the difference between , the number of samples of the feature map in the middle layer of the student generator and the middle layer of the first teacher generator, and the number of channels of the feature map map determine the channel distillation loss.
  • the accuracy of channel distillation loss is improved by considering not only the attention weights of each pair of channels mapped to each other, but also the number of channels in the intermediate layer and the number of feature maps sampled in the intermediate layer.
  • the channel distillation loss function used to calculate the channel distillation loss can be expressed as:
  • n represents the number of samples of the feature map
  • c represents the number of channels mapped by the feature map
  • the channel distillation loss is weighted according to the channel loss weight factor to obtain a weighted result.
  • the first The output distillation loss and the second output distillation loss tune the student generator. Therefore, by adjusting the weight factor of the channel loss, the degree of influence of the channel distillation loss on the optimization process of the student generator can be adjusted, and the flexibility of the training of the student generator can be improved.
  • the process of adjusting the student generator according to the weighted result of the channel distillation loss and the channel loss weighting factor, the first output distillation loss and the second output distillation loss it can be based on the first output distillation loss, the second output distillation loss , respectively determine the online loss of the student generator with respect to the first teacher generator and the online loss of the student generator with respect to the second teacher generator.
  • the online loss of the first teacher generator, the online loss of the student generator relative to the second teacher generator, and the weighted results of the channel distillation loss and the channel loss weighting factor are weighted and summed to balance these loss values by weighting , to get the final loss value of the multi-grain online distillation of the student generator.
  • the model parameters of the student generator are adjusted to realize the optimization of the student generator.
  • the online loss of the student generator relative to the first teacher generator includes: the output distillation loss between the student generator and the first teacher generator.
  • the online loss of the student generator with respect to the first teacher generator contains: the output distillation loss between the student generator and the first teacher generator and the total variation loss of the output image of the student generator.
  • the online loss of the student generator with respect to the second teacher generator consists of: the output distillation loss between the student generator and the second teacher generator.
  • the online loss of the student generator with respect to the second teacher generator contains: the output distillation loss between the student generator and the second teacher generator and the total variation loss of the output image of the student generator.
  • the objective loss function used to calculate the final loss value of the multi-grain online distillation of the student generator can be expressed as:
  • ⁇ CD represents the channel loss weight factor.
  • FIG. 5 is an example diagram of a model structure of a generative confrontation network provided by an embodiment of the present disclosure.
  • the generative adversarial network includes two teacher generators, a student generator G S and two discriminators that are partially shared.
  • the two teacher generators include a wider teacher generator and more in-depth teacher generators
  • the two discriminators share the previous multiple convolutional layers.
  • the teacher generator is the first teacher generator in the foregoing embodiment; compared with the student generator, the deeper teacher generator is equivalent to Multiple Resnet modules are inserted before and after the sampling layer of the student generator, and the depth of the model is greater than that of the student generator. It can be seen that the teacher generator is the second teacher generator in the foregoing embodiments.
  • the middle layer of the wider teacher generator and the middle layer of the student generator are connected with a channel convolution layer (not marked in Figure 5).
  • the channel convolution layer here is used to establish The mapping relationship between the channels in the middle layer of the wider teacher generator and the channel in the middle layer of the student generator facilitates the calculation of the channel distillation loss between the wider teacher generator and the student generator.
  • the sample image (the contour map of the high-heeled shoes in Fig. 5) is fed into the wider teacher generator, the student generator, and the deeper teacher generator respectively.
  • Determine the GAN loss for the wider teacher generator i.e., the first teacher generator in the previous example, by partially sharing the discriminator, the output image of the wider teacher generator, and the ground truth label (the reference image in the previous example).
  • the adversarial loss of which is used to adjust the wider teacher generator.
  • the GAN loss of the deeper teacher generator that is, the adversarial loss of the second teacher generator in the previous example, which is used to adjust the deeper teacher generator. teacher builder.
  • the channel distillation loss is calculated based on the difference between the wider teacher generator and the student generator in the middle layer. Based on the output image of the wider teacher generator, the output image of the student generator, and the output image of the deeper teacher generator, the output distillation loss between the wider teacher generator and the student generator, the deeper teacher generator The output distillation loss between the generator and the student generator.
  • the channel distillation loss and the two classes of output distillation losses are used to tune the student generator.
  • FIG. 6 is a structural block diagram of a data processing device provided in an embodiment of the present disclosure. For ease of description, only the parts related to the embodiments of the present disclosure are shown.
  • the data processing device includes: an acquisition module 601 and a processing module 602 .
  • the processing module 602 is configured to process the image through the first generator to obtain a processed image.
  • the data processing device is suitable for the generative confrontation network obtained by model distillation
  • the generative confrontation network includes the first generator, the second generator and the discriminator
  • the model distillation is to alternately train the first generator and the second generator , the model size of the first generator is smaller than the model size of the second generator.
  • an alternate training process of the first generator and the second generator in the generative confrontation network includes:
  • the first generator is adjusted.
  • determining the loss value of the second generator includes:
  • the loss value of the second generator is determined.
  • the network layer includes an input layer, an intermediate layer and an output layer, and the adjusted first generator is determined according to the sample image, the adjusted second generator and the first generator. Distillation losses between the second generator and the first generator, including:
  • the output distillation loss is determined, and the output distillation loss is the distillation loss between the output layer of the second generator and the output layer of the first generator .
  • the output distillation loss is determined according to the difference between the output image of the first generator and the output image of the second generator, including:
  • the output distillation loss is determined.
  • the perceptual loss includes a feature reconstruction loss and/or a style reconstruction loss, and feature extraction is performed on the output image of the first generator and the output image of the second generator through a feature extraction network to determine the first
  • the perceptual loss between the output image of the generator and the output image of the second generator including:
  • a feature reconstruction loss and/or a style reconstruction loss is determined based on the difference between the features of the output image of the first generator and the features of the output image of the second generator.
  • adjusting the first generator according to the distillation loss includes:
  • the first generator is adjusted.
  • adjusting the first generator according to the distillation loss and the total variation loss includes:
  • the first generator is adjusted according to the online loss of the first generator.
  • the first generator is a student generator and the second generator is a teacher generator.
  • the teacher generator includes a first teacher generator and a second teacher generator, the model capacity of the first teacher generator is larger than that of the student generator, and the model depth of the second teacher generator is larger than The model depth of the student generator.
  • the discriminator includes a first discriminator and a second discriminator, there is a shared convolutional layer between the first discriminator and the second discriminator, and the first generator and the second discriminator in the generative confrontation network
  • An alternate training process of the two generators includes:
  • the sample data includes the sample image and the reference image of the sample image
  • the network layer includes an input layer, an intermediate layer, and an output layer.
  • the adjusted first teacher generator and the adjusted second teacher generator Tuning student generators, including:
  • the sample image is processed respectively to obtain the output image of the student generator, the output image of the first teacher generator and the second teacher generator output image of
  • the first output distillation loss is determined, and the first output distillation loss is the distillation loss between the output layer of the first teacher generator and the output layer of the student generator ;
  • the channel distillation loss is determined, and the channel distillation loss is the difference between the middle layer of the first teacher generator and the middle layer of the student generator between distillation losses;
  • the second output distillation loss is determined, and the second output distillation loss is the distillation loss between the output layer of the second teacher generator and the output layer of the student generator ;
  • a channel convolution layer is connected between the middle layer of the first teacher generator and the middle layer of the student generator, and the channel convolution layer is used to establish the The mapping relationship between the channel and the channel of the intermediate layer of the student generator, according to the feature map output by the intermediate layer of the student generator and the feature map output by the intermediate layer of the first teacher generator, the channel distillation loss is determined, including:
  • the channel distillation loss is determined according to the difference between the attention weights of the channels mapped to each other.
  • adjusting the student generator according to the first output distillation loss, the channel distillation loss and the second output distillation loss includes:
  • the channel distillation loss is weighted to obtain the weighted result
  • the device provided in this embodiment can be used to implement the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, so this embodiment will not repeat them here.
  • the electronic device 700 may be a terminal device or a server.
  • the terminal equipment may include but not limited to mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA for short), tablet computers (Portable Android Device, PAD for short), portable multimedia players (Portable Media Player, referred to as PMP), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital TVs, desktop computers, etc.
  • PDA Personal Digital Assistant
  • PMP portable multimedia players
  • mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals)
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 7 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 700 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 708 loads the program in the random access memory (Random Access Memory, referred to as RAM) 703 to execute various appropriate actions and processes.
  • RAM Random Access Memory
  • various programs and data necessary for the operation of the electronic device 700 are also stored.
  • the processing device 701, ROM 702, and RAM 703 are connected to each other through a bus 704.
  • An input/output (I/O) interface 705 is also connected to the bus 704 .
  • an input device 706 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; ), a speaker, a vibrator, etc.
  • a storage device 708 including, for example, a magnetic tape, a hard disk, etc.
  • the communication means 709 may allow the electronic device 700 to communicate with other devices wirelessly or by wire to exchange data. While FIG. 7 shows electronic device 700 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts.
  • the computer program may be downloaded and installed from a network via communication means 709, or from storage means 708, or from ROM 702.
  • the processing device 701 When the computer program is executed by the processing device 701, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to execute the methods shown in the above-mentioned embodiments.
  • Computer program code for carrying out the operations of the present disclosure can be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external A computer (connected via the Internet, eg, using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the first obtaining unit may also be described as "a unit for obtaining at least two Internet Protocol addresses".
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • a data processing method is provided, which is suitable for a generative confrontation network obtained through model distillation, and the data processing method includes: acquiring an image to be processed; A generator processes the image to obtain the processed image; wherein, the generative confrontation network includes the first generator, the second generator, and the discriminator, and the model distillation is to alternately train the first A generator and the process of the second generator, the model size of the first generator is smaller than the model size of the second generator.
  • an alternate training process of the first generator and the second generator in the generative confrontation network includes: determining the second generator according to the sample data and the discriminator
  • the loss value of the sample data includes the sample image and the reference image corresponding to the sample image; according to the loss value of the second generator, adjust the second generator; according to the sample image, the adjusted
  • the second generator and the first generator determine an adjusted distillation loss between the second generator and the first generator; and adjust the first generator according to the distillation loss.
  • the determining the loss value of the second generator according to the sample data and the discriminator includes: processing the sample image by the second generator, Obtain the output image of the second generator; use the discriminator to determine the authenticity of the reference image corresponding to the sample image and the output image of the second generator, and determine the adversarial loss of the second generator ; Determine the reconstruction loss of the second generator according to the difference between the reference image corresponding to the sample image and the output image of the second generator; determine the second generator according to the confrontation loss and the reconstruction loss The generator's loss value.
  • the network layer includes an input layer, an intermediate layer and an output layer, and according to the sample image, the adjusted second generator and the The first generator determines the distillation loss between the adjusted second generator and the first generator, including: processing the sample image through the first generator and the adjusted second generator respectively , get the output image of the first generator and the output image of the second generator; according to the difference between the output image of the first generator and the output image of the second generator, determine the output distillation loss, the output distillation loss is the distillation loss between the output layer of the second generator and the output layer of the first generator.
  • the determining the output distillation loss according to the difference between the output image of the first generator and the output image of the second generator includes:
  • the perceptual loss includes a feature reconstruction loss and/or a style reconstruction loss, and the output image of the first generator and the output image of the second generator through a feature extraction network
  • the output images are respectively subjected to feature extraction, and the perceptual loss between the output image of the first generator and the output image of the second generator is determined, including:
  • the feature reconstruction loss is determined and/or the style reconstruction loss.
  • the adjusting the first generator according to the distillation loss includes: determining the total variation loss of the output image of the first generator; and the total variation loss, adjust the first generator.
  • the adjusting the first generator according to the distillation loss and the total variation loss includes: weighting the distillation loss and the total variation loss The sum is obtained to obtain the online loss of the first generator; and the first generator is adjusted according to the online loss of the first generator.
  • the first generator is a student generator
  • the second generator is a teacher generator
  • the teacher generator includes a first teacher generator and a second teacher generator, the model capacity of the first teacher generator is larger than the model capacity of the student generator, so The model depth of the second teacher generator is greater than the model depth of the student generator.
  • the discriminator includes a first discriminator and a second discriminator, there is a shared convolutional layer between the first discriminator and the second discriminator, and the generating
  • An alternate training process of the first generator and the second generator in the formula confrontation network includes: determining the loss value of the first teacher generator according to the sample data and the first discriminator, and the sample data includes the sample image and a reference image of the sample image; adjust the first teacher generator based on the loss value of the first teacher generator; determine the second teacher generator based on the sample data and the second discriminator according to the loss value of the second teacher generator, adjust the second teacher generator; according to the sample image, the adjusted first teacher generator and the adjusted second teacher generator, adjust The Student Builder.
  • the network layer includes an input layer, an intermediate layer, and an output layer
  • adjusting the student generator includes: respectively processing the sample image through the student generator, the adjusted first teacher generator, and the adjusted second teacher generator to obtain The output image of the student generator, the output image of the first teacher generator, and the output image of the second teacher generator; according to the output image of the student generator and the output of the first teacher generator image, determine the first output distillation loss, the first output distillation loss is the distillation loss between the output layer of the first teacher generator and the output layer of the student generator; according to the middle of the student generator
  • the feature map output by the layer and the feature map output by the intermediate layer of the first teacher generator determine the channel distillation loss, and the channel distillation loss is the intermediate layer between the intermediate layer of the first teacher generator and the student generator Distillation loss between layers; from the output image of the student generator and the output image of the second teacher generator, determine a second
  • a channel convolution layer is connected between the middle layer of the first teacher generator and the middle layer of the student generator, and the channel convolution layer is used to establish the first The mapping relationship between the channel in the middle layer of the teacher generator and the channel of the middle layer of the student generator, the feature map output according to the middle layer of the student generator and the first teacher generator
  • the feature map output by the middle layer determining the channel distillation loss, including: according to the feature map output by each channel in the middle layer of the student generator, determining the attention weight of each channel in the middle layer of the student generator;
  • the adjusting the student generator according to the first output distillation loss, the channel distillation loss and the second output distillation loss includes: according to the channel loss weighting factor , weighting the channel distillation loss to obtain a weighted result; adjusting the student generator according to the weighted result, the first output distillation loss, and the second output distillation loss.
  • a data processing device suitable for a generative confrontation network obtained through model distillation includes: an acquisition module, configured to acquire An image; a processing module, configured to process the image through a first generator to obtain a processed image; wherein the generative confrontation network includes the first generator, a second generator and a discriminator,
  • the model distillation is a process of alternately training the first generator and the second generator, the model size of the first generator being smaller than the model size of the second generator.
  • an alternate training process of the first generator and the second generator in the generative confrontation network includes: determining the second generator according to the sample data and the discriminator
  • the loss value of the sample data includes the sample image and the reference image corresponding to the sample image; according to the loss value of the second generator, adjust the second generator; according to the sample image, the adjusted
  • the second generator and the first generator determine an adjusted distillation loss between the second generator and the first generator; and adjust the first generator according to the distillation loss.
  • the determining the loss value of the second generator according to the sample data and the discriminator includes: processing the sample image by the second generator, Obtain the output image of the second generator; use the discriminator to determine the authenticity of the reference image corresponding to the sample image and the output image of the second generator, and determine the adversarial loss of the second generator ; Determine the reconstruction loss of the second generator according to the difference between the reference image corresponding to the sample image and the output image of the second generator; determine the second generator according to the confrontation loss and the reconstruction loss The generator's loss value.
  • the network layer includes an input layer, an intermediate layer and an output layer, and according to the sample image, the adjusted second generator and the The first generator determines the distillation loss between the adjusted second generator and the first generator, including: processing the sample image through the first generator and the adjusted second generator respectively , get the output image of the first generator and the output image of the second generator; according to the difference between the output image of the first generator and the output image of the second generator, determine the output distillation loss, the output distillation loss is the distillation loss between the output layer of the second generator and the output layer of the first generator.
  • the determining the output distillation loss according to the difference between the output image of the first generator and the output image of the second generator includes:
  • the perceptual loss includes a feature reconstruction loss and/or a style reconstruction loss, and the output image of the first generator and the output image of the second generator through a feature extraction network
  • the output images are respectively subjected to feature extraction, and the perceptual loss between the output image of the first generator and the output image of the second generator is determined, including:
  • the feature reconstruction loss is determined and/or the style reconstruction loss.
  • the adjusting the first generator according to the distillation loss includes: determining a total variation loss of the output image of the first generator; according to the distillation loss and the total variation loss, adjust the first generator.
  • the adjusting the first generator according to the distillation loss and the total variation loss includes: weighting the distillation loss and the total variation loss The sum is obtained to obtain the online loss of the first generator; and the first generator is adjusted according to the online loss of the first generator.
  • the first generator is a student generator
  • the second generator is a teacher generator
  • the teacher generator includes a first teacher generator and a second teacher generator, the model capacity of the first teacher generator is larger than the model capacity of the student generator, so The model depth of the second teacher generator is greater than the model depth of the student generator.
  • the discriminator includes a first discriminator and a second discriminator, there is a shared convolutional layer between the first discriminator and the second discriminator, and the generating
  • An alternate training process of the first generator and the second generator in the formula confrontation network includes: determining the loss value of the first teacher generator according to the sample data and the first discriminator, and the sample data includes the sample image and a reference image of the sample image; adjust the first teacher generator based on the loss value of the first teacher generator; determine the second teacher generator based on the sample data and the second discriminator according to the loss value of the second teacher generator, adjust the second teacher generator; according to the sample image, the adjusted first teacher generator and the adjusted second teacher generator, adjust The Student Builder.
  • the network layer includes an input layer, an intermediate layer, and an output layer
  • adjusting the student generator includes: respectively processing the sample image through the student generator, the adjusted first teacher generator, and the adjusted second teacher generator to obtain The output image of the student generator, the output image of the first teacher generator, and the output image of the second teacher generator; according to the output image of the student generator and the output of the first teacher generator image, determine the first output distillation loss, the first output distillation loss is the distillation loss between the output layer of the first teacher generator and the output layer of the student generator; according to the middle of the student generator
  • the feature map output by the layer and the feature map output by the intermediate layer of the first teacher generator determine the channel distillation loss, and the channel distillation loss is the intermediate layer between the intermediate layer of the first teacher generator and the student generator Distillation loss between layers; from the output image of the student generator and the output image of the second teacher generator, determine a second
  • a channel convolution layer is connected between the middle layer of the first teacher generator and the middle layer of the student generator, and the channel convolution layer is used to establish the first The mapping relationship between the channel in the middle layer of the teacher generator and the channel of the middle layer of the student generator, the feature map output according to the middle layer of the student generator and the first teacher generator
  • the feature map output by the middle layer determining the channel distillation loss, including: according to the feature map output by each channel in the middle layer of the student generator, determining the attention weight of each channel in the middle layer of the student generator;
  • the adjusting the student generator according to the first output distillation loss, the channel distillation loss and the second output distillation loss includes: according to the channel loss weighting factor , weighting the channel distillation loss to obtain a weighted result; adjusting the student generator according to the weighted result, the first output distillation loss, and the second output distillation loss.
  • an electronic device including: at least one processor and a memory;
  • the memory stores computer-executable instructions
  • the at least one processor executes the computer-executed instructions stored in the memory, so that the at least one processor executes the data processing method described in the above first aspect and various possible designs of the first aspect.
  • a computer-readable storage medium stores computer-executable instructions, and when a processor executes the computer-executable instructions, Realize the data processing method described in the above first aspect and various possible designs of the first aspect.
  • a computer program product includes computer-executable instructions, and when a processor executes the computer-executable instructions, the first aspect and Various possible designs of the data processing method in the first aspect.
  • the embodiments of the present disclosure provide a computer program, the computer program includes computer-executable instructions, and when the processor executes the computer-executable instructions, the above-mentioned first aspect and various possible designs of the first aspect can be realized.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本公开实施例提供一种数据处理方法及设备,该方法适用于通过模型蒸馏得到的生成式对抗网络,生成式对抗网络包括第一生成器、第二生成器和判别器,模型蒸馏为交替训练第一生成器和第二生成器的过程,第一生成器的模型规模小于第二生成器的模型规模。该方法包括:获取待处理的图像;通过第一生成器对图像进行处理,得到处理后的图像。从而,舍弃复杂的多阶段模型压缩过程,采用模型蒸馏实现模型的一步压缩,提高生成式对抗网络的模型压缩效率;通过模型蒸馏中第一生成器和第二生成器的交替训练,确保模型压缩后的生成器的图像处理效果。

Description

数据处理方法及设备
相关申请交叉引用
本申请要求于2021年7月15日提交中国专利局、申请号为202110802048.4、发明名称为“数据处理方法及设备”的中国专利申请的优先权,其全部内容通过引用并入本文。
技术领域
本公开实施例涉及计算机与网络通信技术领域,尤其涉及一种数据处理方法及设备。
背景技术
目前,随着硬件技术进步和业务发展,在轻量级设备(例如手机、智能可穿戴设备)上部署深度学习网络逐渐成为深度学习网络发展的趋势之一。鉴于深度学习网络的网络结构通常较为复杂、计算量较大,将深度学习网络部署在轻量级设备之前需对深度学习网络进行模型压缩。
目前,对深度学习网络进行模型压缩,是将深度学习网络的模型压缩过程制定为一个多阶段的任务,该任务包括网络结构搜索、蒸馏、剪枝、量化等多个操作。在将生成式对抗网络(Generative Adversarial Networks,GAN)部署在轻量级设备上时,若采用上述模型压缩过程对GAN进行模型压缩,会导致模型压缩的时间成本较高且对计算资源的要求较高。
发明内容
本公开实施例提供一种数据处理方法及设备,以提高生成式对抗网络的模型压缩效率,实现在轻量级设备上通过生成对抗网络对图像进行处理。
第一方面,本公开实施例提供一种数据处理方法,适用于通过模型蒸馏得到的生成式对抗网络,所述数据处理方法包括:
获取待处理的图像;
通过第一生成器对所述图像进行处理,得到处理后的图像;
其中,所述生成式对抗网络包括所述第一生成器、第二生成器和判别器,所述模型蒸馏为交替训练所述第一生成器和所述第二生成器的过程,所述第一生成器的模型规模小于所述第二生成器的模型规模。
第二方面,本公开实施例提供一种数据处理设备,适用于通过模型蒸馏得到的生成式对抗网络,所述数据处理设备包括:
获取模块,用于获取待处理的图像;
处理模块,用于通过第一生成器对所述图像进行处理,得到处理后的图像;
其中,所述生成式对抗网络包括所述第一生成器、所述第二生成器和判别器,所述模型蒸馏为交替训练所述第一生成器和所述第二生成器的过程,所述第一生成器的模型规模小于所述第二生成器的模型规模。
第三方面,本公开实施例提供一种电子设备,包括:至少一个处理器和存储器;
所述存储器存储计算机执行指令;
所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第一方面以及第一方面各种可能的设计所述的数据处理方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的数据处理方法。
第五方面,本公开实施例提供一种计算机程序产品,所述计算机程序产品包含计算机执行指令,当处理器执行所述计算机执行指令时,实现如上述第一方面以及第一方面各种可能的设计所述的数据处理方法。
第六方面,本公开实施例提供一种计算机程序,所述计算机程序包含计算机执行指令,当处理器执行所述计算机执行指令时,实现如上述第一方面以及第一方面各种可能的设计所述的数据处理方法。
本实施例提供的数据处理方法及设备中,生成式对抗网络包括第一生成器、第二生成器和判别器,第一生成器的模型规模小于第二生成器的模型规模,在模型蒸馏过程中,对生成对抗网络中的第一生成器和第二生成器进行交替训练,在每次训练过程中,通过优化后的第二生成器引导第一生成器的训练。通过模型蒸馏得到的第一生成器对待处理的图像进行处理。
从而,在本实施例中:一方面,摒弃多阶段的模型压缩过程,实现仅模型蒸馏这一阶段的模型压缩,降低了模型压缩的复杂性,提高了模型压缩的效率;另一方面,通过模型蒸馏过程中第一生成器和第二生成器交替训练这一在线蒸馏方式,提高了第一生成器的模型训练效果,提高了经第一生成器处理的图像的质量。最终得到的第一生成器在模型规模上能够适应于计算能力较弱的轻量级设备,且经第一生成器确保处理后的图像的质量较佳。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图做一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种应用场景的示例图;
图2为本公开实施例提供的数据处理方法流程示意图;
图3为本公开实施例提供的数据处理方法中生成式对抗网络的一次训练过程的流程示意图一;
图4为本公开实施例提供的数据处理方法中生成式对抗网络的一次训练过程的流程示意图二;
图5为本公开实施例提供的生成式对抗网络的模型结构示例图;
图6为本公开实施例提供的数据处理设备的结构框图;
图7为本公开实施例提供的电子设备的硬件结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开 一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
参考图1,图1为本公开实施例提供的一种应用场景的示例图。
图1所示的应用场景为图像处理场景。在图像处理场景中,涉及的设备包括终端101和服务器102,终端101与服务器102例如通过网络进行通信。服务器102用于训练深度学习模型,并将训练后的深度学习模型部署至终端101。终端101通过深度学习模型进行图像处理。
其中,深度学习模型为生成式对抗网络(Generative Adversarial Networks,GAN)。服务器102将训练后的生成式对抗网络中的生成器部署至终端。
终端101属于轻量级设备(例如摄像头、手机、智能家电),计算能力较弱,适合部署规模较小的深度学习模型。所以,如何得到适应于部署在轻量级设备上的规模较小的生成器并提高规模较小的生成器的图像处理效果,是目前亟需解决的问题之一。
模型压缩为训练模型规模较小的深度学习模型的方式之一。然而,目前用于生成式对抗网络的模型压缩方式仍存在以下不足之处:
1)深度学习领域成熟的模型压缩技术不是针对生成式对抗网络定制的,缺乏对生成式对抗网络的复杂特性和结构的探索;2)模型压缩过程包括网络结构搜索、蒸馏、剪枝、量化等多个阶段,时间要求和计算资源要求较高;3)压缩后的生成式对抗网络消耗的计算资源较高,难以应用于轻量级设备。
为解决上述问题,本公开实施例提供了一种数据处理方法。在该方法中设计了一种适合生成式对抗网络的模型压缩方式,在该方式中,通过模型蒸馏实现生成式对抗网络的一步压缩,降低了模型压缩的复杂度,提高了模型压缩的效率,在模型蒸馏过程中,通过规模较小的生成器与规模较大的生成器交替训练的在线蒸馏方式,提高规模较小的生成器的训练效果。最终,得到的规模较小的生成器,能够适用于轻量级设备、且处理后的图像质量较佳。
示例性的,本公开实施例提供的数据处理方法可以应用在终端或者服务器中。该方法应用在终端时,可以实现对终端所采集图像的实时处理。该方法应用在服务器时,可以实现对终端所发送图像的处理。其中,终端设备可以是个人数字处理(personal digital assistant,简称PDA)设备、具有无线通信功能的手持设备(例如智能手机、平板电脑)、计算设备(例如个人电脑(personal computer,简称PC))、车载设备、可穿戴设备(例如智能手表、智能手环)、以及智能家居设备(例如智能显示设备)等。
参考图2,图2为本公开实施例提供的数据处理方法的流程示意图一。如图2所示,该数据处理方法包括:
S201、获取待处理的图像。
一示例中,待处理的图像可为终端实时拍摄到的图像、或者从终端实时拍摄的视频中获取一帧或者多帧图像。
又一示例中,待处理的图像可为用户输入或者用户选中的图像。例如,用户在终端的显示界面上输入待处理检测的图像,或者选择待检测的图像。或者,服务器接收到终端发送的用户输入或者用户选中的图像。
又一示例中,待处理的图像可为终端上实时播放的图像。例如,终端在检测到图像或者视频播放时,获取正在播放的图像或者视频帧。从而,实现对终端上实时播放的图像的处理。
又一示例中,待处理的图像为终端和/或服务器上预先存储的数据库中的图像。例如,在 终端预先建立存储多个待处理的图像的数据库,在进行图像处理时,从该数据库中获取待处理的图像。
S202、通过第一生成器对图像进行处理,得到处理后的图像。
在生成式对抗网络中,包括第一生成器、第二生成器和判别器,第一生成器的模型规模小于第二生成器的模型规模,所以,相较于第一生成器,第二生成器的图像处理能力更强,能够提取到图像更细致的特征,处理得到质量较高的图像,图像处理过程所需要的计算资源也更多。
在生成式对抗网络的训练过程中,采用模型蒸馏的方式对生成式对抗网络进行训练。在模型蒸馏过程中,对第一生成器和第二生成器进行交替训练,即对第一生成器和第二生成器进行在线蒸馏,以利用优化后的第二生成器引导第一生成器的优化,使得比第二生成器的模型规模小的第一生成器在图像处理质量上能够逼近第二生成器。
生成式对抗网络的训练过程可在服务器上进行。考虑到终端的计算能力较弱,可将模型蒸馏后的模型规模较小的第一生成器,部署在终端。
本步骤中,在图像处理过程中,将待处理的图像直接输入第一生成器,或者将待处理的图像经过裁剪、去噪、增强等预处理操作后,再输入第一生成器,得到第一生成器输出的处理后的图像。
本公开实施例中,通过第一生成器和第二生成器的在线蒸馏方式,实现生成式对抗网络的模型压缩,提高了模型压缩的效率和效果,得到模型规模较小、处理得到的图像质量较高的第一生成器,尤其适合部署在轻量级设备上用于图像处理,提高轻量级设备上图像的处理效率和质量。
需要说明以下几点:生成式对抗网络的训练过程与生成式对抗网络应用于图像处理的过程分开进行,例如,在服务器上进行生成式对抗网络的训练后,在终端上部署训练后的学生生成器,通过学生生成器进行图像处理。服务器每次更新生成式对抗网络后,可在终端重新部署学生生成器。
下面,通过实施例对生成式对抗网络的训练过程进行描述。
参考图3,图3为本公开实施例提供的数据处理方法中生成式对抗网络的一次训练过程的流程示意图一,即生成式对抗网络中第一生成器与第二生成器的一次交替训练过程的流程示意图。如图3所示,生成式对抗网络中第一生成器与第二生成器的一次交替训练过程,包括:
S301:根据样本数据和判别器,确定第二生成器的损失值。
其中,样本数据包括样本图像和样本图像对应的参考图像。例如,在图像的深度估计中,样本数据包括样本图像和样本图像的真实深度图;在图像的人脸识别中,样本图像包括样本图像和样本图像的真实的人脸标记图,比如在人脸标记图中可由人工手动标记各个人脸的位置。
本步骤中,通过第二生成器对样本图像进行处理,得到第二生成器输出的处理后的样本图像,为简洁描述,后续称为第二生成器的输出图像。通过判别器,对样本图像对应的参考图像和第二生成器的输出图像进行真假判别,确定第二生成器的对抗损失。其中,在训练过程中,第二生成器使得自身的输出图像逼近样本图像对应的参考图像,判别器尽量区分出第二生成器的输出图像和样本图像对应的参考图像,第二生成器的对抗损失反映判别器对第二生成器的输出图像和样本图像对应的参考图像进行真假判别的损失值。
在通过判别器,对样本图像对应的参考图像和第二生成器的输出图像进行判别的过程中,将样本图像对应的参考图像,输入判别器,将第二生成器的输出图像,输入判别器,由判别器分别判别样本图像对应的参考图像、第二生成器的输出图像是否来自样本数据。最终,根据样本图像对应的参考图像输入判别器时判别器的输出、第二生成器的输出图像输入判别器时判别器的输出和对抗损失函数,计算教师生成器的对抗损失。
可选的,在判别器的输出为1表示判别器的输入数据来自样本数据,判别器的输出为0表示判别器的输入数据不来自样本数据的情形下,确定样本图像对应的参考图像输入判别器时判别器的输出的期望值,确定1减去第二生成器的输出图像输入判别器时判别器的输出得到的差值的期望值,将该两个期望值进行相加求和,得到第二生成器的对抗损失。
进一步的,用于计算第二生成器的对抗损失的对抗损失函数,表示为:
L GAN(G T,D)=E {x,y}[logD(x,y)]+E {x}[log(1-D(x,G T(x)))]。
其中,L GAN(G T,D)为对抗损失函数,G T表示第二生成器,D表示判别器,x表示样本图像,y表示样本图像对应的参考图像,G T(x)表示将样本图像x输入第二生成器后第二生成器的输出图像,E {x,y}[]表示在样本数据{x,y}下的期望函数、E {x}[]表示在样本数据x下的期望函数。
在一些实施例中,确定第二生成器的损失值为第二生成器的对抗损失。也即,直接采用上述计算得到的对抗损失,作为教第二生成器的损失值。
在一些实施例中,除对抗损失之外,第二生成器的损失值还包含第二生成器的重建损失。此时,S301的一种可能的实现方式包括:通过第二生成器对样本图像进行处理,得到第二生成器的输出图像;通过判别器对样本图像对应的参考图像和第二生成器的输出图像进行真假判别,确定第二生成器的对抗损失;根据样本图像对应的参考图像与第二生成器的输出图像的差异,确定第二生成器的损失值。从而,在第二生成器的损失值中,既考虑到判别器进行图像判别时的对抗损失,又考虑到反映样本图像对应的参考图像与第二生成器的输出图像之间差异的重建损失,提高第二生成器的损失值的全面性和准确性,进而提高第二生成器的训练效果。
可选的,在重建损失函数中,确定样本图像对应的参考图像与第二生成器的输出图像之间的差值,根据该差值,计算得到第二生成器的重建损失。
进一步的,用于计算第二生成器的重建损失的重建损失函数,表示为:
L recon(G T,D)=E {x,y}[y-G T(x)]。
其中,L recon(G T,D)为第二生成器的重建损失函数,y-G T(x)为样本图像对应的参考图像与第二生成器的输出图像之间的差值。
S302:根据第二生成器的损失值,调整第二生成器。
本步骤中,在得到第二生成器的损失值后,可根据优化目标函数,对第二生成器进行调整,完成第二生成器的一次训练。其中,优化目标函数,例如为最大化损失值的函数,或者,最小化损失值的函数,在第二生成器的调整过程中所采用的优化算法,例如梯度下降算法,在此对优化算法不做限制。
在一些实施例中,第二生成器的损失值包含第二生成器的对抗损失时,优化目标函数包括在判别器的基础上最大化该对抗损失和在教师生成器的基础上最小化该对抗损失。换句话说,在该优化目标函数中,判别器的优化方向为最大化该对抗损失,以提高判别器的判别能 力;第二生成器的优化目标为最小化该对抗损失,以通过第二生成器的输出图像逼近样本图像对应的参考图像,使得判别器判别第二生成器的输出图像来自样本数据。
在一些实施例中,第二生成器的损失值包含第二生成器的重建损失时,优化目标函数包括在第二生成器的基础上最小化该重建损失,即,通过调整第二生成器最小化该重建损失,促使第二生成器的输出图像逼近样本图像对应的参考图像,提高第二生成器的输出图像的图像质量。
可选的,在第二生成器的损失值包含第二生成器的对抗损失和第二生成器的重建损失时,第二生成器的优化目标函数表示为:
Figure PCTCN2022094556-appb-000001
其中,
Figure PCTCN2022094556-appb-000002
表示第二生成器训练过程中的优化目标函数,
Figure PCTCN2022094556-appb-000003
表示通过调整判别器最大化判别损失。
S303:根据样本图像、调整后的第二生成器和第一生成器,确定调整后的第二生成器与第一生成器之间的蒸馏损失。
本步骤中,通过调整后的第二生成器对样本图像进行处理,通过第一生成器对样本图像进行处理。由于第二生成器的模型规模大于第一生成器的模型规模,通过调整后的第二生成器对样本图像进行处理所得到的数据与通过第一生成器对样本图像进行处理所得到的数据之间存在差异,根据这些差异,确定调整后的第二生成器与第一生成器之间的蒸馏损失。
以下提供蒸馏损失以及蒸馏损失的确定过程的实施例。
在一些实施例中,调整后的第二生成器与第一生成器之间的蒸馏损失,包含调整后的第二生成器与第一生成器之间的输出蒸馏损失。在生成式对抗网络中,网络层包括输入层、中间层和输出层,输出蒸馏损失为第二生成器的输出层与第一生成器的输出层之间的蒸馏损失,反映第二生成器的输出图像与第一生成器的输出图像之间的差异。此时,S303的一种可能的实现方式包括:通过第一生成器、调整后的第二生成器,分别处理样本图像,得到第一生成器的输出图像和第二生成器的输出图像;根据第一生成器的输出图像与第二生成器的输出图像之间的差异,确定输出蒸馏损失。
其中,第一生成器的输出图像与第二生成器的输出图像之间的差异,可通过将第一生成器的输出图像与第二生成器的输出图像进行比较得到。例如,将第一生成器的输出图像中各个像素与第二生成器的输出图像中相应位置处的像素进行比较;又如,在图像的亮度、对比度等方面,对第一生成器的输出图像与第二生成器的输出图像进行比较。
从而,通过反映第一生成器的输出图像与第二生成器的输出图像之间的差异的输出蒸馏损失,引导第一生成器的优化,随着第一生成器的优化,第一生成器的输出图像逐渐逼近调整后的第二生成器的输出图像,有利于提高第一生成器处理图像的图像质量。
关于输出蒸馏损失:
一示例中,输出蒸馏损失包含第一生成器的输出图像与第二生成器的输出图像之间的结构化相似损失和/或感知损失。
其中,结构化相似损失与人类视觉系统(Human Visual System,HVS)对图像的观察相似,关注第一生成器的输出图像与第二生成器的输出图像的局部结构差异,包括图像的亮度、对比度等方面的差异。感知损失关注第一生成器的输出图像与第二生成器的输出图像在特征表示方面的差异。
具体的,可根据第二生成器的输出图像的亮度和对比度以及第一生成器的输出图像的亮度和对比度,确定第二生成器的输出图像与第一生成器的输出图像之间的结构相似化损失,例如,在图像的亮度、对比度上,分别比较第二生成器的输出图像与第一生成器的输出图像的差异,得到结构化相似损失。和/或,通过特征提取网络对第一生成器的输出图像和第二生成器的输出图像分别进行特征提取,确定第一生成器的输出图像与第二生成器的输出图像之间的感知损失,例如,将提取到的第一生成器的输出图像的特征与提取到的第二生成器的输出图像的特征进行比较,得到感知损失。
根据结构相似化损失和/或感知损失,确定输出蒸馏损失,例如,确定输出蒸馏损失为结构化相似损失,或者,确定输出蒸馏损失为感知损失,或者,对结构化相似损失和感知损失进行加权求和,得到输出蒸馏损失。
从而,通过结构化相似损失和/或感知损失,从人类视觉、特征表示等一个或多个方面,确定第一生成器的输出图像与第二生成器的输出图像之间的差异,提高输出蒸馏损失的全面性和准确性,提高第一生成器的训练效果。
可选的,确定结构相似化损失的过程包括:确定第二生成器的输出图像的亮度估计、第一生成器的输出图像的亮度估计、第二生成器的输出图像的对比度估计、第一生成器的输出图像的对比度估计、第二生成器的输出图像与第一生成器的输出图像之间的结构相似性估计;根据这些参数,确定第一生成器的输出图像和第二生成器的输出图像之间的结构相似化损失。
具体的,计算第二生成器的输出图像的像素均值、像素标准差,计算第一生成器的输出图像的像素均值、像素标准差,计算第二生成器的输出图像的像素与第一生成器的输出图像的像素之间的协方差。针对第二生成器,确定其输出图像的亮度估计、对比度估计分别为其输出图像的像素均值、像素标准差。同样的,针对第一生成器,确定其输出图像的亮度估计、对比度估计分别为其输出图像的像素均值、像素标准差。确定第二生成器的输出图像与第一生成器的输出图像之间的结构相似性估计,为第二生成器的输出图像的像素与第一生成器的输出图像的像素之间的协方差。
进一步的,用于计算结构化相似的结构相似化损失函数表示为:
Figure PCTCN2022094556-appb-000004
其中,L SSIM(p t,p s)表示结构相似化损失函数,p t、p s分别表示第二生成器的输出图像、第一生成器的输出图像,μ t、μ s分别表示第二生成器的输出图像的亮度估计、第一生成器的输出图像的亮度估计,
Figure PCTCN2022094556-appb-000005
分别表示第二生成器的输出图像的对比度估计、第一生成器的输出图像的对比度估计,σ ts表示第二生成器的输出图像与第一生成器的输出图像之间的结构相似性估计。
可选的,确定感知损失的过程包括:将第一生成器的输出图像和第二生成器的输出图像分别输入特征提取网络,得到特征提取网络的预设网络层输出的第一生成器的输出图像的特征和第二生成器的输出图像的特征;根据第一生成器的输出图像的特征与第二生成器的输出图像的特征之间的差异,确定特征重建损失和/或样式重建损失。
其中,感知损失包括特征重建损失和/或样式重建损失:特征重建损失用于反映第一生成器的输出图像的较为底层(或者较为具体)的特征表示与第二生成器的输出图像的较为底层的特征表示之间的差异,用于鼓励第一生成器的输出图像与第二生成器的输出图像具有相似的特征表示;样式重建损失用于反映第一生成器的输出图像的较为抽象的样式特征(例如颜 色、纹理、图案)与第二生成器的输出图像的较为抽象的样式特征之间的差异,用于鼓励第一的输出图像与第二生成器的输出图像具有相似的样式特征。
具体的,基于同一特征提取网络的不同网络层提取到的特征的抽象程度不同:获取用于提取底层特征的网络层所提取的第一生成器的输出图像的特征和第二生成器的输出图像的特征,根据第一生成器的输出图像的特征与第二生成器的输出图像的特征之间的差异,确定特征重建损失;获取用于提取抽象特征的网络层所提取的第一生成器的输出图像的特征和第二生成器的输出图像的特征,根据第一生成器的输出图像的特征与第二生成器的输出图像的特征之间的差异,确定样式重建损失。
或者,通过不同的特征提取网络进行图像特征的提取,其中,一个特征提取网络擅长提取底层的特征表示,另一个特征提取网络擅长提取抽象的样式特征。基于不同的特征提取网络所提取的第一生成器的输出图像的特征与第二生成器的特征,分别确定特征重建损失和样式重建损失。
可选的,特征提取网络为超分辨率测试序列(Visual Geometry Group,VGG)网络,其中,VGG网络是一个深度的卷积神经网络,可用于提取第一生成器的输出图像的特征与第二生成器的特征。从而,可同一VGG网络的不同网络层中或者从不同VGG网络的不同网络层中,得到第一生成器的输出图像和第二生成器的输出图像中不同抽象程度的特征。
进一步的,用于计算特征重建损失的特征重建损失函数表示为:
Figure PCTCN2022094556-appb-000006
其中,L fea(p t,p s)表示特征损失函数,用于计算第二生成器的输出图像p t与第一生成器p s的输出图像之间的特征重建损失,φ j(p t)表示VGG网络φ的第j层提取到的第二生成器的输出图像的特征激活值(即特征),φ j(p s)表示VGG网络φ的第j层提取到的第一生成器的输出图像的特征激活值。C j×H j×W j表示VGG网络φ的第j层输出的特征激活值的维数。
进一步的,用于计算样式重建损失的样式重建损失函数表示为:
Figure PCTCN2022094556-appb-000007
其中,L style(p t,p s)表示样式损失函数,用于计算第二生成器的输出图像p t与第一生成器p s的输出图像之间的样式重建损失,
Figure PCTCN2022094556-appb-000008
表示VGG网络φ的第j层提取到的第二生成器的输出图像的特征激活值的格拉姆矩阵(Gram matrix),
Figure PCTCN2022094556-appb-000009
表示VGG网络φ的第j层提取到的第一生成器的输出图像的特征激活值的格拉姆矩阵。
S304:根据蒸馏损失,调整第一生成器。
本步骤中,在得到第一生成器与第二生成器之间的蒸馏损失后,对该蒸馏损失进行反向传播,在反向传播的过程中调整第一生成器的模型参数,使得学习生成器朝着最小化蒸馏损失的方向进行优化。
一示例中,蒸馏损失包含输出蒸馏损失,对输出蒸馏损失进行反向传播,在反向传播的过程中调整第一生成器的模型参数,使得学习生成器朝着最小化输出蒸馏损失的方向进行优化。其中,输出蒸馏损失的概念和确定过程可参照前述步骤的描述,不再赘述。
又一示例中,除蒸馏损失外,第一生成器相对于第二生成器的在线损失还包括第一生成器的输出图像的总变差损失。其中,第一生成器的输出图像的总变差损失用来反映第一生成器的输出图像的空间平滑度,通过总变差损失对第一生成器进行优化,可以提高第一生成器的输出图像的空间平滑度,提高图像质量。
在第一生成器的在线损失包括第一生成器与第二生成器之间的蒸馏损失和第一生成器的输出图像的总变差损失的情形下,S304的一种可能的实现方式包括:对蒸馏损失和总变差损失进行加权求和,得到第一生成器的在线损失;根据第一生成器的在线损失,对第一生成器进行调整。其中,蒸馏损失和总变差损失分别对应的权重可由专业人员基于经验和实验过程进行确定。
从而,在第一生成器的训练过程中,通过结合蒸馏损失和总变差损失的方式,兼顾了第一生成器与第二生成器之间图像处理的数据差异和第一生成器输出的图像噪声情况,并通过蒸馏损失和总变差损失的加权方式,对蒸馏损失和总变差损失进行平衡,有利于提高第一生成器的训练效果。
进一步的,在蒸馏损失包括输出蒸馏损失,输出蒸馏损失包括第一生成器的输出图像与第二生成器的输出图像之间的结构相似化损失、感知损失,以及感知损失包括第一生成器的输出图像与第二生成器的输出图像之间的特征重建损失和样式重建损失的情形下,用于计算第一生成器的在线损失的在线蒸馏损失函数,表示为:
L kd(p t,p s)=λ ssimL ssimfeaL feastyleL styletvL tv
其中,L kd(p t,p s)表示第一生成器的损失函数,λ ssim、λ fea、λ style、λ tv分别表示为结构相似化损失L ssim对应的权重、特征重建损失L fea对应的权重、样式重建损失L style对应的权重、总变差损失L tv对应的权重。
综上,本实施例中,第二生成器与第一生成器在线蒸馏,即第二生成器与第一生成器同步进行训练。在每次训练过程中,第一生成器仅利用当前训练次数中调整后的第二生成器进行优化。一方面,实现第一生成器在有判别器的环境下进行训练,同时第一生成器又无需与判别器紧密绑定,使得第一生成器可以更加灵活的训练并获得进一步的压缩;另一方面,第一生成器的优化也无需真实标签,第一生成器仅学习具有相似结构且模型规模更大的第二生成器的输出,有效地降低了第一生成器拟合真实标签的难度。
在一些实施例中,第一生成器为学生生成器,第二生成器为教师生成器。其中,学生生成器与教师生成器的模型结构相似,教师生成器的模型规模和复杂程度大于学生生成器的模型规模和复杂程度,相较于学生生成器,教师生成器的学习能力更强,能够在蒸馏过程中更好地引导学生生成器的训练。
在一些实施例中,教师生成器包括第一教师生成器和第二教师生成器,其中,第一教师生成器的模型容量大于学生生成器的模型容量,第二教师生成器的模型深度大于学生生成器的模型深度。
因此,从两个互补的维度为学生生成器提供两个不同的教师生成器,可以在模型蒸馏过程中为学生生成模型提供互补的全面的蒸馏损失,具体如下:第一教师生成器从模型容量(即模型宽度,又称为模型的通道数)对学生生成器进行弥补,捕捉到学生生成器捕捉不到的更细节的图像信息;第二模型生成器从模型深度对学生生成器进行弥补,达到各更好的图像质量。除上述区别外,学生生成器与第一教师生成器、第二教师生成器在模型结构上整体相似,都是由包括个网络层的深度学习模型。
可选的,第一教师生成器的中间层的通道数为学生生成器的中间层的通道数的倍数,其中,倍数大于1。从而,通过倍数关系,简洁地建立第一教师生成器与学生生成器的关系,更有利于后续实施例中通道蒸馏损失的计算。
可选的,第二教师生成器的网络层数量多于学生生成器的网络层数量。
可选的,在构建第二教师生成器时,在学生生成器的每个上采样网络层和每个下采样网络层之前增加一个或多个网络层,得到第二教师生成器。
可选的,在构建第二教师生成器时,在学生生成器的每个上采样网络层和每个下采样网络层之前增加深度残差网络(Deep residual network,Resnet),得到第二教师生成器。从而,通过增加成熟的Resnet,提高第二教师生成器的训练效率,降低模型深度较大的第二教师生成器的训练难度。
一示例中,在生成式对抗网络的一次训练过程中,可根据样本数据和判别器,确定第一教师生成器的损失值,样本数据中包括样本图像和样本图像的参考图像;根据第一教师生成器的损失值,调整第一教师生成器;根据样本数据和判别器,确定第二教师生成器的损失值;根据第二教师生成器的损失值,调整第二教师生成器;根据样本图像、调整后的第一教师生成器以及调整后的第二教师生成器,调整学生生成器。
其中,第一教师生成器的调整、第二教师生成器的调整可参照前述实施例中第二生成器的调整,与前述实施例不同的是,在调整学生生成器时,需要确定第一教师生成器与学生生成器之间的蒸馏损失、第二教师生成器与学生生成器之间的蒸馏损失,根据第一教师生成器与学生生成器之间的蒸馏损失以及第二教师生成器与学生生成器之间的蒸馏损失,调整学生生成器。同样的,第一教师生成器与学生生成器之间的蒸馏损失的确定过程、第二教师生成器与学生生成器之间的蒸馏损失的确定过程,可以参照前述实施例中第二生成器与第一生成器之间的蒸馏损失的确定过程,不再赘述。
在一些实施例中,判别器包括第一判别器和第二判别器,第一判别器和第二判别器之间存在共享卷积层,在生成式对抗网络的模型训练过程中,第一教师生成器采用第一判别器,第二教师生成器采用第二判别器。从而,充分考虑到第一教师生成器与第二教师生成器的模型结构相似但又不完全相同,采用卷积层部分共享的第一判别器和第二判别器分别训练第一教师生成器和第二教师生成器,提高模型训练效果和效率。
参考图4,图4为本公开实施例提供的数据处理方法中生成式对抗网络的一次训练过程的流程示意图二,即生成式对抗网络中学生生成器、第一教师生成器、第二教师生成器的一次交替训练过程的流程示意图。
如图4所示,生成式对抗网络中学生生成器与第一教师生成器、第二教师生成器的一次交替训练过程,包括:
S401、根据样本数据和第一判别器,确定第一教师生成器的损失值。
其中,S401的实现原理和技术效果,可参照前述实施例中对根据样本图像和判别器确定第二生成器的损失值的描述,在此不再赘述。
可选的,第一教师生成器的损失值包括第一教师生成器的对抗损失。此时,用于计算第一教师生成器的对抗损失的对抗损失函数,可表示为:
Figure PCTCN2022094556-appb-000010
其中,
Figure PCTCN2022094556-appb-000011
为第一教师生成器的对抗损失函数,
Figure PCTCN2022094556-appb-000012
表示第一教师生成器,
Figure PCTCN2022094556-appb-000013
表示将样本图像x输入第一教师生成器后第一教师生成器的输出图像,D1表示第一判别器。
可选的,第一教师生成器的损失值还包含第一教师生成器的重建损失,此时,用于计算第一教师生成器的重建损失的重建损失函数,可表示为:
Figure PCTCN2022094556-appb-000014
其中,
Figure PCTCN2022094556-appb-000015
为第一教师生成器的重建损失函数,
Figure PCTCN2022094556-appb-000016
为样本图像对应的参考图像与第一教师生成器的输出图像之间的差值。
S402、根据第一教师生成器的损失值,调整第一教师生成器。
其中,S402的实现原理和技术效果,可参照前述实施例中对根据第二生成器的损失值调整第二生成器的描述,在此不再赘述。
可选的,在第一教师生成器的损失值包含第一教师生成器的对抗损失和第一教师生成器的重建损失时,第一教师生成器的优化目标函数表示为:
Figure PCTCN2022094556-appb-000017
其中,
Figure PCTCN2022094556-appb-000018
表示第一教师生成器训练过程中的优化目标函数,
Figure PCTCN2022094556-appb-000019
表示通过调整第一判别器最大化判别损失。
S403、根据样本数据和第二判别器,确定第二教师生成器的损失值。
其中,S403的实现原理和技术效果,可参照前述实施例中对根据样本图像和判别器确定第二生成器的损失值的描述,在此不再赘述。
可选的,第二教师生成器的损失值包括第二教师生成器的对抗损失。此时,用于计算第二教师生成器的对抗损失的对抗损失函数,可表示为:
Figure PCTCN2022094556-appb-000020
其中,
Figure PCTCN2022094556-appb-000021
为第二教师生成器的对抗损失函数,
Figure PCTCN2022094556-appb-000022
表示第一教师生成器,
Figure PCTCN2022094556-appb-000023
表示将样本图像x输入第二教师生成器后第二教师生成器的输出图像,D2表示第二判别器。
可选的,第二教师生成器的损失值还包含第二教师生成器的重建损失,此时,用于计算第二教师生成器的重建损失的重建损失函数,可表示为:
Figure PCTCN2022094556-appb-000024
其中,
Figure PCTCN2022094556-appb-000025
为第一教师生成器的重建损失函数,
Figure PCTCN2022094556-appb-000026
为样本图像对应的参考图像与第一教师生成器的输出图像之间的差值。
S404、根据第二教师生成器的损失值,调整第二教师生成器。
其中,S404的实现原理和技术效果,可参照前述实施例中对根据第二生成器的损失值调整第二生成器的描述,在此不再赘述。
可选的,在第二教师生成器的损失值包含第二教师生成器的对抗损失和第二教师生成器的重建损失时,第二教师生成器的优化目标函数表示为:
Figure PCTCN2022094556-appb-000027
其中,
Figure PCTCN2022094556-appb-000028
表示第二教师生成器训练过程中的优化目标函数,
Figure PCTCN2022094556-appb-000029
表示通过调整第二判别器最大化判别损失。
S405、根据样本图像、调整后的第一教师生成器以及调整后的第二教师生成器,调整学生生成器。
本步骤中,通过调整后的第一教师生成器、调整后的第二教师生成器、学生生成器分别对样本图像进行处理。第一教师生成器的模型容量大于学生生成器的模型容量,第二教师生成器的模型深度大于学生生成器的模型深度,因此:通过调整后的第一教师生成器对样本图像进行处理所得到的数据、通过调整后的第二教师生成器对样本图像进行处理所得到的数据分别与通过学生生成器进行处理所得到的数据之间存在差异,通过这些差异,可确定调整后的第一教师生成器、调整后的第二教师生成器分别与学生生成器之间的蒸馏损失。进而,根 据调整后的第一教师生成器、调整后的第二教师生成器分别与学生生成器之间的蒸馏损失,调整学生生成器,在每次训练过程中,基于优化的第一教师生成器、优化的第二教师生成器引导学生生成器的优化,综合第一教师生成器和第二教师生成器,提高学生生成器的训练效果。
在一些实施例中,第一教师生成器与学生生成器之间的蒸馏损失包括第一教师生成器与学生生成器之间的输出蒸馏损失,即第一教师生成器的输出层与学生生成器的输出层之间的蒸馏损失。进一步的,该输出蒸馏损失可包括:第一教师生成器的输出图像与学生生成器的输出图像之间的结构相似性损失和/或感知损失。进一步的,该感知损失可包括:第一教师生成器的输出图像与学生生成器的输出图像之间的特征重建损失和/或样式重建损失。
在一些实施例中,第二教师生成器与学生生成器之间的蒸馏损失包括第二教师生成器与学生生成器之间的输出蒸馏损失,即第二教师生成器的输出层与学生生成器的输出层之间的蒸馏损失。进一步的,该输出蒸馏损失可包括:第二师生成器的输出图像与学生生成器的输出图像之间的结构相似性损失和/或感知损失。进一步的,该感知损失可包括:第二教师生成器的输出图像与学生生成器的输出图像之间的特征重建损失和/或样式重建损失。
其中,第一教师生成器、第二教师生成器分别与学生生成器之间的蒸馏损失的相关内容可参照前述实施例中第二生成器与第一生成器之间的蒸馏损失的详细描述,在此不再赘述。
在一些实施例中,第一教师生成器的模型深度与学生生成器的模型深度相同,相较于学生生成器,第一教师生成器的模型容量更大,也即卷积层的通道数更多,能够捕捉到学生生成器无法捕捉到的细节。考虑到教师生成器与学生生成器之间的蒸馏损失若仅包括输出蒸馏损失,也即,在模型蒸馏过程中,仅蒸馏了教师生成器的输出层的信息,或者说,仅考虑了教师生成器的输出图像与学生生成器的输出图像之间从差异,而未考虑教师生成器的中间层的信息。因此,可以基于第一教师生成器的结构特点,利用第一教师生成器的中间层的信息,也即利用通道粒度的信息,作为学生生成器的优化过程的监督信号之一,进一步提升学生生成器的训练效果。
可选的,第一教师生成器与学生生成器之间的蒸馏损失包含第一教师生成器与学生生成器之间的输出蒸馏损失和通道蒸馏损失,其中,通道蒸馏损失为第一教师生成器的中间层与学生生成器的中间层之间的蒸馏损失,反映第一教师生成器的中间层提取的样本图像的特征与学生生成器的中间层提取的样本图像的特征之间的差异。从而,结合输出蒸馏损失、通道蒸馏损失作为学生生成器优化的监督信息,实现多粒度的模型蒸馏,提高模型蒸馏效果。
此时,S405的一种可能的实现方式包括:通过学生生成器、调整后的第一教师生成器、调整后的第二教师生成器,分别处理样本图像,得到学生生成器的输出图像和第一教师生成器的输出图像,确定第一输出蒸馏损失,第一输出蒸馏损失为第一教师生成器的输出层与学生生成器的输出层之间的蒸馏损失;根据学生生成器的中间层输出的特征图和第一教师生成器的中间层输出的特征图,确定通道蒸馏损失;根据学生生成器的输出图像和第二教师生成器的输出图像,确定第二输出蒸馏损失,第二输出蒸馏损失为第二教师生成器的输出层与学生生成器的输出层之间的蒸馏损失;根据第一输出蒸馏损失、通道蒸馏损失以及第二输出蒸馏损失,调整学生生成器。
其中,学生生成器的中间层输出的特征图,是指经学生生成器的中间层提取到的样本图像的特征,包括学生生成器的中间层中各个通道输出的特征映射值。同样的,第一教师生成 器的中间层输出的特征图,是指经第一教师生成器的中间层提取到的样本图像的特征,包括第一教师生成器的中间层中各个通道输出的特征映射值。
具体的,由于学生生成器的模型深度与第一教师生成器的模型深度相同,针对各中间层,确定学生生成器的中间层输出的特征图与第一教师生成器的中间层输出的特征图之间的差异,根据该差异,确定通道蒸馏损失。在得到第一输出蒸馏损失、通道蒸馏损失以及第二输出蒸馏损失后,根据第一输出蒸馏损失、通道蒸馏损失和第二输出蒸馏损失,调整学生生成器。
考虑到第一教师生成器的模型容量大于学生生成器的模型容量,第一教师生成器的中间层的通道数大于学生生成器的中间层的通道数,在一些实施例中,第一教师生成器的中间层与学生生成器的中间层之间连接有通道卷积层,通道卷积层用于建立第一教师生成器的中间层的通道与学生生成器的中间层的通道之间的映射关系。从而,基于通道卷积层,针对学生生成器的中间层的每个通道,在第一教师生成器的中间层的通道中都存在与之对应的通道,在不改变学生生成器的中间层的通道数的前提下,通过通道卷积层,在通道蒸馏损失的确定过程中实现学生生成器的中间层的通道数扩展。
此时,根据学生生成器的中间层输出的特征图和第一教师生成器的中间层输出的特征图,确定通道蒸馏损失的一种可能的实现方式包括:根据学生生成器的中间层中的各通道输出的特征图,确定学生生成器的中间层中各通道的注意力权重;根据第一教师生成器的中间层中各通道输出的特征图,确定第一教师生成器的中间层中各通道的注意力权重;在学生生成器的中间层和第一教师生成器的中间层中,根据相互映射的通道的注意力权重之间的差异,确定通道蒸馏损失。其中,通道的注意力权重用于衡量通道的重要性。
具体的,在学生生成器中,针对中间层中的各通道,可基于通道输出的特征图上的像素,计算通道的注意力权重,例如,将通道输出的特征图上所有像素的总和或者均值,确定为该通道的注意力权重。在第一教师生成器中,针对中间层中的各通道,可基于通道输出的特征图上的像素,计算通道的注意力权重。在学生生成器的中间层的各通道中和第一教师生成器的中间层的各通道中,确定相互映射的通道,将相互映射的通道的注意力权重进行比较,确定相互映射的通道的注意力权重之间的差异,进而确定通道蒸馏损失。
可选的,通道卷积层为1*1的可学习卷积层,通过1*1的可学习卷积层将学生生成器的中间层的通道映射到第一教师生成器相应的中间层的通道,使得学生生成器的中间层的通道数升维至与学生生成器相应的中间层的通道数一致。
可选的,在学生生成器中,可针对中间层的各通道,根据通道输出的特征图上的各像素以及通道输出的特征图的尺寸,计算通道的注意力权重。
在根据通道输出的特征图上的各像素以及通道输出的特征图的尺寸的过程中,可将通道输出的特征图上每行像素相加,得到各行对应的像素和;将各行对应的像素和相加,得到特征图对应的像素总和;根据特征图的尺寸,对像素总和求平均,得到通道的注意力权重。
进一步的,通道的注意力权重的计算公式可表示为:
Figure PCTCN2022094556-appb-000030
其中,w c表示通道c的注意力权重,H为通道C输出的特征图的高,W为通道C输出的特征图的宽,u c(i,j)为特征图上位置为(i,j)的像素。
其中,第一教师生成器的中间层中各通道的注意力权重的确定过程,可参照学生生成器的相关描述,不再赘述。
可选的,针对学生生成器的各中间层和教师生成器的各中间层中,确定相互映射的各对通道的注意力权重之间的差值,根据相互映射的各对通道的注意力权重之间的差值、学生生成器的中间层和第一教师生成器的中间层中特征图的采样数量、特征图映射的通道数,确定通道蒸馏损失。从而,不仅考虑到相互映射的各对通道的注意力权重,还考虑到中间层的通道数、中间层中采样的特征图的数量,提高了通道蒸馏损失的准确性。
其中,用于计算通道蒸馏损失的通道蒸馏损失函数可表示为:
Figure PCTCN2022094556-appb-000031
其中,
Figure PCTCN2022094556-appb-000032
表示通道蒸馏损失函数,n表示特征图的采样数量,c表示特征图映射的通道数,
Figure PCTCN2022094556-appb-000033
表示在第一教师生成器中第i个特征图映射的第j个通道的注意力权重,
Figure PCTCN2022094556-appb-000034
表示在学生生成器中第i个特征图映射的第j个通道的注意力权重。
可选的,在基于通道蒸馏损失、第一输出蒸馏损失以及第二蒸馏损失,调整学生生成器时,根据通道损失权重因子,对通道蒸馏损失进行加权,得到加权结果,根据加权结果、第一输出蒸馏损失以及第二输出蒸馏损失,调整学生生成器。从而,可通过调整通道损失权重因子,对通道蒸馏损失对学生生成器的优化过程的影响程度进行调整,提高学生生成器的训练的灵活性。
进一步的,在根据通道蒸馏损失与通道损失权重因子的加权结果、第一输出蒸馏损失以及第二输出蒸馏损失,调整学生生成器的过程中,可基于第一输出蒸馏损失、第二输出蒸馏损失,分别确定学生生成器相对于第一教师生成器的在线损失、学生生成器相对于第二教师生成器的在线损失。对第一教师生成器的在线损失、学生生成器相对于第二教师生成器的在线损失以及通道蒸馏损失与通道损失权重因子的加权结果进行加权求和,以通过加权方式平衡这几项损失值,得到学生生成器的多粒度在线蒸馏的最终损失值。基于该最终损失值,调整学生生成器的模型参数,实现学生生成器的优化。
其中,学生生成器相对于第一教师生成器的在线损失包含:学生生成器与第一教师生成器之间的输出蒸馏损失。或者,学生生成器相对于第一教师生成器的在线损失包含:学生生成器与第一教师生成器之间的输出蒸馏损失和学生生成器的输出图像的总变差损失。学生生成器相对于第二教师生成器的在线损失包含:学生生成器与第二教师生成器之间的输出蒸馏损失。或者,学生生成器相对于第二教师生成器的在线损失包含:学生生成器与第二教师生成器之间的输出蒸馏损失和学生生成器的输出图像的总变差损失。具体可参照前述实施例中学生生成器相对于教师生成器的在线损失,不再赘述。
进一步的,用于计算学生生成器的多粒度在线蒸馏的最终损失值的目标损失函数可表示为:
Figure PCTCN2022094556-appb-000035
其中,
Figure PCTCN2022094556-appb-000036
表示目标损失函数,λ CD表示通道损失权重因子。
Figure PCTCN2022094556-appb-000037
表示学生生成器与第一教师生成器的在线损失函数,
Figure PCTCN2022094556-appb-000038
表示学生生成器与第二教师生成器的在线损失函数。
参照图5,图5为本公开实施例提供的生成式对抗网络的模型结构示例图。如图5所示,生成式对抗网络包括两个教师生成器、一个学生生成器G S以及部分共享的两个判别器,两个教师生成器包括更宽的教师生成器
Figure PCTCN2022094556-appb-000039
和更深度教师生成器
Figure PCTCN2022094556-appb-000040
两个判别器共享前面多个卷积 层。
其中,更宽的教师生成器的中间层的通道数为η×c 1、η×c 2、……、η×c k-1、η×c k,学生生成器的中间层的通道数为c 1、c 2、……、c k-1、c k,可见,该教师生成器即前述实施例中的第一教师生成器;更深的教师生成器相较于学生生成器,相当于在学生生成器的采样层的前后插入多个Resnet模块,模型的深度大于学生生成器的模型深度,可见,该教师生成器即前述实施例中的第二教师生成器。
其中,更宽的教师生成器的中间层与学生生成器的中间层连接有通道卷积层(图5中未标注),根据前述实施例的描述,可知该处的通道卷积层用于建立更宽的教师生成器的中间层中的通道与学生生成器的中间层中的通道的映射关系,便于计算更宽的教师生成器与学生成器之间的通道蒸馏损失。
如图5所示,在训练过程中,将样本图像(图5中高跟鞋的轮廓图)分别输入更宽的教师生成器、学生生成器、更深的教师生成器。通过部分共享的判别器、更宽的教师生成器的输出图像和真实标签(前述实施例中的参考图像),确定更宽的教师生成器的GAN损失,即前述实施例中第一教师生成器的对抗损失,该对抗损失用于调整更宽的教师生成器。通过部分共享的判别器、更深的教师生成器的输出图像和真实标签,确定更深的教师生成器的GAN损失,即前述实施例中第二教师生成器的对抗损失,该对抗损失用于调整更深的教师生成器。基于更宽的教师生成器与学生生成器在中间层的差异,计算得到通道蒸馏损失。基于更宽的教师生成器的输出图像、学生生成器的输出图像、更深的教师生成器的输出图像,分别计算得到更宽的教师生成器与学生生成器之间的输出蒸馏损失、更深的教师生成器与学生生成器之间的输出蒸馏损失。通道蒸馏损失和该两类输出蒸馏损失,用于调整学生生成器。
对应于上文实施例的数据处理方法,图6为本公开实施例提供的数据处理设备的结构框图。为了便于说明,仅示出了与本公开实施例相关的部分。参照图6,数据处理设备包括:获取模块601和处理模块602。
获取模块601,用于获取待处理的图像;
处理模块602,用于通过第一生成器对图像进行处理,得到处理后的图像。
其中,该数据处理设备适用于通过模型蒸馏得到的生成式对抗网络,生成式对抗网络包括第一生成器、第二生成器和判别器,模型蒸馏为交替训练第一生成器和第二生成器的过程,第一生成器的模型规模小于第二生成器的模型规模。
在本公开的一个实施例中,生成式对抗网络中第一生成器与第二生成器的一次交替训练过程包括:
根据样本数据和判别器,确定第二生成器的损失值,样本数据中包括样本图像和样本图像对应的参考图像;
根据第二生成器的损失值,调整第二生成器;
根据样本图像、调整后的第二生成器和第一生成器,确定调整后的第二生成器与第一生成器之间的蒸馏损失;
根据蒸馏损失,调整第一生成器。
在本公开的一个实施例中,根据样本数据和判别器,确定第二生成器的损失值,包括:
通过第二生成器对样本图像进行处理,得到第二生成器的输出图像;
通过判别器对样本图像对应的参考图像和第二生成器的输出图像进行真假判别,确定第 二生成器的对抗损失;
根据样本图像对应的参考图像与第二生成器的输出图像的差异,确定第二生成器的重建损失;
根据对抗损失和重建损失,确定第二生成器的损失值。
在本公开的一个实施例中,在生成式对抗网络中,网络层包括输入层、中间层和输出层,根据样本图像、调整后的第二生成器和第一生成器,确定调整后的第二生成器与第一生成器之间的蒸馏损失,包括:
通过第一生成器、调整后的第二生成器,分别处理样本图像,得到第一生成器的输出图像和第二生成器的输出图像;
根据第一生成器的输出图像与第二生成器的输出图像之间的差异,确定输出蒸馏损失,输出蒸馏损失为第二生成器的输出层与第一生成器的输出层之间的蒸馏损失。
在本公开的一个实施例中,根据第一生成器的输出图像与第二生成器的输出图像之间的差异,确定输出蒸馏损失,包括:
根据第二生成器的输出图像的亮度和对比度以及第一生成器的输出图像的亮度和对比度,确定第二生成器的输出图像与第一生成器的输出图像之间的结构相似化损失;
通过特征提取网络对第一生成器的输出图像和第二生成器的输出图像分别进行特征提取,确定第一生成器的输出图像与第二生成器的输出图像之间的感知损失;
根据结构相似化损失和感知损失,确定输出蒸馏损失。
在本公开的一个实施例中,感知损失包括特征重建损失和/或样式重建损失,通过特征提取网络对第一生成器的输出图像和第二生成器的输出图像分别进行特征提取,确定第一生成器的输出图像与第二生成器的输出图像之间的感知损失,包括:
将第一生成器的输出图像和第二生成器的输出图像分别输入特征提取网络,得到特征提取网络的预设网络层输出的第一生成器的输出图像的特征和第二生成器的输出图像的特征;
根据第一生成器的输出图像的特征与第二生成器的输出图像的特征之间的差异,确定特征重建损失和/或样式重建损失。
在本公开的一个实施例中,根据蒸馏损失,调整第一生成器,包括:
确定第一生成器的输出图像的总变差损失;
根据蒸馏损失和总变差损失,调整第一生成器。
在本公开的一个实施例中,述根据蒸馏损失和总变差损失,调整第一生成器,包括:
对蒸馏损失和总变差损失进行加权求和,得到第一生成器的在线损失;
根据第一生成器的在线损失,对第一生成器进行调整。
在本公开的一个实施例中,第一生成器为学生生成器,第二生成器为教师生成器。
在本公开的一个实施例中,教师生成器包括第一教师生成器和第二教师生成器,第一教师生成器的模型容量大于学生生成器的模型容量,第二教师生成器的模型深度大于学生生成器的模型深度。
在本公开的一个实施例中,判别器包括第一判别器和第二判别器,第一判别器和第二判别器之间存在共享卷积层,生成式对抗网络中第一生成器与第二生成器的一次交替训练过程包括:
根据样本数据和第一判别器,确定第一教师生成器的损失值,样本数据中包括样本图像 和样本图像的参考图像;
根据第一教师生成器的损失值,调整第一教师生成器;
根据样本数据和第二判别器,确定第二教师生成器的损失值;
根据第二教师生成器的损失值,调整第二教师生成器;
根据样本图像、调整后的第一教师生成器以及调整后的第二教师生成器,调整学生生成器。
在本公开的一个实施例中,在生成式对抗网络中,网络层包括输入层、中间层和输出层,根据样本图像、调整后的第一教师生成器以及调整后的第二教师生成器,调整学生生成器,包括:
通过学生生成器、调整后的第一教师生成器、调整后的第二教师生成器,分别处理样本图像,得到学生生成器的输出图像、第一教师生成器的输出图像以及第二教师生成器的输出图像;
根据学生生成器的输出图像和第一教师生成器的输出图像,确定第一输出蒸馏损失,第一输出蒸馏损失为第一教师生成器的输出层与学生生成器的输出层之间的蒸馏损失;
根据学生生成器的中间层输出的特征图和第一教师生成器的中间层输出的特征图,确定通道蒸馏损失,通道蒸馏损失为第一教师生成器的中间层与学生生成器的中间层之间的蒸馏损失;
根据学生生成器的输出图像和第二教师生成器的输出图像,确定第二输出蒸馏损失,第二输出蒸馏损失为第二教师生成器的输出层与学生生成器的输出层之间的蒸馏损失;
根据第一输出蒸馏损失、通道蒸馏损失以及第二输出蒸馏损失,调整学生生成器。
在本公开的一个实施例中,第一教师生成器的中间层与学生生成器的中间层之间连接有通道卷积层,通道卷积层用于建立第一教师生成器的中间层中的通道与学生生成器的中间层的通道之间的映射关系,根据学生生成器的中间层输出的特征图和第一教师生成器的中间层输出的特征图,确定通道蒸馏损失,包括:
根据学生生成器的中间层中各通道输出的特征图,确定学生生成器的中间层中各通道的注意力权重;
根据第一教师生成器的中间层中各通道输出的特征图,确定第一教师生成器的中间层中各通道的注意力权重;
在学生生成器的中间层和第一教师生成器的中间层中,根据相互映射的通道的注意力权重之间的差异,确定通道蒸馏损失。
在本公开的一个实施例中,根据第一输出蒸馏损失、通道蒸馏损失以及第二输出蒸馏损失,调整学生生成器,包括:
根据通道损失权重因子,对通道蒸馏损失进行加权,得到加权结果;
根据加权结果、第一输出蒸馏损失以及第二输出蒸馏损失,调整学生生成器。
本实施例提供的设备,可用于执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
参考图7,其示出了适于用来实现本公开实施例的电子设备700的结构示意图,该电子设备700可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable  Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图7示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图7所示,电子设备700可以包括处理装置(例如中央处理器、图形处理器等)701,其可以根据存储在只读存储器(Read Only Memory,简称ROM)702中的程序或者从存储装置708加载到随机访问存储器(Random Access Memory,简称RAM)703中的程序而执行各种适当的动作和处理。在RAM 703中,还存储有电子设备700操作所需的各种程序和数据。处理装置701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。
通常,以下装置可以连接至I/O接口705:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置706;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置707;包括例如磁带、硬盘等的存储装置708;以及通信装置709。通信装置709可以允许电子设备700与其他设备进行无线或有线通信以交换数据。虽然图7示出了具有各种装置的电子设备700,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置709从网络上被下载和安装,或者从存储装置708被安装,或者从ROM 702被安装。在该计算机程序被处理装置701执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备 执行时,使得该电子设备执行上述实施例所示的方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
第一方面,根据本公开的一个或多个实施例,提供了一种数据处理方法,适用于通过模型蒸馏得到的生成式对抗网络,所述数据处理方法包括:获取待处理的图像;通过第一生成器对所述图像进行处理,得到处理后的图像;其中,所述生成式对抗网络包括所述第一生成器、第二生成器和判别器,所述模型蒸馏为交替训练所述第一生成器和所述第二生成器的过程,所述第一生成器的模型规模小于所述第二生成器的模型规模。
根据本公开的一个或多个实施例,所述生成式对抗网络中第一生成器与第二生成器的一次交替训练过程包括:根据样本数据和所述判别器,确定所述第二生成器的损失值,所述样 本数据中包括样本图像和所述样本图像对应的参考图像;根据所述第二生成器的损失值,调整所述第二生成器;根据所述样本图像、调整后的第二生成器和所述第一生成器,确定调整后的第二生成器与所述第一生成器之间的蒸馏损失;根据所述蒸馏损失,调整所述第一生成器。
根据本公开的一个或多个实施例,所述根据样本数据和所述判别器,确定所述第二生成器的损失值,包括:通过所述第二生成器对所述样本图像进行处理,得到所述第二生成器的输出图像;通过所述判别器对所述样本图像对应的参考图像和所述第二生成器的输出图像进行真假判别,确定所述第二生成器的对抗损失;根据所述样本图像对应的参考图像与所述第二生成器的输出图像的差异,确定所述第二生成器的重建损失;根据所述对抗损失和所述重建损失,确定所述第二生成器的损失值。
根据本公开的一个或多个实施例,在所述生成式对抗网络中,网络层包括输入层、中间层和输出层,所述根据所述样本图像、调整后的第二生成器和所述第一生成器,确定调整后的第二生成器与所述第一生成器之间的蒸馏损失,包括:通过所述第一生成器、调整后的第二生成器,分别处理所述样本图像,得到所述第一生成器的输出图像和所述第二生成器的输出图像;根据所述第一生成器的输出图像与所述第二生成器的输出图像之间的差异,确定输出蒸馏损失,所述输出蒸馏损失为所述第二生成器的输出层与所述第一生成器的输出层之间的蒸馏损失。
根据本公开的一个或多个实施例,所述根据所述第一生成器的输出图像与所述第二生成器的输出图像之间的差异,确定输出蒸馏损失,包括:
根据所述第二生成器的输出图像的亮度和对比度以及所述第一生成器的输出图像的亮度和对比度,确定所述第二生成器的输出图像与所述第一生成器的输出图像之间的结构相似化损失;通过特征提取网络对所述第一生成器的输出图像和所述第二生成器的输出图像分别进行特征提取,确定所述第一生成器的输出图像与所述第二生成器的输出图像之间的感知损失;根据所述结构相似化损失和所述感知损失,确定所述输出蒸馏损失。
根据本公开的一个或多个实施例,所述感知损失包括特征重建损失和/或样式重建损失,所述通过特征提取网络对所述第一生成器的输出图像和所述第二生成器的输出图像分别进行特征提取,确定所述第一生成器的输出图像与所述第二生成器的输出图像之间的感知损失,包括:
将所述第一生成器的输出图像和所述第二生成器的输出图像分别输入所述特征提取网络,得到所述特征提取网络的预设网络层输出的所述第一生成器的输出图像的特征和所述第二生成器的输出图像的特征;根据所述第一生成器的输出图像的特征与所述第二生成器的输出图像的特征之间的差异,确定所述特征重建损失和/或所述样式重建损失。
根据本公开的一个或多个实施例,所述根据所述蒸馏损失,调整所述第一生成器,包括:确定所述第一生成器的输出图像的总变差损失;根据所述蒸馏损失和所述总变差损失,调整所述第一生成器。
根据本公开的一个或多个实施例,所述根据所述蒸馏损失和所述总变差损失,调整所述第一生成器,包括:对所述蒸馏损失和所述总变差损失进行加权求和,得到所述第一生成器的在线损失;根据所述第一生成器的在线损失,对所述第一生成器进行调整。
根据本公开的一个或多个实施例,所述第一生成器为学生生成器,所述第二生成器为教 师生成器。
根据本公开的一个或多个实施例,所述教师生成器包括第一教师生成器和第二教师生成器,所述第一教师生成器的模型容量大于所述学生生成器的模型容量,所述第二教师生成器的模型深度大于所述学生生成器的模型深度。
根据本公开的一个或多个实施例,所述判别器包括第一判别器和第二判别器,所述第一判别器和所述第二判别器之间存在共享卷积层,所述生成式对抗网络中第一生成器与第二生成器的一次交替训练过程包括:根据样本数据和所述第一判别器,确定所述第一教师生成器的损失值,所述样本数据中包括样本图像和所述样本图像的参考图像;根据所述第一教师生成器的损失值,调整所述第一教师生成器;根据样本数据和所述第二判别器,确定所述第二教师生成器的损失值;根据所述第二教师生成器的损失值,调整所述第二教师生成器;根据所述样本图像、调整后的第一教师生成器以及调整后的第二教师生成器,调整所述学生生成器。
根据本公开的一个或多个实施例,在所述生成式对抗网络中,网络层包括输入层、中间层和输出层,所述根据所述样本图像、调整后的第一教师生成器以及调整后的第二教师生成器,调整所述学生生成器,包括:通过所述学生生成器、调整后的第一教师生成器、调整后的第二教师生成器,分别处理所述样本图像,得到所述学生生成器的输出图像、所述第一教师生成器的输出图像以及所述第二教师生成器的输出图像;根据所述学生生成器的输出图像和所述第一教师生成器的输出图像,确定第一输出蒸馏损失,所述第一输出蒸馏损失为所述第一教师生成器的输出层与所述学生生成器的输出层之间的蒸馏损失;根据所述学生生成器的中间层输出的特征图和所述第一教师生成器的中间层输出的特征图,确定通道蒸馏损失,所述通道蒸馏损失为所述第一教师生成器的中间层与所述学生生成器的中间层之间的蒸馏损失;根据所述学生生成器的输出图像和所述第二教师生成器的输出图像,确定第二输出蒸馏损失,所述第二输出蒸馏损失为所述第二教师生成器的输出层与所述学生生成器的输出层之间的蒸馏损失;根据所述第一输出蒸馏损失、所述通道蒸馏损失以及所述第二输出蒸馏损失,调整所述学生生成器。
根据本公开的一个或多个实施例,所述第一教师生成器的中间层与所述学生生成器的中间层之间连接有通道卷积层,通道卷积层用于建立所述第一教师生成器的中间层中的通道与所述学生生成器的中间层的通道之间的映射关系,所述根据所述学生生成器的中间层输出的特征图和所述第一教师生成器的中间层输出的特征图,确定通道蒸馏损失,包括:根据所述学生生成器的中间层中各通道输出的特征图,确定所述学生生成器的中间层中各通道的注意力权重;根据所述第一教师生成器的中间层中各通道输出的特征图,确定所述第一教师生成器的中间层中各通道的注意力权重;在所述学生生成器的中间层和所述第一教师生成器的中间层中,根据相互映射的通道的注意力权重之间的差异,确定所述通道蒸馏损失。
根据本公开的一个或多个实施例,所述根据所述第一输出蒸馏损失、所述通道蒸馏损失以及所述第二输出蒸馏损失,调整所述学生生成器,包括:根据通道损失权重因子,对所述通道蒸馏损失进行加权,得到加权结果;根据所述加权结果、所述第一输出蒸馏损失以及所述第二输出蒸馏损失,调整所述学生生成器。
第二方面,根据本公开的一个或多个实施例,提供了一种数据处理设备,适用于通过模型蒸馏得到的生成式对抗网络,所述数据处理设备包括:获取模块,用于获取待处理的图像; 处理模块,用于通过第一生成器对所述图像进行处理,得到处理后的图像;其中,所述生成式对抗网络包括所述第一生成器、第二生成器和判别器,所述模型蒸馏为交替训练所述第一生成器和所述第二生成器的过程,所述第一生成器的模型规模小于所述第二生成器的模型规模。
根据本公开的一个或多个实施例,所述生成式对抗网络中第一生成器与第二生成器的一次交替训练过程包括:根据样本数据和所述判别器,确定所述第二生成器的损失值,所述样本数据中包括样本图像和所述样本图像对应的参考图像;根据所述第二生成器的损失值,调整所述第二生成器;根据所述样本图像、调整后的第二生成器和所述第一生成器,确定调整后的第二生成器与所述第一生成器之间的蒸馏损失;根据所述蒸馏损失,调整所述第一生成器。
根据本公开的一个或多个实施例,所述根据样本数据和所述判别器,确定所述第二生成器的损失值,包括:通过所述第二生成器对所述样本图像进行处理,得到所述第二生成器的输出图像;通过所述判别器对所述样本图像对应的参考图像和所述第二生成器的输出图像进行真假判别,确定所述第二生成器的对抗损失;根据所述样本图像对应的参考图像与所述第二生成器的输出图像的差异,确定所述第二生成器的重建损失;根据所述对抗损失和所述重建损失,确定所述第二生成器的损失值。
根据本公开的一个或多个实施例,在所述生成式对抗网络中,网络层包括输入层、中间层和输出层,所述根据所述样本图像、调整后的第二生成器和所述第一生成器,确定调整后的第二生成器与所述第一生成器之间的蒸馏损失,包括:通过所述第一生成器、调整后的第二生成器,分别处理所述样本图像,得到所述第一生成器的输出图像和所述第二生成器的输出图像;根据所述第一生成器的输出图像与所述第二生成器的输出图像之间的差异,确定输出蒸馏损失,所述输出蒸馏损失为所述第二生成器的输出层与所述第一生成器的输出层之间的蒸馏损失。
根据本公开的一个或多个实施例,所述根据所述第一生成器的输出图像与所述第二生成器的输出图像之间的差异,确定输出蒸馏损失,包括:
根据所述第二生成器的输出图像的亮度和对比度以及所述第一生成器的输出图像的亮度和对比度,确定所述第二生成器的输出图像与所述第一生成器的输出图像之间的结构相似化损失;通过特征提取网络对所述第一生成器的输出图像和所述第二生成器的输出图像分别进行特征提取,确定所述第一生成器的输出图像与所述第二生成器的输出图像之间的感知损失;根据所述结构相似化损失和所述感知损失,确定所述输出蒸馏损失。
根据本公开的一个或多个实施例,所述感知损失包括特征重建损失和/或样式重建损失,所述通过特征提取网络对所述第一生成器的输出图像和所述第二生成器的输出图像分别进行特征提取,确定所述第一生成器的输出图像与所述第二生成器的输出图像之间的感知损失,包括:
将所述第一生成器的输出图像和所述第二生成器的输出图像分别输入所述特征提取网络,得到所述特征提取网络的预设网络层输出的所述第一生成器的输出图像的特征和所述第二生成器的输出图像的特征;根据所述第一生成器的输出图像的特征与所述第二生成器的输出图像的特征之间的差异,确定所述特征重建损失和/或所述样式重建损失。
根据本公开的一个或多个实施例,所述根据所述蒸馏损失,调整所述第一生成器,包括: 确定所述第一生成器的输出图像的总变差损失;根据所述蒸馏损失和所述总变差损失,调整所述第一生成器。
根据本公开的一个或多个实施例,所述根据所述蒸馏损失和所述总变差损失,调整所述第一生成器,包括:对所述蒸馏损失和所述总变差损失进行加权求和,得到所述第一生成器的在线损失;根据所述第一生成器的在线损失,对所述第一生成器进行调整。
根据本公开的一个或多个实施例,所述第一生成器为学生生成器,所述第二生成器为教师生成器。
根据本公开的一个或多个实施例,所述教师生成器包括第一教师生成器和第二教师生成器,所述第一教师生成器的模型容量大于所述学生生成器的模型容量,所述第二教师生成器的模型深度大于所述学生生成器的模型深度。
根据本公开的一个或多个实施例,所述判别器包括第一判别器和第二判别器,所述第一判别器和所述第二判别器之间存在共享卷积层,所述生成式对抗网络中第一生成器与第二生成器的一次交替训练过程包括:根据样本数据和所述第一判别器,确定所述第一教师生成器的损失值,所述样本数据中包括样本图像和所述样本图像的参考图像;根据所述第一教师生成器的损失值,调整所述第一教师生成器;根据样本数据和所述第二判别器,确定所述第二教师生成器的损失值;根据所述第二教师生成器的损失值,调整所述第二教师生成器;根据所述样本图像、调整后的第一教师生成器以及调整后的第二教师生成器,调整所述学生生成器。
根据本公开的一个或多个实施例,在所述生成式对抗网络中,网络层包括输入层、中间层和输出层,所述根据所述样本图像、调整后的第一教师生成器以及调整后的第二教师生成器,调整所述学生生成器,包括:通过所述学生生成器、调整后的第一教师生成器、调整后的第二教师生成器,分别处理所述样本图像,得到所述学生生成器的输出图像、所述第一教师生成器的输出图像以及所述第二教师生成器的输出图像;根据所述学生生成器的输出图像和所述第一教师生成器的输出图像,确定第一输出蒸馏损失,所述第一输出蒸馏损失为所述第一教师生成器的输出层与所述学生生成器的输出层之间的蒸馏损失;根据所述学生生成器的中间层输出的特征图和所述第一教师生成器的中间层输出的特征图,确定通道蒸馏损失,所述通道蒸馏损失为所述第一教师生成器的中间层与所述学生生成器的中间层之间的蒸馏损失;根据所述学生生成器的输出图像和所述第二教师生成器的输出图像,确定第二输出蒸馏损失,所述第二输出蒸馏损失为所述第二教师生成器的输出层与所述学生生成器的输出层之间的蒸馏损失;根据所述第一输出蒸馏损失、所述通道蒸馏损失以及所述第二输出蒸馏损失,调整所述学生生成器。
根据本公开的一个或多个实施例,所述第一教师生成器的中间层与所述学生生成器的中间层之间连接有通道卷积层,通道卷积层用于建立所述第一教师生成器的中间层中的通道与所述学生生成器的中间层的通道之间的映射关系,所述根据所述学生生成器的中间层输出的特征图和所述第一教师生成器的中间层输出的特征图,确定通道蒸馏损失,包括:根据所述学生生成器的中间层中各通道输出的特征图,确定所述学生生成器的中间层中各通道的注意力权重;根据所述第一教师生成器的中间层中各通道输出的特征图,确定所述第一教师生成器的中间层中各通道的注意力权重;在所述学生生成器的中间层和所述第一教师生成器的中间层中,根据相互映射的通道的注意力权重之间的差异,确定所述通道蒸馏损失。
根据本公开的一个或多个实施例,所述根据所述第一输出蒸馏损失、所述通道蒸馏损失以及所述第二输出蒸馏损失,调整所述学生生成器,包括:根据通道损失权重因子,对所述通道蒸馏损失进行加权,得到加权结果;根据所述加权结果、所述第一输出蒸馏损失以及所述第二输出蒸馏损失,调整所述学生生成器。
第三方面,根据本公开的一个或多个实施例,提供了一种电子设备,包括:至少一个处理器和存储器;
所述存储器存储计算机执行指令;
所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第一方面以及第一方面各种可能的设计所述的数据处理方法。
第四方面,根据本公开的一个或多个实施例,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的数据处理方法。
第五方面,根据本公开的一个或多个实施例,提供了一种计算机程序产品,所述计算机程序产品包含计算机执行指令,当处理器执行所述计算机执行指令时,实现如第一方面以及第一方面各种可能的设计所述的数据处理方法。
第六方面,本公开实施例提供一种计算机程序,所述计算机程序包含计算机执行指令,当处理器执行所述计算机执行指令时,实现如上述第一方面以及第一方面各种可能的设计所述的数据处理方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (19)

  1. 一种数据处理方法,适用于通过模型蒸馏得到的生成式对抗网络,所述数据处理方法包括:
    获取待处理的图像;
    通过第一生成器对所述图像进行处理,得到处理后的图像;
    其中,所述生成式对抗网络包括所述第一生成器、第二生成器和判别器,所述模型蒸馏为交替训练所述第一生成器和所述第二生成器的过程,所述第一生成器的模型规模小于所述第二生成器的模型规模。
  2. 根据权利要求1所述的数据处理方法,所述生成式对抗网络中所述第一生成器与所述第二生成器的一次交替训练过程包括:
    根据样本数据和所述判别器,确定所述第二生成器的损失值,所述样本数据中包括样本图像和所述样本图像对应的参考图像;
    根据所述第二生成器的损失值,调整所述第二生成器;
    根据所述样本图像、调整后的第二生成器和所述第一生成器,确定调整后的第二生成器与所述第一生成器之间的蒸馏损失;
    根据所述蒸馏损失,调整所述第一生成器。
  3. 根据权利要求2所述的数据处理方法,所述根据样本数据和所述判别器,确定所述第二生成器的损失值,包括:
    通过所述第二生成器对所述样本图像进行处理,得到所述第二生成器的输出图像;
    通过所述判别器对所述样本图像对应的参考图像和所述第二生成器的输出图像进行真假判别,确定所述第二生成器的对抗损失;
    根据所述样本图像对应的参考图像与所述第二生成器的输出图像的差异,确定所述第二生成器的重建损失;
    根据所述对抗损失和所述重建损失,确定所述第二生成器的损失值。
  4. 根据权利要求2所述的数据处理方法,在所述生成式对抗网络中,网络层包括输入层、中间层和输出层,所述根据所述样本图像、调整后的第二生成器和所述第一生成器,确定调整后的第二生成器与所述第一生成器之间的蒸馏损失,包括:
    通过所述第一生成器、调整后的第二生成器,分别处理所述样本图像,得到所述第一生成器的输出图像和所述第二生成器的输出图像;
    根据所述第一生成器的输出图像与所述第二生成器的输出图像之间的差异,确定输出蒸馏损失,所述输出蒸馏损失为所述第二生成器的输出层与所述第一生成器的输出层之间的蒸馏损失。
  5. 根据权利要求4所述的数据处理方法,所述根据所述第一生成器的输出图像与所述第二生成器的输出图像之间的差异,确定输出蒸馏损失,包括:
    根据所述第二生成器的输出图像的亮度和对比度以及所述第一生成器的输出图像的亮度和对比度,确定所述第二生成器的输出图像与所述第一生成器的输出图像之间的结构相似化损失;
    通过特征提取网络对所述第一生成器的输出图像和所述第二生成器的输出图像分别进行特征提取,确定所述第一生成器的输出图像与所述第二生成器的输出图像之间的感知损失;
    根据所述结构相似化损失和所述感知损失,确定所述输出蒸馏损失。
  6. 根据权利要求5所述的数据处理方法,所述感知损失包括特征重建损失和/或样式重建损失,所述通过特征提取网络对所述第一生成器的输出图像和所述第二生成器的输出图像分别进行特征提取,确定所述第一生成器的输出图像与所述第二生成器的输出图像之间的感知损失,包括:
    将所述第一生成器的输出图像和所述第二生成器的输出图像分别输入所述特征提取网络,得到所述特征提取网络的预设网络层输出的所述第一生成器的输出图像的特征和所述第二生成器的输出图像的特征;
    根据所述第一生成器的输出图像的特征与所述第二生成器的输出图像的特征之间的差异,确定所述特征重建损失和/或所述样式重建损失。
  7. 根据权利要求2所述的数据处理方法,所述根据所述蒸馏损失,调整所述第一生成器,包括:
    确定所述第一生成器的输出图像的总变差损失;
    根据所述蒸馏损失和所述总变差损失,调整所述第一生成器。
  8. 根据权利要求7所述的数据处理方法,所述根据所述蒸馏损失和所述总变差损失,调整所述第一生成器,包括:
    对所述蒸馏损失和所述总变差损失进行加权求和,得到所述第一生成器的在线损失;
    根据所述第一生成器的在线损失,对所述第一生成器进行调整。
  9. 根据权利要求1至8任一项所述的数据处理方法,所述第一生成器为学生生成器,所述第二生成器为教师生成器。
  10. 根据权利要求9所述的数据处理方法,所述教师生成器包括第一教师生成器和第二教师生成器,所述第一教师生成器的模型容量大于所述学生生成器的模型容量,所述第二教师生成器的模型深度大于所述学生生成器的模型深度。
  11. 根据权利要求10所述的数据处理方法,所述判别器包括第一判别器和第二判别器,所述第一判别器和所述第二判别器之间存在共享卷积层,所述生成式对抗网络中第一生成器与第二生成器的一次交替训练过程包括:
    根据样本数据和所述第一判别器,确定所述第一教师生成器的损失值,所述样本数据中包括样本图像和所述样本图像的参考图像;
    根据所述第一教师生成器的损失值,调整所述第一教师生成器;
    根据样本数据和所述第二判别器,确定所述第二教师生成器的损失值;
    根据所述第二教师生成器的损失值,调整所述第二教师生成器;
    根据所述样本图像、调整后的第一教师生成器以及调整后的第二教师生成器,调整所述学生生成器。
  12. 根据权利要求11所述的数据处理方法,在所述生成式对抗网络中,网络层包括输入层、中间层和输出层,所述根据所述样本图像、调整后的第一教师生成器以及调整后的第二教师生成器,调整所述学生生成器,包括:
    通过所述学生生成器、调整后的第一教师生成器、调整后的第二教师生成器,分别处理所述样本图像,得到所述学生生成器的输出图像、所述第一教师生成器的输出图像以及所述第二教师生成器的输出图像;
    根据所述学生生成器的输出图像和所述第一教师生成器的输出图像,确定第一输出蒸馏损失,所述第一输出蒸馏损失为所述第一教师生成器的输出层与所述学生生成器的输出层之间的蒸馏损失;
    根据所述学生生成器的中间层输出的特征图和所述第一教师生成器的中间层输出的特征图,确定通道蒸馏损失,所述通道蒸馏损失为所述第一教师生成器的中间层与所述学生生成器的中间层之间的蒸馏损失;
    根据所述学生生成器的输出图像和所述第二教师生成器的输出图像,确定第二输出蒸馏损失,所述第二输出蒸馏损失为所述第二教师生成器的输出层与所述学生生成器的输出层之间的蒸馏损失;
    根据所述第一输出蒸馏损失、所述通道蒸馏损失以及所述第二输出蒸馏损失,调整所述学生生成器。
  13. 根据权利要求12所述的数据处理方法,所述第一教师生成器的中间层与所述学生生成器的中间层之间连接有通道卷积层,所述通道卷积层用于建立所述第一教师生成器的中间层中的通道与所述学生生成器的中间层的通道之间的映射关系,所述根据所述学生生成器的中间层输出的特征图和所述第一教师生成器的中间层输出的特征图,确定通道蒸馏损失,包括:
    根据所述学生生成器的中间层中各通道输出的特征图,确定所述学生生成器的中间层中各通道的注意力权重;
    根据所述第一教师生成器的中间层中各通道输出的特征图,确定所述第一教师生成器的中间层中各通道的注意力权重;
    在所述学生生成器的中间层和所述第一教师生成器的中间层中,根据相互映射的通道的注意力权重之间的差异,确定所述通道蒸馏损失。
  14. 根据权利要求12所述的数据处理方法,所述根据所述第一输出蒸馏损失、所述通道蒸馏损失以及所述第二输出蒸馏损失,调整所述学生生成器,包括:
    根据通道损失权重因子,对所述通道蒸馏损失进行加权,得到加权结果;
    根据所述加权结果、所述第一输出蒸馏损失以及所述第二输出蒸馏损失,调整所述学生生成器。
  15. 一种数据处理设备,适用于通过模型蒸馏得到的生成式对抗网络,所述数据处理设备包括:
    获取模块,用于获取待处理的图像;
    处理模块,用于通过第一生成器对所述图像进行处理,得到处理后的图像;
    其中,所述生成式对抗网络包括所述第一生成器、第二生成器和判别器,所述模型蒸馏为交替训练所述第一生成器和所述第二生成器的过程,所述第一生成器的模型规模小于所述第二生成器的模型规模。
  16. 一种电子设备,包括:至少一个处理器和存储器;
    所述存储器存储计算机执行指令;
    所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如权利要求1至14任一项所述的数据处理方法。
  17. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当 处理器执行所述计算机执行指令时,实现如权利要求1至14任一项所述的数据处理方法。
  18. 一种计算机程序产品,所述计算机程序产品包含计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至14任一项所述的数据处理方法。
  19. 一种计算机程序,所述计算机程序包含计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至14任一项所述的数据处理方法。
PCT/CN2022/094556 2021-07-15 2022-05-23 数据处理方法及设备 WO2023284416A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22841056.9A EP4354343A1 (en) 2021-07-15 2022-05-23 Data processing method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110802048.4 2021-07-15
CN202110802048.4A CN113449851A (zh) 2021-07-15 2021-07-15 数据处理方法及设备

Publications (1)

Publication Number Publication Date
WO2023284416A1 true WO2023284416A1 (zh) 2023-01-19

Family

ID=77816289

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/094556 WO2023284416A1 (zh) 2021-07-15 2022-05-23 数据处理方法及设备

Country Status (3)

Country Link
EP (1) EP4354343A1 (zh)
CN (1) CN113449851A (zh)
WO (1) WO2023284416A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449851A (zh) * 2021-07-15 2021-09-28 北京字跳网络技术有限公司 数据处理方法及设备
CN114092678A (zh) * 2021-11-29 2022-02-25 北京字节跳动网络技术有限公司 图像处理方法、装置、电子设备及存储介质
CN116797782A (zh) * 2022-03-09 2023-09-22 北京字跳网络技术有限公司 一种图像的语义分割方法、装置、电子设备及存储介质
CN116797466A (zh) * 2022-03-14 2023-09-22 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备及可读存储介质
CN115936980B (zh) * 2022-07-22 2023-10-20 北京字跳网络技术有限公司 一种图像处理方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548190A (zh) * 2015-09-18 2017-03-29 三星电子株式会社 模型训练方法和设备以及数据识别方法
CN111967573A (zh) * 2020-07-15 2020-11-20 中国科学院深圳先进技术研究院 数据处理方法、装置、设备及计算机可读存储介质
CN112052948A (zh) * 2020-08-19 2020-12-08 腾讯科技(深圳)有限公司 一种网络模型压缩方法、装置、存储介质和电子设备
CN113095475A (zh) * 2021-03-02 2021-07-09 华为技术有限公司 一种神经网络的训练方法、图像处理方法以及相关设备
CN113449851A (zh) * 2021-07-15 2021-09-28 北京字跳网络技术有限公司 数据处理方法及设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458765B (zh) * 2019-01-25 2022-12-02 西安电子科技大学 基于感知保持卷积网络的图像质量增强方法
CN112465111B (zh) * 2020-11-17 2024-06-21 大连理工大学 一种基于知识蒸馏和对抗训练的三维体素图像分割方法
CN113065635A (zh) * 2021-02-27 2021-07-02 华为技术有限公司 一种模型的训练方法、图像增强方法及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548190A (zh) * 2015-09-18 2017-03-29 三星电子株式会社 模型训练方法和设备以及数据识别方法
CN111967573A (zh) * 2020-07-15 2020-11-20 中国科学院深圳先进技术研究院 数据处理方法、装置、设备及计算机可读存储介质
CN112052948A (zh) * 2020-08-19 2020-12-08 腾讯科技(深圳)有限公司 一种网络模型压缩方法、装置、存储介质和电子设备
CN113095475A (zh) * 2021-03-02 2021-07-09 华为技术有限公司 一种神经网络的训练方法、图像处理方法以及相关设备
CN113449851A (zh) * 2021-07-15 2021-09-28 北京字跳网络技术有限公司 数据处理方法及设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HANTING CHEN; YUNHE WANG; HAN SHU; CHANGYUAN WEN; CHUNJING XU; BOXIN SHI; CHAO XU; CHANG XU: "Distilling portable Generative Adversarial Networks for Image Translation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 March 2020 (2020-03-07), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081617036 *
REN YUXI; WU JIE; XIAO XUEFENG; YANG JIANCHAO: "Online Multi-Granularity Distillation for GAN Compression", 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 10 October 2021 (2021-10-10), pages 6773 - 6783, XP034093124, DOI: 10.1109/ICCV48922.2021.00672 *

Also Published As

Publication number Publication date
CN113449851A (zh) 2021-09-28
EP4354343A1 (en) 2024-04-17

Similar Documents

Publication Publication Date Title
WO2023284416A1 (zh) 数据处理方法及设备
WO2020155907A1 (zh) 用于生成漫画风格转换模型的方法和装置
CN109800732B (zh) 用于生成漫画头像生成模型的方法和装置
CN111476306A (zh) 基于人工智能的物体检测方法、装置、设备及存储介质
WO2022253061A1 (zh) 一种语音处理方法及相关设备
CN111275721A (zh) 一种图像分割方法、装置、电子设备及存储介质
CN112990053B (zh) 图像处理方法、装置、设备及存储介质
CN111738010B (zh) 用于生成语义匹配模型的方法和装置
CN112115900B (zh) 图像处理方法、装置、设备及存储介质
CN113555032B (zh) 多说话人场景识别及网络训练方法、装置
CN111312223B (zh) 语音分割模型的训练方法、装置和电子设备
CN114420135A (zh) 基于注意力机制的声纹识别方法及装置
CN117094362B (zh) 一种任务处理方法及相关装置
CN112037305B (zh) 对图像中的树状组织进行重建的方法、设备及存储介质
CN114429658A (zh) 人脸关键点信息获取方法、生成人脸动画的方法及装置
CN111128131B (zh) 语音识别方法、装置、电子设备及计算机可读存储介质
WO2023185516A1 (zh) 图像识别模型的训练方法、识别方法、装置、介质和设备
CN111312224B (zh) 语音分割模型的训练方法、装置和电子设备
CN111311609B (zh) 一种图像分割方法、装置、电子设备及存储介质
JP7504192B2 (ja) 画像を検索するための方法及び装置
CN115116117A (zh) 一种基于多模态融合网络的学习投入度数据的获取方法
CN115424060A (zh) 模型训练方法、图像分类方法和装置
CN111291640B (zh) 用于识别步态的方法和装置
CN113609957A (zh) 一种人体行为识别方法及终端
CN113762037A (zh) 图像识别方法、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22841056

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18579178

Country of ref document: US

Ref document number: 2022841056

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022841056

Country of ref document: EP

Effective date: 20240112

NENP Non-entry into the national phase

Ref country code: DE