WO2023045870A1 - 网络模型的压缩方法、图像生成方法、装置、设备及介质 - Google Patents

网络模型的压缩方法、图像生成方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023045870A1
WO2023045870A1 PCT/CN2022/119638 CN2022119638W WO2023045870A1 WO 2023045870 A1 WO2023045870 A1 WO 2023045870A1 CN 2022119638 W CN2022119638 W CN 2022119638W WO 2023045870 A1 WO2023045870 A1 WO 2023045870A1
Authority
WO
WIPO (PCT)
Prior art keywords
generator
discriminator
function
objective function
loss
Prior art date
Application number
PCT/CN2022/119638
Other languages
English (en)
French (fr)
Inventor
吴捷
李少杰
肖学锋
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Priority to EP22871916.7A priority Critical patent/EP4339836A1/en
Publication of WO2023045870A1 publication Critical patent/WO2023045870A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a network model compression method, image generation method, device, device and medium.
  • GAN Generative Adversarial Networks
  • the present disclosure provides a network model compression method, image generation method, device, device and medium.
  • the first aspect of the embodiments of the present disclosure provides a method for compressing a network model.
  • the network model to be compressed includes a first generator and a first discriminator.
  • the method includes:
  • the loss difference between the first generator and the first discriminator is a first loss difference
  • the loss difference between the second generator and the second discriminator is a second loss difference
  • the absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold
  • the state of the convolution kernels in the first discriminator is configured so that a part of the convolution kernels is in an active state, and another part of the convolution kernels is in a suppressed state, so that the second discriminator ,include:
  • the first weight parameter includes weight parameters corresponding to other elements in the second discriminator except the retention factor.
  • the second weight parameter includes weight parameters corresponding to each element in the second generator.
  • the determining the first weight parameter of the second discriminator includes:
  • the method also includes:
  • a second weight parameter of the second generator is determined according to an objective function of the second generator.
  • the method before determining the second weight parameter of the second generator according to the objective function of the second generator, the method further includes:
  • the objective function of the second discriminator is determined according to the loss function of the second discriminator with respect to real pictures and the loss function of the second discriminator with respect to fake pictures.
  • the method before determining the objective function of the second generator according to the loss function of the second generator, the method further includes:
  • the determining the objective function of the second generator according to the loss function of the second generator includes:
  • the objective function of the second generator is determined according to the distillation objective function between the teacher generative adversarial network and the student generative adversarial network, and the loss function of the second generator.
  • the objective function of the second generator is determined according to the distillation objective function between the teacher generation confrontation network and the student generation confrontation network, and the loss function of the second generator ,include:
  • the objective function of the second generator is determined according to the distillation objective function between the teacher generation confrontation network and the student generation confrontation network, and the loss function of the second generator Previously, the method further included:
  • the distillation objective function is determined according to the first similarity measure function and the second similarity measure function.
  • the determining the first similarity metric function according to the similarity of at least one layer of intermediate feature maps in the first generator and the second generator includes:
  • the first similarity measure function is determined according to the first sub-similarity measure function corresponding to each layer.
  • the determining a second similarity metric function according to the similarity between the first intermediate feature map of the at least one layer and the second intermediate feature map of the at least one layer includes:
  • the second similarity measure function is determined according to the second sub-similarity measure function corresponding to each layer.
  • said determining each of said retention factors comprises:
  • the retention factor is greater than or equal to the second preset threshold, it is determined that the retention factor is 1.
  • the method before determining each of the retention factors according to the objective function of the retention factors, the method further includes:
  • the objective function of the second generator determines an objective function of the retention factor.
  • a second aspect of an embodiment of the present disclosure provides an image generation method, the method comprising:
  • the second generator and the second discriminator are obtained by using the method described in the first aspect.
  • a third aspect of an embodiment of the present disclosure provides a device for compressing a network model.
  • the network model to be compressed includes a first generator and a first discriminator, and the device includes: a pruning module and a configuration module;
  • the pruning module is configured to perform pruning processing on the first generator to obtain a second generator
  • the configuration module is configured to configure the states of the convolution kernels in the first discriminator, so that a part of the convolution kernels is in an active state, and another part of the convolution kernels is in an inhibited state, so as to obtain a second discriminator;
  • the loss difference between the first generator and the first discriminator is a first loss difference
  • the loss difference between the second generator and the second discriminator is a second loss difference
  • the absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold
  • a fourth aspect of the embodiments of the present disclosure provides a network model compression device, the device includes:
  • a processor configured to execute the computer program, and when the computer program is executed by the processor, the processor executes the method as described in the first aspect.
  • a fifth aspect of the embodiments of the present disclosure provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method described in the first aspect is implemented.
  • a sixth aspect of the embodiments of the present disclosure provides a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, where the computer program includes program code for executing the method as described in the first aspect .
  • the second generator is obtained by pruning the first generator; and the states of the convolution kernels in the first discriminator are configured so that some of the convolution kernels are active, and the other The convolution kernel is suppressed, and the second discriminator is obtained; where the loss difference between the first generator and the first discriminator is the first loss difference, and the loss difference between the second generator and the second discriminator is The second loss difference, the absolute value of the difference between the first loss difference and the second loss difference is less than a first preset threshold. Since the embodiments of the present disclosure can cooperate to compress the generator and the discriminator, the compressed generator and the compressed discriminator can maintain Nash equilibrium, avoiding the phenomenon of mode collapse.
  • FIG. 1 is a flowchart of a network model compression method provided by an embodiment of the present disclosure
  • Fig. 2 is a flow chart of a network model compression method provided by an embodiment of the present disclosure
  • Fig. 3 is a flowchart of a network model compression method provided by an embodiment of the present disclosure
  • FIG. 4 is a flow chart of an image generation method provided by an embodiment of the present disclosure.
  • Fig. 5 is a schematic structural diagram of a network model compression device provided by an embodiment of the present disclosure.
  • Fig. 6 is a schematic structural diagram of a network model compression device in an embodiment of the present disclosure.
  • the embodiment of the present disclosure provides a network model compression method, by pruning the first generator to obtain the second compression, and configuring the state of the convolution kernel in the first discriminator, Make some of the convolution kernels in the active state, and the other part of the convolution kernels in the suppressed state to obtain the second discriminator, so that the first loss difference between the first generator and the first discriminator is the same as that of the second generator and the second discriminator
  • the second loss difference of is similar, so that the Nash equilibrium can be maintained between the second generator and the second discriminator, and the mode collapse phenomenon can be prevented.
  • Fig. 1 is a flowchart of a network model compression method provided by an embodiment of the present disclosure, and the method may be executed by a network model compression device.
  • the compression device of the network model can be understood as a device with computing functions such as a tablet computer, a notebook computer, a desktop computer, etc. by way of example.
  • the method can compress the network model to be compressed including the first generator and the first discriminator. As shown in Figure 1, the method provided in this embodiment includes the following S110-S120:
  • performing pruning on the first generator includes: selectively deleting convolution kernels in the first generator, so that convolution kernels whose importance is less than a preset importance threshold are deleted , the convolution kernels whose importance is greater than or equal to the preset importance threshold are retained.
  • each convolutional layer (Convolutional layer, CL) in the first generator is correspondingly provided with a batch normalization (Batch Normalization, BN) layer, and each CL includes at least one convolutional kernel.
  • Each convolution kernel is correspondingly provided with a scaling factor in the BN layer corresponding to its CL, wherein the scaling factor is used to characterize the importance of its corresponding convolution kernel.
  • each training includes according to Determine the scaling factor.
  • sort the scaling factors of each convolution kernel from small to large and use binary search to delete the convolution kernel corresponding to the smaller scaling factor until the calculation amount of the first generator Satisfy the preset given calculation amount.
  • each CL in the first generator includes at least one convolution kernel, and each convolution kernel is correspondingly set with a weight parameter, wherein the weight parameter of the convolution kernel is used to characterize the corresponding convolution kernel Importance.
  • the first generator and the first discriminator are trained, specifically, each training includes according to Determine the weight parameter. When the number of training times is reached, sort the weight parameters of each convolution kernel from small to large, and use binary search to delete the convolution kernel corresponding to the smaller weight parameter, until the calculation amount of the first generator Satisfy the preset given calculation amount.
  • the specific number of training times and the specific value of the preset given calculation amount can be set by those skilled in the art according to the actual situation, and are not limited here.
  • the convolution kernel in the suppressed state does not work, and the convolution kernel in the activated state works normally.
  • the loss difference between the first generator and the first discriminator is the first loss difference
  • the loss difference between the second generator and the second discriminator is the second loss difference
  • the first loss difference and the second loss The absolute value of the difference between the differences is smaller than a first preset threshold.
  • the first loss difference is used to characterize the difference between the performance of the first generator and the performance of the first discriminator, which can be calculated according to the objective function of the first generator and the loss function of the first discriminator for fake pictures out.
  • the second loss difference is used to characterize the difference between the performance of the second generator and the performance of the second discriminator, which can be calculated according to the objective function of the second generator and the loss function of the second discriminator for the false picture out.
  • the network model to be compressed is a trained GAN
  • the first generator and the first discriminator are in a state of Nash equilibrium.
  • the first loss difference can be close to the second loss difference, that is, the Nash equilibrium can be maintained between the second generator and the second discriminator, so that it can prevent Mode collapse occurs.
  • those skilled in the art may set the specific implementation manner of configuring the state of the convolution kernel in the first discriminator according to the actual situation, which is not limited here.
  • S120 includes: freezing the retention factors corresponding to each convolution kernel in the second discriminator, determining the first weight parameter of the second discriminator; freezing the first weight parameter and the second discriminator 2.
  • the second weight parameter of the generator determines each retention factor; repeat the above-mentioned operations of determining the first weight parameter of the second discriminator and determining each retention factor until the difference between the first loss difference and the second loss difference The absolute value is smaller than the first preset threshold.
  • the retention factor is used to characterize the importance of its corresponding convolution kernel.
  • a retention factor is configured for each convolution kernel in the first discriminator to obtain a second discriminator, and the initial value of the retention factor corresponding to each convolution kernel may be 1.
  • the value of the retention factor corresponding to the convolution kernel in the activated state may be 1
  • the value of the retention factor corresponding to the convolution kernel in the suppressed state may be 0.
  • the first weight parameter mentioned here includes weight parameters corresponding to other elements in the second discriminator except the retention factor corresponding to the convolution kernel.
  • the second weight parameter includes weight parameters corresponding to each element (such as a convolution kernel) in the second generator.
  • determining the first weight parameter of the second discriminator includes: determining the first weight parameter of the second discriminator according to an objective function of the second discriminator. In a possible implementation, before determining the first weight parameter of the second discriminator according to the objective function of the second discriminator, it may further include determining the second weight parameter of the second generator according to the objective function of the second generator .
  • each retention factor is determined according to an objective function of the retention factor.
  • Exemplarily first, keeping the retention factor constant, by optimizing the objective function of the second generator, determine the second weight parameter of the second generator; by optimizing the objective function of the second discriminator, determine the second weight parameter of the second discriminator A weight parameter. Then, keep the first weight parameter of the second discriminator and the second weight parameter of the second generator unchanged, and determine the retention factor corresponding to each convolution kernel by optimizing the objective function of the retention factor.
  • the training can be ended; when the absolute value of the difference between the first loss difference and the second loss difference is detected Greater than or equal to the first preset threshold, return to continue the operations of "determine the second weight parameter of the second generator, determine the first weight parameter of the second discriminator” and “determine the retention factor corresponding to each convolution kernel” until The absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold.
  • the second generator is obtained by pruning the first generator; and the states of the convolution kernels in the first discriminator are configured so that some of the convolution kernels are active, and the other The convolution kernel is suppressed, and the second discriminator is obtained; where the loss difference between the first generator and the first discriminator is the first loss difference, and the loss difference between the second generator and the second discriminator is The second loss difference, the absolute value of the difference between the first loss difference and the second loss difference is less than a first preset threshold. Since the embodiments of the present disclosure can cooperate to compress the generator and the discriminator, the compressed generator and the compressed discriminator can maintain Nash equilibrium, avoiding the phenomenon of mode collapse.
  • Fig. 2 is a flow chart of a network model compression method provided by an embodiment of the present disclosure. As shown in Fig. 2, the method provided by this embodiment includes the following S210-S280:
  • S220 can be set by those skilled in the art according to actual conditions, and is not limited here.
  • the objective function of the second generator is as follows:
  • L G S represents the objective function of the second generator
  • G S (Z) represents the false picture generated by the second generator according to the noise signal
  • D S (G S (Z)) represents the The response value of the fake picture generated by the generator
  • f S G (-D S ( GS (Z))) represents the loss function of the second generator
  • E(*) represents the expected value of the distribution function
  • p(z) represents the noise distribution .
  • the objective function of the second discriminator is as follows:
  • L D S represents the objective function of the second discriminator
  • D(x) represents the response value of the second discriminator to the real picture
  • f S D (-D(x)) represents the loss of the second discriminator on the real picture function
  • E(*) represents the expected value of the distribution function
  • pdata represents the distribution of the real picture
  • G S (Z) represents the false picture generated by the second generator based on the noise signal
  • D S (G S (Z)) represents the second discrimination
  • f S D (D S (G S (Z))) represents the loss function of the second discriminator on the fake picture generated by the second generator
  • p(z) represents the noise distribution.
  • the objective function of the retention factor is as follows:
  • L arch L D S +
  • L arch represents the objective function of the retention factor
  • L G S represents the objective function of the second generator
  • L D S represents the objective function of the second discriminator
  • L S Dfake represents the loss function of the second discriminator about the fake picture
  • L G T represents the objective function of the first generator
  • L T Dfake represents the loss function of the first discriminator about the false picture
  • G T (Z) represents the false picture generated by the first generator according to the noise signal
  • D T (G T (Z)) represents the response value of the first discriminator to the false picture generated by the first generator
  • f T G (-D T (G T (Z))) represents the loss function of the first generator
  • f T D ( DT (G T (Z))) represents the loss function of the first discriminator with respect to the fake image generated by the first generator
  • E(*) represents the expected value of the distribution function
  • p(z) represents the noise distribution.
  • the retention factors corresponding to each convolution kernel are kept unchanged, and the second weight parameter of the second generator is updated so that the objective function L G S of the second generator becomes smaller, thereby determining the second generator's second weight parameter
  • Two weight parameters updating the first weight parameter of the second discriminator so that the objective function L D S of the second discriminator becomes smaller, thereby determining the first weight parameter of the second discriminator.
  • S260 can be set by those skilled in the art according to actual conditions, and is not limited here.
  • the second weight parameter of the second generator and the first weight parameter of the second discriminator are kept unchanged, and the retention factors corresponding to each convolution kernel are updated, so that the objective function L arch of the retention factor becomes smaller, so that Determine each retention factor.
  • the specific implementation manner of S270 can be set by those skilled in the art according to actual conditions, and is not limited here.
  • the absolute value of the difference between the objective function of the second generator and the loss function of the second discriminator about the fake picture is used as the second loss difference.
  • the second loss difference is as follows:
  • ⁇ L S represents the second loss difference
  • S280 Determine whether the absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold. If yes, end the training; if not, return to S250.
  • S280 can be set by those skilled in the art according to actual conditions, and is not limited here.
  • the absolute value of the difference between the objective function of the first generator and the loss function of the false image of the first discriminator is used as the first loss difference; the first loss difference and the second loss difference are calculated The absolute value of the difference; determine whether the absolute value of the difference between the first loss difference and the second loss difference is less than a first preset threshold.
  • the first loss difference is as follows:
  • ⁇ L T represents the first loss difference
  • L G T and L T Dfake can be referred to above, and will not be repeated here.
  • the objective function of the second generator is determined according to the loss function of the second generator, and the second weight parameter of the second generator is determined according to the objective function of the second generator;
  • the loss function of the second discriminator and the loss function of the second discriminator about the fake picture determine the objective function of the second discriminator, and determine the first weight parameter of the second discriminator according to the objective function of the second discriminator; according to the first generator's
  • the objective function, the loss function of the first discriminator on the fake picture, the objective function of the second generator, the objective function of the second discriminator, and the loss function of the second discriminator on the fake picture determine the objective function of the retention factor, and according to
  • the objective function of the retention factor determines each retention factor, which can reduce the loss function of the second generator and the loss function of the second discriminator on false pictures in the process of approaching the second loss difference to the first loss difference, and improve the second The loss function of the second discriminator with respect to real images, thereby optimizing the performance of the second generator and the second discriminator.
  • Fig. 3 is a flow chart of a network model compression method provided by an embodiment of the present disclosure. As shown in Fig. 3, the method provided by this embodiment includes the following S310-S390:
  • the first generator and the first discriminator are well-trained network models with good accuracy and stability. Using them as a teacher to generate an adversarial network to guide students to generate an adversarial network is beneficial to improve the performance of the second generator. performance. There is usually a distillation loss when the teacher generates an adversarial network to guide the student to generate an adversarial network. The distillation loss can be reduced by optimizing the distillation objective function.
  • the specific determination method of the distillation objective function can be set by those skilled in the art according to the actual situation, which is not limited here.
  • the specific determination method of the distillation objective function is as follows: determine the first similarity metric function according to the similarity of at least one layer of intermediate feature maps in the first generator and the second generator; The fake picture generated by the generator is input into the first discriminator to obtain the first intermediate feature map of at least one layer in the first discriminator; the fake picture generated by the second generator is input to the first discriminator to obtain the first intermediate feature map in the first discriminator The second intermediate feature map of at least one layer; determine the second similarity metric function according to the similarity between the first intermediate feature map of at least one layer and the second intermediate feature map of at least one layer; according to the first similarity metric function and The second similarity measure function determines the distillation objective function.
  • the intermediate feature map of the first generator refers to the output information of a certain layer in the first generator
  • the intermediate feature map of the second generator refers to the output information of a certain layer in the second generator.
  • the intermediate feature map used to determine the first similarity measure function has the following characteristics: the intermediate feature map obtained from the first generator has a one-to-one correspondence with the intermediate feature map obtained from the second generator, that is, when obtained from When obtaining an intermediate feature map of a certain layer (for example, the first layer) in the first generator, it is necessary to obtain an intermediate feature map with the same number of network layers as the intermediate feature map (for example, the first layer) from the second generator.
  • Those skilled in the art can set the intermediate feature maps obtained from which layers of the first generator and the second generator according to the actual situation, which is not limited here.
  • the first similarity metric function is used to characterize the degree of approximation of the intermediate layer information of the first generator and the second generator.
  • a specific manner of determining the first similarity measure function can be set by those skilled in the art according to actual conditions.
  • determining the first similarity metric function according to the similarity of at least one layer of intermediate feature maps in the first generator and the second generator includes: taking the middle of the i-th layer of the first generator The feature map and the intermediate feature map of the i-th layer of the second generator are input to the similarity measurement function to obtain the first sub-similarity measurement function corresponding to the i-th layer; wherein, i is a positive integer, and i is taken from 1-M value, M is the number of layers of the first generator and the second generator; the first similarity measurement function is determined according to the first sub-similarity measurement function corresponding to each layer.
  • the similarity measurement function includes an MSE loss function and a Texture loss function.
  • the similarity measurement function is as follows:
  • O denote two different intermediate feature maps of the input similarity measure function
  • G ij (O) represents the inner product of the feature of the pth channel in the intermediate feature map O and the feature of the qth channel
  • c l represents the input
  • the intermediate feature map of this similarity measure function The number of channels of O, Represents the MSE loss function.
  • the first similarity measurement function is as follows:
  • D denotes the first similarity measure function
  • L G denotes the number of intermediate feature maps obtained from the first generator
  • p(z) denotes the noise distribution
  • Indicates the first sub-similarity measure function corresponding to the i-th layer Denotes the intermediate feature map obtained from the i-th layer of the second generator, Denotes the intermediate feature map obtained from the i-th layer of the first generator, Represents a learnable 1x1 size convolutional layer for The number of channels converted to and the same number of channels.
  • the first intermediate feature map refers to the output information of a certain layer in the first discriminator when the fake picture generated by the first generator is input to the first discriminator;
  • the second intermediate feature map refers to the output information of the second When the fake image generated by the generator is input to the first discriminator, the output information of a certain layer in the first discriminator.
  • the first discriminator of the graph has the advantage of being highly correlated with the generation tasks of the first generator and the second generator, and having a good ability to distinguish the authenticity of the input image.
  • first intermediate feature map there is a one-to-one correspondence between the first intermediate feature map and the second intermediate feature map, that is, when the first intermediate feature map is the intermediate feature map of a certain layer (for example, the first layer) in the first discriminator, the corresponding second intermediate feature map
  • the intermediate feature map is the intermediate feature map of this layer (eg, the first layer) in the second discriminator.
  • Those skilled in the art can set the intermediate feature maps obtained from the layers of the first discriminator and the second discriminator according to the actual situation, which is not limited here.
  • the second similarity metric function is used to characterize the degree of approximation of the intermediate layer information of the first discriminator and the second discriminator.
  • a specific manner of determining the second similarity measure function can be set by those skilled in the art according to actual conditions.
  • determining the second similarity metric function according to the similarity between the first intermediate feature map and the second intermediate feature map of each layer includes: combining the first intermediate feature map and the second intermediate feature map corresponding to the jth layer The intermediate feature map is input to the similarity measurement function to obtain the second sub-similarity measurement function corresponding to the jth layer; where j is a positive integer, 1 ⁇ j ⁇ N, and j takes a value from 1-N, and N is the first The number of layers of the discriminator; the second similarity measurement function is determined according to the second sub-similarity measurement function corresponding to each layer.
  • the second similarity measurement function is as follows:
  • D2 denotes the second similarity measure function
  • LD denotes the number of intermediate feature maps obtained from the first discriminator
  • p(z) denotes the noise distribution
  • Indicates the second sub-similarity metric function corresponding to the jth layer Indicates the first intermediate feature map corresponding to the jth layer, Indicates the second intermediate feature map corresponding to the jth layer.
  • the specific implementation manner of determining the distillation objective function according to the first similarity metric function and the second similarity metric function can be set by those skilled in the art according to the actual situation, which is not limited here.
  • the distillation objective function is determined by summing the first similarity metric function and the second similarity metric function according to weights.
  • the distillation objective function is determined by directly summing the first similarity measure function and the second similarity measure function.
  • distillation objective function is as follows:
  • D1 represents the first similarity measure function
  • D2 represents the second similarity measure function
  • the distillation objective function and the objective function components determined according to the loss function of the second generator are directly summed to determine the objective function of the second generator.
  • the distillation objective function and the objective function components determined according to the loss function of the second generator are summed according to weights to determine the objective function of the second generator.
  • the objective function of the second generator is as follows:
  • (L G S )' represents the objective function component determined according to the loss function of the second generator.
  • (L G S )′ can be the objective function of the second generator, as shown in Figure 2, the example in the method, therefore, for the specific explanation of (L G S )′, please refer to the above, and will not repeat it here .
  • L distill represents the distillation objective function
  • represents the weight parameter of the distillation objective function.
  • S340 Determine an objective function of the second discriminator according to the loss function of the second discriminator with respect to the real picture and the loss function of the second discriminator with respect to the fake picture.
  • S350 determine according to the objective function of the second generator, the objective function of the second discriminator, the loss function of the second discriminator about the fake picture, the objective function of the first generator, and the loss function of the first discriminator about the fake picture Objective function for the retention factor.
  • the second preset threshold can be set by those skilled in the art according to actual conditions, and is not limited here.
  • the process can be accelerated.
  • S390 Determine whether the absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold. If yes, end the training; if not, return to S360.
  • the intermediate layer feature maps in the first generator and the first discriminator are used as additional supervision information to help the second generator generate high-quality images through the distillation method. In this way, a lightweight second generator can be obtained, and it can be ensured that the lightweight second generator can generate high-quality images.
  • Fig. 4 is a flowchart of an image generation method provided by an embodiment of the present disclosure, and the method may be executed by an electronic device.
  • the electronic device can be interpreted as a device with a computing function, such as a tablet computer, a notebook computer, a desktop computer, etc. by way of example.
  • the method provided in this embodiment includes the following S410-S420:
  • the second generator and the second discriminator are obtained by using the method in any one of the above-mentioned Fig. 1-Fig. 3 embodiments.
  • the second generator for generating false images and the second discriminator for outputting false images are obtained by using the network model compression method provided in the embodiments of the present disclosure. Since the second generator and the second The discriminator is relatively lightweight, so even if it is deployed on resource-constrained edge devices, it will not generate long delays.
  • Fig. 5 is a schematic structural diagram of an apparatus for compressing a network model provided by an embodiment of the present disclosure.
  • the apparatus for compressing a network model may be understood as the above-mentioned network model compression device or some functional modules in the above-mentioned network model compression device.
  • the compression device of the network model can compress the network model to be compressed, the network model to be compressed includes a first generator and a first discriminator, as shown in Figure 5, the compression device 500 of the network model includes: a pruning module and a configuration module 520;
  • the pruning module 510 is configured to perform pruning processing on the first generator to obtain a second generator
  • the configuration module 520 is configured to configure the states of the convolution kernels in the first discriminator, so that a part of the convolution kernels is in an active state, and another part of the convolution kernels is in an inhibited state, so as to obtain a second discriminator;
  • the loss difference between the first generator and the first discriminator is a first loss difference
  • the loss difference between the second generator and the second discriminator is a second loss difference
  • the absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold
  • the configuration module 520 includes:
  • the first determining submodule can be configured to freeze the retention factors corresponding to the convolution kernels in the second discriminator, and determine the first weight parameter of the second discriminator; wherein the retention factor is used to characterize its The importance of the corresponding convolution kernel;
  • the second determining submodule may be configured to freeze the first weight parameter of the second discriminator and the second weight parameter of the second generator, and determine each of the retention factors;
  • the repetition submodule may be configured to repeat the above-mentioned operations of determining the first weight parameter of the second discriminator and determining each of the retention factors until the difference between the first loss difference and the second loss difference is The absolute value of the value is smaller than the first preset threshold.
  • the first weight parameter includes weight parameters corresponding to other elements in the second discriminator except the retention factor.
  • the second weight parameter includes weight parameters corresponding to each element in the second generator.
  • the first determining submodule includes:
  • the first determining unit may be configured to determine a second weight parameter of the second generator according to an objective function of the second generator
  • the second determining unit may be configured to determine a first weight parameter of the second discriminator according to an objective function of the second discriminator.
  • the device also includes:
  • the objective function determination module of the second generator may be configured to, before determining the second weight parameter of the second generator according to the objective function of the second generator, according to the loss of the second generator a function determining an objective function of said second generator;
  • the objective function determining module of the second discriminator may be configured to determine the objective function of the second discriminator according to the loss function of the second discriminator on real pictures and the loss function of the second discriminator on fake pictures .
  • the device further includes a teacher and student determination module, which can be configured to combine the first The generator and the first discriminator are used as a teacher to generate an adversarial network, and the second generator and the second discriminator are used as a student to generate an adversarial network;
  • the objective function determination module of the second generator can be specifically configured to determine the second The objective function of the generator.
  • the objective function determination module of the second generator may specifically be configured to add the distillation objective function and the objective function components determined according to the loss function of the second generator according to weights , determine the objective function of the second generator.
  • the device also includes:
  • the first similarity measure function determining module may be configured to determine the second generator based on the distillation objective function between the teacher generated confrontation network and the student generated confrontation network, and the loss function of the second generator. Before the objective function of the second generator, a first similarity measurement function is determined according to the similarity of at least one layer of intermediate feature maps in the first generator and the second generator;
  • the first intermediate feature map obtaining module may be configured to input the fake picture generated by the first generator into the first discriminator to obtain a first intermediate feature map of at least one layer in the first discriminator;
  • the second intermediate feature map obtaining module may be configured to input the fake picture generated by the second generator into the first discriminator to obtain a second intermediate feature map of at least one layer in the first discriminator;
  • the second similarity measure function determination module may be configured to determine a second similarity measure function according to the similarity between the first intermediate feature map of the at least one layer and the second intermediate feature map of the at least one layer ;
  • the distillation objective function determination module may be configured to determine the distillation objective function according to the first similarity measure function and the second similarity measure function.
  • the first similarity measure function determination module includes:
  • the first sub-similarity measure function determination submodule may be configured to input the intermediate feature map of the i-th layer of the first generator and the intermediate feature map of the i-th layer of the second generator into the similarity measure function, Obtain the first sub-similarity measurement function corresponding to the i-th layer; wherein, i is a positive integer, and i takes a value from 1-M, and M is the number of layers of the first generator and the second generator;
  • the first similarity measure function determining submodule may be configured to determine the first similarity measure function according to the first sub-similarity measure function corresponding to each layer.
  • the second similarity measure function determination module includes:
  • the second sub-similarity measurement function determination submodule can be configured to input the first intermediate feature map and the second intermediate feature map corresponding to the j-th layer into the similarity measurement function to obtain the second sub-similarity map corresponding to the j-th layer Similarity measurement function; wherein, j is a positive integer, 1 ⁇ j ⁇ N, and j takes a value from 1-N, and N is the number of layers of the first discriminator;
  • the second similarity measure function determining submodule may be configured to determine the second similarity measure function according to the second sub-similarity measure function corresponding to each layer.
  • the second determining submodule includes:
  • a third determination unit may be configured to determine each of the retention factors according to an objective function of the retention factors
  • the fourth determining unit may be configured to determine that the retention factor is 0 when the retention factor is smaller than a second preset threshold
  • the fifth determining unit may be configured to determine that the retention factor is 1 when the retention factor is greater than or equal to the second preset threshold.
  • the device further includes an objective function determination module of the retention factor, which can be configured to determine each of the retention factors according to the objective function of the retention factor, according to the objective of the second generator function, the objective function of the second discriminator, the loss function of the second discriminator on fake pictures, the objective function of the first generator, and the loss function of the first discriminator on fake pictures determine the Objective function for the retention factor described above.
  • an objective function determination module of the retention factor can be configured to determine each of the retention factors according to the objective function of the retention factor, according to the objective of the second generator function, the objective function of the second discriminator, the loss function of the second discriminator on fake pictures, the objective function of the first generator, and the loss function of the first discriminator on fake pictures determine the Objective function for the retention factor described above.
  • the device provided in this embodiment is capable of executing the method in any of the above-mentioned embodiments in FIG. 1-FIG. 3 , and its execution mode and beneficial effect are similar, and details are not repeated here.
  • FIG. 6 is a schematic structural diagram of a network model compression device in an embodiment of the present disclosure.
  • the compression device 600 of the network model in the embodiment of the present disclosure may include, but not limited to, devices such as mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), Mobile terminals such as vehicle-mounted terminals (eg, vehicle-mounted navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
  • the compression device of the network model shown in FIG. 6 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • the network model compression device 600 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 601, which may be stored in a read-only memory (ROM) 602 or loaded from a storage device 608. Various appropriate actions and processes are executed by accessing programs in the random access memory (RAM) 603 . In the RAM 603, various programs and data necessary for the operation of the compression device 600 of the network model are also stored.
  • the processing device 601, ROM 602, and RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604 .
  • the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 607 such as a computer; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609.
  • the communication means 609 may allow the network model compression device 600 to perform wireless or wired communication with other devices to exchange data. While FIG. 6 shows a compression device 600 having a network model of various devices, it should be understood that implementing or possessing all of the devices shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602.
  • the processing device 601 When the computer program is executed by the processing device 601, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the compression device of the above-mentioned network model; or it may exist independently without being assembled into the compression device of the network model.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the compression device of the network model, the compression device of the network model: performs pruning processing on the first generator to obtain the first generator Two generators; the state of the convolution kernel in the first discriminator is configured so that a part of the convolution kernels are in an active state, and the other part of the convolution kernels are in a suppressed state to obtain a second discriminator; wherein, the first generator The loss difference between the first discriminator and the first discriminator is the first loss difference, the loss difference between the second generator and the second discriminator is the second loss difference, and the difference between the first loss difference and the second loss difference The absolute value is smaller than the first preset threshold.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • An embodiment of the present disclosure also provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the method of any one of the above-mentioned embodiments in FIGS. 1-3 can be implemented. Its execution method and beneficial effect are similar, and will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种网络模型的压缩方法、图像生成方法、装置、设备及介质。其中,网络模型的压缩方法包括:对第一生成器进行剪枝处理,得到第二生成器(S110);对第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器(S120);其中,第一生成器和第一辨别器之间的损失差为第一损失差,第二生成器和第二辨别器之间的损失差为第二损失差,第一损失差和第二损失差之间的差值绝对值小于第一预设阈值。由于本方法可协同压缩生成器和辨别器,因此压缩后的生成器和压缩后的辨别器可保持纳什均衡,避免了模式坍塌现象。

Description

网络模型的压缩方法、图像生成方法、装置、设备及介质
本申请要求于2021年09月24日提交中国专利局、申请号为202111122307.5、申请名称为“网络模型的压缩方法、图像生成方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,尤其涉及一种网络模型的压缩方法、图像生成方法、装置、设备及介质。
背景技术
生成式对抗网络(Generative Adversarial Networks,GAN)是一种深度学习模型,是近年来复杂分布上无监督学习最具前景的方法之一,广泛应用于各种图像合成任务,例如图像生成、图像分辨率、超分辨率。
然而GAN庞大的计算量和内存需求严重阻碍其部署到资源受限的边缘设备上。因此,相关技术中通过压缩GAN中的生成器来解决该问题,但是,这会破坏生成器和辨别器之间的纳什均衡,导致生成器生成图像出现模式坍塌的现象。
发明内容
为了解决上述技术问题或者至少部分地解决上述技术问题,本公开提供了一种网络模型的压缩方法、图像生成方法、装置、设备及介质。
本公开实施例的第一方面提供了一种网络模型的压缩方法,待压缩的网络模型包括第一生成器和第一辨别器,该方法包括:
对所述第一生成器进行剪枝处理,得到第二生成器;
对所述第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器;
其中,所述第一生成器和所述第一辨别器之间的损失差为第一损失差,所述第二生成器和所述第二辨别器之间的损失差为第二损失差,所述第一损失差和所述第二损失差之间的差值绝对值小于第一预设阈值。
在一种实施方式中,所述对所述第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器,包括:
冻结所述第二辨别器中的各卷积核对应的保留因子,确定所述第二辨别器的第一权重参数;其中,所述保留因子用于表征其对应的卷积核的重要程度;
冻结所述第二辨别器的第一权重参数和所述第二生成器的第二权重参数,确定各所述保留因子;
重复执行上述确定所述第二辨别器的第一权重参数以及确定各所述保留因子的操作,直至所述第一损失差和所述第二损失差之间的差值绝对值小于所述第一预设阈值。
在一种实施方式中,所述第一权重参数包括所述第二辨别器中除去所述保留因子之外的其它要素对应的权重参数。
在一种实施方式中,所述第二权重参数包括所述第二生成器中各要素对应的权重参数。
在一种实施方式中,所述确定所述第二辨别器的第一权重参数,包括:
根据所述第二辨别器的目标函数确定所述第二辨别器的第一权重参数;
所述方法还包括:
根据所述第二生成器的目标函数确定所述第二生成器的第二权重参数。
在一种实施方式中,所述根据所述第二生成器的目标函数确定所述第二生成器的第二权重参数之前,所述方法还包括:
根据所述第二生成器的损失函数确定所述第二生成器的目标函数;
根据所述第二辨别器关于真实图片的损失函数、以及所述第二辨别器关于虚假图片的损失函数确定所述第二辨别器的目标函数。
在一种实施方式中,所述根据所述第二生成器的损失函数确定所述第二生成器的目标函数之前,所述方法还包括:
将所述第一生成器和所述第一辨别器作为教师生成对抗网络,将所述第二生成器和所述第二辨别器作为学生生成对抗网络;
所述根据所述第二生成器的损失函数确定所述第二生成器的目标函数,包括:
根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及所述第二生成器的损失函数确定所述第二生成器的目标函数。
在一种实施方式中,所述根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及所述第二生成器的损失函数确定所述第二生成器的目标函数,包括:
将所述蒸馏目标函数、以及根据所述第二生成器的损失函数确定的目标函数分量,按照权重进行加和,确定所述第二生成器的目标函数。
在一种实施方式中,所述根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及所述第二生成器的损失函数确定所述第二生成器的目标函数之前,所述方法还包括:
根据所述第一生成器和所述第二生成器中的至少一层中间特征图的相似度确定第一相似性度量函数;
将所述第一生成器生成的虚假图片输入所述第一辨别器,得到所述第一辨别器中的至少一层的第一中间特征图;
将所述第二生成器生成的虚假图片输入所述第一辨别器,得到所述第一辨别器中的至少一层的第二中间特征图;
根据所述至少一层的所述第一中间特征图和所述至少一层的所述第二中间特征图的相似度确定第二相似性度量函数;
根据所述第一相似性度量函数和所述第二相似性度量函数确定所述蒸馏目标函数。
在一种实施方式中,所述根据所述第一生成器和所述第二生成器中的至少一层中间特征图的相似度确定第一相似性度量函数,包括:
将所述第一生成器的第i层的中间特征图和所述第二生成器的第i层的中间特征图输入相似性度量函数,得到第i层对应的第一子相似性度量函数;其中,i为正整数,且i从1-M中取值,M为所述第一生成器和所述第二生成器的层数;
根据各层对应的所述第一子相似性度量函数确定所述第一相似性度量函数。
在一种实施方式中,所述根据所述至少一层的所述第一中间特征图和所述至少一层的 所述第二中间特征图的相似度确定第二相似性度量函数,包括:
将第j层对应的所述第一中间特征图和所述第二中间特征图输入相似性度量函数,得到第j层对应的第二子相似性度量函数;其中,j为正整数,1≤j≤N,且j从1-N中取值,N为所述第一辨别器的层数;
根据各层对应的所述第二子相似性度量函数确定所述第二相似性度量函数。
在一种实施方式中,所述确定各所述保留因子,包括:
根据所述保留因子的目标函数确定各所述保留因子;
当所述保留因子小于第二预设阈值时,确定所述保留因子为0;
当所述保留因子大于或等于所述第二预设阈值时,确定所述保留因子为1。
在一种实施方式中,所述根据所述保留因子的目标函数确定各所述保留因子之前,所述方法还包括:
根据所述第二生成器的目标函数、所述第二辨别器的目标函数、所述第二辨别器关于虚假图片的损失函数、所述第一生成器的目标函数、以及所述第一辨别器关于虚假图片的损失函数确定所述保留因子的目标函数。
本公开实施例的第二方面提供了一种图像生成方法,该方法包括:
将随机噪声信号输入至第二生成器中,以使所述第二生成器根据所述随机噪声信号生成虚假图像;
将所述虚假图像输入所述第二辨别器,以使所述第二辨别器辨别所述虚假图像为真后输出所述虚假图像;
其中,所述第二生成器和所述第二辨别器采用如第一方面所述的方法得到。
本公开实施例的第三方面提供了一种网络模型的压缩装置,待压缩的网络模型包括第一生成器和第一辨别器,所述装置包括:剪枝模块和配置模块;
所述剪枝模块,配置为对所述第一生成器进行剪枝处理,得到第二生成器;
所述配置模块,配置为对所述第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器;
其中,所述第一生成器和所述第一辨别器之间的损失差为第一损失差,所述第二生成器和所述第二辨别器之间的损失差为第二损失差,所述第一损失差和所述第二损失差之间的差值绝对值小于第一预设阈值。
本公开实施例的第四方面提供了一种网络模型的压缩设备置,该设备包括:
存储器,所述存储器中存储有计算机程序;
处理器,用于执行所述计算机程序,当所述计算机程序被所述处理器执行时,所述处理器执行如第一方面所述的方法。
本公开实施例的第五方面提供了一种计算机可读存储介质,所述存储介质中存储有计算机程序,当所述计算机程序被处理器执行时,实现如第一方面所述的方法。
本公开实施例的第六方面提供了一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行如第一方面所述的方法的程序代码。
本公开实施例提供的技术方案与现有技术相比具有如下优点:
本公开实施例,通过对第一生成器进行剪枝处理,得到第二生成器;并对第一辨别器 中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器;其中,第一生成器和第一辨别器之间的损失差为第一损失差,第二生成器和第二辨别器之间的损失差为第二损失差,第一损失差和第二损失差之间的差值绝对值小于第一预设阈值。由于本公开实施例可协同压缩生成器和辨别器,因此压缩后的生成器和压缩后的辨别器可保持纳什均衡,避免了模式坍塌现象。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本公开实施例提供的一种网络模型的压缩方法的流程图;
图2是本公开实施例提供的一种网络模型的压缩方法的流程图;
图3是本公开实施例提供的一种网络模型的压缩方法的流程图;
图4是本公开实施例提供的一种图像生成方法的流程图;
图5是本公开实施例提供的一种网络模型的压缩装置的结构示意图;
图6是本公开实施例中的一种网络模型的压缩设备的结构示意图。
具体实施方式
为了能够更清楚地理解本公开的上述目的、特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。
对于模型较大的GAN,其耗费的计算资源通常较多,当其应用于手机等计算能力较差的设备上时,延时较长,不能够满足实时的应用要求。因此,相关技术中,通过压缩生成器来减小GAN的总体模型大小。但是,申请人发现仅压缩生成器而保留辨别器的结构不变时,会出现模式坍塌现象。
申请人经研究发现,对于训练好的GAN,它的生成器和辨别器处于一种势均力敌的状态,对生成器进行压缩后,生成器的性能会有所下降,而辨别器的结构保持不变,即辨别器的性能保持不变,如此,生成器和辨别器之间的纳什均衡被打破,导致产生模式坍塌现象。
有鉴于此,本公开实施例提供了一种网络模型的压缩方法,通过对第一生成器进行剪枝处理,得到第二压缩,并对第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器,使得第一生成器和第一辨别器的第一损失差与第二生成器和第二辨别器的第二损失差相近,进而使得第二生成器和第二辨别器之间能够维持纳什均衡,防止出现模式坍塌现象。下面结合具体的实施例对 该方法进行介绍。
图1是本公开实施例提供的一种网络模型的压缩方法的流程图,该方法可以由一种网络模型的压缩设备来执行。该网络模型的压缩设备可以示例性的理解为诸如平板电脑、笔记本电脑、台式机等具有计算功能的设备。该方法可以对包括第一生成器和第一辨别器的待压缩网络模型进行压缩。如图1所示,本实施例提供的方法包括如下S110-S120:
S110、对第一生成器进行剪枝处理,得到第二生成器。
具体地,对第一生成器进行剪枝处理的具体实施方式,本领域技术人员可根据实际情况设置,此处不作限定。在一种可能的实施方式中,对第一生成器进行剪枝处理包括:对第一生成器中的卷积核进行选择性删除,使得重要程度小于预设重要程度阈值的卷积核被删除,重要程度大于等于预设重要程度阈值的卷积核被保留。
示例性地,第一生成器中的每个卷积层(Convolutional layer,CL)对应设置有批归一化(Batch Normalization,BN)层,每个CL中包括至少一个卷积核。每个卷积核在其所属的CL对应的BN层中对应设置有缩放因子,其中,缩放因子用于表征其对应的卷积核的重要程度。将每个卷积核对应的缩放因子通过直接加和的方式添加至第一生成器的目标函数中,得到
Figure PCTCN2022119638-appb-000001
其中,L G T为第一生成器的目标函数,A为第一生成器中卷积核的数量,scale(a)为第a个卷积核的缩放因子。然后,对第一生成器和第一辨别器进行训练,具体地,每次训练中包括根据
Figure PCTCN2022119638-appb-000002
确定缩放因子,当训练次数达到后,将各卷积核的缩放因子按照从小到大进行排序,采用二分查找的方式删除较小的缩放因子对应的卷积核,直至第一生成器的计算量满足预设给定计算量。
示例性地,第一生成器中的每个CL中包括至少一个卷积核,每个卷积核对应设置有权重参数,其中,卷积核的权重参数用于表征其对应的卷积核的重要程度。将每个卷积核对应的权重参数通过直接加和的方式添加至第一生成器的目标函数中,得到
Figure PCTCN2022119638-appb-000003
其中,L G T为第一生成器的目标函数,A为第一生成器中卷积核的数量,L(a)为第a个卷积核的权重参数。然后,对第一生成器和第一辨别器进行训练,具体地,每次训练中包括根据
Figure PCTCN2022119638-appb-000004
确定权重参数,当训练次数达到后,将各卷积核的权重参数按照从小到大进行排序,采用二分查找的方式删除较小的权重参数对应的卷积核,直至第一生成器的计算量满足预设给定计算量。
其中,在上述两种方式中,具体训练次数、以及预设给定计算量的具体值,本领域技术人员可根据实际情况设置,此处不作限定。
S120、对第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器。
具体地,当压缩后的网络模型投入使用时,处于抑制状态的卷积核不工作,处于激活状态的卷积核正常工作。
其中,第一生成器和第一辨别器之间的损失差为第一损失差,第二生成器和第二辨别器之间的损失差为第二损失差,第一损失差和第二损失差之间的差值绝对值小于第一预设阈值。
具体地,第一损失差用于表征第一生成器的性能和第一辨别器的性能之间的差异,可根据第一生成器的目标函数和第一辨别器关于虚假图片的损失函数计算得出。同理,第二损失差用于表征第二生成器的性能和第二辨别器的性能之间的差异,可根据第二生成器的目标函数和第二辨别器关于虚假图片的损失函数计算得出。
可以理解的是,由于待压缩的网络模型为训练好的GAN,因此,第一生成器和第一辨别器之间处于纳什均衡的状态,当对第一生成器进行剪枝处理,并对第一辨别器中的卷积核的状态进行配置之后,可使第一损失差与第二损失差相近,即可使第二生成器和第二辨别器之间能够维持纳什均衡,如此,可防止出现模式坍塌现象。
具体地,对第一辨别器中的卷积核的状态进行配置的具体实施方式,本领域技术人员可根据实际情况设置,此处不作限定。
在一种可能的实施方式中,S120包括:冻结第二辨别器中各卷积核对应的保留因子,确定第二辨别器的第一权重参数;冻结第二辨别器的第一权重参数和第二生成器的第二权重参数,确定各保留因子;重复执行上述确定第二辨别器的第一权重参数以及确定各保留因子的操作,直至第一损失差和第二损失差之间的差值绝对值小于第一预设阈值。
其中,保留因子用于表征其对应的卷积核的重要程度。
具体地,为第一辨别器中各卷积核配置保留因子,得到第二辨别器,各卷积核对应的保留因子的初始值可以为1。最终得到的第二辨别器中,处于激活状态的卷积核对应的保留因子的值可以为1,处于抑制状态的卷积核对应的保留因子的值可以为0。
具体地,这里所述的第一权重参数包括第二辨别器中除去卷积核对应的保留因子之外的其它要素对应的权重参数。这里所述第二权重参数包括第二生成器中各要素(例如卷积核)对应的权重参数。
具体地,确定第二辨别器的第一权重参数的具体实施方式,本领域技术人员可根据实际情况设置,此处不作限定。在一种可能的实施方式中,确定第二辨别器的第一权重参数包括:根据第二辨别器的目标函数确定第二辨别器的第一权重参数。在一种可能的实施方式中,在根据第二辨别器的目标函数确定第二辨别器的第一权重参数之前还可以包括根据第二生成器的目标函数确定第二生成器的第二权重参数。
具体地,确定各保留因子的具体实施方式,本领域技术人员可根据实际情况设置,此处不作限定。在一种可能的实施方式中,根据保留因子的目标函数确定各保留因子。
示例性地,首先,保持保留因子不变,通过优化第二生成器的目标函数,确定第二生成器的第二权重参数;通过优化第二辨别器的目标函数,确定第二辨别器的第一权重参数。然后,保持第二辨别器的第一权重参数和第二生成器的第二权重参数不变,通过优化保留因子的目标函数,确定各卷积核对应的保留因子。当检测到第一损失差和第二损失差之间的差值绝对值小于第一预设阈值,则可以结束训练;当检测到第一损失差和第二损失差之 间的差值绝对值大于等于第一预设阈值,返回继续执行“确定第二生成器的第二权重参数,确定第二辨别器的第一权重参数”以及“确定各卷积核对应的保留因子”的操作,直至第一损失差和第二损失差之间的差值绝对值小于第一预设阈值。
本公开实施例,通过对第一生成器进行剪枝处理,得到第二生成器;并对第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器;其中,第一生成器和第一辨别器之间的损失差为第一损失差,第二生成器和第二辨别器之间的损失差为第二损失差,第一损失差和第二损失差之间的差值绝对值小于第一预设阈值。由于本公开实施例可协同压缩生成器和辨别器,因此压缩后的生成器和压缩后的辨别器可保持纳什均衡,避免了模式坍塌现象。并且,通过交替确定各保留因子以及第二生成器的第二权重参数和第二辨别器的第一权重参数,直至第一损失差和第二损失差之间的差值绝对值小于第一预设阈值,可在将第二损失差逼近第一损失差的过程中优化第二生成器和第二辨别器的性能。
图2是本公开实施例提供的一种网络模型的压缩方法的流程图,如图2所示,本实施例提供的方法包括如下S210-S280:
S210、对第一生成器进行剪枝处理,得到第二生成器。
S220、根据第二生成器的损失函数确定第二生成器的目标函数。
具体地,S220的具体实施方式本领域技术人员可根据实际情况设置,此处不限定。
示例性地,第二生成器的目标函数如下:
L G S=E z~p(z)[f S G(-D S(G S(Z)))];
其中,L G S表示第二生成器的目标函数,G S(Z)表示第二生成器根据噪声信号生成的虚假图片,D S(G S(Z))表示第二辨别器对第二生成器生成的虚假图片的响应值,f S G(-D S(G S(Z)))表示第二生成器的损失函数,E(*)表示分布函数的期望值,p(z)表示噪声分布。
S230、根据第二辨别器关于真实图片的损失函数、以及第二辨别器关于虚假图片的损失函数确定第二辨别器的目标函数。
具体地,S230的具体实施方式本领域技术人员可根据实际情况设置,此处不限定。
示例性地,第二辨别器的目标函数如下:
L D S=E x~pdata[f S D(-D(x))]+E z~p(z)[f S D(D S(G S(Z)))];
其中,L D S表示第二辨别器的目标函数,D(x)表示第二辨别器对真实图片的响应值,f S D(-D(x))表示第二辨别器关于真实图片的损失函数,E(*)表示分布函数的期望值,pdata表示真实图片的分布,G S(Z)表示第二生成器根据噪声信号生成的虚假图片,D S(G S(Z))表示第二辨别器对第二生成器生成的虚假图片的响应值,f S D(D S(G S(Z)))表示第二辨别器关于第二生成器生成的虚假图片的损失函数,p(z)表示噪声分布。
S240、根据第二生成器的目标函数、第二辨别器的目标函数、第二辨别器关于虚假图片的损失函数、第一生成器的目标函数、以及第一辨别器关于虚假图片的损失函数确定保留因子的目标函数。
具体地,S240的具体实施方式本领域技术人员可根据实际情况设置,此处不限定。
示例性地,保留因子的目标函数如下:
L arch=L D S+||L G S-L S Dfake||-||L G T-L T Dfake||;
L S Dfake=E z~p(z)[f S D(D S(G S(Z)))];
L G T=E z~p(z)[f T G(-D T(G T(Z)))];
L T Dfake=E z~p(z)[f T D(D T(G T(Z)))];
其中,L arch表示保留因子的目标函数,L G S表示第二生成器的目标函数,L D S表示第二辨别器的目标函数,L S Dfake表示第二辨别器关于虚假图片的损失函数,其具体解释请见上文,此处不再赘述。L G T表示第一生成器的目标函数,L T Dfake表示第一辨别器关于虚假图片的损失函数,G T(Z)表示第一生成器根据噪声信号生成的虚假图片,D T(G T(Z))表示第一辨别器对第一生成器生成的虚假图片的响应值,f T G(-D T(G T(Z)))表示第一生成器的损失函数,f T D(D T(G T(Z)))表示第一辨别器关于第一生成器生成的虚假图片的损失函数,E(*)表示分布函数的期望值,p(z)表示噪声分布。
S250、冻结第二辨别器中的各卷积核对应的保留因子,根据第二生成器的目标函数确定第二生成器的第二权重参数;根据第二辨别器的目标函数确定第二辨别器的第一权重参数。
具体地,S250的具体实施方式本领域技术人员可根据实际情况设置,此处不限定。
示例性地,保持各卷积核对应的保留因子不变,更新第二生成器的第二权重参数,以使第二生成器的目标函数L G S变小,从而确定第二生成器的第二权重参数;更新第二辨别器的第一权重参数,以使第二辨别器的目标函数L D S变小,从而确定第二辨别器的第一权重参数。
S260、冻结第二辨别器的第一权重参数和第二生成器的第二权重参数,根据保留因子的目标函数确定保留因子。
具体地,S260的具体实施方式本领域技术人员可根据实际情况设置,此处不限定。
示例性地,保持第二生成器的第二权重参数和第二辨别器的第一权重参数不变,更新各卷积核对应的保留因子,以使保留因子的目标函数L arch变小,从而确定各保留因子。
S270、根据第二生成器的目标函数和第二辨别器的关于虚假图片的损失函数确定第二生成器和第二辨别器之间的第二损失差。
具体地,S270的具体实施方式本领域技术人员可根据实际情况设置,此处不限定。在 一种可能的实施方式中,将第二生成器的目标函数和第二辨别器的关于虚假图片的损失函数的差值的绝对值作为第二损失差。
示例性地,第二损失差如下:
ΔL S=|L G S-L S Dfake|;
其中,ΔL S表示第二损失差,L G S和L S Dfake的具体解释可参见上文,此处不作赘述。
S280、判断第一损失差和第二损失差之间的差值绝对值是否小于第一预设阈值。若是,则结束训练;若否,则返回执行S250。
具体地,S280的具体实施方式本领域技术人员可根据实际情况设置,此处不限定。
在一种可能的实施方式中,将第一生成器的目标函数和第一辨别器的虚假图片的损失函数的差值的绝对值作为第一损失差;计算第一损失差和第二损失差的差值绝对值;判断第一损失差和第二损失差的差值绝对值是否小于第一预设阈值。
示例性地,第一损失差如下:
ΔL T=|L G T-L T Dfake|;
其中,ΔL T表示第一损失差,L G T和L T Dfake的具体解释可参见上文,此处不作赘述。
则第一损失差和第二损失差的差值绝对值如下:
ΔL=|ΔL S-ΔL T|;
然后,判断第一损失差和第二损失差的差值绝对值ΔL是否小于第一预设阈值;若是,则结束训练;若否,则返回执行S250。
本公开实施例,根据第二生成器的损失函数确定第二生成器的目标函数,并根据第二生成器的目标函数确定第二生成器的第二权重参数;根据第二辨别器关于真实图片的损失函数、以及第二辨别器关于虚假图片的损失函数确定第二辨别器的目标函数,并根据第二辨别器的目标函数确定第二辨别器的第一权重参数;根据第一生成器的目标函数、第一辨别器关于虚假图片的损失函数、第二生成器的目标函数、第二辨别器的目标函数、以及第二辨别器关于虚假图片的损失函数确定保留因子的目标函数,并根据保留因子的目标函数确定各保留因子,可在将第二损失差逼近第一损失差的过程中降低第二生成器的损失函数、以及第二辨别器关于虚假图片的损失函数,并且,提高第二辨别器关于真实图片的损失函数,从而优化第二生成器和第二辨别器的性能。
图3是本公开实施例提供的一种网络模型的压缩方法的流程图,如图3所示,本实施例提供的方法包括如下S310-S390:
S310、对所述第一生成器进行剪枝处理,得到第二生成器。
S320、将第一生成器和第一辨别器作为教师生成对抗网络,将第二生成器和第二辨别器作为学生生成对抗网络。
S330、根据教师生成对抗网络和学生生成对抗网络之间的蒸馏目标函数、以及第二生成器的损失函数确定第二生成器的目标函数。
具体地,第一生成器和第一辨别器为训练好的网络模型,具有较好的精度和稳定性, 将其作为教师生成对抗网络指导学生生成对抗网络学习,有利于提高第二生成器的性能。教师生成对抗网络指导学生生成对抗网络学习时通常存在蒸馏损失,可以通过优化蒸馏目标函数的方式减少蒸馏损失。
具体地,蒸馏目标函数的具体确定方式,本领域技术人员可根据实际情况设置,此处不作限定。
在一种可能的实施方式中,蒸馏目标函数的具体确定方式如下:根据第一生成器和第二生成器中的至少一层中间特征图的相似度确定第一相似性度量函数;将第一生成器生成的虚假图片输入第一辨别器,得到第一辨别器中的至少一层的第一中间特征图;将第二生成器生成的虚假图片输入第一辨别器,得到第一辨别器中的至少一层的第二中间特征图;根据至少一层的第一中间特征图和至少一层的第二中间特征图的相似度确定第二相似性度量函数;根据第一相似性度量函数和第二相似性度量函数确定蒸馏目标函数。
具体地,第一生成器的中间特征图指的是第一生成器中某一层的输出信息;第二生成器的中间特征图指的是第二生成器中某一层的输出信息。
具体地,用于确定第一相似性度量函数的中间特征图具有如下特征:从第一生成器中获取的中间特征图与从第二生成器中获取的中间特征图一一对应,即当从第一生成器中获取某一层(例如第一层)的中间特征图时,需从第二生成器中获取与该中间特征图所在网络层数相同(例如第一层)的中间特征图。其中,从第一生成器以及第二生成器的哪些层中获取的中间特征图,本领域技术人员可根据实际情况设置,此处不作限定。
具体地,第一相似性度量函数用于表征第一生成器和第二生成器的中间层信息的逼近程度。第一相似性度量函数的具体确定方式本领域技术人员可根据实际情况设置。
在一种可能的实施方式中,根据第一生成器和第二生成器中的至少一层中间特征图的相似度确定第一相似性度量函数包括:将第一生成器的第i层的中间特征图和第二生成器的第i层的中间特征图输入相似性度量函数,得到第i层对应的第一子相似性度量函数;其中,i为正整数,且i从1-M中取值,M为第一生成器和第二生成器的层数;根据各层对应的第一子相似性度量函数确定第一相似性度量函数。
具体地,相似性度量函数的具体结构,本领域技术人员可根据实际情况设置,此处不作限定。在一种可能的实施方式中,相似性度量函数包括MSE损失函数和Texture损失函数。
示例性地,相似性度量函数如下:
Figure PCTCN2022119638-appb-000005
其中,
Figure PCTCN2022119638-appb-000006
O分别表示输入相似性度量函数的两个不同的中间特征图,
Figure PCTCN2022119638-appb-000007
代表中间特征图
Figure PCTCN2022119638-appb-000008
中第p个通道的特征和第q个通道的特征进行内积,G ij(O)代表中间特征图O中第p个通道的特征和第q个通道的特征进行内积,c l表示输入该相似性度量函数的中间特征图
Figure PCTCN2022119638-appb-000009
O的通道数量,
Figure PCTCN2022119638-appb-000010
表示MSE损失函数。
示例性地,第一相似性度量函数如下:
Figure PCTCN2022119638-appb-000011
其中,D 1表示第一相似性度量函数,L G表示从第一生成器中获取的中间特征图的数量,p(z)表示噪声分布,
Figure PCTCN2022119638-appb-000012
表示第i层对应的第一子相似性度量函数,
Figure PCTCN2022119638-appb-000013
表示从第二生成器的第i层获取的中间特征图,
Figure PCTCN2022119638-appb-000014
表示从第一生成器的第i层获取的中间特征图,
Figure PCTCN2022119638-appb-000015
表示可学习的1x1大小卷积层,用于将
Figure PCTCN2022119638-appb-000016
的通道数量转换成和
Figure PCTCN2022119638-appb-000017
的相同的通道数量。
具体地,第一中间特征图指的是将第一生成器生成的虚假图片输入第一辨别器时,第一辨别器中某一层的输出信息;第二中间特征图指的是将第二生成器生成的虚假图片输入第一辨别器时,第一辨别器中某一层的输出信息。
可以理解的是,相比于额外引入用来提取第一生成器和第二生成器的中间特征图的网络模型,本公开实施例中用来提取第一生成器和第二生成器的中间特征图的第一辨别器具有与第一生成器和第二生成器的生成任务相关性高、具有良好的辨别输入图像真伪的能力的优势。
具体地,第一中间特征图和第二中间特征图一一对应,即第一中间特征图为第一辨别器中某一层(例如第一层)的中间特征图时,与其对应的第二中间特征图为第二辨别器中该层(例如第一层)的中间特征图。其中,从第一辨别器以及第二辨别器的哪些层中获取的中间特征图,本领域技术人员可根据实际情况设置,此处不作限定。
具体地,第二相似性度量函数用于表征第一辨别器和第二辨别器的中间层信息的逼近程度。第二相似性度量函数的具体确定方式本领域技术人员可根据实际情况设置。
在一种可能的实施方式中,根据各层的第一中间特征图和第二中间特征图的相似度确定第二相似性度量函数包括:将第j层对应的第一中间特征图和第二中间特征图输入相似性度量函数,得到第j层对应的第二子相似性度量函数;其中,j为正整数,1≤j≤N,且j从1-N中取值,N为第一辨别器的层数;根据各层对应的第二子相似性度量函数确定第二相似性度量函数。
示例性地,第二相似性度量函数如下:
Figure PCTCN2022119638-appb-000018
其中,D 2表示第二相似性度量函数,L D表示从第一辨别器中获取的中间特征图的数量,p(z)表示噪声分布,
Figure PCTCN2022119638-appb-000019
表示第j层对应的第二子相似性度量函数,
Figure PCTCN2022119638-appb-000020
表示第j层对应的第一中间特征图,
Figure PCTCN2022119638-appb-000021
表示第j层对应的第二中间特征图。
具体地,根据第一相似性度量函数和第二相似性度量函数确定蒸馏目标函数的具体实施方式,本领域技术人员可根据实际情况设置,此处不作限定。
在一种可能的实施方式中,将第一相似性度量函数和第二相似性度量函数按照权重加 和确定蒸馏目标函数。
在一种可能的实施方式中,将第一相似性度量函数和第二相似性度量函数按照直接加和确定蒸馏目标函数。
示例性地,蒸馏目标函数如下:
Figure PCTCN2022119638-appb-000022
其中,
Figure PCTCN2022119638-appb-000023
表示蒸馏目标函数,D 1表示第一相似性度量函数,D 2表示第二相似性度量函数。
具体地,S330的具体实施方式本领域技术人员可根据实际情况设置,此处不限定。
在一种可能的实施方式中,将蒸馏目标函数、以及根据第二生成器的损失函数确定的目标函数分量按照直接进行加和,确定第二生成器的目标函数。
在一种可能的实施方式中,将蒸馏目标函数、以及根据第二生成器的损失函数确定的目标函数分量按照权重进行加和,确定第二生成器的目标函数。
示例性地,第二生成器的目标函数如下:
L G S=(L G S)′+γL distill
(L G S)′=E z~p(z)[f S G(-D S(G S(Z)))];
其中,(L G S)′表示根据第二生成器的损失函数确定的目标函数分量,本领域技术人员应当理解的是,当未将第一生成器和第一辨别器作为教师生成对抗网络时,(L G S)′可以为第二生成器的目标函数,如图2所示方法中的示例,因此,关于(L G S)′的具体解释,请见上文,此处不再赘述。L distill表示蒸馏目标函数,γ表示蒸馏目标函数的权重参数。
S340、根据第二辨别器关于真实图片的损失函数、以及第二辨别器关于虚假图片的损失函数确定第二辨别器的目标函数。
S350、根据第二生成器的目标函数、第二辨别器的目标函数、第二辨别器关于虚假图片的损失函数、第一生成器的目标函数、以及第一辨别器关于虚假图片的损失函数确定保留因子的目标函数。
S360、冻结第二辨别器中的各卷积核对应的保留因子,根据第二生成器的目标函数确定第二生成器的第二权重参数;根据第二辨别器的目标函数确定第二辨别器的第一权重参数。
S370、冻结第二辨别器的第一权重参数和第二生成器的第二权重参数,根据保留因子的目标函数确定各保留因子;当保留因子小于第二预设阈值时,确定保留因子为0;当保留因子大于或等于第二预设阈值时,确定保留因子为1。
具体地,第二预设阈值的具体至本领域技术人员可根据实际情况设置,此处不作限定。
可以理解的是,通过设置当保留因子的值小于第二预设阈值时,令该保留因子为0;当保留因子的值大于等于第二预设阈值时,令该保留因子为1,可加快将一部分第一辨别 器中的卷积核设置为激活状态、以及将另一部分第一辨别器中的卷积核设置为抑制状态的进程,从而提高网络模型的压缩效率。
S380、根据第二生成器的目标函数和第二辨别器关于虚假图片的损失函数确定第二生成器和第二辨别器之间的第二损失差。
S390、判断第一损失差和第二损失差之间的差值绝对值是否小于第一预设阈值。若是,结束训练;若否,返回执行S360。
本公开实施例,通过蒸馏方法同时利用第一生成器和第一辨别器中的中间层特征图作为额外的监督信息帮助第二生成器生成高质量的图像。如此,既能够得到轻量级的第二生成器,又可确保轻量级的第二生成器可以生成高质量图像。
图4是本公开实施例提供的一种图像生成方法的流程图,该方法可以由一种电子设备来执行。该电子设备可以示例性的理解为诸如平板电脑、笔记本电脑、台式机等具有计算功能的设备。如图4所示,本实施例提供的方法包括如下S410-S420:
S410、将随机噪声信号输入至第二生成器中,以使第二生成器根据随机噪声信号生成虚假图像。
S420、将虚假图像输入第二辨别器,以使第二辨别器辨别虚假图像为真后输出虚假图像。
其中,第二生成器和第二辨别器采用上述图1-图3中任一实施例的方法得到。
本公开实施例,用于生成虚假图像的第二生成器以及用于输出虚假图像的第二辨别器,采用了本公开实施例提供的网络模型的压缩方法得到,由于第二生成器和第二辨别器的比较轻量,因此即使部署在资源受限的边缘设备上,也不会产生较长的时延。
图5是本公开实施例提供的一种网络模型的压缩装置的结构示意图,该网络模型的压缩装置可以被理解为上述网络模型的压缩设备或者上述网络模型的压缩设备中的部分功能模块。该网络模型的压缩装置可以压缩待压缩网络模型,待压缩的网络模型包括第一生成器和第一辨别器,如图5所示,该网络模型的压缩装置500包括:剪枝模块和配置模块520;
剪枝模块510,配置为对所述第一生成器进行剪枝处理,得到第二生成器;
配置模块520,配置为对所述第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器;
其中,所述第一生成器和所述第一辨别器之间的损失差为第一损失差,所述第二生成器和所述第二辨别器之间的损失差为第二损失差,所述第一损失差和所述第二损失差之间的差值绝对值小于第一预设阈值。
在一种实施方式中,配置模块520包括:
第一确定子模块,可以配置为冻结所述第二辨别器中的各卷积核对应的保留因子,确定所述第二辨别器的第一权重参数;其中,所述保留因子用于表征其对应的卷积核的重要程度;
第二确定子模块,可以配置为冻结所述第二辨别器的第一权重参数和所述第二生成器的第二权重参数,确定各所述保留因子;
重复子模块,可以配置为重复执行上述确定所述第二辨别器的第一权重参数以及确定各所述保留因子的操作,直至所述第一损失差和所述第二损失差之间的差值绝对值小于所 述第一预设阈值。
在一种实施方式中,所述第一权重参数包括所述第二辨别器中除去所述保留因子之外的其它要素对应的权重参数。
在一种实施方式中,所述第二权重参数包括所述第二生成器中各要素对应的权重参数。
在一种实施方式中,第一确定子模块包括:
第一确定单元,可以配置为根据所述第二生成器的目标函数确定所述第二生成器的第二权重参数;
第二确定单元,可以配置为根据所述第二辨别器的目标函数确定所述第二辨别器的第一权重参数。
在一种实施方式中,该装置还包括:
第二生成器的目标函数确定模块,可以配置为,在所述根据所述第二生成器的目标函数确定所述第二生成器的第二权重参数之前,根据所述第二生成器的损失函数确定所述第二生成器的目标函数;
第二辨别器的目标函数确定模块,可以配置为根据所述第二辨别器关于真实图片的损失函数、以及所述第二辨别器关于虚假图片的损失函数确定所述第二辨别器的目标函数。
在一种实施方式中,该装置还包括教师和学生确定模块,可以配置为在所述根据所述第二生成器的损失函数确定所述第二生成器的目标函数之前,将所述第一生成器和所述第一辨别器作为教师生成对抗网络,将所述第二生成器和所述第二辨别器作为学生生成对抗网络;
第二生成器的目标函数确定模块,具体可以配置为根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及所述第二生成器的损失函数确定所述第二生成器的目标函数。
在一种实施方式中,第二生成器的目标函数确定模块,具体可以配置为将所述蒸馏目标函数、以及根据所述第二生成器的损失函数确定的目标函数分量,按照权重进行加和,确定所述第二生成器的目标函数。
在一种实施方式中,该装置还包括:
第一相似性度量函数确定模块,可以配置为在所述根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及所述第二生成器的损失函数确定所述第二生成器的目标函数之前,根据所述第一生成器和所述第二生成器中的至少一层中间特征图的相似度确定第一相似性度量函数;
第一中间特征图得到模块,可以配置为将所述第一生成器生成的虚假图片输入所述第一辨别器,得到所述第一辨别器中的至少一层的第一中间特征图;
第二中间特征图得到模块,可以配置为将所述第二生成器生成的虚假图片输入所述第一辨别器,得到所述第一辨别器中的至少一层的第二中间特征图;
第二相似性度量函数确定模块,可以配置为根据所述至少一层的所述第一中间特征图和所述至少一层的所述第二中间特征图的相似度确定第二相似性度量函数;
蒸馏目标函数确定模块,可以配置为根据所述第一相似性度量函数和所述第二相似性度量函数确定所述蒸馏目标函数。
在一种实施方式中,第一相似性度量函数确定模块包括:
第一子相似性度量函数确定子模块,可以配置为将所述第一生成器的第i层的中间特征图和所述第二生成器的第i层的中间特征图输入相似性度量函数,得到第i层对应的第一子相似性度量函数;其中,i为正整数,且i从1-M中取值,M为所述第一生成器和所述第二生成器的层数;
第一相似性度量函数确定子模块,可以配置为根据各层对应的所述第一子相似性度量函数确定所述第一相似性度量函数。
在一种实施方式中,第二相似性度量函数确定模块包括:
第二子相似性度量函数确定子模块,可以配置为将第j层对应的所述第一中间特征图和所述第二中间特征图输入相似性度量函数,得到第j层对应的第二子相似性度量函数;其中,j为正整数,1≤j≤N,且j从1-N中取值,N为所述第一辨别器的层数;
第二相似性度量函数确定子模块,可以配置为根据各层对应的所述第二子相似性度量函数确定所述第二相似性度量函数。
在一种实施方式中,第二确定子模块包括:
第三确定单元,可以配置为根据所述保留因子的目标函数确定各所述保留因子;
第四确定单元,可以配置为当所述保留因子小于第二预设阈值时,确定所述保留因子为0;
第五确定单元,可以配置为当所述保留因子大于或等于所述第二预设阈值时,确定所述保留因子为1。
在一种实施方式中,该装置还包括保留因子的目标函数确定模块,可以配置为在所述根据所述保留因子的目标函数确定各所述保留因子之前,根据所述第二生成器的目标函数、所述第二辨别器的目标函数、所述第二辨别器关于虚假图片的损失函数、所述第一生成器的目标函数、以及所述第一辨别器关于虚假图片的损失函数确定所述保留因子的目标函数。
本实施例提供的装置能够执行上述图1-图3中任一实施例的方法,其执行方式和有益效果类似,在这里不再赘述。
示例的,图6是本公开实施例中的一种网络模型的压缩设备的结构示意图。下面具体参考图6,其示出了适于用来实现本公开实施例中的网络模型的压缩设备600的结构示意图。本公开实施例中的网络模型的压缩设备600可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图6示出的网络模型的压缩设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,网络模型的压缩设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有网络模型的压缩设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄 像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许网络模型的压缩设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的网络模型的压缩设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述网络模型的压缩设备中所包含的;也可以是单独存在,而未装配入该网络模型的压缩设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该网络模型的压缩设备执行时,使得该网络模型的压缩设备:对第一生成器进行剪枝处理,得到第二生成器;对第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器;其中,第一生成器和第一辨别器之间的损失差为第一损失差,第二生成器和第二辨别器之间的损失差为第二损失差,第一损失差和 第二损失差之间的差值绝对值小于第一预设阈值。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
本公开实施例还提供一种计算机可读存储介质,所述存储介质中存储有计算机程序,当所述计算机程序被处理器执行时可以实现上述图1-图3中任一实施例的方法,其执行方式和有益效果类似,在这里不再赘述。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体 意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (18)

  1. 一种网络模型的压缩方法,待压缩的网络模型包括第一生成器和第一辨别器,所述方法包括:
    对所述第一生成器进行剪枝处理,得到第二生成器;
    对所述第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器;
    其中,所述第一生成器和所述第一辨别器之间的损失差为第一损失差,所述第二生成器和所述第二辨别器之间的损失差为第二损失差,所述第一损失差和所述第二损失差之间的差值绝对值小于第一预设阈值。
  2. 根据权利要求1所述的网络模型的压缩方法,其中,所述对所述第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器,包括:
    冻结所述第二辨别器中的各卷积核对应的保留因子,确定所述第二辨别器的第一权重参数;其中,所述保留因子用于表征其对应的卷积核的重要程度;
    冻结所述第二辨别器的第一权重参数和所述第二生成器的第二权重参数,确定各所述保留因子;
    重复执行上述确定所述第二辨别器的第一权重参数以及确定各所述保留因子的操作,直至所述第一损失差和所述第二损失差之间的差值绝对值小于所述第一预设阈值。
  3. 根据权利要求2所述的网络模型的压缩方法,其中,所述第一权重参数包括所述第二辨别器中除去所述保留因子之外的其它要素对应的权重参数。
  4. 根据权利要求2所述的网络模型的压缩方法,其中,所述第二权重参数包括所述第二生成器中各要素对应的权重参数。
  5. 根据权利要求2所述的网络模型的压缩方法,其中,所述确定所述第二辨别器的第一权重参数,包括:
    根据所述第二辨别器的目标函数确定所述第二辨别器的第一权重参数;
    所述方法还包括:
    根据所述第二生成器的目标函数确定所述第二生成器的第二权重参数。
  6. 根据权利要求5所述的网络模型的压缩方法,其中,所述根据所述第二生成器的目标函数确定所述第二生成器的第二权重参数之前,所述方法还包括:
    根据所述第二生成器的损失函数确定所述第二生成器的目标函数;
    根据所述第二辨别器关于真实图片的损失函数、以及所述第二辨别器关于虚假图片的损失函数确定所述第二辨别器的目标函数。
  7. 根据权利要求6所述的网络模型的压缩方法,其中,所述根据所述第二生成器的损失函数确定所述第二生成器的目标函数之前,所述方法还包括:
    将所述第一生成器和所述第一辨别器作为教师生成对抗网络,将所述第二生成器和所述第二辨别器作为学生生成对抗网络;
    所述根据所述第二生成器的损失函数确定所述第二生成器的目标函数,包括:
    根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及 所述第二生成器的损失函数确定所述第二生成器的目标函数。
  8. 根据权利要求7所述的网络模型的压缩方法,其中,所述根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及所述第二生成器的损失函数确定所述第二生成器的目标函数,包括:
    将所述蒸馏目标函数、以及根据所述第二生成器的损失函数确定的目标函数分量,按照权重进行加和,确定所述第二生成器的目标函数。
  9. 根据权利要求7所述的网络模型的压缩方法,其中,所述根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及所述第二生成器的损失函数确定所述第二生成器的目标函数之前,所述方法还包括:
    根据所述第一生成器和所述第二生成器中的至少一层中间特征图的相似度确定第一相似性度量函数;
    将所述第一生成器生成的虚假图片输入所述第一辨别器,得到所述第一辨别器中的至少一层的第一中间特征图;
    将所述第二生成器生成的虚假图片输入所述第一辨别器,得到所述第一辨别器中的至少一层的第二中间特征图;
    根据所述至少一层的所述第一中间特征图和所述至少一层的所述第二中间特征图的相似度确定第二相似性度量函数;
    根据所述第一相似性度量函数和所述第二相似性度量函数确定所述蒸馏目标函数。
  10. 根据权利要求9所述的网络模型的压缩方法,其中,所述根据所述第一生成器和所述第二生成器中的至少一层中间特征图的相似度确定第一相似性度量函数,包括:
    将所述第一生成器的第i层的中间特征图和所述第二生成器的第i层的中间特征图输入相似性度量函数,得到第i层对应的第一子相似性度量函数;其中,i为正整数,且i从1-M中取值,M为所述第一生成器和所述第二生成器的层数;
    根据各层对应的所述第一子相似性度量函数确定所述第一相似性度量函数。
  11. 根据权利要求9所述的网络模型的压缩方法,其中,所述根据所述至少一层的所述第一中间特征图和所述至少一层的所述第二中间特征图的相似度确定第二相似性度量函数,包括:
    将第j层对应的所述第一中间特征图和所述第二中间特征图输入相似性度量函数,得到第j层对应的第二子相似性度量函数;其中,j为正整数,1≤j≤N,且j从1-N中取值,N为所述第一辨别器的层数;
    根据各层对应的所述第二子相似性度量函数确定所述第二相似性度量函数。
  12. 根据权利要求2所述的网络模型的压缩方法,其中,所述确定各所述保留因子,包括:
    根据所述保留因子的目标函数确定各所述保留因子;
    当所述保留因子小于第二预设阈值时,确定所述保留因子为0;
    当所述保留因子大于或等于所述第二预设阈值时,确定所述保留因子为1。
  13. 根据权利要求12所述的网络模型的压缩方法,其中,所述根据所述保留因子的目标函数确定各所述保留因子之前,所述方法还包括:
    根据所述第二生成器的目标函数、所述第二辨别器的目标函数、所述第二辨别器关于虚假图片的损失函数、所述第一生成器的目标函数、以及所述第一辨别器关于虚假图片的损失函数确定所述保留因子的目标函数。
  14. 一种图像生成方法,包括:
    将随机噪声信号输入至第二生成器中,以使所述第二生成器根据所述随机噪声信号生成虚假图像;
    将所述虚假图像输入所述第二辨别器,以使所述第二辨别器辨别所述虚假图像为真后输出所述虚假图像;
    其中,所述第二生成器和所述第二辨别器采用如权利要求1-13中任一所述的方法得到。
  15. 一种网络模型的压缩装置,待压缩的网络模型包括第一生成器和第一辨别器,所述装置包括:剪枝模块和配置模块;
    所述剪枝模块,配置为对所述第一生成器进行剪枝处理,得到第二生成器;
    所述配置模块,配置为对所述第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器;
    其中,所述第一生成器和所述第一辨别器之间的损失差为第一损失差,所述第二生成器和所述第二辨别器之间的损失差为第二损失差,所述第一损失差和所述第二损失差之间的差值绝对值小于第一预设阈值。
  16. 一种网络模型的压缩设备,包括:
    存储器,所述存储器中存储有计算机程序;
    处理器,用于执行所述计算机程序,当所述计算机程序被所述处理器执行时,所述处理器执行如权利要求1-13中任一项所述的方法。
  17. 一种计算机可读存储介质,所述存储介质中存储有计算机程序,当所述计算机程序被处理器执行时,实现如权利要求1-13中任一项所述的方法。
  18. 一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行如权利要求1-13中任一项所述的方法的程序代码。
PCT/CN2022/119638 2021-09-24 2022-09-19 网络模型的压缩方法、图像生成方法、装置、设备及介质 WO2023045870A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22871916.7A EP4339836A1 (en) 2021-09-24 2022-09-19 Network model compression method, apparatus and device, image generation method, and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111122307.5A CN113780534B (zh) 2021-09-24 2021-09-24 网络模型的压缩方法、图像生成方法、装置、设备及介质
CN202111122307.5 2021-09-24

Publications (1)

Publication Number Publication Date
WO2023045870A1 true WO2023045870A1 (zh) 2023-03-30

Family

ID=78853212

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/119638 WO2023045870A1 (zh) 2021-09-24 2022-09-19 网络模型的压缩方法、图像生成方法、装置、设备及介质

Country Status (3)

Country Link
EP (1) EP4339836A1 (zh)
CN (1) CN113780534B (zh)
WO (1) WO2023045870A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780534B (zh) * 2021-09-24 2023-08-22 北京字跳网络技术有限公司 网络模型的压缩方法、图像生成方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084281A (zh) * 2019-03-31 2019-08-02 华为技术有限公司 图像生成方法、神经网络的压缩方法及相关装置、设备
CN110796251A (zh) * 2019-10-28 2020-02-14 天津大学 基于卷积神经网络的图像压缩优化方法
CN110796619A (zh) * 2019-10-28 2020-02-14 腾讯科技(深圳)有限公司 一种图像处理模型训练方法、装置、电子设备及存储介质
US20200302295A1 (en) * 2019-03-22 2020-09-24 Royal Bank Of Canada System and method for knowledge distillation between neural networks
CN113780534A (zh) * 2021-09-24 2021-12-10 北京字跳网络技术有限公司 网络模型的压缩方法、图像生成方法、装置、设备及介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423700B (zh) * 2017-07-17 2020-10-20 广州广电卓识智能科技有限公司 人证核实的方法及装置
CN109978792A (zh) * 2019-03-28 2019-07-05 厦门美图之家科技有限公司 一种生成图像增强模型的方法
CN113313133A (zh) * 2020-02-25 2021-08-27 武汉Tcl集团工业研究院有限公司 一种生成对抗网络的训练方法、动画图像生成方法
CN112052948B (zh) * 2020-08-19 2023-11-14 腾讯科技(深圳)有限公司 一种网络模型压缩方法、装置、存储介质和电子设备
CN113159173B (zh) * 2021-04-20 2024-04-26 北京邮电大学 一种结合剪枝与知识蒸馏的卷积神经网络模型压缩方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200302295A1 (en) * 2019-03-22 2020-09-24 Royal Bank Of Canada System and method for knowledge distillation between neural networks
CN110084281A (zh) * 2019-03-31 2019-08-02 华为技术有限公司 图像生成方法、神经网络的压缩方法及相关装置、设备
CN110796251A (zh) * 2019-10-28 2020-02-14 天津大学 基于卷积神经网络的图像压缩优化方法
CN110796619A (zh) * 2019-10-28 2020-02-14 腾讯科技(深圳)有限公司 一种图像处理模型训练方法、装置、电子设备及存储介质
CN113780534A (zh) * 2021-09-24 2021-12-10 北京字跳网络技术有限公司 网络模型的压缩方法、图像生成方法、装置、设备及介质

Also Published As

Publication number Publication date
CN113780534B (zh) 2023-08-22
CN113780534A (zh) 2021-12-10
EP4339836A1 (en) 2024-03-20

Similar Documents

Publication Publication Date Title
WO2020155907A1 (zh) 用于生成漫画风格转换模型的方法和装置
WO2023273985A1 (zh) 一种语音识别模型的训练方法、装置及设备
CN112364860B (zh) 字符识别模型的训练方法、装置和电子设备
WO2023273579A1 (zh) 模型的训练方法、语音识别方法、装置、介质及设备
WO2022227886A1 (zh) 超分修复网络模型生成方法、图像超分修复方法及装置
WO2022247562A1 (zh) 多模态数据检索方法、装置、介质及电子设备
WO2023273611A1 (zh) 语音识别模型的训练方法、语音识别方法、装置、介质及设备
WO2020207174A1 (zh) 用于生成量化神经网络的方法和装置
WO2023143016A1 (zh) 特征提取模型的生成方法、图像特征提取方法和装置
WO2023143178A1 (zh) 对象分割方法、装置、设备及存储介质
WO2023005386A1 (zh) 模型训练方法和装置
WO2023088280A1 (zh) 意图识别方法、装置、可读介质及电子设备
WO2022171036A1 (zh) 视频目标追踪方法、视频目标追踪装置、存储介质及电子设备
CN113033580B (zh) 图像处理方法、装置、存储介质及电子设备
WO2023116138A1 (zh) 多任务模型的建模方法、推广内容处理方法及相关装置
WO2023273610A1 (zh) 语音识别方法、装置、介质及电子设备
WO2023051238A1 (zh) 动物形象的生成方法、装置、设备及存储介质
WO2023045870A1 (zh) 网络模型的压缩方法、图像生成方法、装置、设备及介质
CN111597825A (zh) 语音翻译方法、装置、可读介质及电子设备
WO2023211369A2 (zh) 语音识别模型的生成方法、识别方法、装置、介质及设备
CN110009101B (zh) 用于生成量化神经网络的方法和装置
JP2022541832A (ja) 画像を検索するための方法及び装置
WO2022012178A1 (zh) 用于生成目标函数的方法、装置、电子设备和计算机可读介质
CN114420135A (zh) 基于注意力机制的声纹识别方法及装置
WO2023185515A1 (zh) 特征提取方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22871916

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022871916

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 18571116

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2022871916

Country of ref document: EP

Effective date: 20231214

ENP Entry into the national phase

Ref document number: 2022871916

Country of ref document: EP

Effective date: 20231214

NENP Non-entry into the national phase

Ref country code: DE