WO2023045870A1 - 网络模型的压缩方法、图像生成方法、装置、设备及介质 - Google Patents
网络模型的压缩方法、图像生成方法、装置、设备及介质 Download PDFInfo
- Publication number
- WO2023045870A1 WO2023045870A1 PCT/CN2022/119638 CN2022119638W WO2023045870A1 WO 2023045870 A1 WO2023045870 A1 WO 2023045870A1 CN 2022119638 W CN2022119638 W CN 2022119638W WO 2023045870 A1 WO2023045870 A1 WO 2023045870A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- generator
- discriminator
- function
- objective function
- loss
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 93
- 230000006835 compression Effects 0.000 title claims abstract description 46
- 238000007906 compression Methods 0.000 title claims abstract description 46
- 238000013138 pruning Methods 0.000 claims abstract description 21
- 238000012545 processing Methods 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims description 307
- 230000014759 maintenance of location Effects 0.000 claims description 87
- 238000011524 similarity measure Methods 0.000 claims description 40
- 238000004821 distillation Methods 0.000 claims description 31
- 238000005259 measurement Methods 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 25
- 230000008014 freezing Effects 0.000 claims description 6
- 238000007710 freezing Methods 0.000 claims description 6
- 230000000875 corresponding effect Effects 0.000 description 49
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 238000012549 training Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 238000009826 distribution Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000005315 distribution function Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Definitions
- the present disclosure relates to the field of computer technology, and in particular to a network model compression method, image generation method, device, device and medium.
- GAN Generative Adversarial Networks
- the present disclosure provides a network model compression method, image generation method, device, device and medium.
- the first aspect of the embodiments of the present disclosure provides a method for compressing a network model.
- the network model to be compressed includes a first generator and a first discriminator.
- the method includes:
- the loss difference between the first generator and the first discriminator is a first loss difference
- the loss difference between the second generator and the second discriminator is a second loss difference
- the absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold
- the state of the convolution kernels in the first discriminator is configured so that a part of the convolution kernels is in an active state, and another part of the convolution kernels is in a suppressed state, so that the second discriminator ,include:
- the first weight parameter includes weight parameters corresponding to other elements in the second discriminator except the retention factor.
- the second weight parameter includes weight parameters corresponding to each element in the second generator.
- the determining the first weight parameter of the second discriminator includes:
- the method also includes:
- a second weight parameter of the second generator is determined according to an objective function of the second generator.
- the method before determining the second weight parameter of the second generator according to the objective function of the second generator, the method further includes:
- the objective function of the second discriminator is determined according to the loss function of the second discriminator with respect to real pictures and the loss function of the second discriminator with respect to fake pictures.
- the method before determining the objective function of the second generator according to the loss function of the second generator, the method further includes:
- the determining the objective function of the second generator according to the loss function of the second generator includes:
- the objective function of the second generator is determined according to the distillation objective function between the teacher generative adversarial network and the student generative adversarial network, and the loss function of the second generator.
- the objective function of the second generator is determined according to the distillation objective function between the teacher generation confrontation network and the student generation confrontation network, and the loss function of the second generator ,include:
- the objective function of the second generator is determined according to the distillation objective function between the teacher generation confrontation network and the student generation confrontation network, and the loss function of the second generator Previously, the method further included:
- the distillation objective function is determined according to the first similarity measure function and the second similarity measure function.
- the determining the first similarity metric function according to the similarity of at least one layer of intermediate feature maps in the first generator and the second generator includes:
- the first similarity measure function is determined according to the first sub-similarity measure function corresponding to each layer.
- the determining a second similarity metric function according to the similarity between the first intermediate feature map of the at least one layer and the second intermediate feature map of the at least one layer includes:
- the second similarity measure function is determined according to the second sub-similarity measure function corresponding to each layer.
- said determining each of said retention factors comprises:
- the retention factor is greater than or equal to the second preset threshold, it is determined that the retention factor is 1.
- the method before determining each of the retention factors according to the objective function of the retention factors, the method further includes:
- the objective function of the second generator determines an objective function of the retention factor.
- a second aspect of an embodiment of the present disclosure provides an image generation method, the method comprising:
- the second generator and the second discriminator are obtained by using the method described in the first aspect.
- a third aspect of an embodiment of the present disclosure provides a device for compressing a network model.
- the network model to be compressed includes a first generator and a first discriminator, and the device includes: a pruning module and a configuration module;
- the pruning module is configured to perform pruning processing on the first generator to obtain a second generator
- the configuration module is configured to configure the states of the convolution kernels in the first discriminator, so that a part of the convolution kernels is in an active state, and another part of the convolution kernels is in an inhibited state, so as to obtain a second discriminator;
- the loss difference between the first generator and the first discriminator is a first loss difference
- the loss difference between the second generator and the second discriminator is a second loss difference
- the absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold
- a fourth aspect of the embodiments of the present disclosure provides a network model compression device, the device includes:
- a processor configured to execute the computer program, and when the computer program is executed by the processor, the processor executes the method as described in the first aspect.
- a fifth aspect of the embodiments of the present disclosure provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method described in the first aspect is implemented.
- a sixth aspect of the embodiments of the present disclosure provides a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, where the computer program includes program code for executing the method as described in the first aspect .
- the second generator is obtained by pruning the first generator; and the states of the convolution kernels in the first discriminator are configured so that some of the convolution kernels are active, and the other The convolution kernel is suppressed, and the second discriminator is obtained; where the loss difference between the first generator and the first discriminator is the first loss difference, and the loss difference between the second generator and the second discriminator is The second loss difference, the absolute value of the difference between the first loss difference and the second loss difference is less than a first preset threshold. Since the embodiments of the present disclosure can cooperate to compress the generator and the discriminator, the compressed generator and the compressed discriminator can maintain Nash equilibrium, avoiding the phenomenon of mode collapse.
- FIG. 1 is a flowchart of a network model compression method provided by an embodiment of the present disclosure
- Fig. 2 is a flow chart of a network model compression method provided by an embodiment of the present disclosure
- Fig. 3 is a flowchart of a network model compression method provided by an embodiment of the present disclosure
- FIG. 4 is a flow chart of an image generation method provided by an embodiment of the present disclosure.
- Fig. 5 is a schematic structural diagram of a network model compression device provided by an embodiment of the present disclosure.
- Fig. 6 is a schematic structural diagram of a network model compression device in an embodiment of the present disclosure.
- the embodiment of the present disclosure provides a network model compression method, by pruning the first generator to obtain the second compression, and configuring the state of the convolution kernel in the first discriminator, Make some of the convolution kernels in the active state, and the other part of the convolution kernels in the suppressed state to obtain the second discriminator, so that the first loss difference between the first generator and the first discriminator is the same as that of the second generator and the second discriminator
- the second loss difference of is similar, so that the Nash equilibrium can be maintained between the second generator and the second discriminator, and the mode collapse phenomenon can be prevented.
- Fig. 1 is a flowchart of a network model compression method provided by an embodiment of the present disclosure, and the method may be executed by a network model compression device.
- the compression device of the network model can be understood as a device with computing functions such as a tablet computer, a notebook computer, a desktop computer, etc. by way of example.
- the method can compress the network model to be compressed including the first generator and the first discriminator. As shown in Figure 1, the method provided in this embodiment includes the following S110-S120:
- performing pruning on the first generator includes: selectively deleting convolution kernels in the first generator, so that convolution kernels whose importance is less than a preset importance threshold are deleted , the convolution kernels whose importance is greater than or equal to the preset importance threshold are retained.
- each convolutional layer (Convolutional layer, CL) in the first generator is correspondingly provided with a batch normalization (Batch Normalization, BN) layer, and each CL includes at least one convolutional kernel.
- Each convolution kernel is correspondingly provided with a scaling factor in the BN layer corresponding to its CL, wherein the scaling factor is used to characterize the importance of its corresponding convolution kernel.
- each training includes according to Determine the scaling factor.
- sort the scaling factors of each convolution kernel from small to large and use binary search to delete the convolution kernel corresponding to the smaller scaling factor until the calculation amount of the first generator Satisfy the preset given calculation amount.
- each CL in the first generator includes at least one convolution kernel, and each convolution kernel is correspondingly set with a weight parameter, wherein the weight parameter of the convolution kernel is used to characterize the corresponding convolution kernel Importance.
- the first generator and the first discriminator are trained, specifically, each training includes according to Determine the weight parameter. When the number of training times is reached, sort the weight parameters of each convolution kernel from small to large, and use binary search to delete the convolution kernel corresponding to the smaller weight parameter, until the calculation amount of the first generator Satisfy the preset given calculation amount.
- the specific number of training times and the specific value of the preset given calculation amount can be set by those skilled in the art according to the actual situation, and are not limited here.
- the convolution kernel in the suppressed state does not work, and the convolution kernel in the activated state works normally.
- the loss difference between the first generator and the first discriminator is the first loss difference
- the loss difference between the second generator and the second discriminator is the second loss difference
- the first loss difference and the second loss The absolute value of the difference between the differences is smaller than a first preset threshold.
- the first loss difference is used to characterize the difference between the performance of the first generator and the performance of the first discriminator, which can be calculated according to the objective function of the first generator and the loss function of the first discriminator for fake pictures out.
- the second loss difference is used to characterize the difference between the performance of the second generator and the performance of the second discriminator, which can be calculated according to the objective function of the second generator and the loss function of the second discriminator for the false picture out.
- the network model to be compressed is a trained GAN
- the first generator and the first discriminator are in a state of Nash equilibrium.
- the first loss difference can be close to the second loss difference, that is, the Nash equilibrium can be maintained between the second generator and the second discriminator, so that it can prevent Mode collapse occurs.
- those skilled in the art may set the specific implementation manner of configuring the state of the convolution kernel in the first discriminator according to the actual situation, which is not limited here.
- S120 includes: freezing the retention factors corresponding to each convolution kernel in the second discriminator, determining the first weight parameter of the second discriminator; freezing the first weight parameter and the second discriminator 2.
- the second weight parameter of the generator determines each retention factor; repeat the above-mentioned operations of determining the first weight parameter of the second discriminator and determining each retention factor until the difference between the first loss difference and the second loss difference The absolute value is smaller than the first preset threshold.
- the retention factor is used to characterize the importance of its corresponding convolution kernel.
- a retention factor is configured for each convolution kernel in the first discriminator to obtain a second discriminator, and the initial value of the retention factor corresponding to each convolution kernel may be 1.
- the value of the retention factor corresponding to the convolution kernel in the activated state may be 1
- the value of the retention factor corresponding to the convolution kernel in the suppressed state may be 0.
- the first weight parameter mentioned here includes weight parameters corresponding to other elements in the second discriminator except the retention factor corresponding to the convolution kernel.
- the second weight parameter includes weight parameters corresponding to each element (such as a convolution kernel) in the second generator.
- determining the first weight parameter of the second discriminator includes: determining the first weight parameter of the second discriminator according to an objective function of the second discriminator. In a possible implementation, before determining the first weight parameter of the second discriminator according to the objective function of the second discriminator, it may further include determining the second weight parameter of the second generator according to the objective function of the second generator .
- each retention factor is determined according to an objective function of the retention factor.
- Exemplarily first, keeping the retention factor constant, by optimizing the objective function of the second generator, determine the second weight parameter of the second generator; by optimizing the objective function of the second discriminator, determine the second weight parameter of the second discriminator A weight parameter. Then, keep the first weight parameter of the second discriminator and the second weight parameter of the second generator unchanged, and determine the retention factor corresponding to each convolution kernel by optimizing the objective function of the retention factor.
- the training can be ended; when the absolute value of the difference between the first loss difference and the second loss difference is detected Greater than or equal to the first preset threshold, return to continue the operations of "determine the second weight parameter of the second generator, determine the first weight parameter of the second discriminator” and “determine the retention factor corresponding to each convolution kernel” until The absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold.
- the second generator is obtained by pruning the first generator; and the states of the convolution kernels in the first discriminator are configured so that some of the convolution kernels are active, and the other The convolution kernel is suppressed, and the second discriminator is obtained; where the loss difference between the first generator and the first discriminator is the first loss difference, and the loss difference between the second generator and the second discriminator is The second loss difference, the absolute value of the difference between the first loss difference and the second loss difference is less than a first preset threshold. Since the embodiments of the present disclosure can cooperate to compress the generator and the discriminator, the compressed generator and the compressed discriminator can maintain Nash equilibrium, avoiding the phenomenon of mode collapse.
- Fig. 2 is a flow chart of a network model compression method provided by an embodiment of the present disclosure. As shown in Fig. 2, the method provided by this embodiment includes the following S210-S280:
- S220 can be set by those skilled in the art according to actual conditions, and is not limited here.
- the objective function of the second generator is as follows:
- L G S represents the objective function of the second generator
- G S (Z) represents the false picture generated by the second generator according to the noise signal
- D S (G S (Z)) represents the The response value of the fake picture generated by the generator
- f S G (-D S ( GS (Z))) represents the loss function of the second generator
- E(*) represents the expected value of the distribution function
- p(z) represents the noise distribution .
- the objective function of the second discriminator is as follows:
- L D S represents the objective function of the second discriminator
- D(x) represents the response value of the second discriminator to the real picture
- f S D (-D(x)) represents the loss of the second discriminator on the real picture function
- E(*) represents the expected value of the distribution function
- pdata represents the distribution of the real picture
- G S (Z) represents the false picture generated by the second generator based on the noise signal
- D S (G S (Z)) represents the second discrimination
- f S D (D S (G S (Z))) represents the loss function of the second discriminator on the fake picture generated by the second generator
- p(z) represents the noise distribution.
- the objective function of the retention factor is as follows:
- L arch L D S +
- L arch represents the objective function of the retention factor
- L G S represents the objective function of the second generator
- L D S represents the objective function of the second discriminator
- L S Dfake represents the loss function of the second discriminator about the fake picture
- L G T represents the objective function of the first generator
- L T Dfake represents the loss function of the first discriminator about the false picture
- G T (Z) represents the false picture generated by the first generator according to the noise signal
- D T (G T (Z)) represents the response value of the first discriminator to the false picture generated by the first generator
- f T G (-D T (G T (Z))) represents the loss function of the first generator
- f T D ( DT (G T (Z))) represents the loss function of the first discriminator with respect to the fake image generated by the first generator
- E(*) represents the expected value of the distribution function
- p(z) represents the noise distribution.
- the retention factors corresponding to each convolution kernel are kept unchanged, and the second weight parameter of the second generator is updated so that the objective function L G S of the second generator becomes smaller, thereby determining the second generator's second weight parameter
- Two weight parameters updating the first weight parameter of the second discriminator so that the objective function L D S of the second discriminator becomes smaller, thereby determining the first weight parameter of the second discriminator.
- S260 can be set by those skilled in the art according to actual conditions, and is not limited here.
- the second weight parameter of the second generator and the first weight parameter of the second discriminator are kept unchanged, and the retention factors corresponding to each convolution kernel are updated, so that the objective function L arch of the retention factor becomes smaller, so that Determine each retention factor.
- the specific implementation manner of S270 can be set by those skilled in the art according to actual conditions, and is not limited here.
- the absolute value of the difference between the objective function of the second generator and the loss function of the second discriminator about the fake picture is used as the second loss difference.
- the second loss difference is as follows:
- ⁇ L S represents the second loss difference
- S280 Determine whether the absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold. If yes, end the training; if not, return to S250.
- S280 can be set by those skilled in the art according to actual conditions, and is not limited here.
- the absolute value of the difference between the objective function of the first generator and the loss function of the false image of the first discriminator is used as the first loss difference; the first loss difference and the second loss difference are calculated The absolute value of the difference; determine whether the absolute value of the difference between the first loss difference and the second loss difference is less than a first preset threshold.
- the first loss difference is as follows:
- ⁇ L T represents the first loss difference
- L G T and L T Dfake can be referred to above, and will not be repeated here.
- the objective function of the second generator is determined according to the loss function of the second generator, and the second weight parameter of the second generator is determined according to the objective function of the second generator;
- the loss function of the second discriminator and the loss function of the second discriminator about the fake picture determine the objective function of the second discriminator, and determine the first weight parameter of the second discriminator according to the objective function of the second discriminator; according to the first generator's
- the objective function, the loss function of the first discriminator on the fake picture, the objective function of the second generator, the objective function of the second discriminator, and the loss function of the second discriminator on the fake picture determine the objective function of the retention factor, and according to
- the objective function of the retention factor determines each retention factor, which can reduce the loss function of the second generator and the loss function of the second discriminator on false pictures in the process of approaching the second loss difference to the first loss difference, and improve the second The loss function of the second discriminator with respect to real images, thereby optimizing the performance of the second generator and the second discriminator.
- Fig. 3 is a flow chart of a network model compression method provided by an embodiment of the present disclosure. As shown in Fig. 3, the method provided by this embodiment includes the following S310-S390:
- the first generator and the first discriminator are well-trained network models with good accuracy and stability. Using them as a teacher to generate an adversarial network to guide students to generate an adversarial network is beneficial to improve the performance of the second generator. performance. There is usually a distillation loss when the teacher generates an adversarial network to guide the student to generate an adversarial network. The distillation loss can be reduced by optimizing the distillation objective function.
- the specific determination method of the distillation objective function can be set by those skilled in the art according to the actual situation, which is not limited here.
- the specific determination method of the distillation objective function is as follows: determine the first similarity metric function according to the similarity of at least one layer of intermediate feature maps in the first generator and the second generator; The fake picture generated by the generator is input into the first discriminator to obtain the first intermediate feature map of at least one layer in the first discriminator; the fake picture generated by the second generator is input to the first discriminator to obtain the first intermediate feature map in the first discriminator The second intermediate feature map of at least one layer; determine the second similarity metric function according to the similarity between the first intermediate feature map of at least one layer and the second intermediate feature map of at least one layer; according to the first similarity metric function and The second similarity measure function determines the distillation objective function.
- the intermediate feature map of the first generator refers to the output information of a certain layer in the first generator
- the intermediate feature map of the second generator refers to the output information of a certain layer in the second generator.
- the intermediate feature map used to determine the first similarity measure function has the following characteristics: the intermediate feature map obtained from the first generator has a one-to-one correspondence with the intermediate feature map obtained from the second generator, that is, when obtained from When obtaining an intermediate feature map of a certain layer (for example, the first layer) in the first generator, it is necessary to obtain an intermediate feature map with the same number of network layers as the intermediate feature map (for example, the first layer) from the second generator.
- Those skilled in the art can set the intermediate feature maps obtained from which layers of the first generator and the second generator according to the actual situation, which is not limited here.
- the first similarity metric function is used to characterize the degree of approximation of the intermediate layer information of the first generator and the second generator.
- a specific manner of determining the first similarity measure function can be set by those skilled in the art according to actual conditions.
- determining the first similarity metric function according to the similarity of at least one layer of intermediate feature maps in the first generator and the second generator includes: taking the middle of the i-th layer of the first generator The feature map and the intermediate feature map of the i-th layer of the second generator are input to the similarity measurement function to obtain the first sub-similarity measurement function corresponding to the i-th layer; wherein, i is a positive integer, and i is taken from 1-M value, M is the number of layers of the first generator and the second generator; the first similarity measurement function is determined according to the first sub-similarity measurement function corresponding to each layer.
- the similarity measurement function includes an MSE loss function and a Texture loss function.
- the similarity measurement function is as follows:
- O denote two different intermediate feature maps of the input similarity measure function
- G ij (O) represents the inner product of the feature of the pth channel in the intermediate feature map O and the feature of the qth channel
- c l represents the input
- the intermediate feature map of this similarity measure function The number of channels of O, Represents the MSE loss function.
- the first similarity measurement function is as follows:
- D denotes the first similarity measure function
- L G denotes the number of intermediate feature maps obtained from the first generator
- p(z) denotes the noise distribution
- Indicates the first sub-similarity measure function corresponding to the i-th layer Denotes the intermediate feature map obtained from the i-th layer of the second generator, Denotes the intermediate feature map obtained from the i-th layer of the first generator, Represents a learnable 1x1 size convolutional layer for The number of channels converted to and the same number of channels.
- the first intermediate feature map refers to the output information of a certain layer in the first discriminator when the fake picture generated by the first generator is input to the first discriminator;
- the second intermediate feature map refers to the output information of the second When the fake image generated by the generator is input to the first discriminator, the output information of a certain layer in the first discriminator.
- the first discriminator of the graph has the advantage of being highly correlated with the generation tasks of the first generator and the second generator, and having a good ability to distinguish the authenticity of the input image.
- first intermediate feature map there is a one-to-one correspondence between the first intermediate feature map and the second intermediate feature map, that is, when the first intermediate feature map is the intermediate feature map of a certain layer (for example, the first layer) in the first discriminator, the corresponding second intermediate feature map
- the intermediate feature map is the intermediate feature map of this layer (eg, the first layer) in the second discriminator.
- Those skilled in the art can set the intermediate feature maps obtained from the layers of the first discriminator and the second discriminator according to the actual situation, which is not limited here.
- the second similarity metric function is used to characterize the degree of approximation of the intermediate layer information of the first discriminator and the second discriminator.
- a specific manner of determining the second similarity measure function can be set by those skilled in the art according to actual conditions.
- determining the second similarity metric function according to the similarity between the first intermediate feature map and the second intermediate feature map of each layer includes: combining the first intermediate feature map and the second intermediate feature map corresponding to the jth layer The intermediate feature map is input to the similarity measurement function to obtain the second sub-similarity measurement function corresponding to the jth layer; where j is a positive integer, 1 ⁇ j ⁇ N, and j takes a value from 1-N, and N is the first The number of layers of the discriminator; the second similarity measurement function is determined according to the second sub-similarity measurement function corresponding to each layer.
- the second similarity measurement function is as follows:
- D2 denotes the second similarity measure function
- LD denotes the number of intermediate feature maps obtained from the first discriminator
- p(z) denotes the noise distribution
- Indicates the second sub-similarity metric function corresponding to the jth layer Indicates the first intermediate feature map corresponding to the jth layer, Indicates the second intermediate feature map corresponding to the jth layer.
- the specific implementation manner of determining the distillation objective function according to the first similarity metric function and the second similarity metric function can be set by those skilled in the art according to the actual situation, which is not limited here.
- the distillation objective function is determined by summing the first similarity metric function and the second similarity metric function according to weights.
- the distillation objective function is determined by directly summing the first similarity measure function and the second similarity measure function.
- distillation objective function is as follows:
- D1 represents the first similarity measure function
- D2 represents the second similarity measure function
- the distillation objective function and the objective function components determined according to the loss function of the second generator are directly summed to determine the objective function of the second generator.
- the distillation objective function and the objective function components determined according to the loss function of the second generator are summed according to weights to determine the objective function of the second generator.
- the objective function of the second generator is as follows:
- (L G S )' represents the objective function component determined according to the loss function of the second generator.
- (L G S )′ can be the objective function of the second generator, as shown in Figure 2, the example in the method, therefore, for the specific explanation of (L G S )′, please refer to the above, and will not repeat it here .
- L distill represents the distillation objective function
- ⁇ represents the weight parameter of the distillation objective function.
- S340 Determine an objective function of the second discriminator according to the loss function of the second discriminator with respect to the real picture and the loss function of the second discriminator with respect to the fake picture.
- S350 determine according to the objective function of the second generator, the objective function of the second discriminator, the loss function of the second discriminator about the fake picture, the objective function of the first generator, and the loss function of the first discriminator about the fake picture Objective function for the retention factor.
- the second preset threshold can be set by those skilled in the art according to actual conditions, and is not limited here.
- the process can be accelerated.
- S390 Determine whether the absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold. If yes, end the training; if not, return to S360.
- the intermediate layer feature maps in the first generator and the first discriminator are used as additional supervision information to help the second generator generate high-quality images through the distillation method. In this way, a lightweight second generator can be obtained, and it can be ensured that the lightweight second generator can generate high-quality images.
- Fig. 4 is a flowchart of an image generation method provided by an embodiment of the present disclosure, and the method may be executed by an electronic device.
- the electronic device can be interpreted as a device with a computing function, such as a tablet computer, a notebook computer, a desktop computer, etc. by way of example.
- the method provided in this embodiment includes the following S410-S420:
- the second generator and the second discriminator are obtained by using the method in any one of the above-mentioned Fig. 1-Fig. 3 embodiments.
- the second generator for generating false images and the second discriminator for outputting false images are obtained by using the network model compression method provided in the embodiments of the present disclosure. Since the second generator and the second The discriminator is relatively lightweight, so even if it is deployed on resource-constrained edge devices, it will not generate long delays.
- Fig. 5 is a schematic structural diagram of an apparatus for compressing a network model provided by an embodiment of the present disclosure.
- the apparatus for compressing a network model may be understood as the above-mentioned network model compression device or some functional modules in the above-mentioned network model compression device.
- the compression device of the network model can compress the network model to be compressed, the network model to be compressed includes a first generator and a first discriminator, as shown in Figure 5, the compression device 500 of the network model includes: a pruning module and a configuration module 520;
- the pruning module 510 is configured to perform pruning processing on the first generator to obtain a second generator
- the configuration module 520 is configured to configure the states of the convolution kernels in the first discriminator, so that a part of the convolution kernels is in an active state, and another part of the convolution kernels is in an inhibited state, so as to obtain a second discriminator;
- the loss difference between the first generator and the first discriminator is a first loss difference
- the loss difference between the second generator and the second discriminator is a second loss difference
- the absolute value of the difference between the first loss difference and the second loss difference is smaller than a first preset threshold
- the configuration module 520 includes:
- the first determining submodule can be configured to freeze the retention factors corresponding to the convolution kernels in the second discriminator, and determine the first weight parameter of the second discriminator; wherein the retention factor is used to characterize its The importance of the corresponding convolution kernel;
- the second determining submodule may be configured to freeze the first weight parameter of the second discriminator and the second weight parameter of the second generator, and determine each of the retention factors;
- the repetition submodule may be configured to repeat the above-mentioned operations of determining the first weight parameter of the second discriminator and determining each of the retention factors until the difference between the first loss difference and the second loss difference is The absolute value of the value is smaller than the first preset threshold.
- the first weight parameter includes weight parameters corresponding to other elements in the second discriminator except the retention factor.
- the second weight parameter includes weight parameters corresponding to each element in the second generator.
- the first determining submodule includes:
- the first determining unit may be configured to determine a second weight parameter of the second generator according to an objective function of the second generator
- the second determining unit may be configured to determine a first weight parameter of the second discriminator according to an objective function of the second discriminator.
- the device also includes:
- the objective function determination module of the second generator may be configured to, before determining the second weight parameter of the second generator according to the objective function of the second generator, according to the loss of the second generator a function determining an objective function of said second generator;
- the objective function determining module of the second discriminator may be configured to determine the objective function of the second discriminator according to the loss function of the second discriminator on real pictures and the loss function of the second discriminator on fake pictures .
- the device further includes a teacher and student determination module, which can be configured to combine the first The generator and the first discriminator are used as a teacher to generate an adversarial network, and the second generator and the second discriminator are used as a student to generate an adversarial network;
- the objective function determination module of the second generator can be specifically configured to determine the second The objective function of the generator.
- the objective function determination module of the second generator may specifically be configured to add the distillation objective function and the objective function components determined according to the loss function of the second generator according to weights , determine the objective function of the second generator.
- the device also includes:
- the first similarity measure function determining module may be configured to determine the second generator based on the distillation objective function between the teacher generated confrontation network and the student generated confrontation network, and the loss function of the second generator. Before the objective function of the second generator, a first similarity measurement function is determined according to the similarity of at least one layer of intermediate feature maps in the first generator and the second generator;
- the first intermediate feature map obtaining module may be configured to input the fake picture generated by the first generator into the first discriminator to obtain a first intermediate feature map of at least one layer in the first discriminator;
- the second intermediate feature map obtaining module may be configured to input the fake picture generated by the second generator into the first discriminator to obtain a second intermediate feature map of at least one layer in the first discriminator;
- the second similarity measure function determination module may be configured to determine a second similarity measure function according to the similarity between the first intermediate feature map of the at least one layer and the second intermediate feature map of the at least one layer ;
- the distillation objective function determination module may be configured to determine the distillation objective function according to the first similarity measure function and the second similarity measure function.
- the first similarity measure function determination module includes:
- the first sub-similarity measure function determination submodule may be configured to input the intermediate feature map of the i-th layer of the first generator and the intermediate feature map of the i-th layer of the second generator into the similarity measure function, Obtain the first sub-similarity measurement function corresponding to the i-th layer; wherein, i is a positive integer, and i takes a value from 1-M, and M is the number of layers of the first generator and the second generator;
- the first similarity measure function determining submodule may be configured to determine the first similarity measure function according to the first sub-similarity measure function corresponding to each layer.
- the second similarity measure function determination module includes:
- the second sub-similarity measurement function determination submodule can be configured to input the first intermediate feature map and the second intermediate feature map corresponding to the j-th layer into the similarity measurement function to obtain the second sub-similarity map corresponding to the j-th layer Similarity measurement function; wherein, j is a positive integer, 1 ⁇ j ⁇ N, and j takes a value from 1-N, and N is the number of layers of the first discriminator;
- the second similarity measure function determining submodule may be configured to determine the second similarity measure function according to the second sub-similarity measure function corresponding to each layer.
- the second determining submodule includes:
- a third determination unit may be configured to determine each of the retention factors according to an objective function of the retention factors
- the fourth determining unit may be configured to determine that the retention factor is 0 when the retention factor is smaller than a second preset threshold
- the fifth determining unit may be configured to determine that the retention factor is 1 when the retention factor is greater than or equal to the second preset threshold.
- the device further includes an objective function determination module of the retention factor, which can be configured to determine each of the retention factors according to the objective function of the retention factor, according to the objective of the second generator function, the objective function of the second discriminator, the loss function of the second discriminator on fake pictures, the objective function of the first generator, and the loss function of the first discriminator on fake pictures determine the Objective function for the retention factor described above.
- an objective function determination module of the retention factor can be configured to determine each of the retention factors according to the objective function of the retention factor, according to the objective of the second generator function, the objective function of the second discriminator, the loss function of the second discriminator on fake pictures, the objective function of the first generator, and the loss function of the first discriminator on fake pictures determine the Objective function for the retention factor described above.
- the device provided in this embodiment is capable of executing the method in any of the above-mentioned embodiments in FIG. 1-FIG. 3 , and its execution mode and beneficial effect are similar, and details are not repeated here.
- FIG. 6 is a schematic structural diagram of a network model compression device in an embodiment of the present disclosure.
- the compression device 600 of the network model in the embodiment of the present disclosure may include, but not limited to, devices such as mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), Mobile terminals such as vehicle-mounted terminals (eg, vehicle-mounted navigation terminals) and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
- the compression device of the network model shown in FIG. 6 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
- the network model compression device 600 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 601, which may be stored in a read-only memory (ROM) 602 or loaded from a storage device 608. Various appropriate actions and processes are executed by accessing programs in the random access memory (RAM) 603 . In the RAM 603, various programs and data necessary for the operation of the compression device 600 of the network model are also stored.
- the processing device 601, ROM 602, and RAM 603 are connected to each other through a bus 604.
- An input/output (I/O) interface 605 is also connected to the bus 604 .
- the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 607 such as a computer; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609.
- the communication means 609 may allow the network model compression device 600 to perform wireless or wired communication with other devices to exchange data. While FIG. 6 shows a compression device 600 having a network model of various devices, it should be understood that implementing or possessing all of the devices shown is not a requirement. More or fewer means may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
- the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602.
- the processing device 601 When the computer program is executed by the processing device 601, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
- the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
- a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
- Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
- the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
- HTTP HyperText Transfer Protocol
- the communication eg, communication network
- Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
- the above-mentioned computer-readable medium may be included in the compression device of the above-mentioned network model; or it may exist independently without being assembled into the compression device of the network model.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the compression device of the network model, the compression device of the network model: performs pruning processing on the first generator to obtain the first generator Two generators; the state of the convolution kernel in the first discriminator is configured so that a part of the convolution kernels are in an active state, and the other part of the convolution kernels are in a suppressed state to obtain a second discriminator; wherein, the first generator The loss difference between the first discriminator and the first discriminator is the first loss difference, the loss difference between the second generator and the second discriminator is the second loss difference, and the difference between the first loss difference and the second loss difference The absolute value is smaller than the first preset threshold.
- Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
- LAN local area network
- WAN wide area network
- Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
- each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs System on Chips
- CPLD Complex Programmable Logical device
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
- a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
- An embodiment of the present disclosure also provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the method of any one of the above-mentioned embodiments in FIGS. 1-3 can be implemented. Its execution method and beneficial effect are similar, and will not be repeated here.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims (18)
- 一种网络模型的压缩方法,待压缩的网络模型包括第一生成器和第一辨别器,所述方法包括:对所述第一生成器进行剪枝处理,得到第二生成器;对所述第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器;其中,所述第一生成器和所述第一辨别器之间的损失差为第一损失差,所述第二生成器和所述第二辨别器之间的损失差为第二损失差,所述第一损失差和所述第二损失差之间的差值绝对值小于第一预设阈值。
- 根据权利要求1所述的网络模型的压缩方法,其中,所述对所述第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器,包括:冻结所述第二辨别器中的各卷积核对应的保留因子,确定所述第二辨别器的第一权重参数;其中,所述保留因子用于表征其对应的卷积核的重要程度;冻结所述第二辨别器的第一权重参数和所述第二生成器的第二权重参数,确定各所述保留因子;重复执行上述确定所述第二辨别器的第一权重参数以及确定各所述保留因子的操作,直至所述第一损失差和所述第二损失差之间的差值绝对值小于所述第一预设阈值。
- 根据权利要求2所述的网络模型的压缩方法,其中,所述第一权重参数包括所述第二辨别器中除去所述保留因子之外的其它要素对应的权重参数。
- 根据权利要求2所述的网络模型的压缩方法,其中,所述第二权重参数包括所述第二生成器中各要素对应的权重参数。
- 根据权利要求2所述的网络模型的压缩方法,其中,所述确定所述第二辨别器的第一权重参数,包括:根据所述第二辨别器的目标函数确定所述第二辨别器的第一权重参数;所述方法还包括:根据所述第二生成器的目标函数确定所述第二生成器的第二权重参数。
- 根据权利要求5所述的网络模型的压缩方法,其中,所述根据所述第二生成器的目标函数确定所述第二生成器的第二权重参数之前,所述方法还包括:根据所述第二生成器的损失函数确定所述第二生成器的目标函数;根据所述第二辨别器关于真实图片的损失函数、以及所述第二辨别器关于虚假图片的损失函数确定所述第二辨别器的目标函数。
- 根据权利要求6所述的网络模型的压缩方法,其中,所述根据所述第二生成器的损失函数确定所述第二生成器的目标函数之前,所述方法还包括:将所述第一生成器和所述第一辨别器作为教师生成对抗网络,将所述第二生成器和所述第二辨别器作为学生生成对抗网络;所述根据所述第二生成器的损失函数确定所述第二生成器的目标函数,包括:根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及 所述第二生成器的损失函数确定所述第二生成器的目标函数。
- 根据权利要求7所述的网络模型的压缩方法,其中,所述根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及所述第二生成器的损失函数确定所述第二生成器的目标函数,包括:将所述蒸馏目标函数、以及根据所述第二生成器的损失函数确定的目标函数分量,按照权重进行加和,确定所述第二生成器的目标函数。
- 根据权利要求7所述的网络模型的压缩方法,其中,所述根据所述教师生成对抗网络和所述学生生成对抗网络之间的蒸馏目标函数、以及所述第二生成器的损失函数确定所述第二生成器的目标函数之前,所述方法还包括:根据所述第一生成器和所述第二生成器中的至少一层中间特征图的相似度确定第一相似性度量函数;将所述第一生成器生成的虚假图片输入所述第一辨别器,得到所述第一辨别器中的至少一层的第一中间特征图;将所述第二生成器生成的虚假图片输入所述第一辨别器,得到所述第一辨别器中的至少一层的第二中间特征图;根据所述至少一层的所述第一中间特征图和所述至少一层的所述第二中间特征图的相似度确定第二相似性度量函数;根据所述第一相似性度量函数和所述第二相似性度量函数确定所述蒸馏目标函数。
- 根据权利要求9所述的网络模型的压缩方法,其中,所述根据所述第一生成器和所述第二生成器中的至少一层中间特征图的相似度确定第一相似性度量函数,包括:将所述第一生成器的第i层的中间特征图和所述第二生成器的第i层的中间特征图输入相似性度量函数,得到第i层对应的第一子相似性度量函数;其中,i为正整数,且i从1-M中取值,M为所述第一生成器和所述第二生成器的层数;根据各层对应的所述第一子相似性度量函数确定所述第一相似性度量函数。
- 根据权利要求9所述的网络模型的压缩方法,其中,所述根据所述至少一层的所述第一中间特征图和所述至少一层的所述第二中间特征图的相似度确定第二相似性度量函数,包括:将第j层对应的所述第一中间特征图和所述第二中间特征图输入相似性度量函数,得到第j层对应的第二子相似性度量函数;其中,j为正整数,1≤j≤N,且j从1-N中取值,N为所述第一辨别器的层数;根据各层对应的所述第二子相似性度量函数确定所述第二相似性度量函数。
- 根据权利要求2所述的网络模型的压缩方法,其中,所述确定各所述保留因子,包括:根据所述保留因子的目标函数确定各所述保留因子;当所述保留因子小于第二预设阈值时,确定所述保留因子为0;当所述保留因子大于或等于所述第二预设阈值时,确定所述保留因子为1。
- 根据权利要求12所述的网络模型的压缩方法,其中,所述根据所述保留因子的目标函数确定各所述保留因子之前,所述方法还包括:根据所述第二生成器的目标函数、所述第二辨别器的目标函数、所述第二辨别器关于虚假图片的损失函数、所述第一生成器的目标函数、以及所述第一辨别器关于虚假图片的损失函数确定所述保留因子的目标函数。
- 一种图像生成方法,包括:将随机噪声信号输入至第二生成器中,以使所述第二生成器根据所述随机噪声信号生成虚假图像;将所述虚假图像输入所述第二辨别器,以使所述第二辨别器辨别所述虚假图像为真后输出所述虚假图像;其中,所述第二生成器和所述第二辨别器采用如权利要求1-13中任一所述的方法得到。
- 一种网络模型的压缩装置,待压缩的网络模型包括第一生成器和第一辨别器,所述装置包括:剪枝模块和配置模块;所述剪枝模块,配置为对所述第一生成器进行剪枝处理,得到第二生成器;所述配置模块,配置为对所述第一辨别器中的卷积核的状态进行配置,使得其中一部分卷积核为激活状态,另一部分卷积核为抑制状态,得到第二辨别器;其中,所述第一生成器和所述第一辨别器之间的损失差为第一损失差,所述第二生成器和所述第二辨别器之间的损失差为第二损失差,所述第一损失差和所述第二损失差之间的差值绝对值小于第一预设阈值。
- 一种网络模型的压缩设备,包括:存储器,所述存储器中存储有计算机程序;处理器,用于执行所述计算机程序,当所述计算机程序被所述处理器执行时,所述处理器执行如权利要求1-13中任一项所述的方法。
- 一种计算机可读存储介质,所述存储介质中存储有计算机程序,当所述计算机程序被处理器执行时,实现如权利要求1-13中任一项所述的方法。
- 一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行如权利要求1-13中任一项所述的方法的程序代码。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22871916.7A EP4339836A1 (en) | 2021-09-24 | 2022-09-19 | Network model compression method, apparatus and device, image generation method, and medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111122307.5A CN113780534B (zh) | 2021-09-24 | 2021-09-24 | 网络模型的压缩方法、图像生成方法、装置、设备及介质 |
CN202111122307.5 | 2021-09-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023045870A1 true WO2023045870A1 (zh) | 2023-03-30 |
Family
ID=78853212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/119638 WO2023045870A1 (zh) | 2021-09-24 | 2022-09-19 | 网络模型的压缩方法、图像生成方法、装置、设备及介质 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP4339836A1 (zh) |
CN (1) | CN113780534B (zh) |
WO (1) | WO2023045870A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780534B (zh) * | 2021-09-24 | 2023-08-22 | 北京字跳网络技术有限公司 | 网络模型的压缩方法、图像生成方法、装置、设备及介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084281A (zh) * | 2019-03-31 | 2019-08-02 | 华为技术有限公司 | 图像生成方法、神经网络的压缩方法及相关装置、设备 |
CN110796251A (zh) * | 2019-10-28 | 2020-02-14 | 天津大学 | 基于卷积神经网络的图像压缩优化方法 |
CN110796619A (zh) * | 2019-10-28 | 2020-02-14 | 腾讯科技(深圳)有限公司 | 一种图像处理模型训练方法、装置、电子设备及存储介质 |
US20200302295A1 (en) * | 2019-03-22 | 2020-09-24 | Royal Bank Of Canada | System and method for knowledge distillation between neural networks |
CN113780534A (zh) * | 2021-09-24 | 2021-12-10 | 北京字跳网络技术有限公司 | 网络模型的压缩方法、图像生成方法、装置、设备及介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423700B (zh) * | 2017-07-17 | 2020-10-20 | 广州广电卓识智能科技有限公司 | 人证核实的方法及装置 |
CN109978792A (zh) * | 2019-03-28 | 2019-07-05 | 厦门美图之家科技有限公司 | 一种生成图像增强模型的方法 |
CN113313133A (zh) * | 2020-02-25 | 2021-08-27 | 武汉Tcl集团工业研究院有限公司 | 一种生成对抗网络的训练方法、动画图像生成方法 |
CN112052948B (zh) * | 2020-08-19 | 2023-11-14 | 腾讯科技(深圳)有限公司 | 一种网络模型压缩方法、装置、存储介质和电子设备 |
CN113159173B (zh) * | 2021-04-20 | 2024-04-26 | 北京邮电大学 | 一种结合剪枝与知识蒸馏的卷积神经网络模型压缩方法 |
-
2021
- 2021-09-24 CN CN202111122307.5A patent/CN113780534B/zh active Active
-
2022
- 2022-09-19 WO PCT/CN2022/119638 patent/WO2023045870A1/zh active Application Filing
- 2022-09-19 EP EP22871916.7A patent/EP4339836A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200302295A1 (en) * | 2019-03-22 | 2020-09-24 | Royal Bank Of Canada | System and method for knowledge distillation between neural networks |
CN110084281A (zh) * | 2019-03-31 | 2019-08-02 | 华为技术有限公司 | 图像生成方法、神经网络的压缩方法及相关装置、设备 |
CN110796251A (zh) * | 2019-10-28 | 2020-02-14 | 天津大学 | 基于卷积神经网络的图像压缩优化方法 |
CN110796619A (zh) * | 2019-10-28 | 2020-02-14 | 腾讯科技(深圳)有限公司 | 一种图像处理模型训练方法、装置、电子设备及存储介质 |
CN113780534A (zh) * | 2021-09-24 | 2021-12-10 | 北京字跳网络技术有限公司 | 网络模型的压缩方法、图像生成方法、装置、设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN113780534B (zh) | 2023-08-22 |
CN113780534A (zh) | 2021-12-10 |
EP4339836A1 (en) | 2024-03-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020155907A1 (zh) | 用于生成漫画风格转换模型的方法和装置 | |
WO2023273985A1 (zh) | 一种语音识别模型的训练方法、装置及设备 | |
CN112364860B (zh) | 字符识别模型的训练方法、装置和电子设备 | |
WO2023273579A1 (zh) | 模型的训练方法、语音识别方法、装置、介质及设备 | |
WO2022227886A1 (zh) | 超分修复网络模型生成方法、图像超分修复方法及装置 | |
WO2022247562A1 (zh) | 多模态数据检索方法、装置、介质及电子设备 | |
WO2023273611A1 (zh) | 语音识别模型的训练方法、语音识别方法、装置、介质及设备 | |
WO2020207174A1 (zh) | 用于生成量化神经网络的方法和装置 | |
WO2023143016A1 (zh) | 特征提取模型的生成方法、图像特征提取方法和装置 | |
WO2023143178A1 (zh) | 对象分割方法、装置、设备及存储介质 | |
WO2023005386A1 (zh) | 模型训练方法和装置 | |
WO2023088280A1 (zh) | 意图识别方法、装置、可读介质及电子设备 | |
WO2022171036A1 (zh) | 视频目标追踪方法、视频目标追踪装置、存储介质及电子设备 | |
CN113033580B (zh) | 图像处理方法、装置、存储介质及电子设备 | |
WO2023116138A1 (zh) | 多任务模型的建模方法、推广内容处理方法及相关装置 | |
WO2023273610A1 (zh) | 语音识别方法、装置、介质及电子设备 | |
WO2023051238A1 (zh) | 动物形象的生成方法、装置、设备及存储介质 | |
WO2023045870A1 (zh) | 网络模型的压缩方法、图像生成方法、装置、设备及介质 | |
CN111597825A (zh) | 语音翻译方法、装置、可读介质及电子设备 | |
WO2023211369A2 (zh) | 语音识别模型的生成方法、识别方法、装置、介质及设备 | |
CN110009101B (zh) | 用于生成量化神经网络的方法和装置 | |
JP2022541832A (ja) | 画像を検索するための方法及び装置 | |
WO2022012178A1 (zh) | 用于生成目标函数的方法、装置、电子设备和计算机可读介质 | |
CN114420135A (zh) | 基于注意力机制的声纹识别方法及装置 | |
WO2023185515A1 (zh) | 特征提取方法、装置、存储介质及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22871916 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022871916 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18571116 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2022871916 Country of ref document: EP Effective date: 20231214 |
|
ENP | Entry into the national phase |
Ref document number: 2022871916 Country of ref document: EP Effective date: 20231214 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |