WO2020233709A1 - 模型压缩方法及装置 - Google Patents

模型压缩方法及装置 Download PDF

Info

Publication number
WO2020233709A1
WO2020233709A1 PCT/CN2020/091824 CN2020091824W WO2020233709A1 WO 2020233709 A1 WO2020233709 A1 WO 2020233709A1 CN 2020091824 W CN2020091824 W CN 2020091824W WO 2020233709 A1 WO2020233709 A1 WO 2020233709A1
Authority
WO
WIPO (PCT)
Prior art keywords
generator
model
generation
network structure
sub
Prior art date
Application number
PCT/CN2020/091824
Other languages
English (en)
French (fr)
Inventor
舒晗
王云鹤
韩凯
许春景
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020233709A1 publication Critical patent/WO2020233709A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Definitions

  • This application relates to the field of computer vision, in particular to a model compression method and device.
  • FIG. 1 shows a schematic diagram of the results of the GAN model in portrait rendering.
  • the generator models in the existing GSN models often require a large amount of memory due to their own output results and optimization goals, and running these generator models usually requires a large computational overhead, which can only be used in graphics processing. It runs on the graphics processing unit (GPU) platform, and these generator models cannot be directly migrated to the mobile terminal.
  • the existing compression algorithms are all designed for the discriminator model in the GSN model, and they cannot be directly applied to the generator model to obtain satisfactory results.
  • the embodiments of the present application provide a model compression method and device, which are used to solve the problem that the existing compression algorithm is directly applied to the generator model and cannot obtain satisfactory results.
  • a model compression method includes: obtaining a generator model before compression; performing binary encoding on the network structure of the generator model before compression to obtain a first generation subgroup.
  • the subgroup includes the network structure of M first-generation generator sub-models, where the network structure of each first-generation generator sub-model corresponds to a set of fixed-length binary codes, and M is a positive integer greater than 1.
  • the fitness value of the network structure of each first-generation generator sub-model according to the fitness value of the network structure of each first-generation generator sub-model, combined with genetic algorithm, determine the optimal fitness value in the Nth generation subgroup
  • the network structure of the Nth generation generator submodel where N is a positive integer greater than 1, where the Nth generation subgroup includes the network structure of M Nth generation generator submodels, and each Nth generation generator submodel
  • the network structure corresponds to a set of fixed-length binary codes.
  • the average value of the fitness value of the network structure of the M Nth generation generator sub-models is compared with the M (N-th)th subgroup in the (N-1)th generation subgroup.
  • the difference between the average value of the fitness value of the network structure of the generation generator submodel is less than the set value; according to the network parameters in the generator model before compression and the Nth generation generator submodel with the optimal fitness value To determine the compressed generator model.
  • the model compression method provided by the embodiment of the application performs global binary coding compression on the network structure of the generator model, and the fitness calculation method based on the network structure of the generator sub-model and the genetic algorithm automatically select compression.
  • the compression is The network parameters of the generator model after compression are smaller than those of the generator model before compression; on the other hand, the FLOPs of the generator model after compression are smaller than the FLOPs of the generator model before compression.
  • the generator model obtained based on the model compression method provided in the embodiment of this application can maintain the style transfer performance, and the traditional compression method is invalid; on the other hand
  • the network structure of the generator model obtained based on the model compression method provided in the embodiments of this application is different. Relatively complex tasks retain more parameters, simple tasks retain fewer parameters, and the model structure is task-related. The uniqueness of it minimizes parameter redundancy.
  • the problem that the existing compression algorithm can not obtain satisfactory results when directly applied to the generator model can be solved.
  • the network structure includes: repeat the following step S1 until the Nth generation subgroup is obtained: step S1, select the kth generation generator sub-model with the best fitness value from the kth generation subgroup as the first
  • the network structure of a (k+1)-th generation generator sub-model in the (k+1) generation subgroup, k is a positive integer smaller than (N-1); according to the genetic algorithm, according to the k-th generation subgroup
  • the fitness value of the network structure of the M generator sub-models is selected for probability, and the selection, crossover and mutation operations are performed according to the preset probability to obtain the other (M-1) in the (k+1)th generation subgroup
  • the network structure of the (k+1)th generation generator sub-model determine the network structure of the Nth generation generator sub-model with the best fitness value in the Nth generation subgroup.
  • the fitness value of the network structure of the p-th generation generator sub-model is based on the normalized value of the network parameters of the p-th generation generator sub-model, the generator perception loss and the discriminator perception loss Determined, the generator perception loss is used to characterize the difference between the output result of the p-th generation generator sub-model and the output result of the p-1th generation generator sub-model; the discriminator perception loss is used to characterize the p-th generation The difference between the output result of the generator sub-model and the output result of the p-1th generation generator sub-model after passing through the discriminator, where p is a positive integer from 1 to N, the 0th generation generator The sub-model is the generator model before compression. Based on this scheme, the fitness value of the network structure of the p-th generation generator sub-model can be determined.
  • the normalized values of network parameters, ⁇ and ⁇ are set values;
  • L GenA represents the perceptual loss of the generator;
  • L DisA represents the perceptual loss of the discriminator, and q represents all the network structure of the sub-model of the p- th generation generator Binary coding of the convolutional layer.
  • p(q) satisfies the following second formula:
  • q l-1 represents the binary coding of the (l-1)th layer convolution in the network structure of the p-th generation generator sub-model
  • q l represents the l- th generation in the network structure of the p-th generation generator sub-model Binary coding of layer convolution
  • H l represents the height of the first layer convolution of the network structure of the p-th generation generator sub-model
  • W l represents the first layer volume of the network structure of the p-th generation generator sub-model
  • C l represents the number of channels of the l-th convolution of the network structure of the p-th generation generator sub-model
  • N l represents the number of the l-th convolution of the network structure of the p-th generator sub-model Number
  • 1 means L1 norm
  • means sum.
  • the method further includes: determining the perceptual loss of the generator according to the following third formula, the third formula including: Among them, x i represents the i-th input picture, m represents the number of input pictures, and G(x i ) represents the output result of the i-th input picture through the sub-model of the p-1 generation generator; Represents the output result of the i-th input picture through the output result of the p-th generation generator sub-model, ⁇ represents the sum; Represents the L2 norm difference. Based on this scheme, the perceptual loss of the generator can be determined.
  • the method further includes: determining the discriminator's perception loss according to the following fourth formula, the fourth formula including: Among them, x i represents the i-th input picture, m represents the number of input pictures, D(G(x i )) represents the i-th input picture after the output result of the (p-1) generation generator sub-model The output result after the discriminator; Represents the output result of the i-th input picture after passing through the output result of the p-th generation generator sub-model and then passing through the discriminator, ⁇ represents the sum; Represents the L2 norm difference. Based on this scheme, the discriminator perception loss can be determined.
  • performing binary encoding on the network structure of the generator model before compression to obtain the first generation subgroup includes: if the first channel in the network structure of the generator model before compression If the corresponding binary code is 0, the calculation unit related to the first channel is removed; or, if the binary code corresponding to the second channel in the network structure of the generator model before compression is 1, keep the second channel related In the calculation unit of, wherein the first channel or the second channel corresponds to a convolution kernel of any layer convolution in the network structure of the generator model before compression.
  • the network parameters of the generator model after compression can be made smaller than the network parameters of the generator model before compression and the generator model after compression can be made
  • the FLOPs of the model are smaller than the FLOPs of the generator model before compression, and the average time consumption of a single image on the CPU platform is reduced.
  • a model compression method includes: obtaining a first generator model and a second generator model before compression, where the first generator model and the second generator model are symmetrical generators Model; Binary encoding is performed on the network structure of the first generator model before compression to obtain the first generation subgroup corresponding to the first generator model; and, the network structure of the second generator model before compression Perform binary encoding to obtain the first generation subgroup corresponding to the second generator model; the first generation subgroup corresponding to the first generator model includes a network structure of M1 first generation generator submodels, and the second The first-generation subgroup corresponding to the generator model includes the network structure of M2 first-generation generator sub-models.
  • the network structure of each first-generation generator sub-model corresponds to a set of fixed-length binary codes. Both M1 and M2 Is a positive integer greater than 1; the fitness value of the network structure of each first-generation generator sub-model is obtained; the fitness value of the network structure of each first-generation generator sub-model is combined with genetic algorithm to determine the first
  • the network structure of the Nth generation generator submodel with the best fitness value in the Nth generation subgroup corresponding to the generator model and the Nth generation generator with the best fitness value in the Nth generation subgroup corresponding to the second generator model The network structure of the sub-model, where N is a positive integer greater than 1, wherein the Nth generation subgroup corresponding to the first generator model includes the network structure of M1 Nth generation generator submodels, and the second generator model corresponds to The Nth generation subgroup of includes the network structure of M2 Nth generation generator submodels.
  • the network structure of each Nth generation generator submodel corresponds to a set of fixed-length binary codes
  • the first generator model corresponds to The average value of the fitness value of the network structure of the M1 Nth generation generator submodel and the average value of the fitness value of the M1 (N-1) generation generator submodel corresponding to the first generator model The difference is less than the first set value, and the average value of the fitness value of the network structure of the M2 Nth generation generator sub-models corresponding to the second generator model is equal to that of the M2 (Nth generation) corresponding to the second generator model.
  • the difference between the average value of the fitness value of the network structure of the generation generator sub-model is smaller than the second set value; according to the network parameters in the first generator model before compression and the corresponding first generator model Determine the compressed first generator model based on the network structure of the Nth generator submodel with the best fitness value in the Nth subgroup; and, according to the network parameters in the second generator model before compression and the
  • the second generator model corresponds to the network structure of the Nth generation generator submodel with the optimal fitness value in the Nth generation subgroup, and the compressed second generator model is determined.
  • the model compression method provided by the embodiment of the application performs global binary coding compression on the network structure of the generator model, and the fitness calculation method based on the network structure of the generator sub-model and the genetic algorithm automatically select compression.
  • the compression is The network parameters of the generator model after compression are smaller than those of the generator model before compression; on the other hand, the FLOPs of the generator model after compression are smaller than the FLOPs of the generator model before compression.
  • the average time consumption of pictures is reduced; on the other hand, when the amount of compressed network parameters is equivalent, the generator model obtained based on the model compression method provided in the embodiment of this application can maintain the style transfer performance, and the traditional compression method is invalid; on the other hand
  • the network structure of the generator model obtained based on the model compression method provided in the embodiments of this application is different. Relatively complex tasks retain more parameters, simple tasks retain fewer parameters, and the model structure is task-related. The uniqueness of it minimizes parameter redundancy.
  • the problem that the existing compression algorithm can not obtain satisfactory results when directly applied to the generator model can be solved.
  • the optimal fitness value in the Nth generation subgroup corresponding to the first generator model The network structure of the Nth generation generator sub-model of the second generator model and the network structure of the Nth generation sub-model of the Nth generation subgroup corresponding to the second generator model with the optimal fitness value include: repeating the following step S1 And step S2, until the Nth generation subgroup corresponding to the first generator model and the Nth generation subgroup corresponding to the second generator model are obtained: step S1, in the kth generation subgroup corresponding to the first generator model
  • the network structure of the k-th generation generator sub-model with the optimal fitness value is used as the network structure of a (k+1)-th generation generator sub-model in the (k+1)-th generation subgroup corresponding to the second generator model ;
  • probabilistic selection is made according to the fitness value of the network structure of the M2 generator sub-models in the k-th generation subgroup
  • the network structure of the k+1 generation generator sub-model determine the network structure of the N-th generation generator sub-model with the best fitness value in the N-th generation subgroup corresponding to the first generator model and the corresponding second generator model The network structure of the Nth generation generator submodel with the best fitness value in the Nth generation subgroup. Based on this scheme, it is possible to determine the network structure of the Nth generator sub-model with the best fitness value in the Nth generation subgroup corresponding to the first generator model and the adaptation in the Nth generation subgroup corresponding to the second generator model The network structure of the Nth generation generator sub-model with the best value.
  • a model compression device for implementing the above-mentioned various methods.
  • the model compression device includes a module, unit, or means corresponding to the foregoing method, and the module, unit, or means can be implemented by hardware, software, or hardware execution of corresponding software.
  • the hardware or software includes one or more modules or units corresponding to the above-mentioned functions.
  • a model compression device including: a processor and a memory; the memory is used to store computer instructions, and when the processor executes the instructions, the model compression device can execute the first aspect or the second aspect. The method described in the aspect.
  • a model compression device including: a processor; the processor is configured to couple with a memory, and after reading an instruction in the memory, execute according to the instruction as described in the first or second aspect. The method described.
  • a computer-readable storage medium stores instructions that, when run on a computer, enable the computer to execute the method described in the first or second aspect.
  • a computer program product containing instructions which when run on a computer, enables the computer to execute the method described in the first or second aspect.
  • an apparatus for example, the apparatus may be a chip or a chip system
  • the apparatus includes a processor for implementing the functions involved in the first aspect or the second aspect.
  • the device also includes a memory for storing necessary program instructions and data.
  • the device is a chip system, it can be composed of chips, or include chips and other discrete devices.
  • the technical effects brought by any one of the design methods of the third aspect to the eighth aspect can be referred to the technical effects brought about by the different design methods in the first aspect or the second aspect, which will not be repeated here.
  • Figure 1 is a schematic diagram of the results of the existing GAN model in portrait rendering
  • Figure 2 is a structural diagram of the existing CycleGAN proposed by the GAN model to complete the image domain conversion
  • Figure 3 is a schematic diagram of the tasks of two image conversion domains in the urban street view data set
  • FIG. 4 is a schematic flowchart of a model compression method provided by an embodiment of the application.
  • 5 is a schematic diagram of comparison between each set of fixed-length binary codes and a compressed generator model provided by an embodiment of the application;
  • Fig. 6 is a schematic diagram of global binary coding of the generator model provided by an embodiment of the application.
  • FIG. 7 is a schematic flowchart of the Nth generation generator sub-model with the optimal fitness value obtained from the generator model before compression according to an embodiment of the application;
  • FIG. 8 is a schematic flowchart of another model compression method provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of alternate iteration optimization of a co-evolution algorithm provided by an embodiment of the application.
  • FIG. 10 is an artistic style conversion model based on automatically compressed images provided by an embodiment of the application.
  • FIG. 11 is an image art style conversion effect diagram before and after compression of the generator model provided by an embodiment of the application.
  • FIG. 12 is a schematic diagram of quick style transfer provided by an embodiment of this application.
  • FIG. 13 is a schematic diagram of the compression effect of a fast style transfer model provided by an embodiment of the application.
  • FIG. 14 is a schematic diagram of the comparison of the conversion effects before and after compression of the generated model for the conversion between horses and zebras provided by an embodiment of the application;
  • FIG. 15 is a first structural diagram of a model compression device provided by an embodiment of the application.
  • FIG. 16 is a second structural diagram of the model compression device provided by an embodiment of the application.
  • At least one item (a) refers to any combination of these items, including any combination of a single item (a) or plural items (a).
  • at least one item (a) of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .
  • words such as “first” and “second” are used to distinguish the same items or similar items with substantially the same function and effect.
  • words such as “first” and “second” do not limit the quantity and order of execution, and words such as “first” and “second” do not limit the difference.
  • words such as “exemplary” or “for example” are used as examples, illustrations, or illustrations. Any embodiment or design solution described as “exemplary” or “for example” in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions.
  • words such as “exemplary” or “for example” are used to present related concepts in a specific manner to facilitate understanding.
  • FIG. 2 it is the structure diagram proposed by CycleGAN to complete the image domain conversion using the GAN model.
  • the generator model G AB completes the migration from style A pictures to style B pictures
  • the generator model G BA completes the transition from style B pictures to
  • the discriminator model D B judges whether the picture comes from a real style B picture or a style B picture generated by the generator model G AB .
  • the generator model G AB obtained by the confrontation training method can complete the task of style transfer.
  • the current generator model (also called a generative model) in the GSN model includes (but is not limited to) the following specific issues:
  • the amount of network parameters of the generator model commonly used for image style conversion is too large.
  • the amount of network parameters of each convolutional layer can often reach tens of thousands or hundreds of thousands.
  • the N-layer convolutional layer of the entire generator model The total number of parameters can reach tens of millions (represented by a 32-bit floating point number, which requires hundreds of megabytes of memory or cache).
  • memory and cache resources are very limited, so how to reduce the amount of convolutional network parameters is an urgent problem to be solved.
  • the convolution operation in the generator model has a huge amount of calculation.
  • a generator model contains a convolution kernel with hundreds of thousands of network parameters, and the number of floating point operations (FLOPs) for convolution operations can reach tens of millions.
  • FLOPs floating point operations
  • the generator model that can be calculated in real time on the GPU is very slow to the mobile terminal.
  • the computing resources of the mobile terminal cannot meet the real-time operation of the existing generator model, how to reduce the amount of convolution calculation and reduce the calculation overhead of the generator model is an urgent problem to be solved.
  • the difficulty of style transfer between different image domains is different, such as street view and street view segmentation in urban terrain dataset.
  • the conversion from the street view segmentation map to the street view requires a large amount of details to be restored, while the conversion from street view to the street view segmentation map requires a large amount of details to be erased.
  • the difficulty of the two tasks is obviously different.
  • the structure of the generator model between the two domains is the same, and the network parameters are the same as the computational complexity, so the parameters of the traditional generator model for generating confrontation training are redundant. , And the redundancy of each image conversion task is different.
  • an embodiment of the present application provides a model compression method, as shown in FIG. 4, including the following steps:
  • S402 Perform binary encoding on the network structure of the generator model before compression to obtain the first generation subgroup.
  • the first-generation subgroup includes the network structure of M first-generation generator sub-models, and the network structure of each first-generation generator sub-model corresponds to a set of fixed-length binary codes, and M is a positive integer greater than 1. .
  • the generator sub-model in the embodiment of the present application may also be referred to as a sub-individual, which is described here in a unified manner, and will not be described in detail below.
  • the network structure of the generator model or the network structure of the generator sub-model in the embodiment of the present application may also be referred to as a generating convolutional neural network or a generating network, etc., which are described in a unified manner here, and will not be described in detail below.
  • the Nth generation subgroup includes the network structure of M Nth generation generator submodels.
  • the network structure of each Nth generation generator submodel corresponds to a set of fixed-length binary codes.
  • M Nth generation generators The difference between the average value of the fitness value of the network structure of the sub-model and the average value of the fitness value of the network structure of the M (N-1) generation generator sub-models in the (N-1) generation subgroup is less than the setting value.
  • S405 Determine the compressed generator model according to the network parameters in the generator model before compression and the network structure of the Nth generation generator sub-model with the optimal fitness value.
  • Performing binary encoding on the network structure of the generator model before compression to obtain the first generation subgroup may include: if the binary code corresponding to the first channel in the network structure of the generator model before compression is 0, remove the first A channel-related calculation unit; or, if the binary code corresponding to the second channel in the network structure of the generator model before compression is 1, keep the second channel-related calculation unit, where the first channel or the second channel A convolution kernel corresponding to any layer of convolution in the network structure of the generator model before compression.
  • the network structure of the generator model in the embodiment of the application is composed of several layers of convolutional neural networks and deconvolutional neural networks, and each layer of convolutional neural networks and deconvolutional neural networks is composed of several convolution kernels.
  • the number of these convolution kernels determines the amount of network parameters and calculations of the generator model.
  • all convolution kernels in the generator model are determined.
  • q l (n) represents the binary coding of the nth convolution kernel of the lth layer of the network structure of the generator model; Represents the weight parameter.
  • the meaning of formula (1) is: if q l (n), multiply the network parameters of the nth convolution kernel of the l-th convolution of the network structure of the generator model by 0, otherwise give the network of the generator model The network parameters of the nth convolution kernel of the lth layer of the structure are multiplied by 1.
  • the first generation subgroup of the network structure including M first generation generator submodels can be obtained.
  • the network structure of each first-generation generator sub-model corresponds to a set of fixed-length binary codes.
  • each set of corresponding fixed-length binary codes is for a specific compressed generator sub-model network structure, where the code is all 1 is the network structure of the complete generator model before compression .
  • the network structure of the compressed generator sub-model removes a certain number of convolution kernels. Therefore, the compressed generator sub-model is compared with that of the generator model before compression. The amount of network parameters decreases, and the amount of convolution calculation involved in the calculation process is also reduced accordingly.
  • the remaining part after binary coding is shown in Figure 6.
  • the first layer of convolution is to remove all the channels with the corresponding binary code of 0.
  • the second layer of convolution and subsequent convolutional layers not only the channels with the channel code of 0 are removed correspondingly, but also the previous ones have been removed.
  • the calculation unit related to the convolutional layer is also removed accordingly, so the amount of calculation is further reduced.
  • the fitness value of the network structure of each first generation generator sub-model combined with genetic algorithm, determine the network structure of the Nth generation generator sub-model with the best fitness value in the Nth generation subgroup, including:
  • Step S1 select the network structure of the k-th generation generator sub-model with the optimal fitness value from the k-th generation subgroup as the network structure of a k+1-th generation subgroup , K is a positive integer less than (N-1); according to the genetic algorithm, the probability selection is performed according to the fitness value of the network structure of the M generator sub-models in the k-th generation subgroup, and the selection is performed according to the preset probability, Crossover and mutation operations to obtain the network structure of the other (M-1) generator sub-models of the k+1 generation subgroup; determine the Nth generation with the best fitness value in the Nth generation subgroup The network structure of the generator sub-model.
  • the first generation subgroup G 1-M can be obtained; wherein, the first generation subgroup G 1-M includes the first generation generator submodel 1_1 network structure G, a first generation of network structure G builder submodel 1_2, whil, a first generation of network structure G 1_m generator submodels.
  • the first generation of network structure G builder 1_1 submodel value corresponding to accommodate 1_1 a first generation of network structure G builder submodel 1_2 1_2 corresponding to fitness
  • a first generation of network structure G builder submodel 1_M corresponds to the adaptation value 1_M .
  • the network structure of the first-generation generator sub-model with the best fitness value is selected from the first-generation subgroup as the network structure of a second-generation generator sub-model in the second-generation subgroup; according to the genetic algorithm , Probabilistic selection is made according to the fitness value of the network structure of the M generator sub-models in the first-generation subgroup, and selection, crossover, and mutation operations are performed according to the preset probability to obtain the other (M -1) The network structure of the second generation generator sub-model.
  • the second-generation subgroup G 2-M includes the network structure G 2_1 of the second-generation generator sub-model, the network structure G 2_2 of the second-generation generator sub-model, ..., the second generation The network structure G 2_M of the generator sub-model.
  • the network structure G 2_1 of the second-generation generator sub-model corresponds to the fitness value 2_1
  • the network structure G 2_2 of the second-generation generator sub-model corresponds to the fitness value 2_2
  • the network structure G of the second-generation generator sub-model 2_M corresponds to the adaptation value 2_M .
  • the network structure of the (N-1)th generation generator sub-model with the best fitness value is selected from the (N-1)th generation subgroup as an Nth generation generator in the Nth generation subgroup
  • the network structure of the sub-model according to the genetic algorithm, the probability selection is performed according to the fitness value of the network structure of the M generator sub-models in the (N-1)th generation subgroup, and selection, crossover and mutation are performed according to the preset probability Operate to obtain the network structure of other (M-1) Nth generation generator submodels in the Nth generation subgroup.
  • the Nth generation subgroup G NM includes the network structure G N_1 of the Nth generator sub-model, the network structure G N_2 of the Nth generator submodel, ..., the Nth generator The network structure of the sub-model G N_M .
  • the network structure G N_1 of the Nth generation generator sub-model corresponds to the fitness value N_1
  • the network structure G N_2 of the Nth generation generator sub-model corresponds to the fitness value N_2
  • the network structure G of the Nth generation generator sub-model N_M corresponds to the fitness value N_M .
  • the average value of the fitness value of the network structure of the M Nth generation generator sub-models and the M (N-1)th generation subgroups in the (N-1)th generation subgroup The difference of the average value of the fitness value of the network structure is less than the set value.
  • the Nth generation subgroup is the generation subgroup whose network structure of the generator submodel tends to be stable.
  • the binary code corresponding to the network structure of the previous generation generator submodel is 0101 0000 010
  • the network structure of the next generation generator submodel is corresponding
  • the binary code of can be 0101 0000 010.
  • the binary code corresponding to the network structure of the previous generation generator submodel 1 (or the previous generation generator model 1) is 01010 1110010 0101
  • the previous generation generator submodel 2 (or the previous generation generator model 2)
  • the binary code corresponding to the network structure is 01010 01010 0110
  • the binary code corresponding to the network structure of the next generation generator submodel 1 can be 01010 0101011 0101
  • the binary code corresponding to the network structure of 2 can be 01010 1110010 0110.
  • the binary code corresponding to the network structure of the previous generation generator submodel is 100 10010101 101010
  • the network of the next generation generator submodel obtained after the mutation operation The binary code corresponding to the structure can be 100 01101010 101010.
  • the embodiment of the application introduces the Output, calculate the fitness value of the generator model's network structure by calculating the difference between the generator model before and after compression in the discriminator.
  • the fitness value of the network structure of the p-th generation generator sub-model is determined according to the normalized value of the network parameter of the p-th generation generator sub-model, the generator perception loss, and the discriminator perception loss.
  • the generator Perceptual loss is used to characterize the difference between the output result of the p-th generation generator sub-model and the output result of the p-1th generation generator sub-model;
  • the discriminator perception loss is used to characterize the output result of the p-th generation generator sub-model and The difference between the output results of the generator sub-model of the p-1 generation and the discriminator, where p is a positive integer from 1 to N, and the generator sub-model of the 0th generation is the generator model before compression .
  • the normalized value of the network parameter quantity, the generator perception loss, and the discriminator perception loss of the p-th generation generator sub-model satisfy the following formula (2):
  • f(q) represents the fitness value of the network structure of the p-th generation generator sub-model
  • p(q) represents the normalized value of the network parameter of the p-th generation generator sub-model
  • ⁇ and ⁇ are the set values
  • L GenA represents the perceptual loss of the generator
  • L DisA represents the perceptual loss of the discriminator
  • q represents the binary coding of all convolutional layers of the network structure of the p- th generation generator sub-model.
  • p(q) can satisfy the following formula (3):
  • 1 means L1 norm; ⁇ means sum.
  • the generator perception loss can be determined according to the following formula (4):
  • x i represents the i-th input picture
  • m represents the number of input pictures
  • G(x i ) represents the output result of the i-th input picture through the p-1 generation generator sub-model
  • represents the sum
  • the above formula (4) is the L2 norm difference of the pictures generated by the generator model before and after compression, and the physical meaning is to make the pictures generated by the generator model before and after compression similar at the pixel level.
  • the perceptual loss of the discriminator may be determined according to the following formula (5) including:
  • x i represents the i-th input picture
  • m represents the number of input pictures
  • D(G(x i )) represents the output of the i-th input picture through the sub-model of the p-1 generation generator and then passes through the discriminator
  • the output result after; Represents the output result of the i-th input picture after passing through the output result of the p-th generation generator sub-model and then passing through the discriminator
  • represents the sum; Represents the L2 norm difference.
  • the above formula (5) is the L2 difference between the picture generated by the generator model before and after compression in the original discriminator model.
  • the physical meaning is to make the picture generated by the generator model before and after compression be in the original discrimination
  • the discriminant results on the device are similar, that is, let the discriminator determine that the pictures produced by the generator before and after compression are consistent in the style domain.
  • an embodiment of the present application also provides a model compression method, as shown in FIG. 8, including the following steps:
  • S801 Acquire the first generator model and the second generator model before compression.
  • the first generator model and the second generator model are symmetrical generator models;
  • the first generation subgroup corresponding to the first generator model includes the network structure of M1 first generation generator submodels
  • the first generation subgroup corresponding to the second generator model includes M2 first generation generator submodels
  • the network structure of each first-generation generator sub-model corresponds to a set of fixed-length binary codes, and M1 and M2 are both positive integers greater than 1.
  • the generator sub-model in the embodiment of the present application may also be referred to as a sub-individual, which is described here in a unified manner, and will not be described in detail below.
  • the network structure of the generator model or the network structure of the generator sub-model in the embodiment of the present application may also be referred to as a generating convolutional neural network or a generating network, etc., which are described in a unified manner here, and will not be described in detail below.
  • the Nth generation subgroup corresponding to the first generator model includes the network structure of M1 Nth generation generator submodels
  • the Nth generation subgroup corresponding to the second generator model includes M2 Nth generation generator submodels
  • the network structure of each Nth generation generator sub-model corresponds to a set of fixed-length binary codes
  • the average fitness value of the network structure of the M1 Nth generation generator sub-models corresponding to the first generator model The difference between the value and the average value of the fitness value of the network structure of the M1 (N-1)th generation generator sub-model corresponding to the first generator model is smaller than the first set value
  • the M2 corresponding to the second generator model The difference between the average fitness value of the network structure of the Nth generation generator submodel and the average fitness value of the network structure of the M2 (N-1) generation generator submodel corresponding to the second generator model is less than The second setting value.
  • step S802 For the specific implementation of the foregoing step S802, reference may be made to step S402 in the embodiment shown in FIG. 4, which will not be repeated here.
  • the fitness value of the network structure of each first generation generator sub-model combined with genetic algorithm, determine the network structure of the Nth generation generator sub-model with the best fitness value among the Nth generation subgroups corresponding to the first generator model And the network structure of the Nth generation generator submodel with the best fitness value in the Nth generation subgroup corresponding to the second generator model, including:
  • Step S1 take the network structure of the k-th generation generator sub-model with the optimal fitness value in the k-th generation subgroup corresponding to the first generator model as the (k+1)-th generation subgroup corresponding to the second generator model
  • the network structure of a (k+1)-th generation generator sub-model according to the genetic algorithm, the probability is calculated according to the fitness value of the network structure of the M2 generator sub-models in the k-th generation subgroup corresponding to the second generator model Select, and perform selection, crossover, and mutation operations according to the preset probability to obtain the other (M2-1) generation k+1 generation sub-models in the k+1 generation subgroup corresponding to the second generator model
  • Network structure, k is a positive integer less than (N-1);
  • Step S2 take the network structure of the k+1th generation generator submodel with the best fitness value in the k+1th generation subgroup corresponding to the second generator model as the (k+1)th generation corresponding to the first generator model
  • the network structure of a generator sub-model of the k+1 generation in the generation subgroup according to the genetic algorithm, according to the fitness value of the network structure of the M1 generator sub-models in the k-th generation subgroup corresponding to the first generator model Perform probability selection, and perform selection, crossover and mutation operations according to preset probabilities to obtain other (M1-1) generation k+1 generators in the k+1 generation subgroup corresponding to the first generator model
  • the network structure of the model ;
  • a co-evolutionary algorithm is introduced for the conversion problem of two image domains, and a generator subgroup is maintained for two symmetrical generator models respectively.
  • a generator subgroup is maintained for two symmetrical generator models respectively.
  • the network structure of the optimal generator sub-model and then train the network structure of the generator sub-model and the network structure of all generator sub-models in another subgroup, and so on, alternate iterative optimization, and finally get two at the same time
  • the network structure of the compressed generator model is introduced for the conversion problem of two image domains, and a generator subgroup is maintained for two symmetrical generator models respectively.
  • generator A and generator B are symmetrical generator models, generator A maintains subgroup A, and generator B maintains subgroup B.
  • generator A maintains subgroup A
  • generator B maintains subgroup B.
  • the network structure of the generator sub-model; in the second iteration, the network structure of the generator sub-model with the best fitness value in subgroup B and the network structure of the generator sub-model in subgroup A are used for training, and the subgroups are selected
  • the network structure of the generator sub-model with the optimal fitness value in subgroup B is derived, and in subsequent iterations, and so on, alternate iterative optimization, and finally get the network structure of the
  • model compression method provided in the embodiments of the present application can be applied to various image conversion and style transfer tasks in the computer vision field, such as portrait beautification, virtual wearing and try-on, character background rendering, automatic driving road scene generation, etc.
  • model compression method provided in the embodiments of the present application can be used to construct an efficient generator model.
  • Virtual wearing and try-on Render the image of the person captured by the video camera, and wear the selected hat, glasses, clothing and other commodities virtually. From the captured image of the person, an image of the person "wearing" the product is generated.
  • Smart camera artistic style rendering In the smart camera, a variety of specific artistic style renderings are performed on the shooting scene, for example, a Van Gogh style scenery picture is generated in real time from a landscape picture taken.
  • Autopilot road scene generation The training process of the autopilot model requires a lot of pictures of road scenes, but it is very expensive to use vehicles to realistically collect these road scenes in unused environments, and can be generated from a large number of racing game scene pictures. To replace the real road scene pictures.
  • model compression method provided by the embodiment of the present application is applicable to all the above-mentioned types of scenes and all other convolutional neural networks, including but not limited to the examples listed above.
  • Example 1 As shown in Figure 10, the input of the generator model for image style conversion is a landscape picture taken by the camera. After several layers of convolution and deconvolution operations, the output image is the converted artistic style image.
  • the generator model used for this image style conversion is compressed by the model compression method provided in this embodiment of the application. Among them, the original generator model has a large amount of network parameters.
  • the number of convolution kernels of the first three layers of generating convolutional networks are 64, 128, and 256 respectively. After genetic algorithm channel selection, the number of convolution kernels of the compressed generating network is The numbers are 27, 55 and 124.
  • the first layer becomes about one-half of the original, and the calculations of the second and third layers become about one-quarter of the original.
  • the style of the output image is basically the same as before compression.
  • the style of the pictures output by the generator model remains consistent.
  • the amount of calculation for other layers of the network is similar to the compression of network parameters.
  • Example 2 The model compression method provided in this embodiment of the application is applied to the conversion of landscape images to Van Gogh style images.
  • Table 1 The comparison between the generator model before compression and the structure of the generator model after compression obtained based on the model compression method provided in the embodiment of the present application is shown in Table 1. Among them, in terms of the number of channels, the compressed generator model is reduced to about half of the pre-compressed generator model.
  • the compression ratio of the network parameters except for the first layer of convolutional layer and the last layer of convolutional layer. The compression ratio is more than 2 times, and the network parameter compression ratios of other convolutional layers are all more than 4 times.
  • each group of pictures consists of three pictures, the first is a landscape picture input to the generator model, the second is a picture generated by the generator model before compression, and the third is a picture generated by the generator model after compression. image. It can be seen that the compressed model is still better in completing the conversion from landscape pictures to Van Gogh style pictures when the model size is compressed on a large scale.
  • Table 2 shows the comparison of model parameters and calculations between the compressed generator model obtained by the model compression method provided by the embodiment of the application and the generator model before compression.
  • the comparison is in the Intel(R)Xeon(R) center
  • the processor central processing unit/processor, CPU
  • the compressed generator model is less than a quarter of the network parameters and calculations of the generator model before compression.
  • the generator model after compression is one third of the generator model before compression.
  • Example 3 Aiming at the problem of rapid image stylization, applying the model compression method provided in the embodiments of the present application can maintain the style transfer performance of the compressed model when the model can be greatly compressed.
  • Figure 12 describes the task of quick style transfer. For a picture to be converted, a style transfer picture is superimposed to obtain a converted stylized picture.
  • Figure 13 describes the compression effect of the fast style transfer model. When the style model compresses the model memory more than four times, the model memory is compressed from the original 6.36MB to 1.17MB, which can maintain the effect of the fast style transfer.
  • Example 4 For the problem of conversion between two image domains, such as the problem of mutual conversion between horse and zebra image domains, the parameters of the compressed generator model obtained by applying the co-evolution algorithm provided in the embodiments of the present application are shown in Table 3. It can be seen from Table 3 that the two image converters compress more than 4 times on the model memory and FLOPs. The compression effect obtained is shown in Figure 14.
  • an embodiment of the present application also provides a model compression device, which is used to implement the foregoing various methods.
  • the above model compression device includes hardware structures and/or software modules corresponding to various functions.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiment of the present application may divide the model compression device into functional modules according to the foregoing method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 15 shows a schematic structural diagram of a model compression device 150.
  • the model compression device 150 includes an acquisition module 1501 and a processing module 1502. Among them, one possible implementation:
  • the obtaining module 1501 is used to obtain the generator model before compression; the processing module 1502 is used to perform binary encoding on the network structure of the generator model before compression to obtain the first generation subgroup, which includes M
  • the network structure of the first-generation generator sub-model where each network structure of the first-generation generator sub-model corresponds to a set of fixed-length binary codes, and M is a positive integer greater than 1.
  • the acquisition module 1501 is also used for Obtain the fitness value of the network structure of each first-generation generator sub-model; the processing module 1502 is also used to determine the Nth-generation child according to the fitness value of the network structure of each first-generation generator sub-model combined with genetic algorithm
  • the network structure of the Nth generation generator submodel with the best fitness value in the group, N is a positive integer greater than 1, where the Nth generation subgroup includes the network structure of M Nth generation generator submodels, each The network structure of the N-generation generator sub-model corresponds to a set of fixed-length binary codes, and the average fitness value of the network structure of the N-th generation generator sub-model is compared with the M in the (N-1)-th generation subgroup The difference between the average value of the fitness value of the network structure of the generator sub-model of the (N-1) generation is less than the set value; the processing module 1502 is also used to optimize the network parameters and fitness values in the generator model before compression.
  • the optimal network structure of the Nth generation generator sub-model determines
  • the processing module 1502 is used to determine the Nth generation generator submodel with the best fitness value in the Nth generation subgroup according to the fitness value of the network structure of each first generation generator submodel combined with genetic algorithm
  • the network structure includes: a processing module 1502, which is used to repeat the following step S1 until the Nth generation subgroup is obtained: Step S1, select the kth generation generator with the best fitness value from the kth generation subgroup
  • the network structure of the model is taken as the network structure of the (k+1)th generation generator sub-model in the (k+1)th generation subgroup, and k is a positive integer less than (N-1); according to the genetic algorithm, according to the first
  • the fitness value of the network structure of the M generator sub-models in the k-generation subgroup is selected for probability, and the selection, crossover and mutation operations are performed according to the preset probability to obtain the other (k+1)th generation subgroups ( M-1)
  • the network structure of the (k+1)th generation generator submodel; the processing module 1502 is used to determine
  • the fitness value of the network structure of the p-th generation generator sub-model is determined based on the normalized value of the network parameter of the p-th generation generator sub-model, the generator perception loss, and the discriminator perception loss.
  • Perceptual loss is used to characterize the difference between the output result of the p-th generation generator sub-model and the output result of the p-1th generation generator sub-model;
  • the discriminator perception loss is used to characterize the output result of the p-th generation generator sub-model and The difference between the output results of the generator sub-model of the p-1 generation and the discriminator, where p is a positive integer from 1 to N, and the generator sub-model of the 0th generation is the generator model before compression .
  • q l-1 represents the binary code of the (l-1)th layer convolution in the network structure of the p-th generation generator sub-model
  • q l represents the l-th layer volume in the network structure of the p-th generation generator sub-model
  • H l represents the height of the l-th convolution of the network structure of the p-th generation generator sub-model
  • W l represents the width of the l-th convolution of the network structure of the p-th generator sub-model
  • C l represents the number of channels of the l-th convolution of the network structure of the p-th generation generator sub-model
  • N l represents the number of the l-th convolution of the network structure of the p-th generator sub-model
  • 1 means L1 norm
  • means sum.
  • the processing module 1502 is further configured to determine the perceptual loss of the generator according to the following third formula, which includes: Among them, x i represents the i-th input picture, m represents the number of input pictures, and G(x i ) represents the output result of the i-th input picture through the p-1 generation generator sub-model; Represents the output result of the i-th input picture through the output result of the p-th generation generator sub-model, ⁇ represents the sum; Represents the L2 norm difference.
  • the processing module 1502 is further configured to determine the discriminator's perception loss according to the following fourth formula, which includes: Among them, x i represents the i-th input picture, m represents the number of input pictures, D(G(x i )) represents the i-th input picture after the output result of the (p-1) generation generator sub-model The output result after the discriminator; Indicates the output result of the i-th input picture after passing through the output result of the p-th generation generator sub-model and then passing through the discriminator, ⁇ means the summation; Represents the L2 norm difference.
  • fourth formula includes: Among them, x i represents the i-th input picture, m represents the number of input pictures, D(G(x i )) represents the i-th input picture after the output result of the (p-1) generation generator sub-model The output result after the discriminator; Indicates the output result of the i-th input picture after passing through the output result of the p-th generation generator sub-model
  • the processing module 1502 is configured to perform binary encoding on the network structure of the generator model before compression to obtain the first-generation subgroups, and includes: a processing module 1502, which is used to determine the network structure of the generator model before compression The binary code corresponding to the first channel in the first channel is 0, and the calculation unit related to the first channel is removed; or the processing module 1502 is used to determine the binary code corresponding to the second channel in the network structure of the generator model before compression If the value is 1, the calculation unit related to the second channel is reserved, where the first channel or the second channel corresponds to a convolution kernel of any layer convolution in the network structure of the generator model before compression.
  • the obtaining module 1501 is used to obtain the first generator model and the second generator model before compression.
  • the first generator model and the second generator model are symmetrical generator models;
  • the processing module 1502 is used to obtain Binary coding is performed on the network structure of the first generator model to obtain the first generation subgroup corresponding to the first generator model; and binary coding is performed on the network structure of the second generator model before compression to obtain the second generation
  • the first generation subgroup corresponding to the first generator model includes the network structure of M1 first generation generator submodels, and the first generation subgroup corresponding to the second generator model includes M2
  • the network structure of the first-generation generator sub-models The network structure of each first-generation generator sub-model corresponds to a set of fixed-length binary codes.
  • M1 and M2 are both positive integers greater than 1.
  • the acquisition module 1501 also It is used to obtain the fitness value of the network structure of each first-generation generator sub-model; the processing module 1502 is also used to determine the first-generation network structure according to the fitness value of each first-generation generator sub-model combined with genetic algorithm
  • the network structure of the sub-model, N is a positive integer greater than 1, where the Nth generation subgroup corresponding to the first generator model includes the network structure of M1 Nth generation generator submodels, and the second generator model corresponds to the first
  • the N-generation subgroup includes the network structure of M2 Nth-generation generator sub-models.
  • the network structure of each N-th generation generator sub-model corresponds to a set of fixed-length binary codes.
  • the first generator model corresponds to the M1
  • the difference between the average of the fitness value of the network structure of the N-generation generator sub-model and the average of the fitness value of the M1 (N-1)-th generation generator sub-model corresponding to the first generator model is smaller than the first generator model.
  • the difference between the average value of the fitness value of the network structure of the model is smaller than the second set value; the processing module 1502 is further configured to calculate the Nth corresponding to the first generator model and the network parameters in the first generator model before compression.
  • the processing module 1502 is configured to determine the optimal fitness value of the Nth generation subgroup corresponding to the first generator model according to the fitness value of the network structure of each first generation generator sub-model combined with genetic algorithm
  • the network structure of the Nth generation generator sub-model and the network structure of the Nth generation sub-model of the Nth generation subgroup corresponding to the second generator model with the best fitness value include: a processing module 1502 for repeated execution The following steps S1 and S2 are performed until the Nth generation subgroup corresponding to the first generator model and the Nth generation subgroup corresponding to the second generator model are obtained: step S1, the kth generation corresponding to the first generator model
  • the network structure of the k-th generation generator sub-model with the optimal fitness value in the subgroup is used as a (k+1)-th generation generator sub-model in the (k+1)-th generation subgroup corresponding to the second generator model
  • the model compression device 150 is presented in the form of dividing various functional modules in an integrated manner.
  • the "module” here can refer to application-specific integrated circuits (ASICs), circuits, processors and memories that execute one or more software or firmware programs, integrated logic circuits, and/or other functions that can provide the aforementioned functions Device.
  • ASICs application-specific integrated circuits
  • the model compression device 150 may adopt the form shown in FIG. 16.
  • the model compression device 160 includes one or more processors 1601.
  • the model compression device 160 has a communication line 1602, at least one communication interface (in FIG. 16 only the communication interface 1604 and a processor 1601 are used as an example for illustration) or a memory 1603.
  • the processor 1601 may be a central processing unit (CPU), a microprocessor, a specific ASIC, or one or more integrated circuits used to control program execution of the solution of the application.
  • CPU central processing unit
  • microprocessor a microprocessor
  • ASIC application specific integrated circuit
  • the communication line 1602 may include a path for connecting different components.
  • the communication interface 1604 may be used to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc.
  • the transceiver module may be a device such as a transceiver or a transceiver.
  • the communication interface 1604 may also be a transceiver circuit located in the processor 1601 to implement signal input and signal output of the processor.
  • the memory 1603 may be a device having a storage function.
  • it can be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • Dynamic storage devices can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), or other optical disk storage, optical disc storage ( Including compact discs, laser discs, optical discs, digital universal discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be stored by a computer Any other media taken, but not limited to this.
  • the memory can exist independently and is connected to the processor through a communication line 1602. The memory can also be integrated with the processor.
  • the memory 1603 is used to store computer-executed instructions for executing the solution of the present application, and the processor 1601 controls the execution.
  • the processor 1601 is configured to execute computer-executable instructions stored in the memory 1603, so as to implement the model compression method provided in the embodiment of the present application.
  • the processor 1601 may also perform the processing-related functions in the model compression method provided in the foregoing embodiment of the present application, and the communication interface 1604 is responsible for communicating with other devices or communication networks.
  • the embodiment does not specifically limit this.
  • the computer-executable instructions in the embodiments of the present application may also be referred to as application program code, which is not specifically limited in the embodiments of the present application.
  • the processor 1601 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 16.
  • the model compression device 160 may include multiple processors, such as the processor 1601 and the processor 1608 in FIG. 16. Each of these processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the model compression apparatus 160 may further include an output device 1605 and an input device 1606.
  • the output device 1605 communicates with the processor 1601 and can display information in a variety of ways.
  • the output device 1605 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector) Wait.
  • the input device 1606 communicates with the processor 1601 and can receive user input in a variety of ways.
  • the input device 1606 may be a mouse, a keyboard, a touch screen device, or a sensor device.
  • the aforementioned model compression device 160 may be a general-purpose device or a special-purpose device.
  • the model compression device 160 may be a server, a desktop computer, a portable computer, a network server, a PDA (personal digital assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or a similar structure in FIG. equipment.
  • PDA personal digital assistant
  • the embodiment of the present application does not limit the type of the model compression device 160.
  • the function/implementation process of the acquisition module 1501 and the processing module 1502 in FIG. 15 may be implemented by the processor 1601 in the model compression device 160 shown in FIG. 16 calling the computer execution instructions stored in the memory 1603. Since the model compression device 160 provided in this embodiment can execute the above-mentioned model compression method, the technical effects that can be obtained can refer to the above-mentioned method embodiment, and will not be repeated here.
  • one or more of the above modules or units can be implemented by software, hardware or a combination of both.
  • the software exists in the form of computer program instructions and is stored in the memory, and the processor can be used to execute the program instructions and implement the above method flow.
  • the processor can be built in SoC (system on chip) or ASIC, or it can be an independent semiconductor chip.
  • SoC system on chip
  • ASIC application specific integrated circuit
  • the processor's internal processing is used to execute software instructions for calculations or processing, and may further include necessary hardware accelerators, such as field programmable gate array (FPGA), PLD (programmable logic device) , Or a logic circuit that implements dedicated logic operations.
  • FPGA field programmable gate array
  • PLD programmable logic device
  • the hardware can be a CPU, a microprocessor, a digital signal processing (digital signal processing, DSP) chip, a microcontroller unit (MCU), an artificial intelligence processor, an ASIC, Any one or any combination of SoC, FPGA, PLD, dedicated digital circuit, hardware accelerator, or non-integrated discrete device can run necessary software or do not rely on software to perform the above method flow.
  • DSP digital signal processing
  • MCU microcontroller unit
  • an artificial intelligence processor an ASIC
  • Any one or any combination of SoC, FPGA, PLD, dedicated digital circuit, hardware accelerator, or non-integrated discrete device can run necessary software or do not rely on software to perform the above method flow.
  • an embodiment of the present application further provides a model compression device (for example, the model compression device may be a chip or a chip system), the model compression device includes a processor, and is configured to implement any of the foregoing method embodiments. method.
  • the model compression device further includes a memory.
  • the memory is used to store necessary program instructions and data, and the processor can call the program code stored in the memory to instruct the model compression device to execute the method in any of the foregoing method embodiments.
  • the memory may not be in the model compression device.
  • the model compression device is a chip system, it may be composed of a chip, or may include a chip and other discrete devices, which is not specifically limited in the embodiment of the present application.
  • the computer may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • a software program it may be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or may include one or more data storage devices such as servers and data centers that can be integrated with the medium.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Genetics & Genomics (AREA)
  • Physiology (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种模型压缩方法及装置,所述方法包括:对压缩前的生成器模型的网络结构进行二值编码,得到包括M个第一代生成器子模型的网络结构的第一代子群(S402);获取每个第一代生成器子模型的网络结构的适应值(S403);根据适应值和遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构(S404),第N代子群中M个第N代生成器子模型的网络结构的适应值的平均值与第(N-1)代子群中M个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于设定值;根据压缩前的生成器模型中的网络参数和适应值最优的第N代生成器子模型的网络结构,确定压缩后的生成器模型(S405)。上述方法用于解决现有的压缩算法直接应用在生成器模型上不能取得令人满意的结果的问题。

Description

模型压缩方法及装置
本申请要求于2019年5月22日提交国家知识产权局、申请号为2019104308762、申请名称为“模型压缩方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉领域,尤其涉及模型压缩方法及装置。
背景技术
随着手机等智能终端的广泛普及,基于移动端的图像风格迁移或者人像渲染等应用有着广泛的需求,在智能相机、移动社交、或虚拟穿戴等领域有着巨大的应用前景。而生成对抗神经网络(generative adversarial network,GAN)模型则在图像风格迁移、人像渲染等应用中取得了良好的效果。比如,图1所示为GAN模型在人像渲染中的结果示意图。
然而,现有的GSN模型中的生成器模型由于其本身输出结果和优化目标的特点,往往需要较大的内存,并且运行这些生成器模型通常需要较大的计算开销,一般只能在图形处理器(graphics processing unit,GPU)平台上运行,不能直接将这些生成器模型迁移到移动端上。而现有的压缩算法都是针对GSN模型中的判别器模型设计的,直接应用在生成器模型上不能取得令人满意的结果。
发明内容
本申请实施例提供模型压缩方法及装置,用于解决现有的压缩算法直接应用在生成器模型上不能取得令人满意的结果的问题。
为达到上述目的,本申请的实施例采用如下技术方案:
第一方面,提供一种模型压缩方法,该方法包括:获取压缩前的生成器模型;对该压缩前的生成器模型的网络结构进行二值编码,得到第一代子群,该第一代子群包括M个第一代生成器子模型的网络结构,其中,每个第一代生成器子模型的网络结构对应一组固定长度的二值编码,M为大于1的正整数;获取该每个第一代生成器子模型的网络结构的适应值;根据该每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构,N为大于1的正整数,其中,该第N代子群包括M个第N代生成器子模型的网络结构,每个第N代生成器子模型的网络结构对应一组该固定长度的二值编码,该M个第N代生成器子模型的网络结构的适应值的平均值与第(N-1)代子群中M个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于设定值;根据该压缩前的生成器模型中的网络参数和该适应值最优的第N代生成器子模型的网络结构,确定压缩后的生成器模型。本申请实施例提供的模型压缩方法通过对生成器模型的网络结构进行全局二值编码压缩,以及基于生成器子模型的网络结构的适应值计算方法和遗传算法自动选择压缩,一方面,使得压缩后的生成器模型的网络参数量小于压缩前的生成器模型的网络参数量;另一方面,使得压缩后的生成器模型的FLOPs小于压缩前的生成器 模型的FLOPs,在CPU平台上单张图片的平均耗时降低;又一方面,在压缩网络参数量相当的情况下,基于本申请实施例提供的模型压缩方法得到的生成器模型能保持风格迁移性能,传统压缩方法失效;再一方面,对不同的图像转换任务,基于本申请实施例提供的模型压缩方法得到的生成器模型的网络结构不同,相对复杂的任务保留参数较多,简单的任务保留参数较少,模型结构具有任务相关的独特性,最大程度减少参数冗余。综上,基于本申请实施例提供的模型压缩方法,可以解决现有的压缩算法直接应用在生成器模型上不能取得令人满意的结果的问题。
在一种可能的设计中,该根据该每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构,包括:重复执行下述步骤S1,直至得到第N代子群:步骤S1、从第k代子群中选出适应值最优的第k代生成器子模型的网络结构作为第(k+1)代子群中的一个第(k+1)代生成器子模型的网络结构,k为小于(N-1)的正整数;根据该遗传算法,按照第k代子群中M个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第(k+1)代子群中的其他(M-1)个第(k+1)代生成器子模型的网络结构;确定第N代子群中适应值最优的第N代生成器子模型的网络结构。基于该方案,可以确定出第N代子群中适应值最优的第N代生成器子模型的网络结构。
在一种可能的设计中,第p代生成器子模型的网络结构的适应值是根据该第p代生成器子模型的网络参数量的归一化值、生成器感知损失和判别器感知损失确定的,该生成器感知损失用于表征第p代生成器子模型的输出结果与第p-1代生成器子模型的输出结果的差值;该判别器感知损失用于表征该第p代生成器子模型的输出结果与该第p-1代生成器子模型的输出结果分别再经过判别器后的输出结果的差值,其中,p为1至N的正整数,第0代生成器子模型为该压缩前的生成器模型。基于该方案,可以确定出第p代生成器子模型的网络结构的适应值。
在一种可能的设计中,该第p代生成器子模型的网络参数量的归一化值、该生成器感知损失和该判别器感知损失满足如下第一公式:f(q)=(p(q)+λL GenA+γL DisA) -1;其中,f(q)表示该第p代生成器子模型的网络结构的适应值;p(q)表示该第p代生成器子模型模型的网络参数量的归一化值,λ和γ为设定值;L GenA表示该生成器感知损失;L DisA表示该判别器感知损失,q表示该第p代生成器子模型的网络结构的所有卷积层的二值编码。
在一种可能的设计中,p(q)满足如下第二公式:
Figure PCTCN2020091824-appb-000001
其中,q l-1表示该第p代生成器子模型的网络结构中第(l-1)层卷积的二值编码;q l表示该第p代生成器子模型的网络结构中第l层卷积的二值编码;H l表示该第p代生成器子模型的网络结构的第l层卷积的高度;W l表该第p代生成器子模型的网络结构的 第l层卷积的宽度;C l表示该第p代生成器子模型的网络结构的第l层卷积的通道数;N l表示该第p代生成器子模型的网络结构的第l层卷积的个数;|||| 1表示L1范数;∑表示求和。
在一种可能的设计中,该方法还包括:根据如下第三公式确定该生成器感知损失,该第三公式包括:
Figure PCTCN2020091824-appb-000002
其中,x i表示第i张输入图片,m表示输入图片的张数,G(x i)表示第i张输入图片经过该第p-1代生成器子模型的输出结果;
Figure PCTCN2020091824-appb-000003
表示该第i张输入图片经过该第p代生成器子模型的输出结果的输出结果,∑表示求和;
Figure PCTCN2020091824-appb-000004
表示L2范数差。基于该方案,可以确定出生成器感知损失。
在一种可能的设计中,该方法还包括:根据如下第四公式确定该判别器感知损失,该第四公式包括:
Figure PCTCN2020091824-appb-000005
其中,x i表示第i张输入图片,m表示输入图片的张数,D(G(x i))表示第i张输入图片经过该第(p-1)代生成器子模型的输出结果再经过判别器后的输出结果;
Figure PCTCN2020091824-appb-000006
表示表示该第i张输入图片经过该第p代生成器子模型的输出结果再经过该判别器后的输出结果,∑表示求和;
Figure PCTCN2020091824-appb-000007
表示L2范数差。基于该方案,可以确定出判别器感知损失。
在一种可能的设计中,该对该压缩前的生成器模型的网络结构进行二值编码,得到第一代子群,包括:若该压缩前的生成器模型的网络结构中的第一通道对应的二值编码为0,去除该第一通道相关的计算单元;或者,若该压缩前的生成器模型的网络结构中的第二通道对应的二值编码为1,保留该第二通道相关的计算单元,其中,该第一通道或该第二通道对应该压缩前的生成器模型的网络结构中的任一层卷积的一个卷积核。基于该方案,通过对压缩前的生成器模型的网络结构进行二值编码,可以使得压缩后的生成器模型的网络参数量小于压缩前的生成器模型的网络参数量以及使得压缩后的生成器模型的FLOPs小于压缩前的生成器模型的FLOPs,在CPU平台上单张图片的平均耗时降低。
第二方面,提供了一种模型压缩方法,该方法包括:获取压缩前的第一生成器模型和第二生成器模型,该第一生成器模型和该第二生成器模型为对称的生成器模型;对该压缩前的第一生成器模型的网络结构进行二值编码,得到该第一生成器模型对应的第一代子群;以及,对该压缩前的第二生成器模型的网络结构进行二值编码,得到该第二生成器模型对应的第一代子群;该第一生成器模型对应的第一代子群包括M1个第一代生成器子模型的网络结构,该第二生成器模型对应的第一代子群包括M2个第一代生成器子模型的网络结构,每个第一代生成器子模型的网络结构对应一组固定长度的二值编码,M1和M2均为大于1的正整数;获取该每个第一代生成器子模型的 网络结构的适应值;根据该每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,N为大于1的正整数,其中,该第一生成器模型对应的第N代子群包括M1个第N代生成器子模型的网络结构,该第二生成器模型对应的第N代子群包括M2个第N代生成器子模型的网络结构,每个第N代生成器子模型的网络结构对应一组该固定长度的二值编码,该第一生成器模型对应的M1个第N代生成器子模型的网络结构的适应值的平均值与该第一生成器模型对应的M1个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于第一设定值,该第二生成器模型对应的M2个第N代生成器子模型的网络结构的适应值的平均值与该第二生成器模型对应的M2个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于第二设定值;根据该压缩前的第一生成器模型中的网络参数和该第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,确定压缩后的第一生成器模型;以及,根据该压缩前的第二生成器模型中的网络参数和该第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,确定压缩后的第二生成器模型。本申请实施例提供的模型压缩方法通过对生成器模型的网络结构进行全局二值编码压缩,以及基于生成器子模型的网络结构的适应值计算方法和遗传算法自动选择压缩,一方面,使得压缩后的生成器模型的网络参数量小于压缩前的生成器模型的网络参数量;另一方面,使得压缩后的生成器模型的FLOPs小于压缩前的生成器模型的FLOPs,在CPU平台上单张图片的平均耗时降低;又一方面,在压缩网络参数量相当的情况下,基于本申请实施例提供的模型压缩方法得到的生成器模型能保持风格迁移性能,传统压缩方法失效;再一方面,对不同的图像转换任务,基于本申请实施例提供的模型压缩方法得到的生成器模型的网络结构不同,相对复杂的任务保留参数较多,简单的任务保留参数较少,模型结构具有任务相关的独特性,最大程度减少参数冗余。综上,基于本申请实施例提供的模型压缩方法,可以解决现有的压缩算法直接应用在生成器模型上不能取得令人满意的结果的问题。
在一种可能的设计中,该根据该每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定该第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及该第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,包括:重复执行下述步骤S1和步骤S2,直至得到第一生成器模型对应的第N代子群以及第二生成器模型对应的第N代子群:步骤S1、将该第一生成器模型对应的第k代子群中适应值最优的第k代生成器子模型的网络结构作为该第二生成器模型对应的第(k+1)代子群中的一个第(k+1)代生成器子模型的网络结构;根据该遗传算法,按照该第二生成器模型对应的第k代子群中M2个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到该第二生成器模型对应的第k+1代子群中的其他(M2-1)个第k+1代生成器子模型的网络结构,k为小于(N-1)的正整数;步骤S2、将该第二生成器模型对应的第k+1代子群中适应值最优的第k+1代生成器子模型的网络结构作为该第一生成器模型对应的第(k+1)代子群中的一个第k+1代生成器子模型的网络结构;根据该 遗传算法,按照该第一生成器模型对应的第k代子群中M1个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到该第一生成器模型对应的第k+1代子群中的其他(M1-1)个第k+1代生成器子模型的网络结构;确定该第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及该第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构。基于该方案,可以确定出第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构。
第三方面,提供了一种模型压缩装置用于实现上述各种方法。所述模型压缩装置包括实现上述方法相应的模块、单元、或手段(means),该模块、单元、或means可以通过硬件实现,软件实现,或者通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块或单元。
第四方面,提供了一种模型压缩装置,包括:处理器和存储器;该存储器用于存储计算机指令,当该处理器执行该指令时,以使该模型压缩装置执行上述第一方面或第二方面所述的方法。
第五方面,提供了一种模型压缩装置,包括:处理器;所述处理器用于与存储器耦合,并读取存储器中的指令之后,根据所述指令执行如上述第一方面或第二方面所述的方法。
第六方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机可以执行上述第一方面或第二方面所述的方法。
第七方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机可以执行上述第一方面或第二方面所述的方法。
第八方面,提供了一种装置(例如,该装置可以是芯片或芯片系统),该装置包括处理器,用于实现上述第一方面或第二方面中所涉及的功能。在一种可能的设计中,该装置还包括存储器,该存储器,用于保存必要的程序指令和数据。该装置是芯片系统时,可以由芯片构成,也可以包含芯片和其他分立器件。
其中,第三方面至第八方面中任一种设计方式所带来的技术效果可参见上述第一方面或第二方面中不同设计方式所带来的技术效果,此处不再赘述。
附图说明
图1为现有的GAN模型在人像渲染中的结果示意图;
图2为现有的CycleGAN提出的利用GAN模型完成图像域转换的结构图;
图3为城市街景数据集中的两个图像转换域的任务示意图;
图4为本申请实施例提供的一种模型压缩方法流程示意图;
图5为本申请实施例提供的每一组固定长度的二值编码与压缩后的生成器模型的对照示意图;
图6为本申请实施例提供的生成器模型的全局二值编码示意图;
图7为本申请实施例提供的由压缩前的生成器模型得到适应值最优的第N代生成器子模型的流程示意图;
图8为本申请实施例提供的另一种模型压缩方法流程示意图;
图9为本申请实施例提供的协同进化算法交替迭代优化示意图;
图10为本申请实施例提供的基于自动压缩图像艺术风格转换模型;
图11为本申请实施例提供的生成器模型压缩前后的图像艺术风格转换效果图;
图12为本申请实施例提供的快速风格迁移示意图;
图13为本申请实施例提供的快速风格迁移模型压缩效果示意图;
图14为本申请实施例提供的马和斑马互相转换的生成模型压缩前后转换效果对比示意图;
图15为本申请实施例提供的模型压缩装置的结构示意图一;
图16为本申请实施例提供的模型压缩装置的结构示意图二。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;本申请中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。并且,在本申请的描述中,除非另有说明,“多个”是指两个或多于两个。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。同时,在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。
此外,本申请实施例描述的各种场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着其他类似新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
如图2所示,为CycleGAN提出的利用GAN模型完成图像域转换的结构图。其中,利用GAN模型的训练方法,将风格A图片和风格B图片作为两个域,生成器模型G AB完成从风格A图片到风格B图片的迁移,生成器模型G BA完成从风格B图片到风格A图片的迁移,判别器模型D B判断图片来自真实的风格B图片还是由生成器模型G AB生成的风格B图片。通过对抗训练的方法得到的生成器模型G AB可以完成风格迁移的任务。但是,目前GSN模型中的生成器模型(也可以称之为生成式模型)包含(但不限于)如下的具体问题:
技术问题1、生成器模型的网络参数量过大
现有常用的用于图像风格转换的生成器模型的网络参数量过大,每个卷积层的网络参数量常常能够达到几万、几十万,整个生成器模型的N层卷积层的参数加起来,能够达到几千万(用32位浮点数表示,需要上百兆字节的内存或缓存)。而在移动端中,内存和缓存资源非常有限,因此如何减少卷积网络参数量,是个亟待解决的问题。
技术问题2、生成器模型的计算开销大
生成器模型中的卷积操作计算量巨大,一个生成器模型含有几十万网络参数量的卷积核,卷积操作的浮点计算次数(floating point operations,FLOPs)可达几千万。在GPU上能够实时运算的生成器模型到了移动端则十分缓慢。在移动端的计算资源难以满足现有生成器模型的实时运算的情况下,如何降低卷积计算量,减少生成器模型的计算开销,是个亟待解决的问题。
技术问题3、传统压缩算法的无效性
传统的针对卷积神经网络的压缩和加速算法都是针对分类或检测等判别器模型(也可以称之为判别式模型)设计,这些算法的前提假设是压缩前后模型在像素级别完全一致,而对于图像风格迁移等生成任务,不需要压缩前后生成式模型产生的图像结果完全一致,只需要风格一致即可,所以传统的压缩算法对于生成器模型的压缩任务无效。
技术问题4、不同图像风格迁移任务的参数冗余
不同图像域之间的风格迁移难度是不同的,比如城市地形数据集中的街景和街景分割图。如图3所示,从街景分割图转换到街景需要恢复大量细节,而反过来从街景转换到街景分割图则需要抹去大量细节,两种任务的难易程度显然是不同的。而在传统的生成对抗的图像转换的任务中,两个域之间的生成器模型的结构一样,网络参数量和计算复杂度一样,所以传统的生成对抗训练的生成器模型的参数存在冗余,且每个图像转换任务的冗余程度不一样。
基于上述问题,本申请实施例提供了一种模型压缩方法,如图4所示,包括如下步骤:
S401、获取压缩前的生成器模型。
S402、对压缩前的生成器模型的网络结构进行二值编码,得到第一代子群。
其中,第一代子群包括M个第一代生成器子模型的网络结构,每个第一代生成器子模型的网络结构对应一组固定长度的二值编码,M为大于1的正整数。
可选的,本申请实施例中的生成器子模型也可以称之为子个体,在此统一说明,以下不再赘述。
可选的,本申请实施例中的生成器模型的网络结构或者生成器子模型的网络结构也可以称之为生成卷积神经网络或生成网络等,在此统一说明,以下不再赘述。
S403、获取每个第一代生成器子模型的网络结构的适应值。
S404、根据每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构,N为大于1的正整数。
其中,第N代子群包括M个第N代生成器子模型的网络结构,每个第N代生成器子模型的网络结构对应一组固定长度的二值编码,M个第N代生成器子模型的网络 结构的适应值的平均值与第(N-1)代子群中M个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于设定值。
S405、根据压缩前的生成器模型中的网络参数和适应值最优的第N代生成器子模型的网络结构,确定压缩后的生成器模型。
其中,在上述步骤S402中:
对压缩前的生成器模型的网络结构进行二值编码,得到第一代子群,可以包括:若压缩前的生成器模型的网络结构中的第一通道对应的二值编码为0,去除第一通道相关的计算单元;或者,若压缩前的生成器模型的网络结构中的第二通道对应的二值编码为1,保留第二通道相关的计算单元,其中,第一通道或第二通道对应压缩前的生成器模型的网络结构中的任一层卷积的一个卷积核。
具体的,本申请实施例中的生成器模型的网络结构由若干层卷积神经网络和反卷积神经网络组成,每一层卷积神经网络和反卷积神经网络由若干个卷积核构成,这些卷积核的多少决定了生成器模型的网络参数量和计算量。但是,一个生成器模型的网络结构确定时,该生成器模型中的所有卷积核就确定了,我们可以用一组固定长度的二值编码表示所有的卷积核是否参与压缩后的生成器模型的计算,0表示去除该卷积核,则该卷积核相关的所有的计算就不需要了,1表示保留该卷积核,则与该卷积核相关的计算得到保留,如公式(1)所示。
Figure PCTCN2020091824-appb-000008
其中,q l(n)表示生成器模型的网络结构的第l层卷积的第n个卷积核的二值编码;
Figure PCTCN2020091824-appb-000009
表示权重参数。公式(1)的含义为:若q l(n),则给生成器模型的网络结构的第l层卷积的第n个卷积核的网络参数乘以0,否则给生成器模型的网络结构的第l层卷积的第n个卷积核的网络参数乘以1。
其中,采用M组固定长度的二值编码分别对压缩前的生成器模型的网络结构进行二值编码,则可以得到包括M个第一代生成器子模型的网络结构的第一代子群,其中,每个第一代生成器子模型的网络结构对应一组固定长度的二值编码。
如图5所示,每一组对应的固定长度的二值编码就对于一个特定的压缩后的生成器子模型的网络结构,其中编码全部为1的是压缩前的完整生成器模型的网络结构。压缩后的生成器子模型的网络结构相比于压缩前的生成器模型的网络结构,去除了一定数量的卷积核,因此压缩后的生成器子模型相比于压缩前的生成器模型的网络参数量变少,在计算过程中涉及到的卷积计算量也相应减少。
对于多层卷积计算,经过二值编码后的剩余部分如图6所示。其中,第一层卷积是相应的二值编码为0的通道全部去除,对于第二层卷积及后面的卷积层来说,不仅通道编码为0的通道相应去除,而且与前面已经去除的卷积层相关的计算单元也相应去除,所以计算量进一步减少。
其中,在上述步骤S404中:
根据每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构,包括:
重复执行下述步骤S1,直至得到第N代子群:
步骤S1、从第k代子群中选出适应值最优的第k代生成器子模型的网络结构作为第k+1代子群中的一个第k+1代生成器子模型的网络结构,k为小于(N-1)的正整数;根据遗传算法,按照第k代子群中M个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第k+1代子群中的其他(M-1)个第k+1代生成器子模型的网络结构;确定第N代子群中适应值最优的第N代生成器子模型的网络结构。
示例性的,如图7所示,根据压缩前的生成器模型的网络结构,可以得到第一代子群G 1-M;其中,第一代子群G 1-M包括第一代生成器子模型的网络结构G 1_1、第一代生成器子模型的网络结构G 1_2、……、第一代生成器子模型的网络结构G 1_M。其中,第一代生成器子模型的网络结构G 1_1对应适应值 1_1、第一代生成器子模型的网络结构G 1_2对应适应值 1_2、……、第一代生成器子模型的网络结构G 1_M对应适应值 1_M
进一步的,从第一代子群中选出适应值最优的第一代生成器子模型的网络结构作为第二代子群中的一个第二代生成器子模型的网络结构;根据遗传算法,按照第一代子群中M个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第二代子群中的其他(M-1)个第二代生成器子模型的网络结构。比如,如图7所示,第二代子群G 2-M包括第二代生成器子模型的网络结构G 2_1、第二代生成器子模型的网络结构G 2_2、……、第二代生成器子模型的网络结构G 2_M。其中,第二代生成器子模型的网络结构G 2_1对应适应值 2_1、第二代生成器子模型的网络结构G 2_2对应适应值 2_2、……、第二代生成器子模型的网络结构G 2_M对应适应值 2_M
以此类推,从第(N-1)代子群中选出适应值最优的第(N-1)代生成器子模型的网络结构作为第N代子群中的一个第N代生成器子模型的网络结构;根据遗传算法,按照第(N-1)代子群中M个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第N代子群中的其他(M-1)个第N代生成器子模型的网络结构。比如,如图7所示,第N代子群G N-M包括第N代生成器子模型的网络结构G N_1、第N代生成器子模型的网络结构G N_2、……、第N代生成器子模型的网络结构G N_M。其中,第N代生成器子模型的网络结构G N_1对应适应值 N_1、第N代生成器子模型的网络结构G N_2对应适应值 N_2、……、第N代生成器子模型的网络结构G N_M对应适应值 N_M
最后,可以得到第N代子群中适应值最优的第N代生成器子模型的网络结构。
其中,本申请实施例中,M个第N代生成器子模型的网络结构的适应值的平均值与第(N-1)代子群中M个第(N-1)代生成器子模型的网络结构的适应值的平均值的 差值小于设定值。也就是说,第N代子群为生成器子模型的网络结构的适应值趋于稳定的一代子群。
其中,根据遗传算法,按照预设的概率进行选择、交叉和突变操作,得到下一代子群中的生成器子模型的网络结构的示例可以如下:
示例性的,假设上一代生成器子模型(或者上一代生成器模型)的网络结构对应的二值编码为0101 0000 010,则经过选择操作后,得到的下一代生成器子模型的网络结构对应的二值编码可以为0101 0000 010。
或者,示例性的,假设上一代生成器子模型1(或者上一代生成器模型1)的网络结构对应的二值编码为01010 1110010 0101,上一代生成器子模型2(或者上一代生成器模型2)的网络结构对应的二值编码为01010 0101011 0110,则经过交叉操作后,得到的下一代生成器子模型1的网络结构对应的二值编码可以为01010 0101011 0101,下一代生成器子模型2的网络结构对应的二值编码可以为01010 1110010 0110。
或者,示例性的,假设上一代生成器子模型(或者上一代生成器模型)的网络结构对应的二值编码为100 10010101 101010,则经过突变操作后,得到的下一代生成器子模型的网络结构对应的二值编码可以为100 01101010 101010。
其中,在上述步骤S403和步骤S404中:
考虑到与判别器模型的优化目标不同,生成器模型的优化并不需要保证压缩前后模型输出结果完全一致,只需要保证压缩后输出结果的域一致即可,因此本申请实施例引入判别器的输出,通过计算压缩前后生成器模型在判别器的差异来计算生成器模型的网络结构的适应值。
示例性的,第p代生成器子模型的网络结构的适应值是根据第p代生成器子模型的网络参数量的归一化值、生成器感知损失和判别器感知损失确定的,生成器感知损失用于表征第p代生成器子模型的输出结果与第p-1代生成器子模型的输出结果的差值;判别器感知损失用于表征第p代生成器子模型的输出结果与第p-1代生成器子模型的输出结果分别再经过判别器后的输出结果的差值,其中,p为1至N的正整数,第0代生成器子模型为压缩前的生成器模型。
可选的,本申请实施例中,第p代生成器子模型的网络参数量的归一化值、生成器感知损失和判别器感知损失满足如下公式(2):
f(q)=(p(q)+λL GenA+γL DisA) -1;               公式(2)
其中,f(q)表示第p代生成器子模型的网络结构的适应值;p(q)表示第p代生成器子模型的网络参数量的归一化值,λ和γ为设定值;L GenA表示生成器感知损失;L DisA表示判别器感知损失,q表示第p代生成器子模型的网络结构的所有卷积层的二值编码。
可选的,p(q)可以满足如下公式(3):
Figure PCTCN2020091824-appb-000010
其中,q l-1表示第p代生成器子模型的网络结构中第(l-1)层卷积的二值编码;q l表示第p代生成器子模型的网络结构中第l层卷积的二值编码;H l表示第p代生成器子模型的网络结构的第l层卷积的高度;W l表第p代生成器子模型的网络结构的第l层卷积的宽度;C l表示第p代生成器子模型的网络结构的第l层卷积的通道数;N l表示第p代生成器子模型的网络结构的第l层卷积的个数;|||| 1表示L1范数;∑表示求和。
可选的,本申请实施例中,可以根据如下公式(4)确定生成器感知损失:
Figure PCTCN2020091824-appb-000011
其中,x i表示第i张输入图片,m表示输入图片的张数,G(x i)表示第i张输入图片经过第p-1代生成器子模型的输出结果;
Figure PCTCN2020091824-appb-000012
表示表示第i张输入图片经过第p代生成器子模型的输出结果,∑表示求和;
Figure PCTCN2020091824-appb-000013
表示L2范数差。
需要说明的是,上述公式(4)是压缩前后的生成器模型产生图片的L2范数差,物理含义是让压缩前后的生成器模型产生的图片在像素级别相似。
可选的,本申请实施例中,可以根据如下公式(5)确定判别器感知损失包括:
Figure PCTCN2020091824-appb-000014
其中,x i表示第i张输入图片,m表示输入图片的张数,D(G(x i))表示第i张输入图片经过第p-1代生成器子模型的输出结果再经过判别器后的输出结果;
Figure PCTCN2020091824-appb-000015
表示第i张输入图片经过第p代生成器子模型的输出结果再经过判别器后的输出结果,∑表示求和;
Figure PCTCN2020091824-appb-000016
表示L2范数差。
需要说明的是,上述公式(5)是压缩前后的生成器模型产生的图片在原来的判别器模型判别结果的L2差值,物理含义是让压缩前后的生成器模型产生的图片在原来的判别器上的判别结果相近,即让判别器判定压缩前后的生成器产生的图片在风格域一致。
可选的,本申请实施例还提供了一种模型压缩方法,如图8所示,包括如下步骤:
S801、获取压缩前的第一生成器模型和第二生成器模型。其中,第一生成器模型和第二生成器模型为对称的生成器模型;
S802、对压缩前的第一生成器模型的网络结构进行二值编码,得到第一生成器模型对应的第一代子群;以及,对压缩前的第二生成器模型的网络结构进行二值编码,得到第二生成器模型对应的第一代子群。
其中,第一生成器模型对应的第一代子群包括M1个第一代生成器子模型的网络结构,第二生成器模型对应的第一代子群包括M2个第一代生成器子模型的网络结构, 每个第一代生成器子模型的网络结构对应一组固定长度的二值编码,M1和M2均为大于1的正整数。
可选的,本申请实施例中的生成器子模型也可以称之为子个体,在此统一说明,以下不再赘述。
可选的,本申请实施例中的生成器模型的网络结构或者生成器子模型的网络结构也可以称之为生成卷积神经网络或生成网络等,在此统一说明,以下不再赘述。
S803、获取每个第一代生成器子模型的网络结构的适应值。
S804、根据每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,N为大于1的正整数。
其中,第一生成器模型对应的第N代子群包括M1个第N代生成器子模型的网络结构,第二生成器模型对应的第N代子群包括M2个第N代生成器子模型的网络结构,每个第N代生成器子模型的网络结构对应一组固定长度的二值编码,第一生成器模型对应的M1个第N代生成器子模型的网络结构的适应值的平均值与第一生成器模型对应的M1个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于第一设定值,第二生成器模型对应的M2个第N代生成器子模型的网络结构的适应值的平均值与第二生成器模型对应的M2个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于第二设定值。
S805、根据压缩前的第一生成器模型中的网络参数和第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,确定压缩后的第一生成器模型;以及,根据压缩前的第二生成器模型中的网络参数和第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,确定压缩后的第二生成器模型。
其中,上述步骤S802的具体实现可参考图4所示的实施例中的步骤S402,在此不再赘述。
其中,在上述步骤S804中:
根据每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,包括:
重复执行下述步骤S1和步骤S2,直至得到第一生成器模型对应的第N代子群以及第二生成器模型对应的第N代子群:
步骤S1、将第一生成器模型对应的第k代子群中适应值最优的第k代生成器子模型的网络结构作为第二生成器模型对应的第(k+1)代子群中的一个第(k+1)代生成器子模型的网络结构;根据遗传算法,按照第二生成器模型对应的第k代子群中M2个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第二生成器模型对应的第k+1代子群中的其他(M2-1)个第k+1代生成器子模型的网络结构,k为小于(N-1)的正整数;
步骤S2、将第二生成器模型对应的第k+1代子群中适应值最优的第k+1代生成器 子模型的网络结构作为第一生成器模型对应的第(k+1)代子群中的一个第k+1代生成器子模型的网络结构;根据遗传算法,按照第一生成器模型对应的第k代子群中M1个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第一生成器模型对应的第k+1代子群中的其他(M1-1)个第k+1代生成器子模型的网络结构;
确定第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构。
也就是说,本申请实施例中,针对两个图像域的转换问题,引入协同进化算法,对两个对称的生成器模型分别维持一个生成器子群。在每次迭代中,分别用一个子群中适应值最优的生成器子模型的网络结构和另一个子群中的生成器子模型的网络结构进行训练,选出第二个子群中适应值最优的生成器子模型的网络结构,然后将该生成器子模型的网络结构和另一个子群中所有生成器子模型的网络结构进行训练,依次类推,交替迭代优化,最终同时得到两个压缩后的生成器模型的网络结构。
示例性的,如图9所示,生成器A和生成器B为对称的生成器模型,生成器A维持子群A,生成器B维持子群B。在第1次迭代中,用子群A中适应值最优的生成器子模型的网络结构和子群B中的生成器子模型的网络结构进行训练,选出子群B中适应值最优的生成器子模型的网络结构;在第2次迭代中,用子群B中适应值最优的生成器子模型的网络结构和子群A中的生成器子模型的网络结构进行训练,选出子群A中适应值最优的生成器子模型的网络结构,进而用子群A中适应值最优的生成器子模型的网络结构和子群B中的生成器子模型的网络结构进行训练,选出子群B中适应值最优的生成器子模型的网络结构,在后续的迭代中,依次类推,交替迭代优化,最终同时得到压缩后的生成器A的网络结构和压缩后的生成器B的网络结构。
可选的,本申请实施例提供的模型压缩方法可应用于计算机视觉领域的多种图像转化和风格迁移任务,比如:人像美化、虚拟佩戴试穿、人物背景渲染,自动驾驶道路场景生成等。其中每类场景中都可以用到本申请实施例提供的模型压缩方法构建高效的生成器模型。这里举几个具体的例子:
1、视频图像的实时渲染:给图像中的人物添加不同风格的装饰物,在现在的视频通话、短视频拍摄等应用中十分常见和广泛。
2、虚拟佩戴试穿:对视频摄像头拍摄到的人物图像进行渲染,虚拟佩戴上选的帽子、眼镜、衣物等商品。由拍摄到的人物图像生成“穿戴”商品的人物图像。
3、智能相机艺术风格渲染:在智能相机中,对拍摄场景进行多种特定艺术风格的渲染,比如,由一张拍摄的风景图片实时生成梵高风格的风景图片。
4、自动驾驶道路场景生成:自动驾驶模型的训练过程需要大量的道路场景的图片,但是用车辆去现实采集这些不用环境下的道路场景是十分昂贵的,可以由大量的赛车游戏场景图片生成真实的道路场景图片来替代真实采集的道路场景图片。
其中,本申请实施例提供的模型压缩方法适用于上述所有类型的场景、以及其他所有卷积神经网络,包括但不限于上面列举的例子。
下面结合一些应用场景,给出本申请实施例提供的模型压缩方法所带来的技术效 果:
示例1、如图10所示,用于图像风格转换的生成器模型的输入是一张相机拍摄的风景图片,经过若干层卷积和反卷积运算,输出图像为转换后的艺术风格图像,其中用于这个图像风格转换的生成器模型是由本申请实施例提供的模型压缩方法压缩得到的。其中,原始的生成器模型网络参数量大,前三层生成卷积网络的卷积核个数分别为64,128,256,经过遗传算法通道选择后,压缩后的生成网络的卷积核个数为27,55和124,第一层变为原来的二分之一左右,第二层和第三层的计算量变为原来的四分之一左右,但是输出图片的风格基本和压缩之前的生成器模型输出的图片的风格保持一致。对于网络其他层的计算量和网络参数量的压缩类似。
示例2、将本申请实施例提供的模型压缩方法应用在风景图到梵高风格图像的转换上。压缩前的生成器模型和基于本申请实施例提供的模型压缩方法得到的压缩后的生成器模型结构的对比如表一所示。其中,在通道数上,压缩后的生成器模型降到压缩前的生成器模型的一半左右,网络参数的压缩比,除了第一层卷积层和最后一层卷积层的网络参数量的压缩比为2倍以上,其他卷积层的网络参数量压缩比都为4倍以上。
表一
Figure PCTCN2020091824-appb-000017
Figure PCTCN2020091824-appb-000018
如图11所示,为生成器模型压缩前后的从风景图像到梵高风格图像的转换效果。其中,每一组图片由三张图片构成,第一张是输入生成器模型的风景图片,第二张是压缩前的生成器模型产生的图片,第三张是压缩后的生成器模型产生的图片。可以看出,压缩模型在模型大小大规模压缩的情况下,依然较好完成了从风景图片到梵高风格图片的转换。
表二中给出了本申请实施例提供的模型压缩方法得到的压缩后的生成器模型和压缩前的生成器模型在模型参数和计算量大小的对比,在Intel(R)Xeon(R)中央处理器(central processing unit/processor,CPU)E5-2690v4@2.60GHz进行了测试,压缩后的生成器模型是压缩前的生成器模型的网络参数量和计算量的不到四分之一,在CPU运行时间上,压缩后的生成器模型是压缩前生成器模型的三分之一。
表二
  模型大小 网络参数量 FLOPs 推理时延
压缩前 43.42MB 11378179 56887M 2.26s
压缩后 10.16MB 2661795 13448M 0.73s
示例3、针对图像快速风格化的问题,应用本申请实施例提供的模型压缩方法,可以将模型大幅压缩的情况下,保持压缩模型的风格迁移性能。图12描述了快速风格迁移的任务,针对一张拟转换图片,叠加上一幅风格迁移图片,得到一张转换后的风格化图片。图13描述了快速风格迁移模型的压缩效果,风格模型在压缩四倍以上模型内存的情况下,模型内存从原始的6.36MB压缩到1.17MB,可以保持快速风格迁移的效果。
示例4、针对两个图像域的转换问题,比如马和斑马图像域的互相转换问题,应用本申请实施例提供的协同进化算法得到的压缩后的生成器模型的参数如表三所示。由表三可以看出,两个图像转换器在模型内存和FLOPs上压缩4倍以上。其中得到的压缩效果如图14所示。
表三
Figure PCTCN2020091824-appb-000019
综上,下面以列表的形式,针对所要解决的技术问题,给出本申请实施例提供的模型压缩方法所带来的有益效果,如表四所示。
表四
Figure PCTCN2020091824-appb-000020
Figure PCTCN2020091824-appb-000021
上述主要从方法流程角度对本申请实施例提供的方案进行了介绍。相应的,本申请实施例还提供了模型压缩装置,该模型压缩装置用于实现上述各种方法。可以理解的是,上述模型压缩装置为了实现上述方法,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对模型压缩装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
比如,以采用集成的方式划分各个功能模块的情况下,图15示出了一种模型压缩装置150的结构示意图。该模型压缩装置150包括获取模块1501和处理模块1502。其中,一种可能的实现方式中:
获取模块1501,用于获取压缩前的生成器模型;处理模块1502,用于对压缩前的生成器模型的网络结构进行二值编码,得到第一代子群,第一代子群包括M个第一代 生成器子模型的网络结构,其中,每个第一代生成器子模型的网络结构对应一组固定长度的二值编码,M为大于1的正整数;获取模块1501,还用于获取每个第一代生成器子模型的网络结构的适应值;处理模块1502,还用于根据每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构,N为大于1的正整数,其中,第N代子群包括M个第N代生成器子模型的网络结构,每个第N代生成器子模型的网络结构对应一组固定长度的二值编码,M个第N代生成器子模型的网络结构的适应值的平均值与第(N-1)代子群中M个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于设定值;处理模块1502,还用于根据压缩前的生成器模型中的网络参数和适应值最优的第N代生成器子模型的网络结构,确定压缩后的生成器模型。
可选的,处理模块1502,用于根据每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构,包括:处理模块1502,用于重复执行下述步骤S1,直至得到第N代子群:步骤S1、从第k代子群中选出适应值最优的第k代生成器子模型的网络结构作为第(k+1)代子群中的一个第(k+1)代生成器子模型的网络结构,k为小于(N-1)的正整数;根据遗传算法,按照第k代子群中M个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第(k+1)代子群中的其他(M-1)个第(k+1)代生成器子模型的网络结构;处理模块1502,用于确定第N代子群中适应值最优的第N代生成器子模型的网络结构。
可选的,第p代生成器子模型的网络结构的适应值是根据第p代生成器子模型的网络参数量的归一化值、生成器感知损失和判别器感知损失确定的,生成器感知损失用于表征第p代生成器子模型的输出结果与第p-1代生成器子模型的输出结果的差值;判别器感知损失用于表征第p代生成器子模型的输出结果与第p-1代生成器子模型的输出结果分别再经过判别器后的输出结果的差值,其中,p为1至N的正整数,第0代生成器子模型为压缩前的生成器模型。
可选的,第p代生成器子模型的网络参数量的归一化值、生成器感知损失和判别器感知损失满足如下第一公式:f(q)=(p(q)+λL GenA+γL DisA) -1;其中,f(q)表示第p代生成器子模型的网络结构的适应值;p(q)表示第p代生成器子模型模型的网络参数量的归一化值,λ和γ为设定值;L GenA表示生成器感知损失;L DisA表示判别器感知损失,q表示第p代生成器子模型的网络结构的所有卷积层的二值编码。
可选的,p(q)满足如下第二公式:
Figure PCTCN2020091824-appb-000022
其中,q l-1表示第p代生成器子模型的网络结构中第(l-1)层卷积的二值编码;q l表示第p代生成器子模型的网络结构中第l层卷积的二值编码;H l表示第p代生成器子模型的网络结构的第l层卷积的高度;W l表第p代生成器子模型的网络结构的第l层卷积的宽度;C l表示 第p代生成器子模型的网络结构的第l层卷积的通道数;N l表示第p代生成器子模型的网络结构的第l层卷积的个数;|||| 1表示L1范数;∑表示求和。
可选的,处理模块1502,还用于根据如下第三公式确定生成器感知损失,第三公式包括:
Figure PCTCN2020091824-appb-000023
其中,x i表示第i张输入图片,m表示输入图片的张数,G(x i)表示第i张输入图片经过第p-1代生成器子模型的输出结果;
Figure PCTCN2020091824-appb-000024
表示第i张输入图片经过第p代生成器子模型的输出结果的输出结果,∑表示求和;
Figure PCTCN2020091824-appb-000025
表示L2范数差。
可选的,处理模块1502,还用于根据如下第四公式确定判别器感知损失,第四公式包括:
Figure PCTCN2020091824-appb-000026
其中,x i表示第i张输入图片,m表示输入图片的张数,D(G(x i))表示第i张输入图片经过第(p-1)代生成器子模型的输出结果再经过判别器后的输出结果;
Figure PCTCN2020091824-appb-000027
表示表示第i张输入图片经过第p代生成器子模型的输出结果再经过判别器后的输出结果,∑表示求和;
Figure PCTCN2020091824-appb-000028
表示L2范数差。
可选的,处理模块1502,用于对压缩前的生成器模型的网络结构进行二值编码,得到第一代子群,包括:处理模块1502,用于若压缩前的生成器模型的网络结构中的第一通道对应的二值编码为0,去除第一通道相关的计算单元;或者,处理模块1502,用于若压缩前的生成器模型的网络结构中的第二通道对应的二值编码为1,保留第二通道相关的计算单元,其中,第一通道或第二通道对应压缩前的生成器模型的网络结构中的任一层卷积的一个卷积核。
或者,另一种可能的实现方式中:
获取模块1501,用于获取压缩前的第一生成器模型和第二生成器模型,第一生成器模型和第二生成器模型为对称的生成器模型;处理模块1502,用于对压缩前的第一生成器模型的网络结构进行二值编码,得到第一生成器模型对应的第一代子群;以及,对压缩前的第二生成器模型的网络结构进行二值编码,得到第二生成器模型对应的第一代子群;第一生成器模型对应的第一代子群包括M1个第一代生成器子模型的网络结构,第二生成器模型对应的第一代子群包括M2个第一代生成器子模型的网络结构,每个第一代生成器子模型的网络结构对应一组固定长度的二值编码,M1和M2均为大于1的正整数;获取模块1501,还用于获取每个第一代生成器子模型的网络结构的适应值;处理模块1502,还用于根据每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,N为大于1的正整数,其中,第一生成器模型对应的第N代子群包括M1个第N代生成器子模型的网络结构,第二生成器模型对应的第N代子群包括 M2个第N代生成器子模型的网络结构,每个第N代生成器子模型的网络结构对应一组固定长度的二值编码,第一生成器模型对应的M1个第N代生成器子模型的网络结构的适应值的平均值与第一生成器模型对应的M1个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于第一设定值,第二生成器模型对应的M2个第N代生成器子模型的网络结构的适应值的平均值与第二生成器模型对应的M2个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于第二设定值;处理模块1502,还用于根据压缩前的第一生成器模型中的网络参数和第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,确定压缩后的第一生成器模型;以及,根据压缩前的第二生成器模型中的网络参数和第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,确定压缩后的第二生成器模型。
可选的,处理模块1502,用于根据每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,包括:处理模块1502,用于重复执行下述步骤S1和步骤S2,直至得到第一生成器模型对应的第N代子群以及第二生成器模型对应的第N代子群:步骤S1、将第一生成器模型对应的第k代子群中适应值最优的第k代生成器子模型的网络结构作为第二生成器模型对应的第(k+1)代子群中的一个第(k+1)代生成器子模型的网络结构;根据遗传算法,按照第二生成器模型对应的第k代子群中M2个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第二生成器模型对应的第k+1代子群中的其他(M2-1)个第k+1代生成器子模型的网络结构,k为小于(N-1)的正整数;步骤S2、将第二生成器模型对应的第k+1代子群中适应值最优的第k+1代生成器子模型的网络结构作为第一生成器模型对应的第(k+1)代子群中的一个第k+1代生成器子模型的网络结构;根据遗传算法,按照第一生成器模型对应的第k代子群中M1个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第一生成器模型对应的第k+1代子群中的其他(M1-1)个第k+1代生成器子模型的网络结构;处理模块1502,用于确定第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在本实施例中,该模型压缩装置150以采用集成的方式划分各个功能模块的形式来呈现。这里的“模块”可以指特定应用集成电路(application-specific integrated circuit,ASIC),电路,执行一个或多个软件或固件程序的处理器和存储器,集成逻辑电路,和/或其他可以提供上述功能的器件。在一个简单的实施例中,本领域的技术人员可以想到该模型压缩装置150可以采用图16所示的形式。
如图16所示,该模型压缩装置160包括一个或多个处理器1601。可选的,该模型压缩装置160通信线路1602、至少一个通信接口(图16中仅是示例性的以包括通信接口1604,以及一个处理器1601为例进行说明)或者存储器1603。
处理器1601可以是一个中央处理器(central processing unit,CPU),微处理器,特定ASIC,或一个或多个用于控制本申请方案程序执行的集成电路。
通信线路1602可包括一通路,用于连接不同组件之间。
通信接口1604,可以用于与其他设备或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。例如,所述收发模块可以是收发器、收发机一类的装置。可选的,所述通信接口1604也可以是位于处理器1601内的收发电路,用以实现处理器的信号输入和信号输出。
存储器1603可以是具有存储功能的装置。例如可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路1602与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器1603用于存储执行本申请方案的计算机执行指令,并由处理器1601来控制执行。处理器1601用于执行存储器1603中存储的计算机执行指令,从而实现本申请实施例中提供的模型压缩方法。
或者,可选的,本申请实施例中,也可以是处理器1601执行本申请上述实施例提供的模型压缩方法中的处理相关的功能,通信接口1604负责与其他设备或通信网络通信,本申请实施例对此不作具体限定。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
在具体实现中,作为一种实施例,处理器1601可以包括一个或多个CPU,例如图16中的CPU0和CPU1。
在具体实现中,作为一种实施例,模型压缩装置160可以包括多个处理器,例如图16中的处理器1601和处理器1608。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,模型压缩装置160还可以包括输出设备1605和输入设备1606。输出设备1605和处理器1601通信,可以以多种方式来显示信息。例如,输出设备1605可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备1606和处理器1601通信,可以以多种方式接收用户的输入。例如,输入设备1606可以是鼠标、键盘、触摸屏设备或传感设备等。
上述的模型压缩装置160可以是一个通用设备或者是一个专用设备。例如模型压缩装置160可以是服务器、台式机、便携式电脑、网络服务器、掌上电脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备、或具有图16 中类似结构的设备。本申请实施例不限定模型压缩装置160的类型。
具体的,图15中的获取模块1501和处理模块1502的功能/实现过程可以通过图16所示的模型压缩装置160中的处理器1601调用存储器1603中存储的计算机执行指令来实现。由于本实施例提供的模型压缩装置160可执行上述的模型压缩方法,因此其所能获得的技术效果可参考上述方法实施例,在此不再赘述。
需要说明的是,以上模块或单元的一个或多个可以软件、硬件或二者结合来实现。当以上任一模块或单元以软件实现的时候,所述软件以计算机程序指令的方式存在,并被存储在存储器中,处理器可以用于执行所述程序指令并实现以上方法流程。该处理器可以内置于SoC(片上系统)或ASIC,也可是一个独立的半导体芯片。该处理器内处理用于执行软件指令以进行运算或处理的核外,还可进一步包括必要的硬件加速器,如现场可编程门阵列(field programmable gate array,FPGA)、PLD(可编程逻辑器件)、或者实现专用逻辑运算的逻辑电路。
当以上模块或单元以硬件实现的时候,该硬件可以是CPU、微处理器、数字信号处理(digital signal processing,DSP)芯片、微控制单元(microcontroller unit,MCU)、人工智能处理器、ASIC、SoC、FPGA、PLD、专用数字电路、硬件加速器或非集成的分立器件中的任一个或任一组合,其可以运行必要的软件或不依赖于软件以执行以上方法流程。
可选的,本申请实施例还提供了一种模型压缩装置(例如,该模型压缩装置可以是芯片或芯片系统),该模型压缩装置包括处理器,用于实现上述任一方法实施例中的方法。在一种可能的设计中,该模型压缩装置还包括存储器。该存储器,用于保存必要的程序指令和数据,处理器可以调用存储器中存储的程序代码以指令该模型压缩装置执行上述任一方法实施例中的方法。当然,存储器也可以不在该模型压缩装置中。该模型压缩装置是芯片系统时,可以由芯片构成,也可以包含芯片和其他分立器件,本申请实施例对此不作具体限定。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可以用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
尽管在此结合各实施例对本申请进行了描述,然而,在实施所要求保护的本申请过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理 解并实现所述公开实施例的其他变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其他单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (20)

  1. 一种模型压缩方法,其特征在于,所述方法包括:
    获取压缩前的生成器模型;
    对所述压缩前的生成器模型的网络结构进行二值编码,得到第一代子群,所述第一代子群包括M个第一代生成器子模型的网络结构,其中,每个第一代生成器子模型的网络结构对应一组固定长度的二值编码,M为大于1的正整数;
    获取所述每个第一代生成器子模型的网络结构的适应值;
    根据所述每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构,N为大于1的正整数,其中,所述第N代子群包括M个第N代生成器子模型的网络结构,每个第N代生成器子模型的网络结构对应一组所述固定长度的二值编码,所述M个第N代生成器子模型的网络结构的适应值的平均值与第(N-1)代子群中M个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于设定值;
    根据所述压缩前的生成器模型中的网络参数和所述适应值最优的第N代生成器子模型的网络结构,确定压缩后的生成器模型。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构,包括:
    重复执行下述步骤S1,直至得到第N代子群:
    步骤S1、从第k代子群中选出适应值最优的第k代生成器子模型的网络结构作为第(k+1)代子群中的一个第(k+1)代生成器子模型的网络结构,k为小于(N-1)的正整数;根据所述遗传算法,按照第k代子群中M个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第(k+1)代子群中的其他(M-1)个第(k+1)代生成器子模型的网络结构;
    确定第N代子群中适应值最优的第N代生成器子模型的网络结构。
  3. 根据权利要求1或2所述的方法,其特征在于,第p代生成器子模型的网络结构的适应值是根据所述第p代生成器子模型的网络参数量的归一化值、生成器感知损失和判别器感知损失确定的,所述生成器感知损失用于表征第p代生成器子模型的输出结果与第p-1代生成器子模型的输出结果的差值;所述判别器感知损失用于表征所述第p代生成器子模型的输出结果与所述第p-1代生成器子模型的输出结果分别再经过判别器后的输出结果的差值,其中,p为1至N的正整数,第0代生成器子模型为所述压缩前的生成器模型。
  4. 根据权利要求3所述的方法,其特征在于,所述第p代生成器子模型的网络参数量的归一化值、所述生成器感知损失和所述判别器感知损失满足如下第一公式:
    f(q)=(p(q)+λL GenA+γL DisA) -1
    其中,f(q)表示所述第p代生成器子模型的网络结构的适应值;p(q)表示所述第p代生成器子模型模型的网络参数量的归一化值,λ和γ为设定值;L GenA表示所述生成器感知损失;L DisA表示所述判别器感知损失,q表示所述第p代生成器子模型的网络 结构的所有卷积层的二值编码。
  5. 根据权利要求4所述的方法,其特征在于,p(q)满足如下第二公式:
    Figure PCTCN2020091824-appb-100001
    其中,q l-1表示所述第p代生成器子模型的网络结构中第(l-1)层卷积的二值编码;q l表示所述第p代生成器子模型的网络结构中第l层卷积的二值编码;H l表示所述第p代生成器子模型的网络结构的第l层卷积的高度;W l表所述第p代生成器子模型的网络结构的第l层卷积的宽度;C l表示所述第p代生成器子模型的网络结构的第l层卷积的通道数;N l表示所述第p代生成器子模型的网络结构的第l层卷积的个数;|||| 1表示L1范数;∑表示求和。
  6. 根据权利要求3-5任一项所述的方法,其特征在于,所述方法还包括:
    根据如下第三公式确定所述生成器感知损失,所述第三公式包括:
    Figure PCTCN2020091824-appb-100002
    其中,x i表示第i张输入图片,m表示输入图片的张数,G(x i)表示第i张输入图片经过所述第p-1代生成器子模型的输出结果;
    Figure PCTCN2020091824-appb-100003
    表示所述第i张输入图片经过所述第p代生成器子模型的输出结果的输出结果,∑表示求和;
    Figure PCTCN2020091824-appb-100004
    表示L2范数差。
  7. 根据权利要求3-6任一项所述的方法,其特征在于,所述方法还包括:
    根据如下第四公式确定所述判别器感知损失,所述第四公式包括:
    Figure PCTCN2020091824-appb-100005
    其中,x i表示第i张输入图片,m表示输入图片的张数,D(G(x i))表示第i张输入图片经过所述第(p-1)代生成器子模型的输出结果再经过判别器后的输出结果;
    Figure PCTCN2020091824-appb-100006
    表示表示所述第i张输入图片经过所述第p代生成器子模型的输出结果再经过所述判别器后的输出结果,∑表示求和;
    Figure PCTCN2020091824-appb-100007
    表示L2范数差。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述对所述压缩前的生成器模型的网络结构进行二值编码,得到第一代子群,包括:
    若所述压缩前的生成器模型的网络结构中的第一通道对应的二值编码为0,去除所述第一通道相关的计算单元;或者,
    若所述压缩前的生成器模型的网络结构中的第二通道对应的二值编码为1,保留 所述第二通道相关的计算单元,其中,所述第一通道或所述第二通道对应所述压缩前的生成器模型的网络结构中的任一层卷积的一个卷积核。
  9. 一种模型压缩方法,其特征在于,所述方法包括:
    获取压缩前的第一生成器模型和第二生成器模型,所述第一生成器模型和所述第二生成器模型为对称的生成器模型;
    对所述压缩前的第一生成器模型的网络结构进行二值编码,得到所述第一生成器模型对应的第一代子群;以及,对所述压缩前的第二生成器模型的网络结构进行二值编码,得到所述第二生成器模型对应的第一代子群;所述第一生成器模型对应的第一代子群包括M1个第一代生成器子模型的网络结构,所述第二生成器模型对应的第一代子群包括M2个第一代生成器子模型的网络结构,每个第一代生成器子模型的网络结构对应一组固定长度的二值编码,M1和M2均为大于1的正整数;
    获取所述每个第一代生成器子模型的网络结构的适应值;
    根据所述每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,N为大于1的正整数,其中,所述第一生成器模型对应的第N代子群包括M1个第N代生成器子模型的网络结构,所述第二生成器模型对应的第N代子群包括M2个第N代生成器子模型的网络结构,每个第N代生成器子模型的网络结构对应一组所述固定长度的二值编码,所述第一生成器模型对应的M1个第N代生成器子模型的网络结构的适应值的平均值与所述第一生成器模型对应的M1个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于第一设定值,所述第二生成器模型对应的M2个第N代生成器子模型的网络结构的适应值的平均值与所述第二生成器模型对应的M2个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于第二设定值;
    根据所述压缩前的第一生成器模型中的网络参数和所述第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,确定压缩后的第一生成器模型;以及,根据所述压缩前的第二生成器模型中的网络参数和所述第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,确定压缩后的第二生成器模型。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定所述第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及所述第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,包括:
    重复执行下述步骤S1和步骤S2,直至得到第一生成器模型对应的第N代子群以及第二生成器模型对应的第N代子群:
    步骤S1、将所述第一生成器模型对应的第k代子群中适应值最优的第k代生成器子模型的网络结构作为所述第二生成器模型对应的第(k+1)代子群中的一个第(k+1)代生成器子模型的网络结构;根据所述遗传算法,按照所述第二生成器模型对应的第k代子群中M2个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到所述第二生成器模型对应的第k+1代子群中的 其他(M2-1)个第k+1代生成器子模型的网络结构,k为小于(N-1)的正整数;
    步骤S2、将所述第二生成器模型对应的第k+1代子群中适应值最优的第k+1代生成器子模型的网络结构作为所述第一生成器模型对应的第(k+1)代子群中的一个第k+1代生成器子模型的网络结构;根据所述遗传算法,按照所述第一生成器模型对应的第k代子群中M1个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到所述第一生成器模型对应的第k+1代子群中的其他(M1-1)个第k+1代生成器子模型的网络结构;
    确定所述第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及所述第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构。
  11. 一种模型压缩装置,其特征在于,所述装置包括:获取模块和处理模块;
    所述获取模块,用于获取压缩前的生成器模型;
    所述处理模块,用于对所述压缩前的生成器模型的网络结构进行二值编码,得到第一代子群,所述第一代子群包括M个第一代生成器子模型的网络结构,其中,每个第一代生成器子模型的网络结构对应一组固定长度的二值编码,M为大于1的正整数;
    所述获取模块,还用于获取所述每个第一代生成器子模型的网络结构的适应值;
    所述处理模块,还用于根据所述每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构,N为大于1的正整数,其中,所述第N代子群包括M个第N代生成器子模型的网络结构,每个第N代生成器子模型的网络结构对应一组所述固定长度的二值编码,所述M个第N代生成器子模型的网络结构的适应值的平均值与第(N-1)代子群中M个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于设定值;
    所述处理模块,还用于根据所述压缩前的生成器模型中的网络参数和所述适应值最优的第N代生成器子模型的网络结构,确定压缩后的生成器模型。
  12. 根据权利要求11所述的装置,其特征在于,所述处理模块,用于根据所述每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第N代子群中适应值最优的第N代生成器子模型的网络结构,包括:
    所述处理模块,用于重复执行下述步骤S1,直至得到第N代子群:
    步骤S1、从第k代子群中选出适应值最优的第k代生成器子模型的网络结构作为第(k+1)代子群中的一个第(k+1)代生成器子模型的网络结构,k为小于(N-1)的正整数;根据所述遗传算法,按照第k代子群中M个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到第(k+1)代子群中的其他(M-1)个第(k+1)代生成器子模型的网络结构;
    所述处理模块,用于确定第N代子群中适应值最优的第N代生成器子模型的网络结构。
  13. 根据权利要求11或12所述的装置,其特征在于,第p代生成器子模型的网络结构的适应值是根据所述第p代生成器子模型的网络参数量的归一化值、生成器感知损失和判别器感知损失确定的,所述生成器感知损失用于表征第p代生成器子模型的输出结果与第p-1代生成器子模型的输出结果的差值;所述判别器感知损失用于表 征所述第p代生成器子模型的输出结果与所述第p-1代生成器子模型的输出结果分别再经过判别器后的输出结果的差值,其中,p为1至N的正整数,第0代生成器子模型为所述压缩前的生成器模型。
  14. 根据权利要求13所述的装置,其特征在于,所述第p代生成器子模型的网络参数量的归一化值、所述生成器感知损失和所述判别器感知损失满足如下第一公式:
    f(q)=(p(q)+λL GenA+γL DisA) -1
    其中,f(q)表示所述第p代生成器子模型的网络结构的适应值;p(q)表示所述第p代生成器子模型模型的网络参数量的归一化值,λ和γ为设定值;L GenA表示所述生成器感知损失;L DisA表示所述判别器感知损失,q表示所述第p代生成器子模型的网络结构的所有卷积层的二值编码。
  15. 根据权利要求14所述的装置,其特征在于,p(q)满足如下第二公式:
    Figure PCTCN2020091824-appb-100008
    其中,q l-1表示所述第p代生成器子模型的网络结构中第(l-1)层卷积的二值编码;q l表示所述第p代生成器子模型的网络结构中第l层卷积的二值编码;H l表示所述第p代生成器子模型的网络结构的第l层卷积的高度;W l表所述第p代生成器子模型的网络结构的第l层卷积的宽度;C l表示所述第p代生成器子模型的网络结构的第l层卷积的通道数;N l表示所述第p代生成器子模型的网络结构的第l层卷积的个数;|||| 1表示L1范数;∑表示求和。
  16. 根据权利要求13-15任一项所述的装置,其特征在于,所述处理模块,还用于根据如下第三公式确定所述生成器感知损失,所述第三公式包括:
    Figure PCTCN2020091824-appb-100009
    其中,x i表示第i张输入图片,m表示输入图片的张数,G(x i)表示第i张输入图片经过所述第p-1代生成器子模型的输出结果;
    Figure PCTCN2020091824-appb-100010
    表示所述第i张输入图片经过所述第p代生成器子模型的输出结果的输出结果,∑表示求和;
    Figure PCTCN2020091824-appb-100011
    表示L2范数差。
  17. 根据权利要求13-16任一项所述的装置,其特征在于,所述处理模块,还用于根据如下第四公式确定所述判别器感知损失,所述第四公式包括:
    Figure PCTCN2020091824-appb-100012
    其中,x i表示第i张输入图片,m表示输入图片的张数,D(G(x i))表示第i张输入图片经过所述第(p-1)代生成器子模型的输出结果再经过判别器后的输出结果;
    Figure PCTCN2020091824-appb-100013
    表示表示所述第i张输入图片经过所述第p代生成器子模型的输出结果再经过 所述判别器后的输出结果,∑表示求和;
    Figure PCTCN2020091824-appb-100014
    表示L2范数差。
  18. 根据权利要求11-17任一项所述的装置,其特征在于,所述处理模块,用于对所述压缩前的生成器模型的网络结构进行二值编码,得到第一代子群,包括:
    所述处理模块,用于若所述压缩前的生成器模型的网络结构中的第一通道对应的二值编码为0,去除所述第一通道相关的计算单元;或者,
    所述处理模块,用于若所述压缩前的生成器模型的网络结构中的第二通道对应的二值编码为1,保留所述第二通道相关的计算单元,其中,所述第一通道或所述第二通道对应所述压缩前的生成器模型的网络结构中的任一层卷积的一个卷积核。
  19. 一种模型压缩装置,其特征在于,所述装置包括:获取模块和处理模块;
    所述获取模块,用于获取压缩前的第一生成器模型和第二生成器模型,所述第一生成器模型和所述第二生成器模型为对称的生成器模型;
    所述处理模块,用于对所述压缩前的第一生成器模型的网络结构进行二值编码,得到所述第一生成器模型对应的第一代子群;以及,对所述压缩前的第二生成器模型的网络结构进行二值编码,得到所述第二生成器模型对应的第一代子群;所述第一生成器模型对应的第一代子群包括M1个第一代生成器子模型的网络结构,所述第二生成器模型对应的第一代子群包括M2个第一代生成器子模型的网络结构,每个第一代生成器子模型的网络结构对应一组固定长度的二值编码,M1和M2均为大于1的正整数;
    所述获取模块,还用于获取所述每个第一代生成器子模型的网络结构的适应值;
    所述处理模块,还用于根据所述每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,N为大于1的正整数,其中,所述第一生成器模型对应的第N代子群包括M1个第N代生成器子模型的网络结构,所述第二生成器模型对应的第N代子群包括M2个第N代生成器子模型的网络结构,每个第N代生成器子模型的网络结构对应一组所述固定长度的二值编码,所述第一生成器模型对应的M1个第N代生成器子模型的网络结构的适应值的平均值与所述第一生成器模型对应的M1个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于第一设定值,所述第二生成器模型对应的M2个第N代生成器子模型的网络结构的适应值的平均值与所述第二生成器模型对应的M2个第(N-1)代生成器子模型的网络结构的适应值的平均值的差值小于第二设定值;
    所述处理模块,还用于根据所述压缩前的第一生成器模型中的网络参数和所述第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,确定压缩后的第一生成器模型;以及,根据所述压缩前的第二生成器模型中的网络参数和所述第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,确定压缩后的第二生成器模型。
  20. 根据权利要求19所述的装置,其特征在于,所述处理模块,用于根据所述每个第一代生成器子模型的网络结构的适应值,结合遗传算法,确定所述第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及所述第二生 成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构,包括:
    所述处理模块,用于重复执行下述步骤S1和步骤S2,直至得到第一生成器模型对应的第N代子群以及第二生成器模型对应的第N代子群:
    步骤S1、将所述第一生成器模型对应的第k代子群中适应值最优的第k代生成器子模型的网络结构作为所述第二生成器模型对应的第(k+1)代子群中的一个第(k+1)代生成器子模型的网络结构;根据所述遗传算法,按照所述第二生成器模型对应的第k代子群中M2个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到所述第二生成器模型对应的第k+1代子群中的其他(M2-1)个第k+1代生成器子模型的网络结构,k为小于(N-1)的正整数;
    步骤S2、将所述第二生成器模型对应的第k+1代子群中适应值最优的第k+1代生成器子模型的网络结构作为所述第一生成器模型对应的第(k+1)代子群中的一个第k+1代生成器子模型的网络结构;根据所述遗传算法,按照所述第一生成器模型对应的第k代子群中M1个生成器子模型的网络结构的适应值大小进行概率选择,并按照预设的概率进行选择、交叉和突变操作,得到所述第一生成器模型对应的第k+1代子群中的其他(M1-1)个第k+1代生成器子模型的网络结构;
    所述处理模块,用于确定所述第一生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构以及所述第二生成器模型对应的第N代子群中适应值最优的第N代生成器子模型的网络结构。
PCT/CN2020/091824 2019-05-22 2020-05-22 模型压缩方法及装置 WO2020233709A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910430876.2 2019-05-22
CN201910430876.2A CN111985597B (zh) 2019-05-22 2019-05-22 模型压缩方法及装置

Publications (1)

Publication Number Publication Date
WO2020233709A1 true WO2020233709A1 (zh) 2020-11-26

Family

ID=73436031

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/091824 WO2020233709A1 (zh) 2019-05-22 2020-05-22 模型压缩方法及装置

Country Status (2)

Country Link
CN (1) CN111985597B (zh)
WO (1) WO2020233709A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994309A (zh) * 2023-05-06 2023-11-03 浙江大学 一种公平性感知的人脸识别模型剪枝方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727633A (zh) * 2019-09-17 2020-01-24 广东高云半导体科技股份有限公司 基于SoC FPGA的边缘人工智能计算系统构架
CN112580639B (zh) * 2021-03-01 2021-08-13 四川大学 一种基于进化神经网络模型压缩的早期胃癌图像识别方法
CN114239792B (zh) * 2021-11-01 2023-10-24 荣耀终端有限公司 利用量化模型进行图像处理的系统、装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2018100325A4 (en) * 2018-03-15 2018-04-26 Nian, Xilai MR A New Method For Fast Images And Videos Coloring By Using Conditional Generative Adversarial Networks
CN108171266A (zh) * 2017-12-25 2018-06-15 中国矿业大学 一种多目标深度卷积生成式对抗网络模型的学习方法
CN108171762A (zh) * 2017-12-27 2018-06-15 河海大学常州校区 一种深度学习的压缩感知同类图像快速重构系统与方法
CN108665432A (zh) * 2018-05-18 2018-10-16 百年金海科技有限公司 一种基于生成对抗网络的单幅图像去雾方法
CN109472757A (zh) * 2018-11-15 2019-03-15 央视国际网络无锡有限公司 一种基于生成对抗神经网络的图像去台标方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6424737B1 (en) * 2000-01-24 2002-07-23 Sony Corporation Method and apparatus of compressing images using localized radon transforms
US7225376B2 (en) * 2002-07-30 2007-05-29 International Business Machines Corporation Method and system for coding test pattern for scan design
US10984308B2 (en) * 2016-08-12 2021-04-20 Xilinx Technology Beijing Limited Compression method for deep neural networks with load balance
US20190147320A1 (en) * 2017-11-15 2019-05-16 Uber Technologies, Inc. "Matching Adversarial Networks"
CN108334497A (zh) * 2018-02-06 2018-07-27 北京航空航天大学 自动生成文本的方法和装置
CN108615073B (zh) * 2018-04-28 2020-11-03 京东数字科技控股有限公司 图像处理方法及装置、计算机可读存储介质、电子设备
CN109783910B (zh) * 2018-12-29 2020-08-28 西安交通大学 一种利用生成对抗网络加速的结构优化设计方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171266A (zh) * 2017-12-25 2018-06-15 中国矿业大学 一种多目标深度卷积生成式对抗网络模型的学习方法
CN108171762A (zh) * 2017-12-27 2018-06-15 河海大学常州校区 一种深度学习的压缩感知同类图像快速重构系统与方法
AU2018100325A4 (en) * 2018-03-15 2018-04-26 Nian, Xilai MR A New Method For Fast Images And Videos Coloring By Using Conditional Generative Adversarial Networks
CN108665432A (zh) * 2018-05-18 2018-10-16 百年金海科技有限公司 一种基于生成对抗网络的单幅图像去雾方法
CN109472757A (zh) * 2018-11-15 2019-03-15 央视国际网络无锡有限公司 一种基于生成对抗神经网络的图像去台标方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994309A (zh) * 2023-05-06 2023-11-03 浙江大学 一种公平性感知的人脸识别模型剪枝方法
CN116994309B (zh) * 2023-05-06 2024-04-09 浙江大学 一种公平性感知的人脸识别模型剪枝方法

Also Published As

Publication number Publication date
CN111985597B (zh) 2023-10-24
CN111985597A (zh) 2020-11-24

Similar Documents

Publication Publication Date Title
WO2020233709A1 (zh) 模型压缩方法及装置
CN107066239A (zh) 一种实现卷积神经网络前向计算的硬件结构
US11651194B2 (en) Layout parasitics and device parameter prediction using graph neural networks
CN114118347A (zh) 用于神经网络量化的细粒度每向量缩放
TWI775210B (zh) 用於卷積運算的資料劃分方法及處理器
JP7085600B2 (ja) 画像間の類似度を利用した類似領域強調方法およびシステム
US20230237342A1 (en) Adaptive lookahead for planning and learning
WO2021147276A1 (zh) 数据处理方法、装置及芯片、电子设备、存储介质
US20230062503A1 (en) Pruning and accelerating neural networks with hierarchical fine-grained structured sparsity
CN112906865A (zh) 神经网络架构搜索方法、装置、电子设备及存储介质
US11282258B1 (en) Adaptive sampling at a target sampling rate
US20230298243A1 (en) 3d digital avatar generation from a single or few portrait images
US20220398283A1 (en) Method for fast and better tree search for reinforcement learning
US11925860B2 (en) Projective hash maps
US11830145B2 (en) Generation of differentiable, manifold meshes of arbitrary genus
US11935194B2 (en) Constrained BSDF sampling
CN113808183B (zh) 使用扭曲的复合估计乘积积分
US20230229916A1 (en) Scalable tensor network contraction using reinforcement learning
CN111382835A (zh) 一种神经网络压缩方法、电子设备及计算机可读介质
US20230360278A1 (en) Table dictionaries for compressing neural graphics primitives
US11595152B1 (en) Forward error correction encoding using binary clustering
US20240144000A1 (en) Fairness-based neural network model training using real and generated data
US11972188B2 (en) Rail power density aware standard cell placement for integrated circuits
CN117112145B (zh) 训练模型分配方法、装置、计算机设备和存储介质
US20230111375A1 (en) Augmenting and dynamically configuring a neural network model for real-time systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20809322

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20809322

Country of ref document: EP

Kind code of ref document: A1