WO2022141189A1 - 一种循环神经网络精度和分解秩的自动搜索方法和装置 - Google Patents

一种循环神经网络精度和分解秩的自动搜索方法和装置 Download PDF

Info

Publication number
WO2022141189A1
WO2022141189A1 PCT/CN2020/141379 CN2020141379W WO2022141189A1 WO 2022141189 A1 WO2022141189 A1 WO 2022141189A1 CN 2020141379 W CN2020141379 W CN 2020141379W WO 2022141189 A1 WO2022141189 A1 WO 2022141189A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
performance evaluation
sub
decomposition rank
layer
Prior art date
Application number
PCT/CN2020/141379
Other languages
English (en)
French (fr)
Inventor
常成
朱雪娟
管子义
李凯
杜来民
毛伟
余浩
Original Assignee
南方科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南方科技大学 filed Critical 南方科技大学
Priority to CN202080003956.0A priority Critical patent/CN112771545A/zh
Priority to PCT/CN2020/141379 priority patent/WO2022141189A1/zh
Publication of WO2022141189A1 publication Critical patent/WO2022141189A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of artificial intelligence, for example, to an automatic search method and device for accuracy and decomposition rank of a recurrent neural network.
  • Deep learning can automatically learn useful features, get rid of the dependence on feature engineering, and achieve results that surpass other algorithms in tasks such as image recognition, video understanding, natural language processing, etc. This success is largely due to the volume Proposition of Convolution Neural Network (CNN) and Recurrent Neural Network (RNN).
  • RNN is a type of recurrent neural network that takes sequence data as input, performs recursion in the evolution direction of the sequence and all nodes (recurrent units) are connected in a chain. middle.
  • the content that RNN needs to analyze gradually becomes high-dimensional and large-scale information flow, which leads to the fact that the fully connected layer of RNN contains extremely dense parameters and calculations, which puts forward huge requirements for computing resources and memory.
  • model compression methods There are many existing model compression methods.
  • tensor decomposition and model quantization are mainly used for model compression.
  • a fixed decomposition rank is usually used to decompose the weight tensor of the model, or a fixed precision is used to quantize the model, that is, the quantization value of each layer of the network is the same, so as to realize model compression.
  • the embodiments of the present application provide an automatic search method and device for the accuracy and decomposition rank of a cyclic neural network, so as to achieve the purpose of quickly searching for the optimal decomposition rank required for model compression and the optimal quantization value of each layer of the network.
  • an embodiment of the present application provides an automatic search method for the accuracy and decomposition rank of a recurrent neural network, the method comprising:
  • the target hyperparameter combination is automatically searched, and the hyperparameters of the hypernetwork are updated according to the target hyperparameter combination, wherein the target hyperparameter combination includes the decomposition rank and the respective quantized value of each layer of the network;
  • the current decomposition rank of the super network and the quantized value of each layer of the network are output.
  • an embodiment of the present application provides an automatic search device for the accuracy and decomposition rank of a recurrent neural network, the device comprising:
  • the initialization module is used to initialize the super network
  • the target hyperparameter combination is automatically searched, and the hyperparameters of the hypernetwork are updated according to the target hyperparameter combination, wherein the target hyperparameter combination includes the decomposition rank and the respective quantized value of each layer of the network;
  • a compression confirmation module configured to output the current decomposition rank of the super network and the quantized value of each layer of the network in response to the performance evaluation result satisfying a preset condition.
  • the decomposition rank of the super network and the quantization value corresponding to each layer of the network are continuously searched and updated to ensure that the sampled Sub-networks whose performance satisfies the conditions finally achieves the purpose of quickly searching for the optimal decomposition rank required for model compression and the optimal quantization value of each layer of the network.
  • Fig. 1a is a schematic flowchart of an automatic search method for accuracy and decomposition rank of a recurrent neural network according to the first embodiment of the present application;
  • Fig. 1b is a schematic diagram of a process of cyclically adjusting hyperparameters according to the first embodiment of the present application
  • FIG. 2 is a schematic flowchart of an automatic search method for the precision and decomposition rank of a recurrent neural network according to the second embodiment of the present application;
  • Fig. 3 is a schematic structural diagram of an automatic search device for the precision and decomposition rank of a recurrent neural network according to the third embodiment of the present application.
  • Fig. 1a is a schematic flowchart of an automatic search method for the accuracy and decomposition rank of a cyclic neural network according to the first embodiment of the present application.
  • This embodiment can be applied to quickly determine the optimal decomposition rank of a super network and the optimal decomposition rank of each layer of the network through equipment such as a server.
  • the method can be performed by an automatic search device for the precision and decomposition rank of a recurrent neural network, which can be implemented in software and/or hardware, and can be integrated on electronic equipment, such as a server or computer. on the device.
  • the automatic search method of RNN accuracy and decomposition rank includes the following process:
  • the hyper-network is a network composed of hyper-parameters, and the hyper-network is optionally constructed based on the structure of a cyclic neural network model, and initializing the hyper-network refers to determining the initial value of the hyper-parameters of the hyper-network. is to determine the decomposition rank of the super-network and the respective quantized value of each layer of the super-network, where the quantized value can characterize the accuracy of the neural network. It should be noted here that during initialization, it can be set according to the value of the hyperparameter set by the user, or it can be set according to the default value of the hyperparameter. After initialization, the quantization value of each layer of the network can be different. The step of S104 is readjusted.
  • the steps of S102-S104 can be performed cyclically to update the hyper-parameters of the super-network, so as to ensure that high-performance sub-networks can be sampled from the updated hyper-parameters.
  • gumbel-softmax can be used to sample the configuration of the subnetwork, and then a corresponding subnetwork can be obtained according to the configuration of the subnetwork, wherein the configuration information of the subnetwork includes the hyperparameters of the subnetwork, such as The decomposition rank of the sub-network and the quantized value of each layer of the sub-network.
  • the performance of the sub-network is evaluated, that is, the accuracy and compression rate of the sub-network are determined.
  • perform performance evaluation on the sampled sub-network based on a preset performance evaluation strategy.
  • the sub-network needs to be trained using sample data, and the determination is based on the training result.
  • the sample data may include training set samples and validation set samples. Training may be performed with the training set samples, and after the training is completed, the validation set samples may be used for verification to determine the accuracy of the sub-network.
  • tensor decomposition can be performed according to the decomposition rank of the sub-network and the quantized value of each layer of the network in the sub-network (for example, TT decomposition, that is, the weight tensor is decomposed by several second-order and third-order tensors. representation) and model quantization to compress the trained sub-network, and determine the compression ratio according to the compression result.
  • TT decomposition that is, the weight tensor is decomposed by several second-order and third-order tensors. representation
  • the quantized value of each layer of network is used to define the data bit width of the network weight of this layer and the data bit width of the network activation function of this layer.
  • the quantized value compared to the same quantized value used in each layer of the network, can ensure the effect of model compression;
  • the decomposition rank refers to the number of non-zero singular values in each intermediate diagonal matrix of the weight tensor during singular value decomposition .
  • TT decomposition can be used to convert high-order weight tensors into multiple low-order tensors, which can greatly reduce the number of model parameters.
  • the present application can achieve a higher model compression rate compared to using only one compression method due to the combination of two model compression technologies, model quantization and tensor decomposition.
  • the prediction accuracy and compression rate of the sub-network can be determined. It should be noted here that, compared to referring to a single compression rate or accuracy rate as an evaluation index, the present application considers both the accuracy and the compression rate, which can better balance the model accuracy rate and the compression rate.
  • both the decomposition rank and the quantization value of the super network have a value range.
  • the target hyperparameter combination is automatically searched, that is, within the value range of the decomposition rank and the quantization value. , randomly extract the decomposition rank and quantized value for random combination to obtain the target hyperparameter combination; in another optional implementation, all possible combinations of the decomposition rank and quantized value can be predetermined and saved, and the target hyperparameter combination can be automatically searched , you can directly select a combination from all pre-saved combinations as the target hyperparameter combination.
  • the performance evaluation result includes the accuracy rate and compression rate of the sub-network; therefore, the target hyperparameter combination can be automatically searched according to the accuracy rate and compression rate of the sub-network, wherein the target hyperparameter combination includes the decomposition rank. and the respective quantization values for each layer of the network.
  • the method of alternately updating the decomposition rank and the quantized value of different network layers can be used to search for the target hyperparameter combination; Either of the two hyperparameters of the quantized value of the network layer remains unchanged, and the other hyperparameter is searched; the next time you search again, the fixed and searched hyperparameters are opposite to the last time.
  • the decomposition rank can be fixed and the quantization value (that is, the bit width bit value) of each layer of the network can be reduced, that is, When searching for a quantized value, you need to search for a quantized value that is smaller than the current quantized value.
  • the super network includes a three-layer network, and the hyperparameters include a three-layer network whose quantization values are 2bit, 6bit, and 8bit, respectively, and the decomposition rank is 3.
  • the decomposition rank 3 can be kept unchanged, and the searched quantization value can be 2bit, 4bit, or 6bit.
  • S104 for the hyper-network after updating the hyper-parameters, return to execute the sampling sub-network, and perform performance evaluation on the sampled sub-network.
  • the target hyper-parameters are combined and the hyper-parameters of the super-network are updated, and so on, until the performance of the sub-network satisfies the preset condition, where the preset condition can optionally include the threshold of the accuracy of the sub-network and the compression rate, and only the accuracy of the sub-network
  • the preset condition can optionally include the threshold of the accuracy of the sub-network and the compression rate, and only the accuracy of the sub-network
  • Figure 1b shows a schematic diagram of the process of looping hyperparameter tuning.
  • the sub-network structure is sampled from the super-network, the sampled sub-network structure is evaluated based on the performance evaluation strategy, and the evaluation results (that is, the accuracy and compression rate of the sub-network structure) are fed back, so that Search the target hyperparameter combination and update the hyperparameters of the hypernetwork according to the evaluation results.
  • This is performed cyclically according to the process of sampling-evaluation-feedback evaluation result-searching and updating hyperparameters, until the estimated accuracy and compression rate of the sub-network structure meet the preset conditions.
  • the decomposition rank of the super network and the quantization value of each layer of the network are optimal, and it can be output directly. Set the quantization value to ensure the best compression effect of the subsequent model.
  • the decomposition rank of the super network and the quantization value corresponding to each layer of the network are continuously searched and updated to ensure that the sampled
  • the sub-network whose performance meets the conditions will finally achieve the purpose of the optimal decomposition rank and the optimal quantization value of each layer of network required for fast search model compression, thus ensuring the compression based on the optimal decomposition rank and the optimal quantization value of each layer of network
  • the compressed recurrent neural network can be deployed on resource-constrained hardware platforms, so that it can achieve high accuracy in practical tasks with only a few computing resources and memory.
  • FIG. 2 is a schematic flowchart of an automatic search method for accuracy and decomposition rank of a recurrent neural network according to a second embodiment of the present application. This embodiment is optimized on the basis of the above-mentioned embodiment. Referring to FIG. 2 , the method includes:
  • the search space refers to the selection of the decomposition rank of the neural network and the quantization value of each network layer.
  • the search space includes a variety of preset hyperparameter combinations, and each hyperparameter combination includes the decomposition rank and sum of the hypernetwork.
  • the quantization value can be optionally a bit width, such as 2, 4, 6, 8, etc., and the unit is bit; the decomposition rank has multiple possibilities such as 2, 3, 4, 5, 6, 7, and 8.
  • a random combination of quantized values and decomposition ranks constitutes the search space.
  • the search space can exemplarily include quantization values of 2 bits and 4 bits, and decomposition rank 2; quantization values of 4 bits and 8 bits, decomposition rank 3; quantization values of 4 bits and 8 bits, decomposition rank 6; This will not be listed one by one.
  • any combination of hyperparameters in the search space is updated to the hypernetwork, that is, the hyperparameters of the hypernetwork are set according to any combination of hyperparameters.
  • the super network is a two-layer network.
  • the hyperparameters of the two-layer network include: quantized values of 4 bits and 8 bits, and a decomposition rank of 3.
  • the softmax function can be used to change the search space to be continuous, and then the gradient descent method is used to find the best hyperparameter combination. This search method is called differentiable search. Policy.
  • the current decomposition rank can be fixed first, and the current decomposition rank can be fixed. Search the search space for multiple hyperparameter combinations with the same decomposition rank as the current decomposition rank, and select a target hyperparameter combination from the multiple hyperparameter combinations, where, among the quantized values of each layer network of the target hyperparameter combination, at least The quantization value of a certain layer of network is smaller than the current quantization value of this layer of network.
  • the current hyperparameters of the super network are quantized values of 4 bits and 8 bits, and the decomposition rank is 3; if the decomposition rank 3 remains unchanged, the multiple hyperparameter combinations that are the same as the current decomposition rank 3 may be: 4 bits, 6 bits, and the decomposition rank 3. ; 4bit, 4bit, decomposition rank 3; 2bit, 8bit, decomposition rank 3; 2bit, 4bit, decomposition rank 3; other combinations may also be included, which are not listed here. Then, a hyperparameter combination can be filtered out as the target hyperparameter combination.
  • the decomposition rank and the quantized value of each layer of the network in the target hyperparameter combination are replaced with the original decomposition rank and quantized value of the super network.
  • the optimal hyperparameter combination can be obtained from the search space, where the optimal hyperparameter combination refers to the accuracy and The compression ratio meets the preset conditions.
  • the present application uses a differentiable search strategy to continuously combine the target hyperparameters in the search space until the optimal decomposition rank sum is found. Up to the optimal quantized value of each layer of the network, the efficiency of obtaining the optimal hyperparameter combination is improved, thereby ensuring the efficiency of model compression.
  • FIG. 3 is a schematic structural diagram of an automatic search device for the precision and decomposition rank of a cyclic neural network according to the third embodiment of the present application.
  • the device is used to quickly determine the optimal decomposition rank of the super network and the optimal quantization of each layer of the network through equipment such as a server. value, see Figure 3, the device includes:
  • an initialization module 301 used to initialize the super network
  • the loop module 302 is used to execute:
  • the target hyperparameter combination is automatically searched, and the hyperparameters of the hypernetwork are updated according to the target hyperparameter combination, wherein the target hyperparameter combination includes the decomposition rank and the respective quantized value of each layer of the network;
  • the compression confirmation module 303 is configured to output the current decomposition rank of the super network and the quantized value of each layer of the network in response to the performance evaluation result satisfying a preset condition.
  • the decomposition rank of the super network and the quantization value corresponding to each layer of the network are continuously adjusted, so as to ensure that the sampled performance satisfies the Conditional sub-network, and finally achieve the purpose of quickly searching for the optimal decomposition rank required for model compression and the optimal quantization value of each layer of the network.
  • the performance evaluation result includes the accuracy rate and compression rate of the sub-network
  • the target hyperparameter combination is automatically searched, and the hyperparameters of the hypernetwork are updated according to the target hyperparameter combination, including:
  • the target hyperparameter combination is automatically searched, and the hyperparameters of the super network are updated according to the target hyperparameter combination.
  • the device further includes:
  • the search space determination module is used to determine the search space before initializing the super network, wherein the search space includes a variety of preset hyperparameter combinations, and each hyperparameter combination includes the decomposition rank of the super network and each layer of the network in the super network. the corresponding quantization value.
  • the initialization module is used for:
  • automatically search for the target hyperparameter combination including:
  • the target hyperparameter combination is automatically determined from the search space.
  • the quantized value of each layer of network is used to define the data bit width of the network weight of this layer and the data bit width of the activation function of this layer of network.
  • the automatic search device for the accuracy and decomposition rank of a cyclic neural network provided by the embodiment of the present application can execute the automatic search method for the accuracy and decomposition rank of a cyclic neural network provided by any embodiment of the present application, and has functional modules and beneficial effects corresponding to the execution method. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请实施例公开了一种循环神经网络精度和分解秩的自动搜索方法和装置,其中方法包括:初始化超网络,并循环执行如下操作:从超网络中采样出子网络,并对采样出子网络进行性能评估;根据性能评估结果,自动搜索目标超参数组合,根据目标超参数组合更新超网络的超参数;针对调整超参数后的超网络,返回执行采样子网络,并对采样出子网络进行性能评估的操作;响应于性能评估结果满足预设条件,将超网络当前的分解秩和每层网络的量化值输出。本申请实施例中,根据性能评估结果,不搜索并更新整超网络的分解秩和每层网络各自对应的量化值,以实现快速搜索模型压缩需要的最优分解秩和每层网络的最优量化值的目的。

Description

一种循环神经网络精度和分解秩的自动搜索方法和装置 技术领域
本申请涉及人工智能技术领域,例如涉及一种循环神经网络精度和分解秩的自动搜索方法和装置。
背景技术
深度学习可以自动学习出有用的特征,脱离了对特征工程的依赖,在图像识别、视频理解、自然语言处理等任务上取得了超越其他算法的结果,这种成功很大程度上得益于卷积神经网络(Convolution neural network,CNN)和循环神经网络(Recurrent neural network,RNN)的提出。RNN是一类以序列数据为输入,在序列的演进方向进行递归且所有节点(循环单元)按链式连接的递归神经网络,经常被使用在分析时序信息(表情,动作,声音等)的模型中。但是随着人工智能的发展,RNN需要分析的内容逐渐变为高维度大规模的信息流,这导致RNN的全连接层包含极其稠密的参数和计算,对计算资源和内存提出了巨大的要求。
现有的模型压缩方法有很多,针对RNN的主要是采用张量分解和模型量化进行模型压缩。具体的,通常采用固定的分解秩对模型的权重张量进行张量分解,或者采用固定的精度对模型进行量化,即每层网络的量化值相同,以实现模型压缩。
然而,由于网络层的冗余及对硬件的需求不一样,不同网络层所需要的量化值也是不一样的,采用固定精度进行模型量化,很难达到最优效果。此外,模型张量进行分解时,不同的分解秩对模型性能影响也不一样。如果采用手动设置的分解秩对模型张量进行分解,需要耗费大量人力物力得到最优分解秩。
发明内容
本申请实施例提供一种循环神经网络精度和分解秩的自动搜索方法和装置,以达到快速搜索模型压缩需要的最优分解秩和每层网络的最优量化值的目的。
第一方面,本申请实施例提供了一种循环神经网络精度和分解秩的自动搜索方法,该方法包括:
初始化超网络,并循环执行如下操作:
从所述超网络中采样出子网络,并对采样出子网络进行性能评估;
根据性能评估结果,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数,其中,所述目标超参数组合包括分解秩和每层网络各自的量化值;
针对更新超参数后的超网络,返回执行采样子网络,并对采样出子网络进行性能评估的操作;
响应于所述性能评估结果满足预设条件,将所述超网络当前的分解秩和每层网络的量化值输出。
第二方面,本申请实施例提供了一种循环神经网络精度和分解秩的自动搜索装置,装置包括:
初始化模块,用于初始化超网络;
循环模块,用于执行:
从所述超网络中采样出子网络,并对采样出子网络进行性能评估;
根据性能评估结果,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数,其中,所述目标超参数组合包括分解秩和每层网络各自的量化值;
针对更新超参数后的超网络,返回执行采样子网络,并对采样出子网络进行性能评估的操作;
压缩确认模块,用于响应于所述性能评估结果满足预设条件,将所述超网络当前的分解秩和每层网络的量化值输出。
本申请实施例中,在初始化超网络后,根据从超网络中采样出的子网络的性能评估结果,不断搜索并更新超网络的分解秩和每层网络各自对应的量化值,以保证采样出性能满足条件的子网络,最终达到快速搜索模型压缩需要的最优分解秩和每层网络的最优量化值的目的。
附图说明
图1a是根据本申请第一实施例中的循环神经网络精度和分解秩的自动搜索方法的流程示意图;
图1b是根据本申请第一实施例中循环调整超参数的过程的示意图;
图2是根据本申请第二实施例中的循环神经网络精度和分解秩的自动搜索方法的流程示意图;
图3是根据本申请第三实施例中的循环神经网络精度和分解秩的自动搜索 装置的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
图1a是根据本申请第一实施例的循环神经网络精度和分解秩的自动搜索方法的流程示意图,本实施例可适用于通过服务器等设备快速确定超网络最优分解秩和各层网络最优量化值的情况,该方法可以由循环神经网络精度和分解秩的自动搜索装置来执行,该装置可以采用软件和/或硬件的方式实现,并可集成在电子设备上,例如集成在服务器或计算机设备上。
如图1a所示,循环神经网络精度和分解秩的自动搜索方法包括如下流程:
S101、初始化超网络,并循环执行S102-S104。
本申请实施例中,超网络是由超参数构成的网络,而超网络可选的是基于循环神经网络模型的结构构建的,而初始化超网络是指确定超网络的超参数的初始值,主要是确定超网络的分解秩和超网络中每层网络各自的量化值,其中,量化值可以表征神经网络的精度。在此需要说明的是,初始化时可根据用户设定的超参数取值进行设置,也可以根据超参数的默认值设定,初始化后,每层网络的量化值可以不同,后续可按照S102-S104的步骤再调整。
在初始化超网络后,可循环执行S102-S104的步骤更新超网络的超参数,以便保证从更新超参数的超网络中可以采样出高性能的子网络。
S102、从超网络中采样出子网络,并对采样出子网络进行性能评估。
本申请实施例中,基于超网络的超参数,可使用gumbel-softmax采样子网络的配置,进而根据子网络的配置得到相应的子网络,其中子网络的配置信息包括子网络的超参数,例如子网络的分解秩和子网络的每层网络的量化值。
在得到子网络后,对该子网络进行性能评估,即是确定该子网络的准确性以及压缩率。可选的,基于预设的性能评估策略对采样出的子网络进行性能评估。示例性的,要判断子网络的准确性,需要利用样本数据对子网络进行训练,并根据训练结果来确定。示例性的,样本数据可以包括训练集样本和验证集样本,可通过训练集样本进行训练,并在训练完成后,通过验证集样本进行验证, 以确定子网络的准确性。进一步的,针对训练完成的子网络,可根据子网络的分解秩和子网络中每层网络的量化值进行张量分解(例如TT分解,即将权重张量用若干二阶和三阶的张量来表示)和模型量化,以实现对训练完的子网络进行压缩,并根据压缩结果确定压缩比。
在此需要说明的是,本申请实施例中,每层网络的量化值用于限定该层网络权重的数据位宽和该层网络激活函数的数据位宽,由于为每层网络设置各自对应的量化值,相比于每层网络采用相同的量化值,可以保证模型压缩的效果;分解秩是指权重张量在做奇异值分解时每一个中间的对角矩阵中非0奇异值的个数。本申请利用TT分解可以将高阶的权重张量转换为多个低阶张量,可极大的减少模型参数的数量。而且本申请由于将模型量化和张量分解两种模型压缩技术相结合,相比于只采用一种压缩方法,可以实现更高的模型压缩率。
通过上述过程,即可确定子网络的预测的准确性以及压缩率。在此需要说明的是,相比于参考单一的压缩率或者准确率作为评估指标,本申请同时考虑了准确性和压缩率,可更好的权衡模型准确率和压缩率。
S103、根据性能评估结果,自动搜索目标超参数组合,并根据所述目标超参数组合更新超网络的超参数,其中,目标超参数组合包括分解秩和每层网络各自的量化值。
本申请实施例中,超网络的分解秩和量化值均有一个取值范围,在一种可选的实施方式中,自动搜索目标超参数组合,即是在分解秩和量化值取值范围内,随机抽取分解秩和量化值进行随机组合,得到目标超参数组合;在另一种可选的实施方式中,可预先确定分解秩和量化值所有可能的组合并保存,自动搜索目标超参数组合时,可直接从预先保存的所有组合中选取一个组合作为目标超参数组合。
进一步的,本申请实施例中,性能评估结果包括子网络的准确率和压缩率;因此可根据子网络的准确率和压缩率,自动搜索目标超参数组合,其中,目标超参数组合包括分解秩和每层网络各自的量化值。
在一种可选的实施方式中,可根据子网络的准确率和压缩率,采用交替更新分解秩和不同网络层的量化值方式,搜索目标超参数组合;例如保持超网络的分解秩和不同网络层的量化值这两种超参数中任一个不变,搜索另一个超参数;下次再搜索时,固定和搜索的超参数与上次相反。
示例性的,若根据性能评估结果确定压缩率比较小,也即模型的参数还是 比较多,此时可以固定分解秩不变,调小各层网络的量化值(即位宽比特值),也即搜索量化值时,需要搜索一个比当前量化值小的量化值,例如,超网络包括三层网络,超参数包括三层网络的量化值分别为2bit、6bit、8bit,分解秩为3,此时可保持分解秩3不变,搜索到的量化值可以为2bit、4bit、6bit。在得到目标超参数组合后,可根据目标超参数组合更新超网络的超参数,进而可执行S104。
S104、针对更新超参数后的超网络,返回执行采样子网络,并对采样出子网络进行性能评估的操作。
在得到更新超参数后的超网络后,返回执行S102,也即是从更新超参数后的超网络中采样出子网络,并对采样出子网络进行性能评估;进而根据评估结果执行S103再次搜索目标超参数组合并更新超网络的超参数,如此循环,直到子网络的性能满足预设条件为止,其中,预设条件可选的包括子网络准确性和压缩率的阈值,只有子网络准确性和压缩率预设阈值时,触发执行S105的步骤。
为了详细说明循环过程,可参见图1b,其示出了循环调整超参数的过程的示意图。在初始化超网络后,从超网络中采样出子网络结构,基于性能评估策略对采样出的子网络结构进行评估,并将评估结果(即子网络结构的准确率和压缩率)进行反馈,以便根据评估结果搜索目标超参数组合并更新超网络的超参数。如此按照采样-评估-反馈评估结果-搜索并更新超参数的过程循环执行,直到评估出的子网络结构的准确率和压缩率满足预设条件为止。
S105、响应于性能评估结果满足预设条件,将超网络当前的分解秩和每层网络的量化值输出。
在子网络的性能评估结果满足条件时,此时超网络的分解秩和各层网络的量化值已为最优,可直接将其输出即可,后续可直接按照输出的分解秩和各层网络量化值进行设置,即可保证后续模型压缩效果最优。
本申请实施例中,在初始化超网络后,根据从超网络中采样出的子网络的性能评估结果,不断搜索并更新超网络的分解秩和每层网络各自对应的量化值,以保证采样出性能满足条件的子网络,最终达到速搜索模型压缩需要的最优分解秩和每层网络的最优量化值的目的,由此可以保证基于最优分解秩和每层网络的最优量化值压缩循环神经网络后,可以使压缩后的循环神经网络可以部署到资源受限的硬件平台,使得只需利用很少的计算资源和内存便可以在实际任务中达到较高的准确率。
图2是根据本申请第二实施例的循环神经网络精度和分解秩的自动搜索方法的流程示意图,本实施例是在上述实施例的基础上进行优化,参见图2,该方法包括:
S201、确定搜索空间。
其中,搜索空间是指对神经网络的分解秩和各网络层量化值的选择,一些实施例中,搜索空间包括预设的多种超参数组合,每种超参数组合包括超网络的分解秩和超网络中每层网络各自对应的量化值。
示例性的,量化值可选的为位宽,例如2,4,6,8等,单位为bit;分解秩有2,3,4,5,6,7,8等多个可能。将量化值和分解秩随机组合构成搜索空间。两层网络作为例子,搜索空间可示例性的包括量化值2bit、4bit,分解秩2;量化值4bit、8bit,分解秩3;量化值4bit、8bit,分解秩6;由于超参数组合众多,在此不再一一列举。
S202、将搜索空间中的任一超参数组合更新到超网络,并循环执行S203-S206。
本申请实施例中,将搜索空间中的任一超参数组合更新到超网络,也即是按照任一超参数组合,设置超网络的超参数。示例性的,超网络为两层网络,初始化后,两层网络的超参数包括:量化值4bit、8bit,分解秩3。
S203、从超网络中采样出子网络,并对采样出子网络进行性能评估。
具体的描述参见上述实施例,在此不再赘述。
S204、根据性能评估结果,结合预设的可微分的搜索策略,自动从搜索空间中确定目标超参数组合。
本申请实施例中,由于搜索空间是离散的,因此可通过softmax函数将搜索空间变成连续的,进而采用梯度下降的方法寻找到最佳超参数组合,这种搜索方式称为可微分的搜索策。
示例性的,若根据性能评估结果确定压缩率比较小,也即意味着模型的参数还是比较多,在利用可微分的搜索策搜索目标超参数组合时,可先固定当前分解秩不变,可从搜索空间中搜索分解秩与当前分解秩相同的多个超参数组合,从多个超参数组合中选择一个目标超参数组合,其中,在目标超参数组合的各层网络的量化值中,至少某一层网络的量化值小于该层网络当前的量化值。
示例性的,超网络当前的超参数为量化值4bit、8bit,分解秩3;保持分解秩3不变,与当前分解秩3相同的多个超参数组合可以为:4bit、6bit,分解秩 3;4bit、4bit,分解秩3;2bit、8bit,分解秩3;2bit、4bit,分解秩3;也可以包括其它组合,在此不一一列举。进而可从中筛选出一个超参数组合作为目标超参数组合。
S205、利用目标超参数组合更新超网络的超参数。
本申请实施例中,确定目标超参数组合后,将目标超参数组合中的分解秩和每层网络的量化值,替换超网络的原来的分解秩和量化值。
S206、针对调整超参数后的超网络,返回执行采样子网络,并对采样出子网络进行性能评估的操作。
本申请实施例中,通过循环执行S203-S206,即可从搜索空间中得到最优的超参数组合,其中最优的超参数组合是指基于该超参数组合提取到的子网络的准确性和压缩率满足预设条件。
S207、响应于所述性能评估结果满足预设条件,将超网络当前的分解秩和每层网络的量化值输出。
本申请实施例中,相比于传统的通过人工设计量化策略和分解秩策略,本申请通过可微分的搜索策略,从搜索空间中不断的目标超参数组合,直到搜索到最优的分解秩和各层网络最优的量化值为止,提升了获取最优超参数组合的效率,进而保证了模型压缩的效率。
图3是根据本申请第三实施例中的循环神经网络精度和分解秩的自动搜索装置的结构示意图,该装置用于通过服务器等设备快速确定超网络最优分解秩和各层网络最优量化值的情况,参见图3,装置包括:
初始化模块301,用于初始化超网络;
循环模块302,用于执行:
从所述超网络中采样出子网络,并对采样出子网络进行性能评估;
根据性能评估结果,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数,其中,所述目标超参数组合包括分解秩和每层网络各自的量化值;
针对更新超参数后的超网络,返回执行采样子网络,并对采样出子网络进行性能评估的操作;
压缩确认模块303,用于响应于所述性能评估结果满足预设条件,将所述超网络当前的分解秩和每层网络的量化值输出。
本申请实施例中,在初始化超网络后,根据从超网络中采样出的子网络的 性能评估结果,不断调整超网络的分解秩和每层网络各自对应的量化值,以保证采样出性能满足条件的子网络,最终实现快速搜索模型压缩需要的最优分解秩和每层网络的最优量化值的目的。
在上述实施例的基础上,可选的,性能评估结果包括所述子网络的准确率和压缩率;
相应的,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数,包括:
根据所述子网络的准确率和压缩率,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数。
在上述实施例的基础上,可选的,该装置还包括:
搜索空间确定模块,用于在初始化超网络之前,确定搜索空间,其中,搜索空间包括预设的多种超参数组合,每种超参数组合包括超网络的分解秩和超网络中每层网络各自对应的量化值。
在上述实施例的基础上,可选的,初始化模块是用于:
将所述搜索空间中的任一超参数组合更新到所述超网络。
在上述实施例的基础上,可选的,根据性能评估结果,自动搜索目标超参数组合,包括:
根据性能评估结果,结合预设的可微分的搜索策略,自动从所述搜索空间中确定目标超参数组合。
在上述实施例的基础上,可选的,每层网络的量化值用于限定该层网络权重的数据位宽和该层网络激活函数的数据位宽。
本申请实施例所提供的循环神经网络精度和分解秩的自动搜索装置可执行本申请任意实施例所提供的循环神经网络精度和分解秩的自动搜索方法,具备执行方法相应的功能模块和有益效果。
注意,上述仅为本申请的可选实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。

Claims (10)

  1. 一种循环神经网络精度和分解秩的自动搜索方法,包括:
    初始化超网络,并循环执行如下操作:
    从所述超网络中采样出子网络,并对采样出子网络进行性能评估;
    根据性能评估结果,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数,其中,所述目标超参数组合包括分解秩和每层网络各自的量化值;
    针对更新超参数后的超网络,返回执行采样子网络,并对采样出子网络进行性能评估的操作;
    响应于所述性能评估结果满足预设条件,将所述超网络当前的分解秩和每层网络的量化值输出。
  2. 根据权利要求1所述的方法,其中,所述性能评估结果包括所述子网络的准确率和压缩率;
    相应的,根据性能评估结果,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数,包括:
    根据所述子网络的准确率和压缩率,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数。
  3. 根据权利要求1所述的方法,其中,在初始化超网络之前,所述方法还包括:
    确定搜索空间,其中,所述搜索空间包括预设的多种超参数组合,每种超参数组合包括超网络的分解秩和超网络中每层网络各自对应的量化值。
  4. 根据权利要求3所述的方法,其中,初始化超网络,包括:
    将所述搜索空间中的任一超参数组合更新到所述超网络。
  5. 根据权利要求4所述的方法,其中,根据性能评估结果,自动搜索目标超参数组合,包括:
    根据性能评估结果,结合预设的可微分的搜索策略,自动从所述搜索空间中确定目标超参数组合。
  6. 根据权利要求1所述的方法,其中,每层网络的量化值用于限定该层网络权重的数据位宽和该层网络激活函数的数据位宽。
  7. 一种循环神经网络精度和分解秩的自动搜索装置,包括:
    初始化模块,用于初始化超网络;
    循环模块,用于执行:
    从所述超网络中采样出子网络,并对采样出子网络进行性能评估;
    根据性能评估结果,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数,其中,所述目标超参数组合包括分解秩和每层网络各自的量化值;
    针对更新超参数后的超网络,返回执行采样子网络,并对采样出子网络进行性能评估的操作;
    压缩确认模块,用于响应于所述性能评估结果满足预设条件,将所述超网络当前的分解秩和每层网络的量化值输出。
  8. 根据权利要求7所述的装置,其中,所述性能评估结果包括所述子网络的准确率和压缩率;
    相应的,根据性能评估结果,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数,包括:
    根据所述子网络的准确率和压缩率,自动搜索目标超参数组合,并根据所述目标超参数组合更新所述超网络的超参数。
  9. 根据权利要求7所述的装置,其中,所述装置还包括:
    搜索空间确定模块,用于在初始化超网络之前,确定搜索空间,其中,搜索空间包括预设的多种超参数组合,每种超参数组合包括超网络的分解秩和超网络中每层网络各自对应的量化值。
  10. 根据权利要求9所述的装置,其中,初始化模块是用于:
    将所述搜索空间中的任一超参数组合更新到所述超网络。
PCT/CN2020/141379 2020-12-30 2020-12-30 一种循环神经网络精度和分解秩的自动搜索方法和装置 WO2022141189A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080003956.0A CN112771545A (zh) 2020-12-30 2020-12-30 一种循环神经网络精度和分解秩的自动搜索方法和装置
PCT/CN2020/141379 WO2022141189A1 (zh) 2020-12-30 2020-12-30 一种循环神经网络精度和分解秩的自动搜索方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/141379 WO2022141189A1 (zh) 2020-12-30 2020-12-30 一种循环神经网络精度和分解秩的自动搜索方法和装置

Publications (1)

Publication Number Publication Date
WO2022141189A1 true WO2022141189A1 (zh) 2022-07-07

Family

ID=75699464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/141379 WO2022141189A1 (zh) 2020-12-30 2020-12-30 一种循环神经网络精度和分解秩的自动搜索方法和装置

Country Status (2)

Country Link
CN (1) CN112771545A (zh)
WO (1) WO2022141189A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312855B (zh) * 2021-07-28 2021-12-10 北京大学 基于搜索空间分解的机器学习优化方法、电子设备及介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263913A (zh) * 2019-05-23 2019-09-20 深圳先进技术研究院 一种深度神经网络压缩方法及相关设备
WO2020028890A1 (en) * 2018-08-03 2020-02-06 Edifecs, Inc. Prediction of healthcare outcomes and recommendation of interventions using deep learning
CN110956262A (zh) * 2019-11-12 2020-04-03 北京小米智能科技有限公司 超网络训练方法及装置、电子设备、存储介质
CN111582454A (zh) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 生成神经网络模型的方法和装置
CN111582453A (zh) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 生成神经网络模型的方法和装置
CN111652354A (zh) * 2020-05-29 2020-09-11 北京百度网讯科技有限公司 用于训练超网络的方法、装置、设备以及存储介质
CN111738418A (zh) * 2020-06-19 2020-10-02 北京百度网讯科技有限公司 超网络的训练方法和装置
CN112116090A (zh) * 2020-09-28 2020-12-22 腾讯科技(深圳)有限公司 神经网络结构搜索方法、装置、计算机设备及存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020028890A1 (en) * 2018-08-03 2020-02-06 Edifecs, Inc. Prediction of healthcare outcomes and recommendation of interventions using deep learning
CN110263913A (zh) * 2019-05-23 2019-09-20 深圳先进技术研究院 一种深度神经网络压缩方法及相关设备
CN110956262A (zh) * 2019-11-12 2020-04-03 北京小米智能科技有限公司 超网络训练方法及装置、电子设备、存储介质
CN111582454A (zh) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 生成神经网络模型的方法和装置
CN111582453A (zh) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 生成神经网络模型的方法和装置
CN111652354A (zh) * 2020-05-29 2020-09-11 北京百度网讯科技有限公司 用于训练超网络的方法、装置、设备以及存储介质
CN111738418A (zh) * 2020-06-19 2020-10-02 北京百度网讯科技有限公司 超网络的训练方法和装置
CN112116090A (zh) * 2020-09-28 2020-12-22 腾讯科技(深圳)有限公司 神经网络结构搜索方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN112771545A (zh) 2021-05-07

Similar Documents

Publication Publication Date Title
US11531889B2 (en) Weight data storage method and neural network processor based on the method
US10552737B2 (en) Artificial neural network class-based pruning
Kalchbrenner et al. Efficient neural audio synthesis
Rocková et al. Posterior concentration for Bayesian regression trees and forests
US20190370658A1 (en) Self-Tuning Incremental Model Compression Solution in Deep Neural Network with Guaranteed Accuracy Performance
Rothman Positive definite estimators of large covariance matrices
WO2022027937A1 (zh) 一种神经网络压缩方法、装置、设备及存储介质
CN111079899A (zh) 神经网络模型压缩方法、系统、设备及介质
CN110298446B (zh) 面向嵌入式系统的深度神经网络压缩和加速方法及系统
CN108876044B (zh) 一种基于知识增强神经网络的线上内容流行度预测方法
US20220383126A1 (en) Low-Rank Adaptation of Neural Network Models
US11962671B2 (en) Biomimetic codecs and biomimetic coding techniques
CN110276451A (zh) 一种基于权重归一化的深度神经网络压缩方法
CN109388779A (zh) 一种神经网络权重量化方法和神经网络权重量化装置
CN111598238A (zh) 深度学习模型的压缩方法及装置
KR20220091575A (ko) 신경망 모델을 압축하는 방법 및 기기
WO2022141189A1 (zh) 一种循环神经网络精度和分解秩的自动搜索方法和装置
Evermann et al. XES tensorflow-Process prediction using the tensorflow deep-learning framework
US20230252294A1 (en) Data processing method, apparatus, and device, and computer-readable storage medium
CN116976428A (zh) 模型训练方法、装置、设备及存储介质
Liu et al. Efficient hyperparameters optimization through model-based reinforcement learning and meta-learning
CN111368995A (zh) 一种基于序列推荐系统的通用网络压缩框架和压缩方法
Peter et al. Resource-efficient dnns for keyword spotting using neural architecture search and quantization
KR20200078865A (ko) 신경망의 프루닝-재훈련 장치 및 방법
CN114742036A (zh) 一种预训练语言模型的组合式模型压缩方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20967530

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20967530

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20967530

Country of ref document: EP

Kind code of ref document: A1