WO2023024577A1 - 面向边缘计算的重参数神经网络架构搜索方法 - Google Patents

面向边缘计算的重参数神经网络架构搜索方法 Download PDF

Info

Publication number
WO2023024577A1
WO2023024577A1 PCT/CN2022/091907 CN2022091907W WO2023024577A1 WO 2023024577 A1 WO2023024577 A1 WO 2023024577A1 CN 2022091907 W CN2022091907 W CN 2022091907W WO 2023024577 A1 WO2023024577 A1 WO 2023024577A1
Authority
WO
WIPO (PCT)
Prior art keywords
branch
parameter
convolution
network
heavy
Prior art date
Application number
PCT/CN2022/091907
Other languages
English (en)
French (fr)
Inventor
高丰
白文媛
Original Assignee
之江实验室
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 之江实验室 filed Critical 之江实验室
Priority to US17/888,513 priority Critical patent/US11645495B2/en
Publication of WO2023024577A1 publication Critical patent/WO2023024577A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the invention relates to the technical field of neural network architecture search, in particular to an edge computing-oriented heavy-parameter neural network architecture search method.
  • Neural network architecture search is a research hotspot in the field of machine learning in recent years. This technology includes the design of search operators and spaces, and the design of search algorithms. At present, neural network architecture search technology can be used to automatically design neural network models of various sizes, avoiding manual complex parameter adjustment. Among them, one of the most potential applications is designing a lightweight neural network model to improve the application capabilities of neural networks on mobile devices.
  • Ding et al. proposed to use structure heavy parameter technology to train neural network, that is, the network has a multi-branch structure during network training, while the network
  • the network has a single-branch structure during inference (Xiaohan Ding and Xiangyu Zhang and Ningning Ma and Jungong Han and Guiguang Ding and Jian Sun.:RepVGG:Making VGG-style ConvNets Great Again.In CVPR,2021).
  • most deep learning inference frameworks are optimized for 3 ⁇ 3 convolutions, so a single-branch structure consisting of all 3 ⁇ 3 can achieve very fast inference speed.
  • an easy-to-implement, high-applicability neural network model with heavy parameters that can be searched and deployed in the case of edge computing is proposed to achieve the purpose of improving the real-time detection speed while ensuring the high precision of the network.
  • the present invention adopts following technical scheme:
  • a heavy parameter neural network architecture search method for edge computing comprising the following steps:
  • each original K ⁇ K convolution sets branches including convolutions of other scales and shortcuts, forming multiple Branch blocks, used to extract image features under different views;
  • Z i,j represents the importance of the j-th branch in the i-th multi-branch block
  • exp() represents the exponent e
  • ⁇ i,j represents the structural parameters of the j-th branch in the i-th multi-branch block
  • ⁇ i, j represents the sampling noise of the j-th branch in the i-th multi-branch block
  • ⁇ i, j represents the temperature coefficient of the j-th branch in the i-th multi-branch block, where its initial value is 1;
  • R i, j ⁇ i, j + ⁇ i, j
  • Rank(R i, j ) represents the importance ranking of the j-th branch in all branches in the i-th multi-branch block, and s represents the ranking threshold , by adjusting the size of s to meet the maximum limit C of the video memory, branches lower than the ranking threshold are not activated;
  • each branch in the multi-branch block obtains different image features, activate the branch with a value of 1 in the formula (2), perform forward reasoning, and then calculate the predicted image classification label and the real The loss function L (cross-entropy) of the image classification label of ;
  • S6 use the best sub-network of a single branch to obtain image features, perform real-time reasoning, and perform image classification on the fused single-branch network. There is no difference in accuracy between the pre-fused and fused networks, but greatly reduces The number of parameters and inference time of the network.
  • said S5 merging the best sub-network of multi-branches into the best sub-network of single-branch by heavy parameter method, comprises the following steps:
  • represents the scaling parameter of the BN layer
  • represents the mean value of the BN layer features
  • represents the standard deviation of the BN layer features
  • represents the translation parameters of the BN layer
  • F' m,:,:,: and F m,:, :,: represent the weight parameters of the mth output channel of the convolutional layer after fusion and before fusion respectively
  • b' m and b m represent the bias parameters of the mth output channel of the convolutional layer after fusion and before fusion respectively
  • : represents All elements of this dimension.
  • each branch is converted into the same scale as the original K ⁇ K convolution, and then the converted K ⁇ K convolution of each branch is respectively convolved with the original K ⁇ K convolution, according to The following heavy parameter formula is fused into a K ⁇ K convolution:
  • TRANS represents the transpose operation on the tensor
  • F 1 represents the K ⁇ K convolution of the jth branch after conversion
  • F 2 represents the original K ⁇ K convolution
  • D is the number of input channels
  • K 1 , K 2 is the size of the convolution kernel
  • F j represents the fused K ⁇ K convolution corresponding to the jth branch
  • b j m represents the bias of the mth output channel of the fused convolutional layer
  • b 1 d represents the converted
  • b 2 d represents the bias of the d-th input channel of the original K ⁇ K convolution
  • F 2 m,d,u,v represents the original The weight of the u-th row and the v-th column of the convolution kernel under the m-th input channel and the d-th output channel of the K ⁇ K convolution.
  • branch convolution 1 ⁇ 1 convolution, 1 ⁇ K convolution, K ⁇ 1 convolution, 1 ⁇ 1-AVG and short cut in S52 convert the scale into the original K
  • the ⁇ K convolution is the same.
  • N is the number of branches
  • b j is the bias after fusion.
  • sampling noise in S32 is subject to a Logistic distribution with a mean value of 0 and a variance of 1, log(-log(u 1 ))-log(-log(u 2 )), where u 1 and u 2 They are all u i ⁇ U(0, 1), indicating that u i obeys the uniform distribution from 0 to 1.
  • the original convolution in S1 is the original K ⁇ K convolution, with 6 branches, and the operators are: 1 ⁇ 1 convolution, 1 ⁇ K convolution, K ⁇ 1 convolution, 1 ⁇ 1-K ⁇ K convolution, 1 ⁇ 1-AVG convolution and shortcut short cut;.
  • the multi-branch structure can strengthen the ability of the network to extract features, it will greatly reduce the network reasoning speed.
  • the operator of each branch In order to use the heavy parameter technology to improve the reasoning speed, the operator of each branch must be Linear and no nonlinear operations are added afterwards, but a BN layer with scaling and translation is used, so that the results of each operator have a certain nonlinear transformation.
  • the output of the current block It will go through a ReLU layer for nonlinear transformation, that is, after the operator of each branch, add a batch normalization BN (Batch Normalization) operation, add the output results of each branch element by element, and perform nonlinear operations (ReLU), combined as the output of the current multi-branch block.
  • ReLU nonlinear operations
  • This application greatly improves the training efficiency and network accuracy when using the heavy parameter technique to train the network, that is, reduces the amount of calculation and storage for neural network training, and at the same time, the trained model can have better performance; and after the training is completed , can convert the multi-branch structure network into a single-branch network without any loss, reducing the amount of parameters and inference time during network inference.
  • Fig. 1 is a structural diagram of search calculation and multi-branch blocks in the present invention.
  • Fig. 2 is a flow chart of the super network training phase in the present invention.
  • Fig. 3 is a scheme diagram of the fusion of multiple branches into a single branch in the present invention.
  • Fig. 4 is a structural diagram of a super network composed of residual branches in the present invention.
  • the present invention first constructs a multi-branch block as a search space, and the multi-branch block can be fused into a single branch through heavy parameter technology.
  • the multi-branch block consists of 1 ⁇ 1 convolution, 1 ⁇ K convolution, K ⁇ 1 Convolution, 1 ⁇ 1-K ⁇ K convolution, 1 ⁇ 1-AVG convolution and short cut.
  • a super network is constructed by stacking multi-branch blocks, which contains all sub-network structures. Afterwards, the super network is trained, and the best branch structure is searched for each block asymptotically during the training process, and the branch structure of different blocks can be different. At the beginning of training, each branch has a certain probability to be sampled during each iteration.
  • the sampled branches will update their weight parameters and structure parameters (sampling probability). As the training progresses, the number of times that useless branches are sampled will gradually decrease until they are not sampled at all. After the training, the branches that are no longer sampled will be removed, and the remaining branches will be fused into one branch according to the heavy parameter technology to improve the inference speed of the network.
  • this embodiment relates to an edge computing-oriented heavy-parameter neural network architecture search method, including the following steps:
  • Step 1 Search operator and multi-branch block structure design, as shown in Figure 1;
  • Step 1-1 Convolution with different kernel sizes can extract image features in different views, so each branch uses convolution or average pooling operators with different convolution kernel sizes, the jump used in the ResNet series network
  • Step 1-2 Although the multi-branch structure can strengthen the ability of the network to extract features, it will greatly reduce the network reasoning speed.
  • the operator of each branch In order to use the heavy parameter technology to improve the reasoning speed, the operator of each branch must be linear and After that, no nonlinear operation is added, but a BN layer with scaling and translation is used to make the result of each operator have a certain nonlinear transformation.
  • the output of the current block In order to further enhance the nonlinear ability of the network, the output of the current block will pass through a ReLU Layers are transformed non-linearly.
  • Step 2 Build a super network
  • Step 2-1 Learn from the experience of many artificially designed networks, and continuously stack the multi-branch blocks designed in step 1 to form a super network with redundant branches.
  • the super network constructed here contains 22 multi-branch blocks, and each block The number of output channels are 48, 48, 48, 96, 96, 96, 96, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 192, 1280,
  • the step size of each operator in the 1st, 2nd, 4th, 8th, and 22nd multi-branch blocks is 2, which is used to downsample the image.
  • the feature map output by the network will finally pass through a global average pooling layer, and then by One input is 1280 dimensions, and the output is a fully connected layer of 1000 dimensions that outputs the value of each category.
  • Step 3 Train the constructed super network on the ImageNet-1K dataset, the training flow chart is shown in Figure 2;
  • Step 3-1 Initialize the weight parameter ⁇ and structural parameter ⁇ of the super network, and set the training hyperparameters as follows: the weight parameter optimizer is SGD with momentum, the initial learning rate is 0.1, the momentum is 0.9, and the weight decay is 0.0001. The learning rate is attenuated by CosineAnnealingLR in all iterations.
  • the structural parameter optimizer is Adam, the initial learning rate is 0.0001, the betas are (0.5, 0.999), no weight attenuation is performed, the training batch size is 256, and the super network is trained for a total of 120 epochs, the first 15 epochs Perform random sampling and only update the weight parameters, update the structural parameters and weight parameters in the middle 50epochs, and only update the weight parameters in the final 55epochs fixed structure.
  • the total number of branches C is set to 75, that is, the maximum limit of the given video memory C is 75;
  • Step 3-2 If random sampling is performed, each branch has a 50% probability of being activated. If it is not random sampling, calculate the importance of each branch according to formula (1), and then calculate the importance of each branch according to formula (2). Branch activations with importance higher than 0.5;
  • Step 3-3 Obtain a batch of training data, use the activated branch to perform forward inference, and calculate the loss function.
  • the loss function uses cross entropy, and calculates the gradient of the weight parameter ⁇ through backpropagation. If it is not random sampling , it is necessary to calculate the gradient of the structural parameter ⁇ according to formula (4);
  • Step 3-4 Update the weight parameter ⁇ with the SGD optimizer, and update the structural parameter ⁇ with the Adam optimizer;
  • Step 3-5 If the training is not over, return to step 3-2, and if the training is over, output the trained super network.
  • Step 4 Delete the inactive branches in the trained super network, and retain the remaining branches and corresponding weight parameters.
  • the super network structure composed of the remaining branches is shown in Figure 4, and the super network is tested on the test set , to obtain 72.96% top-1 correct rate, and the reasoning time required for each batch of pictures is 0.68 seconds.
  • Step 5 Merge the branches of each block in the super network, as shown in Figure 3;
  • Step 5-1 Merge the BN layer after each operator with the operator according to formula (5);
  • Step 5-2 Convert 1 ⁇ 1 convolution, 1 ⁇ 3 convolution, 3 ⁇ 1 convolution, AVG and short cut into 3 ⁇ 3 convolution by padding zero, and then convert the 3 ⁇ 3 convolution of each branch
  • the product and the original 3 ⁇ 3 convolution are combined according to formula (6).
  • Step 6 Classify the fused single-branch model on the test set for image classification.
  • the test equipment is Intel Core i7 CPU.
  • the model trained by this patent method has the same inference speed and model size, but the correct rate is much higher than the single branch training model.
  • this patent Because the method can fuse multiple branches into a single branch, it can greatly reduce the amount of model parameters and calculation without any loss of performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种面向边缘计算的重参数神经网络架构搜索方法,包括如下步骤:步骤一:设计线性算子与多支路块结构;步骤二:通过堆叠多支路块结构构建超级网络;步骤三:通过基于梯度的一阶段搜索算法训练超级网络;步骤四:删除超级网络中多余的支路构建最佳子网络;步骤五:多分支的最佳子网络转化成单支路网络;步骤六:使用单支路网络完成任务推理。该方法用于搜索可进行重参数的神经网络结构,在保证推理精度的同时,确保了推理的实时性以及模型运算的高效率。

Description

面向边缘计算的重参数神经网络架构搜索方法 技术领域
本发明涉及神经网络架构搜索技术领域,尤其是涉及面向边缘计算的重参数神经网络架构搜索方法。
背景技术
神经网络架构搜索是近年来机器学习领域中的研究热门,这种技术包括对搜索算子和空间的设计,搜索算法的设计等。目前,神经网络架构搜索技术可用于自动化的设计各种大小的神经网络模型,避免人工进行复杂的参数调整。其中,最具潜力的应用之一在设计轻量化的神经网络模型,提高神经网络在移动设备上的应用能力。
在移动设备中,神经网络推理的实时性和准确性是两大要考虑的因素。在早期人工设计的轻量化神经网络模型中,Howard等人提出了MobileNet,该神经网络是单支路的结构,通过1×1的点卷积和3×3的深度分离卷积交替运算大大降低了网络的参数量,从而提升了推理速度(Howard,A.G.,Zhu,M.,Chen,B.,Kalenichenko,D.,Wang,W.,Weyand,T.,Andreetto,M.,Adam,H.:Mobilenets:Efficient convolutional neural networks for mobile vision applications.In ArXiv abs/1704.04861,2017.)。但是由于该模型是单支路结构,很难得到较高的准确性。同时,在人工设计轻量化模型时,许多工作往往把网络的参数量或浮点数运算量作为衡量模型快慢的指标。然而类似跳跃连接等无参数且低浮点数运算的操作仍会导致推理速度变慢。
为了缓解深度单支路结构网络准确性低和多支路网络结构推理慢的问题,Ding等人提出了利用结构重参数技术训练神经网络,即在网络训练时网络为多分支结构,而在网络推理时网络为单支路结构(Xiaohan Ding and Xiangyu Zhang and Ningning Ma and Jungong Han and Guiguang Ding and Jian Sun.:RepVGG:Making VGG-style ConvNets Great Again.In CVPR,2021)。此外,大多数的深度学习推理框架都对3×3卷积有优化,因此全部都由3×3组成的单支路结构可以获得非常快的推理速度。
虽然RepVGG系列模型已经大大提高了模型的实际推理速度,但因为支路的结构是人为固定的,所以网络模型的准确性仍有非常大的提升空间。此外,过 多的支路会导致训练网络模型需要的显存空间大大增加。因此,如何高效的通过重参数技术提升模型性能成为时下需要解决的问题。
发明内容
为解决现有技术的不足,出一种易实现、适用性高的,可搜索出部署在边缘计算情况下的重参数神经网络模型,实现保证网络高精度的同时,提高实时检测速度的目的,本发明采用如下的技术方案:
面向边缘计算的重参数神经网络架构搜索方法,包括如下步骤:
S1,设计各个支路的算子以及搜索空间,为了加强K×K卷积提取特征的能力,每个原始K×K卷积设置支路,包括其他尺度的卷积和捷径short cut,构成多分支块,用于提取不同视野下的图片特征;
S2,构建包含所有支路的超级网络,借鉴VGG网络中的直筒设计,通过不断堆叠K×K卷积的多分支块,构建一个单支路冗余的网络结构;
S3,通过离散神经网络架构搜索方法,在给定显存限制下,训练超级网络,包括如下步骤:
S31,给定显存最大限制C,初始化每条支路的结构参数ɑ和权重参数θ;
S32,计算每条支路的重要性:
Figure PCTCN2022091907-appb-000001
其中,Z i,j表示第i个多分支块中第j条支路的重要性,exp()表示指数e,α i,j表示第i个多分支块中第j条支路的结构参数,ζ i,j表示第i个多分支块中第j条支路的采样噪声,λ i,j表示第i个多分支块中第j条支路的温度系数,在此处其初始值为1;
S33,计算每条支路是否激活:
Figure PCTCN2022091907-appb-000002
其中R i,j=α i,ji,j,Rank(R i,j)表示第i个多分支块中第j条支路在所有支路中的重要性排名,s表示排名阈值,通过调整s的大小来满足所述显存最 大限制C,低于该排名阈值的支路不激活;
S34,获取一批训练数据,多分支块中的每条支路获取不同的图像特征,激活公式(2)中值为1的支路,进行前向推理,之后计算预测的图像分类标签与真实的图像分类标签的损失函数L(交叉熵);
S35,通过反向传播,分别计算权重参数θ和激活参数Z在损失函数L上的梯度,所述激活参数Z是由Z i,j组成的向量,同时计算结构参数ɑ在logp(Z)上的梯度,Z是离散后的ɑ,p(α)是结构参数ɑ经过如下公式概率化后的结果
Figure PCTCN2022091907-appb-000003
S36,根据对L的梯度,更新权重参数θ,同时根据如下公式更新结构参数ɑ(当不是随机采样时,根据如下公式计算结构参数ɑ的梯度)
Figure PCTCN2022091907-appb-000004
其中
Figure PCTCN2022091907-appb-000005
表示在Z采样于p(Z i,j)概率分布下的期望,
Figure PCTCN2022091907-appb-000006
是第i个多分支块中第j条支路的结构参数α在logp(Z)上的梯度;
S37,回到S32,直至超级网络中的权重参数和结构参数训练到收敛;
S4,将训练后多余的支路从超级网络中去除,得到最佳子网络,根据公式(2)去除经过步骤3训练的超级网络中未激活的支路,剩余支路的权重参数直接从超级网络中继承,不需要重新训练或者微调;
S5,将多支路的最佳子网络融合成单支路的最佳子网络;
S6,使用单支路的最佳子网络获取图像特征,进行实时推理,将融合后的单支路网络进行图像分类,融合前和融合后的网络在准确性上无任何差别,但大大降低了网络的参数量和推理时间。
进一步地,所述S5,通过重参数方法将多支路的最佳子网络融合成单支路的最佳子网络,包括如下步骤:
S51,将保留下来的每条支路中的卷积层和BN层权重参数进行重参数融合;
S52,将各支路重参数融合为与原始卷积相同尺度的卷积,再分别与原始 K×K卷积融合成一个卷积F j
S53,将同一个多分支块中的多分支K×K卷积F j融合成一个K×K卷积。
进一步地,所述S51中重参数融合公式如下
Figure PCTCN2022091907-appb-000007
其中,γ表示BN层的缩放参数、μ表示BN层特征的均值、σ表示BN层特征的标准差、β表示BN层的平移参数,F’ m,:,:,:和F m,:,:,:分别表示融合后和融合前卷积层第m个输出通道的权重参数,b’ m和b m分别表示融合后和融合前卷积层第m个输出通道的偏置参数,:表示该维度的所有元素。
进一步地,所述S52中,先将各支路分别转换成与原始K×K卷积相同的尺度,再将各支路转换后的K×K卷积分别与原始K×K卷积,根据如下的重参数公式,融合成一个K×K卷积:
Figure PCTCN2022091907-appb-000008
其中TRANS表示对张量的转置操作,F 1表示转换后的第j条支路的K×K卷积,F 2表示原始K×K卷积,D为输入通道数,K 1、K 2为卷积核尺寸,F j表示第j条支路对应的融合后的K×K卷积,b j m表示融合后卷积层第m个输出通道的偏置,b 1 d表示转换后的第j条支路的K×K卷积第d个输入通道的偏置,b 2 d表示原始K×K卷积第d个输入通道的偏置,F 2 m,d,u,v表示原始K×K卷积第m个输入通道、第d个输出通道下,卷积核第u行第v列的权重。
进一步地,所述S52中的支路卷积1×1卷积、1×K卷积、K×1卷积、1×1-AVG和short cut通过补零操作,将尺度转换成于原始K×K卷积相同。
进一步地,所述S53中多个卷积融合成一个卷积的重参数公式如下
F′=F 1+F 2+...+F N,b′=b 1+b 2+...+b N     (7)
其中N为支路数量,b j为融合后的偏置。
进一步地,所述S31中的初始化后,先进行对支路的随机采样,并只更新权重参数θ,再进行支路的重要性采样,并更新结构参数ɑ和权重参数θ,最后进行支路的重要性采样,并固定结构参数ɑ,只更新权重参数θ。
进一步地,所述S32中的采样噪声,是服从均值为0、方差为1的Logistics 分布,log(-log(u 1))-log(-log(u 2)),其中u 1和u 2都是u i~U(0,1),表示u i服从0到1的均匀分布。
进一步地,所述S1中的原始卷积为原始K×K卷积,其支路有6条,其算子分别为:1×1卷积、1×K卷积、K×1卷积、1×1-K×K卷积、1×1-AVG卷积以及捷径short cut;。
进一步地,所述S1中,多支路结构虽然可以强化网络提取特征的能力,但是会极大的降低网络推理速度,为了使用重参数技术提高推理速度,每条支路的算子都必须是线性的且之后都不加非线性操作,而是采用一个带缩放和平移的BN层,使得每个算子的结果有一定的非线性变换,为了进一步增强网络的非线性能力,当前块的输出会经过一个ReLU层进行非线性变换,即每条支路的算子后,添加批量归一化BN(Batch Normalization)操作,将每条支路的输出结果,按元素相加,并进行非线性操作(ReLU),结合起来作为当前多分支块的输出。
本发明的优势和有益效果在于:
本申请大大提升了利用重参数技巧训练网络时的训练效率以及网络准确性,即减少了神经网络训练的计算量和存储量,同时训练后的模型能有更优秀的性能;且在训练完成后,可以将多分支结构网络无任何损失的转化为单分支网络,减少了网络推理时的参数量和推理时间。
附图说明
图1是本发明中搜索算计和多支路块的结构图。
图2是本发明中超级网络训练阶段的流程图。
图3是本发明中多支路融合为单支路的方案图。
图4是本发明中剩余支路构成的超级网络结构图。
具体实施方式
以下结合附图对本发明的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本发明,并不用于限制本发明。
本发明首先构建一个多支路块作为搜索空间,该多支路块可以通过重参数技术融合为单支路,该多支路块由1×1卷积、1×K卷积、K×1卷积、1×1-K×K卷积、1×1-AVG卷积以及short cut组成。通过堆叠多支路块构建一个超级网络,该超级网络包含了所有子网络结构。之后训练超级网络,并且在训练过程中 渐近式的为每个块搜索出最佳的支路结构,不同块的支路结构可以不同。在训练的开始,每次迭代过程中每条支路都有一定概率被采样。采样到的支路会更新其权重参数和结构参数(采样概率)。随着训练的深入,无用的支路被采样到的次数会慢慢变少直至完全不采样。在训练结束后,不再被采样的支路会被去除,而剩下的支路则根据重参数技术融合为一条支路,提升网络的推理速度。
具体地,本实施例涉及一种面向边缘计算的重参数神经网络架构搜索方法,包括如下步骤:
步骤1:搜索算子与多支路块结构设计,如图1所示;
步骤1-1:不同核尺寸的卷积可以提取不同视野下的图片特征,因此每条支路都采用了不同卷积核大小的卷积或者平均池化算子,ResNet系列网络中使用的跳跃连接(short cut)可以被看作是一种权重参数永远是1的1×1卷积,此处卷积核尺寸K=3,被设计的6条支路算子分别为:1×1卷积、1×3卷积、3×1卷积、1×1-3×3卷积、1×1-AVG卷积以及short cut,所有支路的最终结果通过元素相加结合起来作为当前块的输出;
步骤1-2:多支路结构虽然可以强化网络提取特征的能力,但是会极大的降低网络推理速度,为了使用重参数技术提高推理速度,每条支路的算子都必须是线性的且之后都不加非线性操作,而是采用一个带缩放和平移的BN层使得每个算子的结果有一定的非线性变换,为了进一步增强网络的非线性能力,当前块的输出会经过一个ReLU层进行非线性变换。
步骤2:构建超级网络;
步骤2-1:借鉴许多人工设计的网络经验,不断堆叠步骤1中设计的多分支块,形成一个支路冗余的超级网络,此处构建的超级网络包含22个多分支块,每个块的输出通道数分别为48、48、48、96、96、96、96、192、192、192、192、192、192、192、192、192、192、192、192、192、192、1280,第1、2、4、8、22个多分支块中每个算子的步长为2,用于对图像进行下采样,网络输出的特征图最后会经过一个全局平均池化层,再由一个输入是1280维,输出是1000维的全连接层输出每个分类的值。
步骤3:在ImageNet-1K数据集上训练构建好的超级网络,训练流程图如图2所示;
步骤3-1:初始化超级网络的权重参数θ和结构参数α,训练超参数设置为:权重参数优化器为带动量的SGD,初始学习率为0.1,动量为0.9,权重衰减为0.0001,每次迭代都通过CosineAnnealingLR进行学习率衰减,结构参数优化器为Adam,初始学习率为0.0001,betas为(0.5,0.999),不进行权重衰减,训练的batch size为256,超级网络总共训练120epochs,前15epochs进行随机采样并只更新权重参数,中间50epochs进行结构参数和权重参数的更新,最后55epochs固定结构只更新权重参数,在本次实施过程中支路总数量C设置为75,即给定显存最大限制C为75;
步骤3-2:如果进行随机采样,则每条支路都有50%概率被激活,如果不是随机采样,则根据公式(1)计算每条支路的重要性,再根据公式(2)把重要性高于0.5的支路激活;
步骤3-3:获取一批训练数据,用被激活的支路进行前向推理,计算损失函数,此处损失函数采用交叉熵,并通过反向传播计算权重参数θ的梯度,如果不是随机采样,则需要根据公式(4)计算结构参数α的梯度;
步骤3-4:用SGD优化器更新权重参数θ,用Adam优化器更新结构参数α;
步骤3-5:如果训练没有结束则返回步骤3-2,如果训练结束则输出训练完成的超级网络。
步骤4:将训练完的超级网络中不激活的支路删除,保留剩余支路以及对应的权重参数,剩余支路构成的超级网络结构如图4所示,在测试集上对超级网络进行测试,获得72.96%top-1正确率,每一批图片需要的推理时间为0.68秒。
步骤5:将超级网络中每个块的支路进行合并,合并示意图如图3所示;
步骤5-1:把每个算子之后的BN层根据公式(5)和算子进行合并;
步骤5-2:把1×1卷积、1×3卷积、3×1卷积、AVG和short cut通过补零转换成3×3卷积,再把每条支路的3×3卷积和原始的3×3卷积根据公式(6)进行合并。
步骤6:将融合后的单支路模型在测试集上进行图像分类,测试设备为Intel Core i7 CPU,模型的正确率、推理速度、参数量以及浮点数计算量如表1所示,与单支路训练模型相比,本专利方法训练的模型拥有与其一样的推理速度和模型大小,但是正确率却远远高于单支路训练模型,此外,与多支路推理模型相比, 本专利方法由于能将多支路融合为单支路,因此能在不损失任何性能的情况下,大大降低模型参数量和计算量。
Figure PCTCN2022091907-appb-000009
表1 模型推理结果比较表
以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的范围。

Claims (10)

  1. 面向边缘计算的重参数神经网络架构搜索方法,其特征在于包括如下步骤:
    S1,设计各个支路的算子以及搜索空间,为原始卷积设置支路,构成多分支块,用于提取不同视野下的图片特征;
    S2,构建包含所有支路的超级网络,通过不断堆叠多分支块,构建支路的网络结构;
    S3,通过离散神经网络架构搜索方法,在给定显存限制下,训练超级网络,包括如下步骤:
    S31,给定显存最大限制C,初始化每条支路的结构参数α和权重参数θ;
    S32,计算每条支路的重要性:
    Figure PCTCN2022091907-appb-100001
    其中,Z i,j表示第i个多分支块中第j条支路的重要性,exp()表示指数e,α i,j表示第i个多分支块中第j条支路的结构参数,ζ i,j表示第i个多分支块中第j条支路的采样噪声,λ i,j表示第i个多分支块中第j条支路的温度系数;
    S33,计算每条支路是否激活:
    Figure PCTCN2022091907-appb-100002
    其中R i,j=α i,ji,j,Rank(R i,j)表示第i个多分支块中第j条支路在所有支路中的重要性排名,s表示排名阈值,通过调整s的大小来满足所述显存最大限制C,低于该排名阈值的支路不激活;
    S34,获取训练数据,多分支块中的每条支路获取不同的图像特征,激活公式(2)中值为1的支路,进行前向推理,之后计算预测的图像分类标签与真实的图像分类标签的损失函数L;
    S35,通过反向传播,分别计算权重参数θ和激活参数Z在损失函数L上的梯度,所述激活参数Z是由Z i,j组成的向量,同时计算结构参数ɑ在logp(Z)上的梯度, p(ɑ)是结构参数ɑ经过如下公式概率化后的结果
    Figure PCTCN2022091907-appb-100003
    S36,根据对L的梯度,更新权重参数θ,同时根据如下公式更新结构参数ɑ:
    Figure PCTCN2022091907-appb-100004
    其中E Z~p(Zi,j)表示在Z采样于p(Z i,j)概率分布下的期望,
    Figure PCTCN2022091907-appb-100005
    是第i个多分支块中第j条支路的结构参数α在logp(Z)上的梯度;
    S37,回到S32,直至超级网络中的权重参数和结构参数训练到收敛;
    S4,将训练后多余的支路从超级网络中去除,得到最佳子网络,根据公式(2)去除经过步骤3训练的超级网络中未激活的支路;
    S5,将多支路的最佳子网络融合成单支路的最佳子网络;
    S6,使用单支路的最佳子网络获取图像特征,进行实时推理,将融合后的单支路网络进行图像分类。
  2. 根据权利要求1所述的面向边缘计算的重参数神经网络架构搜索方法,其特征在于所述S5,通过重参数方法将多支路的最佳子网络融合成单支路的最佳子网络,包括如下步骤:
    S51,将保留下来的每条支路中的卷积层和BN层权重参数进行重参数融合;
    S52,将各支路重参数融合为与原始卷积相同尺度的卷积,再分别与原始卷积融合成一个卷积F j
    S53,将同一个多分支块中的多分支卷积F j融合成一个卷积。
  3. 根据权利要求2所述的面向边缘计算的重参数神经网络架构搜索方法,其特征在于所述S51中重参数融合公式如下
    Figure PCTCN2022091907-appb-100006
    其中,γ表示BN层的缩放参数、μ表示BN层特征的均值、σ表示BN层特征 的标准差、β表示BN层的平移参数,F’ m,:,:,:和F m,:,:,:分别表示融合后和融合前卷积层第m个输出通道的权重参数,b’ m表示融合后卷积层第m个输出通道的偏置参数,下标中的:表示融合后或融合前卷积层的维度的所有元素。
  4. 根据权利要求2所述的面向边缘计算的重参数神经网络架构搜索方法,其特征在于所述S52中,先将各支路分别转换成与原始卷积相同的尺度,再将各支路转换后的卷积分别与原始卷积,根据如下的重参数公式,融合成一个卷积:
    Figure PCTCN2022091907-appb-100007
    其中TRANS表示对张量的转置操作,F 1表示转换后的第j条支路的卷积,F 2表示原始卷积,D为输入通道数,K 1、K 2为卷积核尺寸,F j表示第j条支路对应的融合后的卷积,b j m表示融合后卷积层第m个输出通道的偏置,b 1 d表示转换后的第j条支路的卷积第d个输入通道的偏置,b 2 d表示原始卷积第d个输入通道的偏置,F 2 m,d,u,v表示原始卷积第m个输入通道、第d个输出通道下,卷积核第u行第v列的权重。
  5. 根据权利要求2所述的面向边缘计算的重参数神经网络架构搜索方法,其特征在于所述S52中的支路卷积通过补零操作,将尺度转换成于原始卷积相同。
  6. 根据权利要求2所述的面向边缘计算的重参数神经网络架构搜索方法,其特征在于所述S53中多个卷积融合成一个卷积的重参数公式如下
    F′=F 1+F 2+...+F N,b′=b 1+b 2+...+b N  (7)
    其中N为支路数量,b j为融合后的偏置。
  7. 根据权利要求1所述的面向边缘计算的重参数神经网络架构搜索方法,其特征在于所述S31中的初始化后,先进行对支路的随机采样,并只更新权重参数θ,再进行支路的重要性采样,并更新结构参数ɑ和权重参数θ,最后进行支路的重要性采样,并固定结构参数ɑ,只更新权重参数θ。
  8. 根据权利要求1所述的面向边缘计算的重参数神经网络架构搜索方法,其特征在于所述S32中的采样噪声,是服从均值为0、方差为1的Logistics分布,log(-log(u 1))-log(-log(u 2)),其中u 1和u 2都是u i~U(0,1),表示u i服从0到1的均 匀分布。
  9. 根据权利要求1所述的面向边缘计算的重参数神经网络架构搜索方法,其特征在于所述S1中的原始卷积为原始K×K卷积,其支路有6条分别为:1×1卷积、1×K卷积、K×1卷积、1×1-K×K卷积、1×1-AVG卷积以及捷径short cut。
  10. 根据权利要求1所述的面向边缘计算的重参数神经网络架构搜索方法,其特征在于所述S1中,每条支路的算子后,添加批量归一化BN操作,将每条支路的输出结果,按元素相加,并进行非线性操作,结合起来作为当前多分支块的输出。
PCT/CN2022/091907 2021-08-27 2022-05-10 面向边缘计算的重参数神经网络架构搜索方法 WO2023024577A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/888,513 US11645495B2 (en) 2021-08-27 2022-08-16 Edge calculation-oriented reparametric neural network architecture search method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110991876.7 2021-08-27
CN202110991876.7A CN113435590B (zh) 2021-08-27 2021-08-27 面向边缘计算的重参数神经网络架构搜索方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/888,513 Continuation US11645495B2 (en) 2021-08-27 2022-08-16 Edge calculation-oriented reparametric neural network architecture search method

Publications (1)

Publication Number Publication Date
WO2023024577A1 true WO2023024577A1 (zh) 2023-03-02

Family

ID=77798164

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091907 WO2023024577A1 (zh) 2021-08-27 2022-05-10 面向边缘计算的重参数神经网络架构搜索方法

Country Status (3)

Country Link
US (1) US11645495B2 (zh)
CN (1) CN113435590B (zh)
WO (1) WO2023024577A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703950A (zh) * 2023-08-07 2023-09-05 中南大学 一种基于多层次特征融合的伪装目标图像分割方法和系统

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529978B (zh) * 2020-12-07 2022-10-14 四川大学 一种人机交互式抽象画生成方法
CN113435590B (zh) * 2021-08-27 2021-12-21 之江实验室 面向边缘计算的重参数神经网络架构搜索方法
CN116091372B (zh) * 2023-01-03 2023-08-15 江南大学 基于层分离和重参数的红外和可见光图像融合方法
CN116205856B (zh) * 2023-02-01 2023-09-08 哈尔滨市科佳通用机电股份有限公司 基于深度学习的人力制动机轴链折断故障检测方法及系统
CN116205284A (zh) * 2023-05-05 2023-06-02 北京蔚领时代科技有限公司 基于新型重参数化结构的超分网络、方法、装置及设备
CN116805423B (zh) * 2023-08-23 2023-11-17 江苏源驶科技有限公司 一种基于结构重参数化的轻量级人体姿态估计算法
CN117195951B (zh) * 2023-09-22 2024-04-16 东南大学 一种基于架构搜索和自知识蒸馏的学习基因继承方法
CN117353811A (zh) * 2023-10-17 2024-01-05 国网吉林省电力有限公司 一种电力光通信系统多工况状态监测及分析方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170316312A1 (en) * 2016-05-02 2017-11-02 Cavium, Inc. Systems and methods for deep learning processor
CN110533179A (zh) * 2019-07-15 2019-12-03 北京地平线机器人技术研发有限公司 网络结构搜索方法和装置、可读存储介质、电子设备
CN111553480A (zh) * 2020-07-10 2020-08-18 腾讯科技(深圳)有限公司 神经网络搜索方法、装置、计算机可读介质及电子设备
CN112183491A (zh) * 2020-11-04 2021-01-05 北京百度网讯科技有限公司 表情识别模型及训练方法、识别方法、装置和计算设备
CN112766466A (zh) * 2021-02-23 2021-05-07 北京市商汤科技开发有限公司 一种神经网络架构搜索的方法、装置和电子设备
CN113435590A (zh) * 2021-08-27 2021-09-24 之江实验室 面向边缘计算的重参数神经网络架构搜索方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2572734A (en) * 2017-12-04 2019-10-16 Alphanumeric Ltd Data modelling method
CN109002358B (zh) * 2018-07-23 2021-08-31 厦门大学 基于深度强化学习的移动终端软件自适应优化调度方法
US11550686B2 (en) * 2019-05-02 2023-01-10 EMC IP Holding Company LLC Adaptable online breakpoint detection over I/O trace time series via deep neural network autoencoders re-parameterization
CN110780923B (zh) * 2019-10-31 2021-09-14 合肥工业大学 应用于二值化卷积神经网络的硬件加速器及其数据处理方法
CN111079923B (zh) * 2019-11-08 2023-10-13 中国科学院上海高等研究院 适用于边缘计算平台的Spark卷积神经网络系统及其电路
CN111563533B (zh) * 2020-04-08 2023-05-02 华南理工大学 基于图卷积神经网络融合多种人脑图谱的受试者分类方法
CN111612134B (zh) * 2020-05-20 2024-04-12 鼎富智能科技有限公司 神经网络结构搜索方法、装置、电子设备及存储介质
CN111882040B (zh) * 2020-07-30 2023-08-11 中原工学院 基于通道数量搜索的卷积神经网络压缩方法
CN112036512B (zh) * 2020-11-03 2021-03-26 浙江大学 基于网络裁剪的图像分类神经网络架构搜索方法和装置
CN112381208B (zh) * 2020-11-13 2023-10-31 中国科学院计算技术研究所 一种基于神经网络架构搜索的图片分类方法与系统
CN112508104A (zh) * 2020-12-08 2021-03-16 浙江工业大学 一种基于快速网络架构搜索的跨任务图像分类方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170316312A1 (en) * 2016-05-02 2017-11-02 Cavium, Inc. Systems and methods for deep learning processor
CN110533179A (zh) * 2019-07-15 2019-12-03 北京地平线机器人技术研发有限公司 网络结构搜索方法和装置、可读存储介质、电子设备
CN111553480A (zh) * 2020-07-10 2020-08-18 腾讯科技(深圳)有限公司 神经网络搜索方法、装置、计算机可读介质及电子设备
CN112183491A (zh) * 2020-11-04 2021-01-05 北京百度网讯科技有限公司 表情识别模型及训练方法、识别方法、装置和计算设备
CN112766466A (zh) * 2021-02-23 2021-05-07 北京市商汤科技开发有限公司 一种神经网络架构搜索的方法、装置和电子设备
CN113435590A (zh) * 2021-08-27 2021-09-24 之江实验室 面向边缘计算的重参数神经网络架构搜索方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIU, KAI; LI, YIDONG; LIN, WEIPENG: "A survey on vehicle re-identification", CHINESE JOURNAL OF INTELLIGENT SCIENCE AND TECHNOLOGY, vol. 2, no. 1, 15 March 2020 (2020-03-15), pages 10 - 25, XP009543786, ISSN: 2096-6652 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703950A (zh) * 2023-08-07 2023-09-05 中南大学 一种基于多层次特征融合的伪装目标图像分割方法和系统
CN116703950B (zh) * 2023-08-07 2023-10-20 中南大学 一种基于多层次特征融合的伪装目标图像分割方法和系统

Also Published As

Publication number Publication date
CN113435590B (zh) 2021-12-21
US11645495B2 (en) 2023-05-09
CN113435590A (zh) 2021-09-24
US20230076457A1 (en) 2023-03-09

Similar Documents

Publication Publication Date Title
WO2023024577A1 (zh) 面向边缘计算的重参数神经网络架构搜索方法
Thangaraj et al. Automated tomato leaf disease classification using transfer learning-based deep convolution neural network
CN109299396B (zh) 融合注意力模型的卷积神经网络协同过滤推荐方法及系统
CN111461322B (zh) 一种深度神经网络模型压缩方法
CN110110080A (zh) 文本分类模型训练方法、装置、计算机设备及存储介质
CN110969191B (zh) 基于相似性保持度量学习方法的青光眼患病概率预测方法
CN110633747A (zh) 目标检测器的压缩方法、装置、介质以及电子设备
US20180101765A1 (en) System and method for hierarchically building predictive analytic models on a dataset
AU2021245165B2 (en) Method and device for processing quantum data
CN107491782A (zh) 利用语义空间信息的针对少量训练数据的图像分类方法
Kongsorot et al. Multi-label classification with extreme learning machine
CN113283524A (zh) 一种基于对抗攻击的深度神经网络近似模型分析方法
CN116432037A (zh) 一种在线迁移学习方法、装置、设备和存储介质
Khan et al. Unsupervised domain adaptation using fuzzy rules and stochastic hierarchical convolutional neural networks
Guo et al. Weak sub-network pruning for strong and efficient neural networks
CN111079011A (zh) 一种基于深度学习的信息推荐方法
CN107273971A (zh) 基于神经元显著性的前馈神经网络结构自组织方法
Li et al. Pruner to predictor: An efficient pruning method for neural networks compression
CN115601578A (zh) 基于自步学习与视图赋权的多视图聚类方法及系统
KR20240034804A (ko) 자동 회귀 언어 모델 신경망을 사용하여 출력 시퀀스 평가
Xiong et al. Convergence of batch gradient method based on the entropy error function for feedforward neural networks
US20230289533A1 (en) Neural Topic Modeling with Continuous Learning
CN113051408A (zh) 一种基于信息增强的稀疏知识图谱推理方法
JP7118882B2 (ja) 変数変換装置、潜在パラメータ学習装置、潜在パラメータ生成装置、これらの方法及びプログラム
Hemkiran et al. Design of Automatic Credit Card Approval System Using Machine Learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22859925

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE