CN116958862A

CN116958862A - End-side layered neural network model training method, device and computer equipment

Info

Publication number: CN116958862A
Application number: CN202310745026.8A
Authority: CN
Inventors: 廖丽平; 林俊龙; 蔡君
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-10-27

Abstract

The invention belongs to the technical field of computers, and relates to an end-side layered neural network model training method, an end-side layered neural network model training device, computer equipment and a storage medium, wherein the method comprises the steps of obtaining a neural network; improving the convolutional layer of the neural network, reducing redundancy of the characteristic diagram of the convolutional layer, and obtaining an improved convolutional layer; performing model dynamic segmentation on the neural network model to obtain model dynamic segmentation points; and dynamically cutting the split points according to the improved convolution layer and the model, and designing a layered training framework based on distributed machine learning. Under the condition of ensuring a certain accuracy of the model, the model is improved and segmented, so that the calculated amount of training the model on a single device is reduced, the aim of deploying the neural network model on an end side is fulfilled, the problems of large calculated amount of the model and small data amount of the single device in the end side model training are solved, the time delay is reduced, the transmission safety is improved, and the reasonable utilization of the end side computing force resource is promoted.

Description

End-side layered neural network model training method, device and computer equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an end-side hierarchical neural network model training method, an end-side hierarchical neural network model training device, a computer device, and a storage medium.

Background

With the continuous development of technologies such as the Internet of things, intelligent chips and 5G/6G, the computing capacity of the terminal side equipment is promoted, more and more equipment has strong computing capacity and storage capacity, and some traditional application programs (such as file processing software and a browser) can only utilize part of computing resources of the equipment, cannot fully exert the computing power potential of the equipment, and cause the waste and idling of the computing resources of the equipment, so that good basic conditions are provided for the training of the terminal side neural network model, and meanwhile, the reasonable utilization of the computing power resources of the terminal side is promoted.

Traditional cloud-based model training requires data to be transmitted from a client to a cloud, and various problems such as response delay, cloud bandwidth, data privacy protection and the like are exposed in the mode, so that the landing of the artificial intelligent terminal side application is seriously hindered. Therefore, the concept of end side model training is developed, the end side neural network model training refers to utilizing computing resources of end side equipment, and the training work of an artificial intelligent model originally performed on the cloud is migrated to the end side for performing, so that compared with the traditional cloud side model training, the end side training has the advantages of lower delay, higher privacy protection, better user experience and the like, and can be widely applied to the fields of industrial Internet, intelligent home, smart cities and the like.

Although the end-side artificial intelligence training has wide application prospect and important theoretical significance, a series of challenges are faced. Among them, convolutional Neural Networks (CNNs), which are one of the most widely used artificial intelligence models, lead the prosperous development of artificial intelligence, and inevitably suffer from conflicts between device computing power and model size, data volume and model effect in the process of end-side application. Specifically, at work

Under the requirements of emerging delay sensitive applications such as industrial anomaly detection, automatic driving, real-time inquiry of monitoring video streams and the like, great challenges are faced to deploying CNN training at the end side, for example, the existing CNN large model training generally needs higher computing power and storage capacity, and the training of the whole model is difficult to complete by single end side equipment; the CNN model training is performed on a single end-side device, which causes problems such as poor model effect due to a small data size of the single end-side device.

Disclosure of Invention

Aiming at the defects in the prior art, the embodiment of the invention aims to provide an end-side hierarchical neural network model training method, an end-side hierarchical neural network model training device, computer equipment and a storage medium. In order to solve the technical problems, the invention provides an end-side layered neural network model training method, which adopts the following technical scheme that:

Acquiring a neural network;

improving the convolutional layer of the neural network, reducing redundancy of the characteristic diagram of the convolutional layer, and obtaining an improved convolutional layer;

performing model dynamic segmentation on the neural network model to obtain model dynamic segmentation points;

and dynamically cutting the split points according to the improved convolution layer and the model, and designing a layered training framework based on distributed machine learning.

In order to solve the technical problems, the invention also provides an end-side layered neural network model training device, which adopts the following technical scheme that:

the acquisition module is used for acquiring the neural network;

the improvement module is used for improving the convolutional layer of the neural network, reducing the redundancy of the characteristic diagram of the convolutional layer and obtaining an improved convolutional layer;

the dynamic segmentation module is used for carrying out model dynamic segmentation on the neural network model to obtain model dynamic segmentation points;

and the layering module is used for dynamically cutting the segmentation points according to the improved convolution layer and the model and designing a layering training framework based on distributed machine learning.

In order to solve the technical problem, the invention also provides a computer device, which adopts the technical scheme that the computer device comprises a memory and a processor, wherein the memory stores computer readable instructions, and the processor realizes the steps of the end-side hierarchical neural network model training method when executing the computer readable instructions.

In order to solve the technical problem, the invention also provides a computer readable storage medium, which adopts the technical scheme that the computer readable storage medium stores computer readable instructions, and the computer readable instructions realize the steps of the end-side hierarchical neural network model training method when being executed by a processor.

Compared with the prior art, the invention has the following main beneficial effects: by reducing redundancy of the model feature graphs, the advantages of federal learning and segmentation learning are fully utilized, the limitation that the whole model is trained on a single device and the model is compressed by giving up a certain accuracy in the current research of the opposite end side model is overcome, a method for training the same model by multiple devices is provided, the model is improved and segmented under the condition that the certain accuracy of the model is ensured, the calculated amount of training the model on a single device is reduced, the aim of deploying the neural network model on the end side is fulfilled, the problems of large calculated amount of the model and small data amount of the single device in the end side model training are solved, time delay is reduced, transmission safety is improved, and reasonable utilization of end side calculation force resources is promoted.

Drawings

In order to more clearly illustrate the solution of the present invention, a brief description will be given below of the drawings required for the description of the embodiments of the present invention, it being apparent that the drawings in the following description are some embodiments of the present invention, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is a flow chart of one embodiment of an end-side hierarchical neural network model training method of the present invention;

FIG. 2 is a schematic diagram of an end-side hierarchical neural network model training architecture used in the end-side hierarchical neural network model training method of the present invention;

FIG. 3 is a schematic diagram of reduced model redundancy used in the end-side hierarchical neural network model training method of the present invention;

FIG. 4 is a schematic diagram of three layers of computational force partitions of an end-side device used in the end-side hierarchical neural network model training method of the present invention;

FIG. 5 is a schematic diagram of an automatic encoder used in the end-side hierarchical neural network model training method of the present invention;

FIG. 6 is a diagram of a layered training architecture used in the end-side layered neural network model training method of the present invention;

FIG. 7 is a schematic diagram of one embodiment of an end-side hierarchical neural network model training device of the present invention;

FIG. 8 is a schematic diagram of the architecture of one embodiment of a computer device of the present invention.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention; the terms "comprising" and "having" and any variations thereof in the description of the invention and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present invention, the technical solution of the embodiment of the present invention will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that, the end-side hierarchical neural network model training method provided by the embodiment of the present invention is generally executed by a server/terminal device, and accordingly, the end-side hierarchical neural network model training device is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers may be merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Example 1

With continued reference to FIG. 1, a flow chart of one embodiment of an end-side hierarchical neural network model training method of the present invention is shown. The end-side layered neural network model training method comprises the following steps:

And S1, acquiring a neural network.

The neural network in the embodiment can be a Convolutional Neural Network (CNN) and other neural networks including convolutional layers, is the most widely applied artificial intelligent model, and can be used for processing data such as pictures, audios and videos.

In this embodiment, the electronic device (e.g., server/terminal device) on which the end-side hierarchical neural network model training method operates may receive the end-side hierarchical neural network model training request through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connections, wiFi connections, bluetooth connections, wiMAXX connections, zigbee connections, UWB (ultra wideband) connections, and other now known or later developed wireless connection means.

In the neural network, the currently widely applied models are LeNet, alexNet, VGG, googLeNet series and ResNet, mobileNet series models which are all open sources, and model codes can be checked at https:// gitsub.

First, a pre-trained convolutional neural network model is downloaded. The pre-trained model can be downloaded through an official website or an open source platform such as a GitHub.

Next, a suitable convolutional neural network framework is selected. Common convolutional neural network frameworks include TensorFlow, pyTorch, and the like. The selection of the appropriate framework may facilitate loading and use of the pre-trained model.

Then, loading a pre-trained convolutional neural network model. Depending on the selected framework and model type, the pre-trained model is loaded using the corresponding function or API. For example, in TensorFlow, a model may be loaded using the tf.keras.model.load_model () function; in PyTorch, model parameters may be loaded using a torch.load () function.

And S2, improving the convolutional layer of the neural network, reducing redundancy of the characteristic diagram of the convolutional layer, and obtaining the improved convolutional layer.

Deep neural networks typically contain a large number of convolutional layers, resulting in high computational costs. Although the MobileNet and ShuffleNet et al models introduce deep convolution or shuffling operations to construct efficient CNNs using smaller convolution filters (FTOPS floating point operations), the remaining 1 x 1 convolution layers still take up significant memory and computational effort.

Fig. 2 is a schematic diagram of an end-side hierarchical neural network model training architecture used in the end-side hierarchical neural network model training method of the present invention. FIG. 3 is a schematic diagram of reduced model redundancy for use in the end-side hierarchical neural network model training method of the present invention. As shown in fig. 2 and 3, in this embodiment, step S2 may further include the steps of:

S21, outputting a preset percentage original characteristic diagram by standard traditional convolution operation for each convolution layer.

In order to reduce a large amount of redundancy existing in the intermediate feature map calculated by the mainstream CNN network, the neural network convolution layers are improved, and for each convolution layer, a preset percentage of original feature map is output through standard traditional convolution operation.

In some alternative implementations of this embodiment, the preset percentage is 35% -45%. This example was chosen to be 40%. The feature map generated by the standard convolution layer is called an original feature map, a large amount of computing resources are consumed for generating the feature map, a large amount of redundancy exists in the generated feature map (namely similar feature maps and similar features), and the feature map can be obtained by linear operation with relatively small computation amount. Therefore, 40% of original feature images are generated by using a standard convolution layer, and then the residual phantom feature images are obtained by using the original feature images through simple linear operation, so that the model calculation cost can be reduced.

Given input profile dataWherein H is _in And W is _in The height and width of the input feature map, C is the number of input channels, and the standard convolution layer that generates n original feature maps can be defined as:

Y=x X f+b (formula 1),

where b is the offset term,for an output profile with n channels,for the convolution filter of this layer, C is the number of convolution kernel channels of the filter, K _H ×K _W Is the convolution kernel size of the filter.

The parameters f, b to be optimized are determined by the dimensions of the input and output feature maps. The output signatures of convolutional layers typically contain a lot of redundancy, some of which may be similar to each other, and it is not necessary to generate these redundant signatures one by one with a lot of computational effort and parameters, and the generation of m raw signatures using standard conventional convolutional layers can be defined as:

y '=x X f' (formula 2),

wherein, the liquid crystal display device comprises a liquid crystal display device,for an output profile with m channels, < >>Is a convolution filter of the layer, m is less than or equal to n, and the parameters of convolution kernel size, stride, filling and the like of the filter are matched with the standard convolution (formula 1) for the sake of simplicity and omitting the offset term bAnd the same as the first embodiment, so as to keep the output characteristic diagram consistent in size.

S22, outputting a phantom feature map by linear operation according to the original feature map, and reserving identity mapping of a convolution layer.

To further obtain the n feature maps required, a series of simple linear operations such as affine transformation, wavelet transformation, etc. are applied to each original feature map in Y' to generate s phantom feature maps (ghosfeature) defined as follows:

Wherein y' _i Is the ith original feature map in Y', Φ _i,j Is a linear operation (excluding the last time) for generating the j-th phantom feature map using the i-th original feature map, i.e., y' _i One or more phantom feature maps may be generated, the last Φ _i,s The operation is an identity mapping for the original feature map. Through the above steps, n=m×s feature maps y= { Y can be obtained ₁₁ ,y ₁₂ ,...,y _ms As output data of the modified convolution layer, a linear operation Φ runs on each channel, which is far less computationally expensive than the normal convolution.

Through step S2, the redundancy of the characteristic diagram of the convolution layer can be reduced, and the calculation cost of the model is reduced.

And S3, carrying out model dynamic segmentation on the neural network model to obtain model dynamic segmentation points.

The neural network power of the device typically uses TOPS (Trillion Operations Per Second) as a measure of the number of floating point operations that can be performed per second (1 TOPS represents a processor that can perform one trillion times per second (10) ¹² ) FTOPS operation).

Fig. 4 is a schematic diagram of three layers of computational force division of an end-side device used in the end-side hierarchical neural network model training method of the present invention. As shown in fig. 4, in this embodiment, step S3 may further include the steps of:

S31, combining the type of a processor, the dominant frequency and the core number used by the terminal equipment, and dividing the computing power of the terminal equipment.

The end-side devices may be various types of hardware devices, such as sensors, smart phones, smart watches, smart home devices, smart vehicles, smart cameras, and the like. These devices, through built-in computing and storage capabilities, can perform preliminary processing and analysis, such as data compression, feature extraction, etc., on the collected data locally, thereby reducing the bandwidth and cost of data transmission. Meanwhile, the terminal side equipment can also transmit the processed data to a cloud server or other edge equipment through network connection for further processing and analysis.

In calculating the computational power of a device, factors such as the type of processor used by the device, the dominant frequency, and the number of cores need to be considered. Different types of processors have different architectures and instruction sets, and the performance varies for different computing tasks. Common processor types include CPU, GPU, TPU, etc. The dominant frequency refers to the clock frequency of the processor, typically in hertz (Hz). The higher the dominant frequency, the greater the number of instructions that can be executed per second by the processor and the greater the computational power. The core number refers to the number of compute cores contained in a processor, and different processors may have different numbers of cores. The more cores, the more tasks a processor can perform at the same time, and the more computationally intensive. The evaluation process is usually evaluated by measuring the runtime of the device to perform a certain workload (model training) using a procedure called benchmark. And the terminal equipment can be hierarchically grouped based on the calculation strength value through a clustering algorithm.

The calculation neural network performance of the equipment is evaluated by using a benchmark test program, the floating point operation times executed in unit time of the equipment are tested by running a neural network model on the end side equipment, and the calculation force value E of the equipment per second is calculated _TOPS ，

The Sigma Layer (FLOPS) is the total floating point operation times of the operation model, and the floating point operation times of each Layer of operation can be calculated through the formulas 5-11 and then accumulated; t is the average run time of a single execution;

calculating force value E based on equipment per second _TOPS Hierarchical grouping is carried out, and the equipment is divided into high-level power computing equipment, medium-level power computing equipment and primary power computing equipment.

In specific implementation, the high-level power computing equipment can be a desktop, a local server of a notebook computer and the like, the power computing level is 10-100 TOPS, the medium-level power computing equipment can be a mobile phone, a tablet computer, a gateway and the like, the power computing level is 1-10 TOPS, the primary power computing equipment can be a camera, an intelligent bracelet, a PLC and the like, and the power computing level is 0-1 TOPS.

S32, calculating the power of each layer of the neural network model.

The amount of computation Force (FTOPS) required for a single training of the neural network model is related to many factors, such as feature map size, convolution/pooling kernel size, number of channels, step size, number of operations, etc. Different types of layers of the neural network have different calculation formulas, wherein the convolution layer mainly comprises convolution operation and addition operation; the pooling layer mainly comprises operations such as maximum pooling, average pooling and the like; each element of the activation layer needs to perform nonlinear function operation once; the fully connected layer mainly includes matrix multiplication and addition operations.

According to the convolutional neural network composition structure, calculating the required calculated force value TOPS layer by layer, wherein the calculated force value TOPS is defined as:

assume that the size of each layer of input characteristic diagram of the convolutional neural network composition structure is H _in ×W _in The size of the output characteristic diagram is H _out ×W _out Convolution/pooling kernel size K _H ×K _W The number of input and output channels is C respectively _in ,C _out Step length is S, and total floating point operation number calculation force value TOPS of each layer is:

the input layer has no actual calculation, so:

FLOPS=0 (equation 6);

the convolution layer TOPS is:

FLOPS＝H _in ×W _in ×K _H ×K _W ×C _in ×C _out ÷S ² (equation 7);

pooling layer TOPS is:

FLOPS＝H _in ×W _in ×K _H ×K _W ×C _in ×C _out (equation 8);

a nonlinear function calculation is performed for each element of the activation layer,

FLOPS＝H _in ×W _in ×C _in (equation 9);

each element of the full connection layer/output layer is subjected to one point operation and one addition operation,

FLOPS＝2×H _in ×W _in ×C _in ×H _out ×W _out ×C _out (equation 10);

normalization layer TOPS is:

FLOPS＝4×W _in ×H _in ×C _in (equation 11).

S33, determining dynamic cutting and dividing points of the model by using a reinforcement learning algorithm.

The CUI curve is used for searching some proper segmentation points on the neural network by using Grad-CAM algorithm, so that the search range of the action space is reduced. The Grad-CAM (Gradient-weighted Class Activation Mapping) algorithm is a neural network visualization method in the field of interpretability, and can infer important areas of the neural network to an input image through Gradient information of a convolution layer without additional data or manual labeling. Specifically, the Grad-CAM algorithm uses gradient weights to calculate the importance degree of each feature map in the classification result, then uses the weights as the weights of the feature map of the layer to obtain the weighted average value of the feature map of the layer, and finally generates a CUI curve to represent which areas are important in the classification process. Unlike other deconvolution and gradient ascent based methods, grad-CAM has the advantage of simplicity and easy understanding and no need to modify the model structure, and is therefore suitable for various deep learning models (e.g., CNN, RNN, transformer) and their application fields (e.g., image recognition, natural language processing, video processing, etc.).

And determining the optimal model segmentation point by using a reinforcement Learning Q-Learning algorithm. The specific algorithm is designed as follows:

first, a state space is defined. In a state environment in which the device residual calculation force, the model size and the model channel number change are considered, a state space vector is defined:

S _i ＝(a ^p ,a ⁱ ,b ^p ,b ⁱ ) (equation 12),

wherein a is ^p For the minimum remaining computing power of the primary computing power equipment, a ⁱ Minimum remaining computing power for middle-level computing power equipment, b ^p Partitioning the model W ^p Calculated force magnitude of b ⁱ Partitioning the model W ⁱ P represents the device state corresponding to the primary computing device and the model running thereon, and i represents the device state corresponding to the intermediate computing device and the model running thereon. The calculation of the remaining computing power of the device is defined as:

a＝δ×E _TOPS (equation 13),

delta is the CPU occupancy rate of the equipment in the current environment; the calculation of the calculation force of each layer of the model is given by the previous section; the model channel number change can be directly obtained through a model structure and is represented by a group of ternary variables, wherein-1 represents that the model channel number of the segmentation position is reduced, 0 represents that the model channel number is unchanged, and 1 represents that the model channel number is increased.

Next, an action space is defined. For the model environment, defining all possible actions (namely selecting different segmentation points), firstly adopting Grad-CAM algorithm to search some possible segmentation points on the model in order to reduce the range of action space, setting C data classification results on the model W, each consisting of J data (j=1, 2, the..J), and the model having I layers (i=1, 2, the..I), and calculating the feature map importance coefficient of each layer by using Grad-CAM algorithm for the data in each classification, wherein the definition is as follows:

Wherein, the liquid crystal display device comprises a liquid crystal display device,is the feature map importance coefficient of the ith layer of the jth data model in the C-th class, F ^i,j ∈R ⁿ ^×m×z Is the characteristic diagram of the ith layer of the neural network of the data j, n, m and z are the height, width and channel number of the characteristic diagram respectively, y ^c Is the probability score for the class C prediction result. And then with the selected layer's feature map F ^i,j And performs a weighted summation between them. Finally, reLU applies an activation function to set the negative value of the gradient to zero to obtain a weighted activation map of the ith layer of the ith classification data j model +.>The definition is as follows:

thereby obtaining CUI value of the ith layer of the jth data model in the C-th classificationFitting +.>Value formation->Curve, CUI of the model is obtained by linear fitting each C image _i Curve, CUI _i The local maximum on the curve is defined as the possible segmentation point on the model, namely action space D _i (i＝1,2,...,I)。

Then, a bonus function is defined. A bonus function may be defined as the bonus that is obtained after performing a certain action in the current state. In this example, the reward function may be defined as the accuracy, time delay and matching degree of the model on the test set to the state space, if the model predicts more accurately and the time required for completing the task is less, then a higher reward value should be given, the accuracy and time for completing the model may be used as input parameters, and the reward value with positive correlation to the accuracy and negative correlation to the time is output, and the function is defined as follows:

Wherein k is ₁ ，k ₂ ，k ₃ ，k ₄ ，k ₅ Is a positive number for adjusting the extent to which each factor affects the prize. The initialized Q value tables are all set to 0, and the Q value represents the expected return value that can be obtained by taking an action in a certain state.

Again, iterative training begins. At each time step, the agent selects an action from the Q-value table according to the current state and performs the action. After the action is executed, the intelligent agent observes the feedback (reward signal) of the environment, and updates the value of the Q value table by adopting the Q-Learning algorithm according to the feedback so as to gradually learn the optimal strategy, and dynamically adjusts the dividing points according to the optimal strategy. Then, the values of the Q value table are updated using the bellman equation according to the current state, the action performed, and the obtained bonus signal such that the Q value gradually approaches the optimal value:

Q(S _i ,D _i )＝Q(S _i ,D _i )+α×[reward+γ×max(Q(S _i+1 ,D _i+1 ))-Q(S _i ,D _i )](equation 17),

where α is the learning rate, γ is the discount factor, max (Q (S _i+1 ,D _i+1 ) Is the maximum Q value for all possible actions D in the next state S. Splitting the neural network model W into W by dynamically determining the optimal split point ^p And W is ⁱ Two partitions, called primary and intermediate computing networks, respectively, perform processing and computation on primary and intermediate computing devices. The primary computing device where the data is located only will W ^p The parameters of the partition are submitted to the high-level computing power equipment, and the medium-level server only submits W ⁱ Parameter submission of partitions to advanced computing devices. Communication involves transmitting activation of a sliced layer of the primary computing power network (referred to as shredding data) to the intermediate computing power device and receiving gradients of shredding data from the intermediate computing power device.

Finally, an automatic encoder is designed. Fig. 5 is a schematic diagram of an automatic encoder structure used in the end-side hierarchical neural network model training method of the present invention. The use of the automatic encoder architecture shown in fig. 5, whose asymmetric architecture design allows to reduce the computational effort on the primary computing device, is composed of a convolution layer with 2 x 2 and 3 x 3 kernels in steps, which performs the spatial and channel compression of the feature map in a single step. The convolutional layer is followed by a Generalized Division Normalization (GDN) layer, which is commonly used for most depth compression schemes, e.g., as a substitute for BN. The operation of the GDN is defined as:

wherein, the liquid crystal display device comprises a liquid crystal display device,an ith output channel representing the spatial position of the kth level of the encoder,/th output channel>Representing the corresponding input value. iGDN is the approximate inverse of GDN defined as:

wherein, the liquid crystal display device comprises a liquid crystal display device,respectively the input and output of the itdn. Finally, a parameter rectification linear unit (PReLU) is used as an activation function to further improve the learning ability of the model. The output of the encoder network is transmitted directly through the channel.

In the decoder, a single convolution with a stride of 1 x 1 and a kernel size of 3 x 3 is first performed on the compressed signature. The following is the iGDN operation, PReLU activation, and upsampling to recover the original spatial dimensions of the intermediate feature map. Finally, another convolution layer with the same stride and kernel size is applied to increase the depth of the feature map to its original value, followed by BN and pralu.

S34, splitting the neural network model W into a primary power calculation network W ^p And a medium-level computing power network W ⁱ Two partitions, processing and computing on the primary and intermediate computing devices.

The model is divided into two parts by reinforcement learning: a primary computing power network and a medium-level computing power network. Put on the primary computing device to operate, called primary computing network. And putting the model on a medium-level computing power device to perform operation, namely a medium-level computing power network, and adapting the computing power required by the model to the computing power of the device. Processing and computation are performed on the primary and intermediate computing devices, mainly forward and backward propagation of the model on the CPU/GPU, updating parameters on the model. Combining some possible slicing points on the CU I curve, the model state of each possible slicing point and the current device residual computing power can be regarded as a state, and the optimal action (i.e. selecting the optimal slicing point) is selected according to the current state, so that the performance of the model can be optimized by dynamically adjusting the slicing points, the optimal slicing point of the model is searched, and the neural network model W is split into W ^p And W is ⁱ Two partitions, called primary and intermediate computing networks, respectively, perform processing and computation on primary and intermediate computing devices.

And the optimal dividing point is found through dynamic matching of the calculation force of the equipment and the size of the model, so that the communication overhead is reduced and the data transmission safety is protected.

An automatic encoder is a structure that can be embedded in a neural network, including an encoder and a decoder. The encoder compresses the input data into an intermediate representation of a low dimension, the decoder restores the intermediate representation to the same dimension as the original input, the intermediate representation is called an injection bottleneck, and the input data can be effectively mapped into an intermediate representation space where the injection bottleneck is located, so that the compression of the data is realized. Therefore, the automatic encoder, which is a segmentation method of injecting bottlenecks, is used at the segmentation point, so that the communication overhead in the training process can be reduced under the condition of ensuring certain data transmission safety.

And S4, dynamically cutting the split points according to the improved convolution layer and the model, and designing a layered training framework based on distributed machine learning.

The federal learning and the segmentation learning are two distributed machine learning methods, and unlike the traditional centralized learning method, the distributed machine learning does not need to transmit data to any untrusted party, can avoid sending the original data set to a central server, and is widely used because of inherent data privacy protection advantages, communication cost reduction and computing resource utilization rate improvement.

In this embodiment, step S4 may further include the steps of:

s41, based on federal learning, training a complete neural network model on the distributed client side in parallel by using local data.

Federal Learning (FL) uses local data to train a complete neural network model in parallel on distributed clients, then sends the locally trained complete neural network model to a server for aggregation to form a global model, and then sends learning parameters of the global model back to all clients for the next round of training until the algorithm converges. However, the client participating in federal learning has a certain requirement on computing capacity, the client with limited computing resources cannot run the complete model, and meanwhile, in the federal learning training process, both the server and the client can access the local model and the global model, so that certain potential safety hazards exist.

S42, the locally trained complete neural network model is sent to a server to be aggregated to form a global model.

S43, the learning parameters of the global model are sent back to all clients to perform the next training until the algorithm converges.

The algorithm here refers to various activation functions on the neural network model, which are written by the code when the model is acquired.

S44, splitting the complete neural network model into a plurality of parts based on segmentation learning.

S45, training the segmented multiple neural network models on different devices respectively.

In the industrial internet scene, for example, the production anomaly detection is performed by utilizing the data collected by the intelligent cameras, a large number of cameras in a factory building are similar to primary computing equipment, a plurality of computers in a monitoring room are equivalent to medium-level computing equipment, and a server of the factory is equivalent to high-level computing equipment.

In combination with the advantages of federal learning and segmentation learning, an AI training model framework with end-side layered computing force coordination is designed, the framework consists of one high-level computing force device, H middle-level computing force devices and K primary computing force devices, and fig. 6 is a layered training framework diagram used in the end-side layered neural network model training method of the invention, as shown in fig. 6, and specific training steps are as follows:

(1) w of neural network model ^p The partitions are sent to primary computing devices for computation, typically also as raw devices for data acquisition. The second partition of the neural network model is placed on the middle-level computing equipment for computing and learning, and the whole neural network model is placed on the high-level computing equipment for model weight aggregation and reasoning;

(2) For W on primary computing equipment ^p The partitioned neural network starts to propagate forward, and then the primary computing device end sends the intermediate data (crushed data), namely the activation result, to the corresponding intermediate computing device to perform model W ⁱ Forward propagation operations of partitions;

(3) the output label of the intermediate computing equipment is obtained, the value of the loss function is calculated to start back propagation, and after the intermediate computing equipment back propagates to the segmentation layer gradient calculation, the gradient is sent back to the primary computing equipment for W ^p The back propagation of the partitions computes gradients;

(4) the corresponding primary and middle-level computing devices respectively send the weight parameters calculated by the two partitions to the high-level computing device to execute FedAVg algorithm, update the whole neural network model, send the weight parameters of the model back to the primary and middle-level computing devices after updating, and carry out the next training;

(5) through multiple rounds of cyclic training operation, training is stopped after the accuracy of the neural network model on the advanced computing equipment reaches an ideal value, reasoning is finally carried out on the advanced computing equipment, and then the reasoning result is issued to the corresponding equipment for execution.

Segmentation Learning (SL) can reduce processing load while improving model security (as compared to running a full network in FL), which is of great importance for the development of artificial intelligence model training on resource constrained devices. Meanwhile, the accuracy of the neural network model after segmentation is consistent with that of the original model.

By designing a layered training architecture, the advantages of federal and segmentation learning are integrated, and the computing resources at the end side are reasonably utilized.

The implementation of the embodiment has the beneficial effects that:

firstly, replacing part of traditional convolution layer operation by using simple linear operation, so that a large amount of redundancy existing in an intermediate feature diagram of traditional CNN network calculation is reduced, and model calculation overhead is reduced;

secondly, based on an end-side layered artificial intelligence training architecture of federal segmentation learning, training of a neural network model is realized by using distributed end-side computing power network computing resources, layered utilization of the end-side computing power network computing resources is improved, and the computing quantity of the model on end-side equipment is reduced. The training architecture is changed from cloud modeling-terminal use into terminal training and reasoning, so that the long-distance data transmission communication overhead of the cloud is removed, the training and reasoning speed of the neural network model is improved, and the deployment of the artificial intelligent model is realized on the terminal side equipment;

and finally, a model dynamic segmentation mechanism based on a reinforcement Learning Q-Learning algorithm. The end side equipment is required to realize training and reasoning of an artificial intelligent model, the difficulty of model segmentation is faced, the calculation amount and the data amount of each part are required to be considered when the model segmentation points are designed, meanwhile, the calculation force change of the end side equipment is required to be monitored, and the model segmentation points can be adaptively adjusted according to the calculation force change of the end side equipment by constructing a dynamic matching model segmentation mechanism so as to improve training efficiency and performance.

The invention is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

Example two

With further reference to fig. 7, as an implementation of the method shown in fig. 1, the present invention provides an embodiment of an end-side hierarchical neural network model training device, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 1, and the device may be specifically applied to various electronic devices.

As shown in fig. 7, the end-side hierarchical neural network model training device 50 according to the present embodiment includes: an acquisition module 51, an improvement module 52, a dynamic segmentation module 53 and a layering module 54. Wherein:

An acquisition module 51 for acquiring a neural network;

the improvement module 52 is configured to improve a convolutional layer of the neural network, reduce redundancy of a feature map of the convolutional layer, and obtain an improved convolutional layer;

the dynamic segmentation module 53 is configured to perform model dynamic segmentation on the neural network model to obtain model dynamic segmentation points;

and the layering module 54 is used for dynamically cutting the segmentation points according to the improved convolution layer and the model, and designing a layering training framework based on distributed machine learning.

The implementation of the embodiment has the beneficial effects that:

Example III

In order to solve the technical problems, the embodiment of the invention also provides computer equipment. Referring specifically to fig. 8, fig. 8 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 6 includes a memory 61, a processor 62, and a network interface 63 which are communicatively connected to each other via a system bus. It is noted that only the computer device 6 having the component memory 61, the processor 62 and the network interface 63 is shown in the figures, but it is understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer device may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 61 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 6. Of course, the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 61 is generally used to store an operating system and various application software installed on the computer device 6, such as computer readable instructions of an end-side hierarchical neural network model training method. Further, the above-described memory 61 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6 described above. In this embodiment, the processor 62 is configured to execute the computer readable instructions stored in the memory 61 or process data, for example, execute the computer readable instructions of the end-side hierarchical neural network model training method.

The network interface 63 may comprise a wireless network interface or a wired network interface, which network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

The implementation of the embodiment has the beneficial effects that:

Example IV

The present invention also provides another embodiment, namely, a computer readable storage medium storing computer readable instructions executable by at least one processor to cause the at least one processor to perform the steps of an end-side hierarchical neural network model training method as described above.

The implementation of the embodiment has the beneficial effects that:

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods of the embodiments of the present invention.

It is apparent that the above-described embodiments are only some embodiments of the present invention, but not all embodiments, and the preferred embodiments of the present invention are shown in the drawings, which do not limit the scope of the patent claims. This invention may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the invention are directly or indirectly applied to other related technical fields, and are also within the scope of the invention.

Claims

1. The end-side layered neural network model training method is characterized by comprising the following steps of:

acquiring a neural network;

2. The method for training an end-side hierarchical neural network model according to claim 1, wherein the step of improving the convolutional layer of the convolutional neural network to reduce redundancy of the convolutional layer feature map, and the step of obtaining the improved convolutional layer specifically comprises:

outputting a preset percentage original characteristic diagram by standard traditional convolution operation aiming at each convolution layer;

and outputting a phantom feature map by linear operation according to the original feature map, and reserving an identity mapping of a convolution layer.

3. The end-side hierarchical neural network model training method according to claim 2, wherein the preset percentage is 35% -45%.

4. The method for training an end-side hierarchical neural network model according to claim 1, wherein the step of performing model dynamic segmentation on the neural network model to obtain model dynamic segmentation points specifically comprises:

combining the type of a processor, a main frequency and the core number used by the terminal equipment, and dividing the computing power of the terminal equipment;

calculating the power of each layer of the neural network model;

Determining dynamic cutting points of the model by using a reinforcement learning algorithm;

splitting a neural network model W into a primary power calculation network W ^p And a medium-level computing power network W ⁱ Two partitions, processing and computing on the primary and intermediate computing devices.

5. The method for training an end-side hierarchical neural network model according to claim 1, wherein the step of designing a hierarchical training architecture based on distributed machine learning by dynamically slicing the split points according to the improved convolutional layer and the model specifically comprises:

based on federal learning, training a complete neural network model on a distributed multi-layer client in parallel using local data;

transmitting the locally trained complete neural network model to a server for aggregation to form a global model;

the learning parameters of the global model are sent back to all clients to carry out the next training until the algorithm converges;

based on segmentation learning, splitting a complete neural network model into a plurality of parts;

and training the segmented multiple neural network models on different devices respectively.

6. The method for training an end-side hierarchical neural network model according to claim 4, wherein the step of performing computational division on the end-side device specifically includes:

Testing the floating point operation times executed in unit time of the equipment by running a neural network model on the equipment, and calculating the equipment calculation force value E per second _TOPS ，Wherein Sigma Layer (FLOPS) is the total floating point operation times of the operation model, and t is the average operation time of single execution;

7. The method for training an end-side hierarchical neural network model according to claim 4, wherein the step of determining the model dynamic segmentation points by using a reinforcement learning algorithm specifically comprises:

defining a state space vector S _i ＝(a ^p ,a ⁱ ,b ^p ,b ⁱ ) Wherein a is ^p For the minimum remaining computing power of the primary computing power equipment, a ⁱ Minimum remaining computing power for middle-level computing power equipment, b ^p Partitioning the model W ^p Calculated force magnitude of b ⁱ Partitioning the model W ⁱ P represents the device state corresponding to the primary computing device and the model running thereon, i represents the device state corresponding to the intermediate computing device and the model running thereon, and the device residual computing force is defined as: a=δ×e _TOPS Wherein delta is the CPU occupancy rate under the current environment of the equipment, E _TOPS Calculating a force value for each second of equipment;

Defining an action space, searching some possible segmentation points on a model by adopting a Grad-CAM algorithm, setting C data classification results on a model W, wherein j=1, 2, and J, the model has I layers, i=1, 2, and I, calculating a feature map importance coefficient of each layer by using the Grad-CAM algorithm for data in each classification, and defining as followsWherein (1)>Is the feature map importance coefficient of the ith layer of the jth data model in the C-th class, F ^i,j ∈R ^n×m×z Is the characteristic diagram of the ith layer of the neural network of the data j, n, m and z are the height, width and channel number of the characteristic diagram respectively, y ^c Is the probability score of the C-th classification prediction result and then is matched with the characteristic diagram F of the selected layer ^i,j The weighted summation is carried out, and finally, the ReLU application activation function sets the negative value of the gradient to be zero to obtain a weighted activation diagram of the ith layer of the j model of the C type classification dataDefined as->Obtaining CUI value of ith layer of jth data model in the C-th classificationFitting +.>Value formation->Curve, CUI of the model is obtained by linear fitting each C image _i Curve, CUI _i The local maximum on the curve is defined as the possible segmentation point on the model, i.e. the action space D _i ，i＝1,2,...,I；

Defining a reward function Wherein k is ₁ ，k ₂ ，k ₃ ，k ₄ ，k ₅ Is a positive number, is used for adjusting the influence degree of each factor on rewards, and is initialized to be 0, wherein Q value represents an expected return value obtained by taking a certain action under a certain state;

performing iterative training, selecting an action from the Q value table according to the current state by the intelligent agent at each time step, and executing the actionAfter the action is executed, the agent observes the feedback of the environment, namely the reward signal, and adopts the Q-Learning algorithm to update the value of the Q value table according to the feedback so as to gradually learn the optimal strategy, dynamically adjusts the dividing point according to the optimal strategy, and then uses the Belman equation to update the value of the Q value table according to the current state, the executed action and the obtained reward signal so that the Q value gradually approaches to the optimal value Q (S _i ,D _i )＝Q(S _i ,D _i )+α×[reward+γ×max(Q(S _i+1 ,D _i+1 ))-Q(S _i ,D _i )]Where α is the learning rate, γ is the discount factor, max (Q (S _i+1 ,D _i+1 ) Is the maximum Q value for all possible actions D in the next state S.

8. An end-side hierarchical neural network model training device, comprising:

the acquisition module is used for acquiring the neural network;

9. A computer device comprising a memory having stored therein computer readable instructions which when executed by the processor implement the steps of the end-side hierarchical neural network model training method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the end-side hierarchical neural network model training method of any of claims 1 to 7.