WO2024032121A1 - Deep learning model reasoning acceleration method based on cloud-edge-end collaboration - Google Patents

Deep learning model reasoning acceleration method based on cloud-edge-end collaboration Download PDF

Info

Publication number
WO2024032121A1
WO2024032121A1 PCT/CN2023/098730 CN2023098730W WO2024032121A1 WO 2024032121 A1 WO2024032121 A1 WO 2024032121A1 CN 2023098730 W CN2023098730 W CN 2023098730W WO 2024032121 A1 WO2024032121 A1 WO 2024032121A1
Authority
WO
WIPO (PCT)
Prior art keywords
delay
deep learning
learning model
computing
hierarchical
Prior art date
Application number
PCT/CN2023/098730
Other languages
French (fr)
Chinese (zh)
Inventor
郭永安
周金粮
王宇翱
钱琪杰
孙洪波
Original Assignee
南京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京邮电大学 filed Critical 南京邮电大学
Publication of WO2024032121A1 publication Critical patent/WO2024032121A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Definitions

  • the invention belongs to the field of cloud-edge-end collaborative computing, and specifically relates to a deep learning model inference acceleration method based on cloud-edge-end collaboration.
  • Intelligent applications based on deep learning models usually require a lot of calculations.
  • One is the End-only mode, which uses simple models and lightweight deep learning model frameworks to perform all calculations on the physical side, such as TensorFlow Lite, Caffe For Android; the second is Cloud-only mode, which offloads all computing tasks to a cloud center with powerful computing power to perform complex deep learning model calculations.
  • the above method will either reduce the recognition accuracy because it only deploys a simple model on the physical side, or it will cause excessive transmission delay overhead due to the instability of the wide area network transmission link between the physical side and the cloud. Therefore, it is quite difficult to ensure reasonable latency and recognition accuracy at the same time.
  • edge computing paradigm To overcome the contradiction between latency and recognition accuracy, a better solution is to leverage the edge computing paradigm.
  • existing edge computing execution frameworks and offloading mechanisms for deep learning model inference still have some limitations due to ignoring the characteristics of deep learning applications and the dynamics of the edge environment.
  • the purpose of this invention is to minimize the response delay of computing tasks on the premise of meeting the calculation accuracy by combining the edge computing paradigm and cloud computing, and offloading the deep learning model to different edge computing nodes in layers.
  • the present invention provides the following technical solution: a deep learning model inference acceleration method based on cloud-edge-end collaboration, where the cloud-edge-end collaboration refers to a cloud server and at least two edge computing nodes communicating with the cloud server. and at least one physical terminal.
  • the communication distance between the physical terminal and the edge computing node is smaller than the distance between the edge computing node and the cloud server.
  • the method includes the following steps:
  • Step S1 The physical terminal preprocesses the image data into image feature data D 1 with the same resolution and equal data volume, inputs the divided DNN layers of the deep learning model DNN z to be offloaded, and uses the output of the previous layer as the next layer. The input of the layer finally gets D z ;
  • Step S2 perform the offline learning phase: Based on the preset load conditions of each edge computing node, the process of processing the image feature data D z using the deep learning model DNN z to be offloaded on each edge computing node is used as the input, known image feature data D z Through the calculation delay corresponding to each DNN z of the deep learning model to be offloaded on each edge computing node as the output, a hierarchical calculation delay prediction model CT is constructed and trained;
  • the process of image feature data D z at each DNN z of the deep learning model to be offloaded on the cloud server is used as input, and each DNN z of the known deep learning model to be offloaded on the cloud server processes the image feature data.
  • the calculation delay corresponding to D z is the output, and the cloud server calculation delay prediction model CT c is constructed and trained;
  • Step S3 According to the actual computing resource load of each edge computing node, the edge computing node corresponding to the computing task of the physical terminal applies the hierarchical computing delay prediction model CT to process the image feature data of each DNN z of the deep learning model to be offloaded.
  • the process of D z is to input and obtain the image feature data D z through the deep learning model to be offloaded on each edge computing node.
  • the calculation delay corresponding to each DNN z is the theoretical hierarchical calculation delay CT′ of the output;
  • Step S4 Based on the known edge computing node LAN network bandwidth r and the physical distance l between each edge computing node, calculate the data transmission required to transmit the image feature data D z to other edge computing nodes through the current edge computing node.
  • Time delay T and propagation delay S At the same time, based on the known cloud server network bandwidth r c and the physical distance l c between the edge computing node of the computing task and the cloud server, calculate the image transmitted by the edge computing node of the computing task
  • the data transmission delay T c and propagation delay S C required for characteristic data D 1 to the cloud server;
  • Step S5 Take the theoretical hierarchical calculation delay CT′ of each edge computing node obtained in step S3, and the data transmission delay T and propagation delay S obtained in step S4 as input, and use the corresponding response delay TIME as the output,
  • Step S6 According to the cloud server computing delay prediction model CT c obtained in step S2 and the computing resource load of the cloud server, apply the hierarchical computing delay prediction model CT c to process the image features of each DNN z of the deep learning model to be offloaded.
  • the process of data D z is to input and obtain the image feature data D z through each DNN z of the deep learning model to be offloaded on the cloud server.
  • the corresponding calculation delay is the output theoretical hierarchical calculation time delay CT z ′, and then according to the following formula:
  • Step S7 Dynamically compare the response delay TIMEc when using the cloud server alone with the TIME size with the smallest response delay of the deep learning model hierarchical offloading model. If TIME is less than TIMEc, then use the deep learning model corresponding to the smallest response delay TIME to hierarchize.
  • the offloading model is a hierarchical offloading strategy, which completes the offloading calculation of the data to be calculated with the goal of minimizing the response delay; otherwise, the cloud server is used alone to process the data to be calculated corresponding to the response delay TIMEc as the final hierarchical offloading strategy, and the data to be calculated is completed. Data is offloaded to minimize response latency;
  • Step S8 Based on the hierarchical offloading strategy obtained in step S7, each edge computing node that executes the hierarchical offloading strategy collects the computing load of the computing task, and then returns to step S2.
  • the aforementioned divided DNN layers of the deep learning model to be unloaded are obtained as follows: the neurons contained in the hidden layer, input layer and output layer of the deep learning model to be unloaded are arranged in separate columns. The neurons are divided into n columns to obtain separate neuron columns, and then DNN z is obtained,
  • DNN z : n is a positive integer.
  • step S1 is specifically:
  • Each DNN z based on the segmented deep learning model to be offloaded uses the process of processing the image feature data D z by each DNN z of the deep learning model to be offloaded on each edge computing node as input.
  • the image feature data D z is passed on each edge computing node.
  • the condition is the preset CPU load, ⁇ is the computing resource load condition and the GPU load is preset, and ⁇ is the computing resource load condition and the cache load is preset.
  • edge computing nodes include deep reinforcement networks, deep learning models, situational awareness centers, and decision-making transceiver centers;
  • the deep reinforcement network includes:
  • the hierarchical computing delay prediction module is used to calculate the theoretical hierarchical computing delays CT′ and CT′ c , as well as the storage hierarchical computing delay prediction model CT and the cloud server computing delay prediction model CT c ;
  • Transmission delay calculation module used to calculate data transmission delay T and propagation delay S;
  • the online decision-making delay statistics module is used to calculate the time t from the edge computing node receiving the computing task sent by the physical terminal to generating the deep learning model hierarchical offloading model;
  • the online learning module is used to collect and transmit the actual computing load and actual computing delay data during computing tasks to the hierarchical computing delay prediction module of the edge computing node;
  • the offline sample data storage module is used to store the image feature data D z corresponding to each DNN z of the deep learning model to be offloaded on each edge computing node under preset load conditions of each edge computing node and cloud server, and The image feature data D z passes through the calculation delay corresponding to each DNN z of the deep learning model to be offloaded on the cloud server;
  • the decision information generation module is used to pass the generated final hierarchical offloading strategy to the decision transceiver center;
  • the situation awareness center includes:
  • the edge computing node computing capability awareness module is used to calculate the computing resource load of each edge computing node
  • the cloud server computing capability awareness module is used to calculate the computing resource load of the cloud server
  • the network telemetry module is used to calculate the network bandwidth r of the local area network where each edge computing node is located, and is used to calculate the physical distance l between each edge computing node;
  • the decision-making transceiver center is used to send and receive the final hierarchical offloading strategy.
  • the aforementioned cloud server includes a deep learning model and a decision transceiver center; the deep learning model is a trained deep learning model; and the decision transceiver center is used to receive the final hierarchical offloading strategy.
  • the situation awareness center includes a computing capability awareness module and a network telemetry module.
  • this method adopts Combining the edge computing paradigm with cloud computing, and offloading deep learning models to different edge computing nodes in layers, fully exploiting the computing potential of the edge side, and minimizing the response delay of computing tasks while satisfying computing accuracy.
  • This method is carried out under the premise of offline learning. Furthermore, this method can update the hierarchical computing delay prediction model in real time based on the actual measured computing resource load and computing delay of each task calculation to optimize deep learning. Decision-making process for model hierarchical offloading.
  • the deep learning model is hierarchically offloaded to edge computing nodes such as edge computing nodes and cloud servers.
  • the collaborative reasoning method can effectively ensure the security of computing data and reduce network bandwidth occupancy.
  • Figure 1 is a technical principle diagram of the present invention.
  • Figure 2 is a schematic diagram of the module composition of the deep reinforcement network of the present invention.
  • Figure 3 is a schematic diagram of the hierarchical offloading of the deep learning model of the present invention.
  • Figure 4 is a flow chart of the method of the present invention.
  • edge computing nodes within the communication range of the cloud server c.
  • the edge computing nodes are deployed on wifi access points or base stations, and the local area network where the edge computing nodes are located At least one physical terminal is set up within the communication range; the distance between each edge computing node and each physical terminal within its communication range is smaller than the distance between the edge computing node and the cloud server; any edge computing node i within the communication range of cloud server c, edge
  • the total number of other edge computing nodes within the communication range of computing node i that is less than the preset distance from it is recorded as N, and 1 ⁇ j ⁇ N, where j is each edge computing node within the communication range of edge computing node i that is less than the preset distance from it.
  • N edge computing nodes together with the edge computing node i form an edge cluster; a deep learning model and a decision-making transceiver center are deployed on the cloud server c; a deep reinforcement network, a deep learning model, and situational awareness are deployed on the edge computing node.
  • Center and Decision Mailing and receiving center are deployed on the edge computing node.
  • a deep reinforcement network is deployed on the edge computing node.
  • the deep reinforcement network includes a hierarchical computing delay prediction module, a transmission delay calculation module, an online decision-making delay statistics module, an online learning module, and an offline sample data storage module. and decision-making information generation module; with the goal of minimizing the computing task response delay TIME, comprehensively consider the data transmission delay T, data propagation delay S, deep learning model hierarchical computing delay CT and decision-making delay t, to find the depth
  • the learning model is hierarchically offloaded to the optimal offloading strategy of each computing node to achieve rapid inference of deep learning models.
  • the hierarchical computing delay prediction module is used to calculate the theoretical hierarchical computing delay; the transmission delay calculation module is used to calculate the data transmission delay T and the propagation delay S; the online decision-making delay statistics module is used to calculate the edge computing node from receiving The time t from the computing task sent to the physical terminal to the generation of the deep learning model layered offloading model; the online learning module is used to collect and transfer the actual computing load and actual computing delay data during the computing task to the layering of the edge computing node Calculation delay prediction module.
  • the actual computing delay refers to the computing delay corresponding to each DNN z of the deep learning model to be offloaded on each edge computing node when the image feature data D z passes through the computing task of each edge computing node.
  • the offline sample data storage module is used to store the hierarchical computing delay prediction model.
  • the CT decision information generation module is used to pass the generated final hierarchical offloading strategy to the decision transceiver center;
  • the deep learning model is a trained deep learning model; situational awareness
  • the center includes a computing power awareness module and a network telemetry module;
  • the computing power awareness module is used to calculate the computing resource load of each edge computing node;
  • the network telemetry module is used to calculate the network bandwidth r of the local area network where each edge computing node is located, and is used to The physical distance l between each edge computing node is calculated;
  • the decision-making transceiver center is used to receive the final hierarchical offloading strategy.
  • Cloud server c includes a deep learning model and a decision transceiver center; the deep learning model is a trained deep learning model; the decision transceiver center is used to receive the final hierarchical offloading strategy.
  • the situation awareness center includes a computing capability awareness module and a network telemetry module.
  • the deep learning model has a multi-layer structure.
  • the neurons contained in the hidden layer, input layer and output layer of the deep learning model to be unloaded are divided into n columns based on the neurons in separate columns. Obtain the columns of neurons in separate columns, and then obtain the DNN z ,
  • DNN z : n is a positive integer.
  • edge computing node i within the communication range of cloud server c, assume that the total number of other edge computing nodes within the communication range of edge computing node i that is less than the preset distance is recorded as 2, and I, II Represents the numbers of these two edge computing nodes respectively.
  • edge computing node i together with edge computing node i form an edge cluster. That is, there are three edge computing nodes in the edge cluster.
  • the deep learning model to be offloaded has three columns of neurons, it can be divided into two layers of deep learning models to be offloaded (DNN 1 , DNN 2 ), denoted as 1 ⁇ z ⁇ 2.
  • each edge computing node i, I, II and cloud server c under the different computing resource loads of each edge computing node i, I, II and cloud server c itself, the general single image feature data D 1 is used as input, and each edge computing node is measured separately to perform each layer of deep learning.
  • the hierarchical computing delays corresponding to each of the above edge computing nodes under different computing resource loads are recorded in the offline sample data storage module under the deeply enhanced network.
  • Computing resource load includes: CPU load ⁇ , GPU load ⁇ and cache load ⁇ .
  • the hierarchical computing delay prediction module uses the sample data in the offline sample data storage module to perform multivariate nonlinear function fitting to obtain the hierarchical computing delay prediction model:
  • CT iz f( ⁇ i , ⁇ i , ⁇ i )
  • the above formula represents the calculation of the deep learning model on any edge computing node i among the three edge computing nodes under the edge cluster, when its CPU load, GPU load and cache load are ⁇ i , ⁇ i and ⁇ i respectively.
  • the trained hierarchical computing delay prediction model is stored in the hierarchical computing delay prediction module.
  • CT Iz f ( ⁇ I , ⁇ I , ⁇ I ),
  • CT cz f ( ⁇ c , ⁇ c , ⁇ c )
  • the above formula represents the calculation of the z-th layer of the deep learning model (DNN z ) on cloud server c on the edge cluster, when its CPU load, GPU load and cache load are ⁇ c , ⁇ c and ⁇ c respectively. Calculate the time delay CT cz .
  • the trained hierarchical computing delay prediction model is stored in the hierarchical computing delay prediction module of each edge computing node.
  • the physical terminal preprocesses the computing task (image data) into image feature data D 1 with the same resolution and equal data volume, and loads it to the edge computing node i located in the same local area network as the current physical terminal.
  • the online decision-making delay statistics module of edge computing node i starts timing and dynamically sends the decision-making delay t to the decision-making information generation module (decision-making delay t refers to the time from receiving the computing task to generating the deep learning model of edge computing node i Layered offloading strategy during this period);
  • the computing power awareness module under the situation awareness center of edge computing node i and the computing power awareness module of cloud server c will move
  • the state-aware edge computing node computing resource load (b i , b I , b II ) and the cloud server c computing resource load (b c ) are passed to the hierarchical computing delay prediction module;
  • the network telemetry module will dynamically calculate the The network bandwidth (r i , r I , r II , r c ) and physical distance l iI , l iII , l ic , l I II , l Ic , l IIc ) of the area where the edge computing node and cloud server are located are passed to the transmission Delay calculation module;
  • the hierarchical computing delay prediction module combines the computing resource load of each edge computing node and cloud server c with the pre-stored hierarchical computing delay prediction model to predict the results of each edge computing node calculating each layer of the deep learning model DNN z
  • the above represents the data transmission delay Ti and propagation delay S iI required to transmit the image feature data D z to the edge computing node I via the edge computing node i , the data transmission delay Ti and the image feature data to be transmitted D z , the network bandwidth r i of the edge computing node i is related, the propagation delay S iI is related to the channel length from the edge computing node i to the edge computing node I (estimated based on the physical distance l iI ), the propagation rate of electromagnetic waves on the channel (based on the speed of light) C for estimating) related to:
  • the decision-making information generation module is based on deep reinforcement learning technology and uses each edge computing node to process the theoretical hierarchical computing delay required for each layer of the deep learning model DNN z .
  • CT iz ′, CT Iz ′, CT IIz ′ the theoretical computing delay required to use cloud server c alone to calculate all deep learning models
  • TIME Minimum the optimization goal, which determines the hierarchical offloading strategy of the optimal deep learning model (different hierarchical offloading strategies correspond to different task response delays TIME, and the goal is to find the optimal hierarchical offloading strategy):
  • TIME is less than TIMEc
  • the hierarchical offloading model of the deep learning model corresponding to the smallest response delay TIME is used as the hierarchical offloading strategy to complete the calculation.
  • the data is offloaded and calculated with the goal of minimizing the response delay; otherwise, the cloud server c corresponding to the response delay TIMEc is used to process the data to be calculated alone as the final hierarchical offloading strategy, and the data to be calculated is completed to minimize the response delay. ;
  • the decision information generation module transmits the generated optimal deep learning model hierarchical offloading strategy to the decision transceiver center (the hierarchical offloading strategy information includes the edge computing nodes participating in this calculation and the number of deep learning model layers that need to be calculated by the edge computing nodes).
  • the policy information is sent to the decision-making transceiver center of each edge computing node that needs to participate in the task calculation through the decision-making transceiver center.
  • the edge computing node starts task calculation according to the strategy. Task calculation results are sent directly to the physical terminal.
  • the online learning module of each edge computing node participating in task calculation collects the computing resource load (CPU load, GPU load and cache load) and actual computing delay when performing task calculation, and transfers all the above sample data to edge computing node i
  • the hierarchical computing delay prediction module is used to update the hierarchical computing delay prediction model for the current deep learning model. Furthermore, all edge computing nodes share the updated hierarchical computing delay prediction model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a deep learning model reasoning acceleration method based on cloud-edge-end collaboration, and in particular, relates to a deep learning model hierarchical offloading method. According to the method, theoretical modeling is performed on a computing delay, a data transmission delay, a data propagation delay, and a model hierarchical offloading policy generation delay in a whole deep learning model reasoning process, and a hierarchical offloading policy of an optimal deep learning model is determined by using the minimum computing task response delay as an optimization target. Compared with a deep learning model execution framework dominated by a physical end and a deep learning model execution framework dominated by a cloud computing center, according to the present method, an edge computing paradigm and cloud computing are combined, and a deep learning model is hierarchically offloaded to different edge computing nodes, so that the computing task response delay is minimized when the computing precision is met.

Description

一种基于云边端协同的深度学习模型推理加速方法A deep learning model inference acceleration method based on cloud-edge-device collaboration 技术领域Technical field
本发明属于云边端协同计算领域,具体涉及一种基于云边端协同的深度学习模型推理加速方法。The invention belongs to the field of cloud-edge-end collaborative computing, and specifically relates to a deep learning model inference acceleration method based on cloud-edge-end collaboration.
背景技术Background technique
基于深度学习模型的智能应用程序通常需要大量计算,当前可行的解决方案有两种,其一是End-only模式,即在物理端使用简单模型和轻量级深度学习模型框架执行所有计算,例如TensorFlow Lite、Caffe For Android;其二是Cloud-only模式,即将所有计算任务卸载到算力强大的云中心,以进行复杂的深度学习模型计算。但是,上述方法要么会因为只在物理端部署一个简单模型而降低识别准确率,要么会因为物理端与云之间的广域网传输链路不稳定而导致传输时延开销过大。因此,同时保证合理的延迟和识别准确率是相当困难的。Intelligent applications based on deep learning models usually require a lot of calculations. There are currently two feasible solutions. One is the End-only mode, which uses simple models and lightweight deep learning model frameworks to perform all calculations on the physical side, such as TensorFlow Lite, Caffe For Android; the second is Cloud-only mode, which offloads all computing tasks to a cloud center with powerful computing power to perform complex deep learning model calculations. However, the above method will either reduce the recognition accuracy because it only deploys a simple model on the physical side, or it will cause excessive transmission delay overhead due to the instability of the wide area network transmission link between the physical side and the cloud. Therefore, it is quite difficult to ensure reasonable latency and recognition accuracy at the same time.
为了克服延迟和识别准确率之间的矛盾,更好的解决方案是利用边缘计算范式。然而,由于忽略了深度学习应用的特点和边缘环境的动态性,现有的用于深度学习模型推理的边缘计算执行框架和卸载机制仍然存在一些局限性。To overcome the contradiction between latency and recognition accuracy, a better solution is to leverage the edge computing paradigm. However, existing edge computing execution frameworks and offloading mechanisms for deep learning model inference still have some limitations due to ignoring the characteristics of deep learning applications and the dynamics of the edge environment.
发明内容Contents of the invention
本发明的目的在于:通过将边缘计算范式和云计算结合起来,并将深度学习模型分层卸载至不同的边缘计算节点,在满足计算精度的前提下,实现计算任务响应时延最小化。The purpose of this invention is to minimize the response delay of computing tasks on the premise of meeting the calculation accuracy by combining the edge computing paradigm and cloud computing, and offloading the deep learning model to different edge computing nodes in layers.
为实现上述目的,本发明提供如下技术方案:一种基于云边端协同的深度学习模型推理加速方法,所述云边端协同是指云服务器、与云服务器通信的至少两个边缘计算节点,以及至少一个物理终端,物理终端与边缘计算节点的通信距离小于边缘计算节点与云服务器的距离,所述方法包括如下步骤:To achieve the above objectives, the present invention provides the following technical solution: a deep learning model inference acceleration method based on cloud-edge-end collaboration, where the cloud-edge-end collaboration refers to a cloud server and at least two edge computing nodes communicating with the cloud server. and at least one physical terminal. The communication distance between the physical terminal and the edge computing node is smaller than the distance between the edge computing node and the cloud server. The method includes the following steps:
步骤S1、物理终端将图像数据预处理为分辨率相同、数据量相等的图像特征数据D1,将输入划分好的待卸载深度学习模型DNNz的各个DNN层,将上一层输出作为下一层的输入,最终得到DzStep S1: The physical terminal preprocesses the image data into image feature data D 1 with the same resolution and equal data volume, inputs the divided DNN layers of the deep learning model DNN z to be offloaded, and uses the output of the previous layer as the next layer. The input of the layer finally gets D z ;
步骤S2、进行离线学习阶段:基于各个边缘计算节点预设负载情况,以各个边缘计算节点上待卸载深度学习模型DNNz处理图像特征数据Dz的过程为输入、已知的图像特征数据Dz通过各个边缘计算节点上待卸载深度学习模型的各个DNNz对应的计算时延为输出,构建并训练获得分层计算时延预测模型CT; Step S2, perform the offline learning phase: Based on the preset load conditions of each edge computing node, the process of processing the image feature data D z using the deep learning model DNN z to be offloaded on each edge computing node is used as the input, known image feature data D z Through the calculation delay corresponding to each DNN z of the deep learning model to be offloaded on each edge computing node as the output, a hierarchical calculation delay prediction model CT is constructed and trained;
同时基于云服务器预设负载情况,以云服务器上待卸载深度学习模型各个DNNz处图像特征数据Dz的过程为输入、已知的云服务器上待卸载深度学习模型各个DNNz处理图像特征数据Dz对应的计算时延为输出,构建并训练获得云服务器计算时延预测模型CTcAt the same time, based on the preset load condition of the cloud server, the process of image feature data D z at each DNN z of the deep learning model to be offloaded on the cloud server is used as input, and each DNN z of the known deep learning model to be offloaded on the cloud server processes the image feature data. The calculation delay corresponding to D z is the output, and the cloud server calculation delay prediction model CT c is constructed and trained;
步骤S3、根据各个边缘计算节点的实际计算资源负载情况,由物理终端的计算任务所对应的边缘计算节点应用分层计算时延预测模型CT,以待卸载深度学习模型各个DNNz处理图像特征数据Dz的过程为输入、获得图像特征数据Dz通过各个边缘计算节点上待卸载深度学习模型各个DNNz对应的计算时延为输出的理论分层计算时延CT′;Step S3: According to the actual computing resource load of each edge computing node, the edge computing node corresponding to the computing task of the physical terminal applies the hierarchical computing delay prediction model CT to process the image feature data of each DNN z of the deep learning model to be offloaded. The process of D z is to input and obtain the image feature data D z through the deep learning model to be offloaded on each edge computing node. The calculation delay corresponding to each DNN z is the theoretical hierarchical calculation delay CT′ of the output;
步骤S4、基于已知的边缘计算节点局域网网络带宽情况r、以及各个边缘计算节点之间的物理距离l,计算经过当前边缘计算节点传输图像特征数据Dz到其他边缘计算节点所需的数据传输时延T和传播时延S;同时基于已知的云服务器网络带宽情况rc、以及计算任务的边缘计算节点与云服务器之间的物理距离lc,计算经过计算任务的边缘计算节点传输图像特征数据D1到云服务器所需的数据传输时延Tc和传播时延SCStep S4: Based on the known edge computing node LAN network bandwidth r and the physical distance l between each edge computing node, calculate the data transmission required to transmit the image feature data D z to other edge computing nodes through the current edge computing node. Time delay T and propagation delay S; at the same time, based on the known cloud server network bandwidth r c and the physical distance l c between the edge computing node of the computing task and the cloud server, calculate the image transmitted by the edge computing node of the computing task The data transmission delay T c and propagation delay S C required for characteristic data D 1 to the cloud server;
步骤S5、以步骤S3获得的各个边缘计算节点理论分层计算时延CT′、以及步骤S4获得的数据传输时延T和传播时延S为输入,以所对应的响应时延TIME为输出,构建深度学习模型分层卸载模型如下式:
TIME=F(CT′,T,S)+t,
Step S5: Take the theoretical hierarchical calculation delay CT′ of each edge computing node obtained in step S3, and the data transmission delay T and propagation delay S obtained in step S4 as input, and use the corresponding response delay TIME as the output, The hierarchical offloading model to build a deep learning model is as follows:
TIME=F(CT′,T,S)+t,
并以响应时延TIME最小为优化目标,获得响应时延TIME最小的深度学习模型分层卸载模型,其中t为边缘计算节点从收到物理终端发送的计算任务到生成深度学习模型分层卸载模型的时间;And with the minimum response delay TIME as the optimization goal, we obtain the hierarchical offloading model of the deep learning model with the minimum response delay TIME, where t is the hierarchical offloading model of the edge computing node from receiving the computing tasks sent by the physical terminal to generating the deep learning model. time;
步骤S6、根据步骤S2获得的云服务器计算时延预测模型CTc,以及云服务器的计算资源负载情况,应用分层计算时延预测模型CTc,以待卸载深度学习模型各个DNNz处理图像特征数据Dz的过程为输入、获得图像特征数据Dz通过云服务器上待卸载深度学习模型的各个DNNz对应的计算时延为输出的理论分层计算时延CTz′,之后按如下公式:
Step S6: According to the cloud server computing delay prediction model CT c obtained in step S2 and the computing resource load of the cloud server, apply the hierarchical computing delay prediction model CT c to process the image features of each DNN z of the deep learning model to be offloaded. The process of data D z is to input and obtain the image feature data D z through each DNN z of the deep learning model to be offloaded on the cloud server. The corresponding calculation delay is the output theoretical hierarchical calculation time delay CT z ′, and then according to the following formula:
计算单独使用云服务器处理计算任务所产生的理论计算时延CT′c,其中CT1′为将D1通过DNN1产生的计算时延,之后按如下公式计算单独使用云服务器时处理图像特征数据Dz的响应时延TIMEc:
TIMEc=F(CT′c,Tc,Sc);
Calculate the theoretical computing delay CT′ c caused by using the cloud server alone to process computing tasks, where CT 1 ′ is the computing delay caused by passing D 1 through DNN 1 , and then calculate it according to the following formula when using the cloud server alone to process image feature data The response delay of D z is TIMEc:
TIME c =F(CT′ c , T c , Sc );
步骤S7、动态比较单独使用云服务器时的响应时延TIMEc与深度学习模型分层卸载模型响应时延最小的TIME大小,若TIME小于TIMEc,则以响应时延TIME最小对应的深度学习模型分层卸载模型为分层卸载策略,完成待计算数据以最小化响应时延为目标的卸载计算;否则以响应时延TIMEc对应的单独使用云服务器处理待计算数据为最终分层卸载策略,完成待计算数据以最小化响应时延的卸载计算;Step S7: Dynamically compare the response delay TIMEc when using the cloud server alone with the TIME size with the smallest response delay of the deep learning model hierarchical offloading model. If TIME is less than TIMEc, then use the deep learning model corresponding to the smallest response delay TIME to hierarchize. The offloading model is a hierarchical offloading strategy, which completes the offloading calculation of the data to be calculated with the goal of minimizing the response delay; otherwise, the cloud server is used alone to process the data to be calculated corresponding to the response delay TIMEc as the final hierarchical offloading strategy, and the data to be calculated is completed. Data is offloaded to minimize response latency;
步骤S8、基于步骤S7获得的分层卸载策略,各执行分层卸载策略的边缘计算节点收集计算任务时的计算负载情况,之后返回步骤S2。Step S8: Based on the hierarchical offloading strategy obtained in step S7, each edge computing node that executes the hierarchical offloading strategy collects the computing load of the computing task, and then returns to step S2.
进一步地,前述的已划分好的待卸载深度学习模型的各个DNN层按如下方法获得:将待卸载深度学习模型的隐藏层、输入层以及输出层的所包含的神经元,以各单独成列的神经元为划分为n列,获得单独成列的神经元列,之后获得DNNzFurther, the aforementioned divided DNN layers of the deep learning model to be unloaded are obtained as follows: the neurons contained in the hidden layer, input layer and output layer of the deep learning model to be unloaded are arranged in separate columns. The neurons are divided into n columns to obtain separate neuron columns, and then DNN z is obtained,
DNNzn为正整数。DNN z : n is a positive integer.
进一步地,前述的步骤S1具体为:Further, the aforementioned step S1 is specifically:
基于已分割好的待卸载深度学习模型的各个DNNz以各个边缘计算节点上待卸载深度学习模型各个DNNz处理图像特征数据Dz的过程为输入、图像特征数据Dz通过各个边缘计算节点上待卸载深度学习模型的各个DNNz对应的计算时延为输出,分别构建各个边缘计算节点分层计算时延模型如下式:CT=f(α,β,γ);其中,α为计算资源负载情况预设CPU负载、β为计算资源负载情况预设GPU负载、γ为计算资源负载情况预设缓存负载。Each DNN z based on the segmented deep learning model to be offloaded uses the process of processing the image feature data D z by each DNN z of the deep learning model to be offloaded on each edge computing node as input. The image feature data D z is passed on each edge computing node. The calculation delay corresponding to each DNN z of the deep learning model to be offloaded is the output, and the hierarchical calculation delay model of each edge computing node is constructed as follows: CT=f(α, β, γ); where α is the computing resource load The condition is the preset CPU load, β is the computing resource load condition and the GPU load is preset, and γ is the computing resource load condition and the cache load is preset.
进一步地,前述的步骤S3中,基于已知的边缘计算节点局域网网络带宽情况r,各个边缘计算节点之间的物理距离l,按如下公式:
T=Dz/r,
S=l/C;
Further, in the aforementioned step S3, based on the known edge computing node LAN network bandwidth r, the physical distance l between each edge computing node is as follows:
T= Dz /r,
S=l/C;
分别计算各个边缘计算节点传输图像特征数据Dz到其他边缘计算节点所需的数据传输时延T、传播时延S;其中,光速C代表电磁波在信道上的传播速率。Calculate the data transmission delay T and propagation delay S required by each edge computing node to transmit the image feature data D z to other edge computing nodes respectively; where the speed of light C represents the propagation rate of electromagnetic waves on the channel.
进一步地,前述的边缘计算节点包括深度强化网络、深度学习模型、态势感知中心、以及决策收发中心;Further, the aforementioned edge computing nodes include deep reinforcement networks, deep learning models, situational awareness centers, and decision-making transceiver centers;
其中所述深度强化网络包括:The deep reinforcement network includes:
分层计算时延预测模块,用于计算理论分层计算时延CT′和CT′c,以及存储分层计算时延预测模型CT和云服务器计算时延预测模型CTcThe hierarchical computing delay prediction module is used to calculate the theoretical hierarchical computing delays CT′ and CT′ c , as well as the storage hierarchical computing delay prediction model CT and the cloud server computing delay prediction model CT c ;
传输时延计算模块,用于计算数据传输时延T和传播时延S;Transmission delay calculation module, used to calculate data transmission delay T and propagation delay S;
在线决策时延统计模块,用于计算边缘计算节点从收到物理终端发送的计算任务到生成深度学习模型分层卸载模型的时间t;The online decision-making delay statistics module is used to calculate the time t from the edge computing node receiving the computing task sent by the physical terminal to generating the deep learning model hierarchical offloading model;
在线学习模块,用于收集并传递计算任务时的实际计算负载情况和实际计算时延数据到边缘计算节点的分层计算时延预测模块;The online learning module is used to collect and transmit the actual computing load and actual computing delay data during computing tasks to the hierarchical computing delay prediction module of the edge computing node;
离线样本数据存储模块,用于存储各个边缘计算节点和云服务器在预设负载情况下,图像特征数据Dz通过各个边缘计算节点上待卸载深度学习模型的各个DNNz对应的计算时延,和图像特征数据Dz通过云服务器上待卸载深度学习模型的各个DNNz对应的计算时延;The offline sample data storage module is used to store the image feature data D z corresponding to each DNN z of the deep learning model to be offloaded on each edge computing node under preset load conditions of each edge computing node and cloud server, and The image feature data D z passes through the calculation delay corresponding to each DNN z of the deep learning model to be offloaded on the cloud server;
决策信息生成模块,用于将生成的最终分层卸载策略传递至决策收发中心;The decision information generation module is used to pass the generated final hierarchical offloading strategy to the decision transceiver center;
所述态势感知中心,包括:The situation awareness center includes:
边缘计算节点计算能力感知模块,用于计算各个边缘计算节点的计算资源负载情况;The edge computing node computing capability awareness module is used to calculate the computing resource load of each edge computing node;
云服务器计算能力感知模块,用于计算云服务器的计算资源负载情况;The cloud server computing capability awareness module is used to calculate the computing resource load of the cloud server;
网络遥测模块,用于计算各个边缘计算节点的所在局域网的网络带宽情况r,且用于计算各个边缘计算节点之间的物理距离l;The network telemetry module is used to calculate the network bandwidth r of the local area network where each edge computing node is located, and is used to calculate the physical distance l between each edge computing node;
所述决策收发中心用于发送、接收最终分层卸载策略。The decision-making transceiver center is used to send and receive the final hierarchical offloading strategy.
进一步地,前述的云服务器包括深度学习模型、决策收发中心;所述深度学习模型为已训练好的深度学习模型;所述决策收发中心用于接收最终分层卸载策略。所述态势感知中心包括计算能力感知模块、网络遥测模块。Further, the aforementioned cloud server includes a deep learning model and a decision transceiver center; the deep learning model is a trained deep learning model; and the decision transceiver center is used to receive the final hierarchical offloading strategy. The situation awareness center includes a computing capability awareness module and a network telemetry module.
本发明采用以上技术方案,与现有技术相比具有以下有益效果:The present invention adopts the above technical solution and has the following beneficial effects compared with the existing technology:
(1)区别于以物理端为主导和以云计算中心为主导的深度学习模型执行框架,本方法通过 将边缘计算范式和云计算结合起来,并将深度学习模型分层卸载至不同的边缘计算节点,充分挖掘边缘侧的计算潜力,在满足计算精度的前提下,实现计算任务响应时延最小化。(1) Different from the deep learning model execution framework dominated by the physical end and the cloud computing center, this method adopts Combining the edge computing paradigm with cloud computing, and offloading deep learning models to different edge computing nodes in layers, fully exploiting the computing potential of the edge side, and minimizing the response delay of computing tasks while satisfying computing accuracy.
(2)通过对整个深度学习模型推理过程中的计算时延、数据传输时延、数据传播时延和模型分层卸载策略生成时延进行理论建模,并以计算任务响应时延最小为优化目标,决定最优深度学习模型的分层卸载策略,最终实现深度学习模型的推理加速。(2) By theoretically modeling the computing delay, data transmission delay, data propagation delay and model hierarchical offloading strategy generation delay in the entire deep learning model inference process, and optimizing with the minimum computing task response delay The goal is to determine the hierarchical offloading strategy of the optimal deep learning model, and ultimately achieve inference acceleration of the deep learning model.
(3)本方法在离线学习的前提下展开,进一步的,本方法可根据每次任务计算实际测量的计算资源负载情况和计算时延来实时更新分层计算时延预测模型,以优化深度学习模型分层卸载的决策过程。(3) This method is carried out under the premise of offline learning. Furthermore, this method can update the hierarchical computing delay prediction model in real time based on the actual measured computing resource load and computing delay of each task calculation to optimize deep learning. Decision-making process for model hierarchical offloading.
(4)将深度学习模型分层卸载至边缘计算节点和云服务器等边缘计算节点上,协同推理的方式可有效保证计算数据的安全性和降低网络带宽的占用率。(4) The deep learning model is hierarchically offloaded to edge computing nodes such as edge computing nodes and cloud servers. The collaborative reasoning method can effectively ensure the security of computing data and reduce network bandwidth occupancy.
附图说明Description of drawings
图1为本发明的技术原理图。Figure 1 is a technical principle diagram of the present invention.
图2为本发明深度强化网络的模块组成示意图。Figure 2 is a schematic diagram of the module composition of the deep reinforcement network of the present invention.
图3为本发明的深度学习模型分层卸载原理图。Figure 3 is a schematic diagram of the hierarchical offloading of the deep learning model of the present invention.
图4为本发明的方法流程图。Figure 4 is a flow chart of the method of the present invention.
具体实施方式Detailed ways
为了更了解本发明的技术内容,特举具体实施例并配合所附图式说明如下。In order to better understand the technical content of the present invention, specific embodiments are described below along with the accompanying drawings.
在本发明中参照附图来描述本发明的各方面,附图中示出了许多说明性实施例。本发明的实施例不局限于附图所述。应当理解,本发明通过上面介绍的多种构思和实施例,以及下面详细描述的构思和实施方式中的任意一种来实现,这是因为本发明所公开的构思和实施例并不限于任何实施方式。另外,本发明公开的一些方面可以单独使用,或者与本发明公开的其他方面的任何适当组合来使用。Aspects of the invention are described herein with reference to the accompanying drawings, in which a number of illustrative embodiments are shown. The embodiments of the present invention are not limited to those described in the drawings. It should be understood that the present invention is implemented through any of the various concepts and embodiments introduced above, as well as the concepts and implementations described in detail below, because the concepts and embodiments disclosed in the present invention are not limited to any implementation. Way. Additionally, some aspects of the present disclosure may be used alone or in any appropriate combination with other aspects of the present disclosure.
如图1所示,基于云服务器、所述云服务器c的通信范围内至少设有两个边缘计算节点,所述边缘计算节点部署在wifi接入点或基站上,且边缘计算节点所在的局域网内至少设置一个物理终端;各边缘计算节点与其通信范围内的各物理终端之间的距离小于边缘计算节点与云服务器之间的距离;云服务器c的通信范围内任意一个边缘计算节点i,边缘计算节点i通信范围内与其物理距离小于预设距离的其他边缘计算节点总数记为N,且1≤j≤N,其中j为边缘计算节点i通信范围内与其距离小于预设距离的各边缘计算节点的编号,此N个边缘计算节点同边缘计算节点i一起组成边缘集群;云服务器c上部署有深度学习模型和决策收发中心;边缘计算节点上部署有深度强化网络、深度学习模型、态势感知中心和决策 收发中心。As shown in Figure 1, based on the cloud server, there are at least two edge computing nodes within the communication range of the cloud server c. The edge computing nodes are deployed on wifi access points or base stations, and the local area network where the edge computing nodes are located At least one physical terminal is set up within the communication range; the distance between each edge computing node and each physical terminal within its communication range is smaller than the distance between the edge computing node and the cloud server; any edge computing node i within the communication range of cloud server c, edge The total number of other edge computing nodes within the communication range of computing node i that is less than the preset distance from it is recorded as N, and 1 ≤ j ≤ N, where j is each edge computing node within the communication range of edge computing node i that is less than the preset distance from it. The number of the node. These N edge computing nodes together with the edge computing node i form an edge cluster; a deep learning model and a decision-making transceiver center are deployed on the cloud server c; a deep reinforcement network, a deep learning model, and situational awareness are deployed on the edge computing node. Center and Decision Mailing and receiving center.
如图2所示,边缘计算节点上部署有深度强化网络,深度强化网络包括分层计算时延预测模块、传输时延计算模块、在线决策时延统计模块、在线学习模块、离线样本数据存储模块和决策信息生成模块;以最小化计算任务响应时延TIME为目标,综合考虑数据传输时延T、数据传播时延S、深度学习模型分层计算时延CT和决策时延t,寻找将深度学习模型分层卸载到各个计算节点的最优卸载策略,实现深度学习模型的快速推理。分层计算时延预测模块用于计算理论分层计算时延;传输时延计算模块用于计算数据传输时延T和传播时延S;在线决策时延统计模块用于计算边缘计算节点从收到物理终端发送的计算任务到生成深度学习模型分层卸载模型的时间t;在线学习模块,用于收集并传递计算任务时的实际计算负载情况和实际计算时延数据到边缘计算节点的分层计算时延预测模块。实际计算时延指各个边缘计算节点计算任务时图像特征数据Dz通过各个边缘计算节点上待卸载深度学习模型各个DNNz对应的计算时延。As shown in Figure 2, a deep reinforcement network is deployed on the edge computing node. The deep reinforcement network includes a hierarchical computing delay prediction module, a transmission delay calculation module, an online decision-making delay statistics module, an online learning module, and an offline sample data storage module. and decision-making information generation module; with the goal of minimizing the computing task response delay TIME, comprehensively consider the data transmission delay T, data propagation delay S, deep learning model hierarchical computing delay CT and decision-making delay t, to find the depth The learning model is hierarchically offloaded to the optimal offloading strategy of each computing node to achieve rapid inference of deep learning models. The hierarchical computing delay prediction module is used to calculate the theoretical hierarchical computing delay; the transmission delay calculation module is used to calculate the data transmission delay T and the propagation delay S; the online decision-making delay statistics module is used to calculate the edge computing node from receiving The time t from the computing task sent to the physical terminal to the generation of the deep learning model layered offloading model; the online learning module is used to collect and transfer the actual computing load and actual computing delay data during the computing task to the layering of the edge computing node Calculation delay prediction module. The actual computing delay refers to the computing delay corresponding to each DNN z of the deep learning model to be offloaded on each edge computing node when the image feature data D z passes through the computing task of each edge computing node.
离线样本数据存储模块用于存储分层计算时延预测模型CT决策信息生成模块用于将生成的最终分层卸载策略传递至决策收发中心;深度学习模型为已训练好的深度学习模型;态势感知中心包括计算能力感知模块,网络遥测模块;计算能力感知模块用于计算各个边缘计算节点的计算资源负载情况;网络遥测模块用于计算各个边缘计算节点的所在局域网的网络带宽情况r,且用于计算各个边缘计算节点之间的物理距离l;决策收发中心用于接收最终分层卸载策略。The offline sample data storage module is used to store the hierarchical computing delay prediction model. The CT decision information generation module is used to pass the generated final hierarchical offloading strategy to the decision transceiver center; the deep learning model is a trained deep learning model; situational awareness The center includes a computing power awareness module and a network telemetry module; the computing power awareness module is used to calculate the computing resource load of each edge computing node; the network telemetry module is used to calculate the network bandwidth r of the local area network where each edge computing node is located, and is used to The physical distance l between each edge computing node is calculated; the decision-making transceiver center is used to receive the final hierarchical offloading strategy.
云服务器c包括深度学习模型、决策收发中心;深度学习模型为已训练好的深度学习模型;决策收发中心用于接收最终分层卸载策略。态势感知中心包括计算能力感知模块、网络遥测模块。Cloud server c includes a deep learning model and a decision transceiver center; the deep learning model is a trained deep learning model; the decision transceiver center is used to receive the final hierarchical offloading strategy. The situation awareness center includes a computing capability awareness module and a network telemetry module.
如图3所示,深度学习模型为多层结构,将待卸载深度学习模型的隐藏层、输入层以及输出层的所包含的神经元,以各单独成列的神经元为划分为n列,获得单独成列的神经元列,之后获得DNNzAs shown in Figure 3, the deep learning model has a multi-layer structure. The neurons contained in the hidden layer, input layer and output layer of the deep learning model to be unloaded are divided into n columns based on the neurons in separate columns. Obtain the columns of neurons in separate columns, and then obtain the DNN z ,
DNNzn为正整数。DNN z : n is a positive integer.
如图4所示,针对云服务器c通信范围内的任意一个边缘计算节点i,假设边缘计算节点i通信范围内与其物理距离小于预设距离的其他边缘计算节点总数记为2,且I、II分别表示这2个边缘计算节点的编号,此2个边缘计算节点同边缘计算节点i一起组成边缘集群, 即该边缘集群内共有3个边缘计算节点。As shown in Figure 4, for any edge computing node i within the communication range of cloud server c, assume that the total number of other edge computing nodes within the communication range of edge computing node i that is less than the preset distance is recorded as 2, and I, II Represents the numbers of these two edge computing nodes respectively. These two edge computing nodes together with edge computing node i form an edge cluster. That is, there are three edge computing nodes in the edge cluster.
假设待卸载深度学习模型共有3列神经元,则其可分为2层待卸载深度学习模型(DNN1、DNN2),记1≤z<2。Assuming that the deep learning model to be offloaded has three columns of neurons, it can be divided into two layers of deep learning models to be offloaded (DNN 1 , DNN 2 ), denoted as 1≤z<2.
离线学习阶段,在各个边缘计算节点i、I、II和云服务器c自身不同的计算资源负载情况下,以通用单个图像特征数据D1作为输入,分别测量各个边缘计算节点进行每一层深度学习模型所需要的分层计算时延CTiz、CTIz、CTIIz和云服务器c进行每一层深度学习模型所需要的分层计算时延CTcz。记录以上各个边缘计算节点在不同的计算资源负载情况下对应的分层计算时延于深度强化网络下的离线样本数据存储模块内。In the offline learning stage, under the different computing resource loads of each edge computing node i, I, II and cloud server c itself, the general single image feature data D 1 is used as input, and each edge computing node is measured separately to perform each layer of deep learning. The hierarchical computing delay CT iz , CT Iz , CT IIz required by the model and the hierarchical computing delay CT cz required by the cloud server c for each layer of the deep learning model. The hierarchical computing delays corresponding to each of the above edge computing nodes under different computing resource loads are recorded in the offline sample data storage module under the deeply enhanced network.
计算资源负载包括:CPU负载α、GPU负载β和缓存负载γ。Computing resource load includes: CPU load α, GPU load β and cache load γ.
其次,基于深度强化学习技术,分层计算时延预测模块利用离线样本数据存储模块内的样本数据进行多元非线性函数拟合,得到分层计算时延预测模型:
CTiz=f(αi,βi,γi)
Secondly, based on deep reinforcement learning technology, the hierarchical computing delay prediction module uses the sample data in the offline sample data storage module to perform multivariate nonlinear function fitting to obtain the hierarchical computing delay prediction model:
CT iz =f(α i , β i , γ i )
上式表示的是在边缘集群下3个边缘计算节点中的任意一个边缘计算节点i上,当其CPU负载、GPU负载和缓存负载分别为αi、βi和γi时,计算深度学习模型第z层(DNNz)产生的计算时延CTiz。训练完成的分层计算时延预测模型储存在分层计算时延预测模块内。CTIz=f(αI,βI,γI),CTIIz=f(αII,βII,γII)同上。
CTcz=f(αc,βc,γc)
The above formula represents the calculation of the deep learning model on any edge computing node i among the three edge computing nodes under the edge cluster, when its CPU load, GPU load and cache load are α i , β i and γ i respectively. The calculation delay CT iz generated by the z-th layer (DNN z ). The trained hierarchical computing delay prediction model is stored in the hierarchical computing delay prediction module. CT Iz =f (α I , β I , γ I ), CT IIz =f (α II , β II , γ II ) are the same as above.
CT cz =f (α c , β c , γ c )
上式表示的是在边缘集群上的云服务器c上,当其CPU负载、GPU负载和缓存负载分别为αc、βc和γc时,计算深度学习模型第z层(DNNz)产生的计算时延CTcz。训练完成的分层计算时延预测模型储存在各个边缘计算节点的分层计算时延预测模块内。The above formula represents the calculation of the z-th layer of the deep learning model (DNN z ) on cloud server c on the edge cluster, when its CPU load, GPU load and cache load are α c , β c and γ c respectively. Calculate the time delay CT cz . The trained hierarchical computing delay prediction model is stored in the hierarchical computing delay prediction module of each edge computing node.
离线学习阶段之后,便可进行任务计算。After the offline learning phase, task calculations can be performed.
物理终端基于图像压缩和图像分割技术将计算任务(图像数据)预处理为分辨率相同、数据量大小相等的图像特征数据D1,并装载至与当前物理终端位于同一局域网内的边缘计算节点i上,边缘计算节点i的在线决策时延统计模块开始计时并将决策时延t动态发送给决策信息生成模块(决策时延t指的是边缘计算节点i从收到计算任务到生成深度学习模型分层卸载策略这段时间);Based on image compression and image segmentation technology, the physical terminal preprocesses the computing task (image data) into image feature data D 1 with the same resolution and equal data volume, and loads it to the edge computing node i located in the same local area network as the current physical terminal. On, the online decision-making delay statistics module of edge computing node i starts timing and dynamically sends the decision-making delay t to the decision-making information generation module (decision-making delay t refers to the time from receiving the computing task to generating the deep learning model of edge computing node i Layered offloading strategy during this period);
边缘计算节点i的态势感知中心下的计算能力感知模块、云服务器c计算能力感知模块将动 态感知到的边缘计算节点计算资源负载情况(bi、bI、bII)和云服务器c计算资源负载情况(bc)传递给分层计算时延预测模块;网络遥测模块将动态测算的边缘计算节点和云服务器所在区域的网络带宽情况(ri、rI、rII、rc)和物理距离liI、liII、lic、lI II、lIc、lIIc)传递给传输时延计算模块;The computing power awareness module under the situation awareness center of edge computing node i and the computing power awareness module of cloud server c will move The state-aware edge computing node computing resource load (b i , b I , b II ) and the cloud server c computing resource load (b c ) are passed to the hierarchical computing delay prediction module; the network telemetry module will dynamically calculate the The network bandwidth (r i , r I , r II , r c ) and physical distance l iI , l iII , l ic , l I II , l Ic , l IIc ) of the area where the edge computing node and cloud server are located are passed to the transmission Delay calculation module;
分层计算时延预测模块结合各边缘计算节点和云服务器c的计算资源负载情况和预先储存的分层计算时延预测模型,用以预测各个边缘计算节点计算每一层深度学习模型DNNz所需要的理论分层计算时延(CTiz′、CTIz′、CTIIz′))和单独使用云服务器c进行全部深度学习模型计算所需要的理论计算时延以上理论计算时延结果同步传递给决策信息生成模块;传输时延计算模块以输入的图像特征数据D1为标准,用以测算各个边缘计算节点的理论数据传输时延(Ti、TI、TII)和理论传播时延(SiI、SiII、Sic、SI II、SIc、SIIc),以上理论时延计算结果同步传递给决策信息生成模块:
Ti=Dz/ri、SiI=li1/C,
The hierarchical computing delay prediction module combines the computing resource load of each edge computing node and cloud server c with the pre-stored hierarchical computing delay prediction model to predict the results of each edge computing node calculating each layer of the deep learning model DNN z The required theoretical hierarchical computing latency (CT iz ′, CT Iz ′, CT IIz ′)) and the theoretical computing latency required to use cloud server c alone to perform all deep learning model calculations The above theoretical calculation delay results are synchronously transmitted to the decision information generation module; the transmission delay calculation module uses the input image feature data D 1 as the standard to calculate the theoretical data transmission delay (T i , T I , T II ) and theoretical propagation delay (S iI , S iII , S ic , S I II , S Ic , S IIc ), the above theoretical delay calculation results are synchronously transmitted to the decision information generation module:
T i =D z /r i , S iI =l i1 /C,
以上表示的是经边缘计算节点i传输图像特征数据Dz到边缘计算节点I所需要的数据传输时延Ti和传播时延SiI,数据传输时延Ti和待传输图像特征数据Dz、边缘计算节点i的网络带宽ri有关,传播时延SiI和边缘计算节点i到边缘计算节点I的信道长度(以物理距离liI做估算)、电磁波在信道上的传播速率(以光速C做估算)有关:
The above represents the data transmission delay Ti and propagation delay S iI required to transmit the image feature data D z to the edge computing node I via the edge computing node i , the data transmission delay Ti and the image feature data to be transmitted D z , the network bandwidth r i of the edge computing node i is related, the propagation delay S iI is related to the channel length from the edge computing node i to the edge computing node I (estimated based on the physical distance l iI ), the propagation rate of electromagnetic waves on the channel (based on the speed of light) C for estimating) related to:
同上,决策信息生成模块基于深度强化学习技术,以各个边缘计算节点处理每一层深度学习模型DNNz所需要的理论分层计算时延 (CTiz′、CTIz′、CTIIz′)、单独使用云服务器c进行全部深度学习模型计算所需要的理论计算时延和理论数据传输时延(Ti、TI、TII)、理论传播时延(SiI、SiII、Sic、SI II、SIc、SIIc)为依据,以任务响应时延TIME最小为优化目标,决定最优深度学习模型的分层卸载策略(不同的分层卸载策略对应不同的任务响应时延TIME,目标是寻找最优分层卸载策略):进一步的,在深度学习模型的分层卸载策略生成过程中,避免求解任务响应时延TIME陷入过度优化过程,动态比较单独使用云服务器c时的响应时延TIMEc,即(CT′c+Ti+Sic)与深度学习模型分层卸载模型响应时延最小的TIME大小,若TIME小于TIMEc,则以响应时延TIME最小对应的深度学习模型分层卸载模型为分层卸载策略,完成待计算数据以最小化响应时延为目标的卸载计算;否则以响应时延TIMEc对应的单独使用云服务器c处理待计算数据为最终分层卸载策略,完成待计算数据以最小化响应时延的卸载计算;As above, the decision-making information generation module is based on deep reinforcement learning technology and uses each edge computing node to process the theoretical hierarchical computing delay required for each layer of the deep learning model DNN z . (CT iz ′, CT Iz ′, CT IIz ′), the theoretical computing delay required to use cloud server c alone to calculate all deep learning models Based on the theoretical data transmission delay (T i , T I , T II ) and theoretical propagation delay (S iI , S iII , S ic , S I II , S Ic , S IIc ), the task response delay TIME Minimum is the optimization goal, which determines the hierarchical offloading strategy of the optimal deep learning model (different hierarchical offloading strategies correspond to different task response delays TIME, and the goal is to find the optimal hierarchical offloading strategy): Furthermore, in the process of generating the hierarchical offloading strategy of the deep learning model, to avoid the solution task response delay TIME from falling into an over-optimization process, dynamically compare the response delay TIMEc when using cloud server c alone, that is, (CT′ c +T i +S ic ) and the TIME size with the smallest response delay of the deep learning model hierarchical offloading model. If TIME is less than TIMEc, then the hierarchical offloading model of the deep learning model corresponding to the smallest response delay TIME is used as the hierarchical offloading strategy to complete the calculation. The data is offloaded and calculated with the goal of minimizing the response delay; otherwise, the cloud server c corresponding to the response delay TIMEc is used to process the data to be calculated alone as the final hierarchical offloading strategy, and the data to be calculated is completed to minimize the response delay. ;
决策信息生成模块将生成的最优深度学习模型分层卸载策略传递至决策收发中心(分层卸载策略信息包含参与此次计算的边缘计算节点和边缘计算节点需要计算的深度学习模型层数),并经决策收发中心将策略信息发送至需要参与此次任务计算的各个边缘计算节点的决策收发中心,边缘计算节点按策略开始任务计算。任务计算结果直接发送给物理终端。The decision information generation module transmits the generated optimal deep learning model hierarchical offloading strategy to the decision transceiver center (the hierarchical offloading strategy information includes the edge computing nodes participating in this calculation and the number of deep learning model layers that need to be calculated by the edge computing nodes). The policy information is sent to the decision-making transceiver center of each edge computing node that needs to participate in the task calculation through the decision-making transceiver center. The edge computing node starts task calculation according to the strategy. Task calculation results are sent directly to the physical terminal.
参与任务计算的各个边缘计算节点的在线学习模块收集自身进行任务计算时的计算资源负载情况(CPU负载、GPU负载和缓存负载)和实际计算时延,并传递以上所有样本数据到边缘计算节点i的分层计算时延预测模块,用以更新针对当前深度学习模型的分层计算时延预测模型,进一步的,所有边缘计算节点共享更新后的分层计算时延预测模型。The online learning module of each edge computing node participating in task calculation collects the computing resource load (CPU load, GPU load and cache load) and actual computing delay when performing task calculation, and transfers all the above sample data to edge computing node i The hierarchical computing delay prediction module is used to update the hierarchical computing delay prediction model for the current deep learning model. Furthermore, all edge computing nodes share the updated hierarchical computing delay prediction model.
虽然本发明已以较佳实施例阐述如上,然其并非用以限定本发明。本发明所属技术领域中具有通常知识者,在不脱离本发明的精神和范围内,当可作各种的更动与润饰。因此,本发明的保护范围当视权利要求书所界定者为准。 Although the present invention has been described above with preferred embodiments, they are not intended to limit the present invention. Those with ordinary skill in the technical field to which the present invention belongs can make various modifications and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be determined by the claims.

Claims (6)

  1. 一种基于云边端协同的深度学习模型推理加速方法,所述云边端协同是指云服务器、与云服务器通信的至少两个边缘计算节点,以及至少一个物理终端,物理终端与边缘计算节点的通信距离小于边缘计算节点与云服务器的距离,其特征在于,所述方法包括如下步骤:A deep learning model inference acceleration method based on cloud-edge-device collaboration. The cloud-edge-device collaboration refers to a cloud server, at least two edge computing nodes communicating with the cloud server, and at least one physical terminal. The physical terminal and the edge computing node The communication distance is less than the distance between the edge computing node and the cloud server, and it is characterized in that the method includes the following steps:
    步骤S1、物理终端将图像数据预处理为分辨率相同、数据量相等的图像特征数据D1,输入划分好的待卸载深度学习模型的各个DNN层,即DNNz,将上一层输出作为下一层的输入,最终得到DzStep S1: The physical terminal preprocesses the image data into image feature data D 1 with the same resolution and equal data volume, inputs the divided DNN layers of the deep learning model to be offloaded, that is, DNN z , and uses the output of the upper layer as the next layer. The input of one layer finally gets D z ;
    步骤S2、进行离线学习阶段:基于各个边缘计算节点计算资源预设负载情况,以各个边缘计算节点上待卸载深度学习模型DNNz处理图像特征数据Dz的过程为输入、已知的图像特征数据Dz通过各个边缘计算节点上待卸载深度学习模型的各个DNNz对应的计算时延为输出,构建并训练获得分层计算时延预测模型CT;Step S2, perform the offline learning phase: Based on the preset load conditions of the computing resources of each edge computing node, the process of processing the image feature data D z of the deep learning model DNN z to be offloaded on each edge computing node is used as the input, known image feature data D z uses the computing delays corresponding to each DNN z of the deep learning model to be offloaded on each edge computing node as the output, constructs and trains the hierarchical computing delay prediction model CT;
    同时基于云服务器计算资源预设负载情况,以云服务器上待卸载深度学习模型各个DNNz处理图像特征数据Dz的过程为输入、已知的云服务器上待卸载深度学习模型各个DNNz处理图像特征数据Dz对应的计算时延为输出,构建并训练获得云服务器计算时延预测模型CTc;步骤S3、根据各个边缘计算节点的实际计算资源负载情况,由物理终端的计算任务所对应的边缘计算节点应用分层计算时延预测模型CT,以待卸载深度学习模型各个DNNz处理图像特征数据Dz的过程为输入、获得图像特征数据Dz通过各个边缘计算节点上待卸载深度学习模型各个DNNz对应的计算时延为输出的理论分层计算时延CT′;At the same time, based on the preset load condition of the cloud server computing resources, the process of processing the image feature data D z by each DNN z of the deep learning model to be offloaded on the cloud server is used as input, and the known process of each DNN z of the deep learning model to be offloaded on the cloud server is used to process the image. The computing delay corresponding to the characteristic data D z is the output, and the cloud server computing delay prediction model CT c is constructed and trained; Step S3: According to the actual computing resource load of each edge computing node, the computing task corresponding to the physical terminal is The edge computing node applies the hierarchical computing delay prediction model CT, taking the process of processing the image feature data D z by each DNN z of the deep learning model to be offloaded as input, and obtaining the image feature data D z through the deep learning model to be offloaded on each edge computing node. The calculation delay corresponding to each DNN z is the theoretical hierarchical calculation delay CT′ of the output;
    步骤S4、基于已知的边缘计算节点局域网网络带宽情况r、以及各个边缘计算节点之间的物理距离l,计算经过当前边缘计算节点传输图像特征数据Dz到其他边缘计算节点所需的数据传输时延T和传播时延S;同时基于已知的云服务器网络带宽情况rc、以及计算任务的边缘计算节点与云服务器之间的物理距离lc,计算经过计算任务的边缘计算节点传输图像特征数据D1到云服务器所需的数据传输时延Tc和传播时延SCStep S4: Based on the known edge computing node LAN network bandwidth r and the physical distance l between each edge computing node, calculate the data transmission required to transmit the image feature data D z to other edge computing nodes through the current edge computing node. Time delay T and propagation delay S; at the same time, based on the known cloud server network bandwidth r c and the physical distance l c between the edge computing node of the computing task and the cloud server, calculate the image transmitted by the edge computing node of the computing task The data transmission delay T c and propagation delay S C required for characteristic data D 1 to the cloud server;
    步骤S5、以步骤S3获得的各个边缘计算节点理论分层计算时延CT′、以及步骤S4获得的数据传输时延T和传播时延S为输入,以所对应的响应时延TIME为输出,构建深度学习模型分层卸载模型如下式:
    TIME=F(CT′,T,S)+t,
    Step S5: Take the theoretical hierarchical calculation delay CT′ of each edge computing node obtained in step S3, and the data transmission delay T and propagation delay S obtained in step S4 as input, and use the corresponding response delay TIME as the output, The hierarchical offloading model to build a deep learning model is as follows:
    TIME=F(CT′,T,S)+t,
    并以响应时延TIME最小为优化目标,获得响应时延TIME最小的深度学习模型分层卸载模型, 其中t为边缘计算节点从收到物理终端发送的计算任务到生成深度学习模型分层卸载模型的时间;And with the minimum response delay TIME as the optimization goal, we obtain the deep learning model hierarchical offloading model with the minimum response delay TIME. Where t is the time from the edge computing node receiving the computing task sent by the physical terminal to generating the deep learning model hierarchical offloading model;
    步骤S6、根据步骤S2获得的云服务器计算时延预测模型CTc,以及云服务器的实际计算资源负载情况,应用分层计算时延预测模型CTc,以待卸载深度学习模型各个DNNz处理图像特征数据Dz的过程为输入、获得图像特征数据Dz通过云服务器上待卸载深度学习模型的各个DNNz对应的计算时延为输出的理论分层计算时延CTz′,之后按如下公式:
    Step S6. According to the cloud server computing delay prediction model CT c obtained in step S2 and the actual computing resource load of the cloud server, apply the hierarchical computing delay prediction model CT c to process the image of each DNN z of the deep learning model to be offloaded. The process of feature data D z is to input and obtain image feature data D z through each DNN z of the deep learning model to be offloaded on the cloud server. The corresponding calculation delay is the output theoretical hierarchical calculation time delay CT z ′, and then according to the following formula :
    计算单独使用云服务器处理计算任务所产生的理论计算时延CTc′,其中CT1′为将D1通过DNN1产生的计算时延,之后按如下公式计算单独使用云服务器时处理图像特征数据Dz的响应时延TIMEc:
    TIMEc=F(CTc′,Tc,Sc);
    Calculate the theoretical computing delay CT c ′ caused by using the cloud server alone to process computing tasks, where CT 1 ′ is the computing delay caused by passing D 1 through DNN 1. Then calculate the image feature data when using the cloud server alone according to the following formula The response delay of D z is TIMEc:
    TIME c =F (CT c ′, T c , Sc );
    步骤S7、动态比较单独使用云服务器时的响应时延TIMEc与深度学习模型分层卸载模型响应时延最小的TIME大小,若TIME小于TIMEc,则以响应时延TIME最小对应的深度学习模型分层卸载模型为分层卸载策略,完成待计算数据以最小化响应时延为目标的卸载计算;否则以响应时延TIMEc对应的单独使用云服务器处理待计算数据为最终分层卸载策略,完成待计算数据以最小化响应时延的卸载计算;Step S7: Dynamically compare the response delay TIMEc when using the cloud server alone with the TIME size with the smallest response delay of the deep learning model hierarchical offloading model. If TIME is less than TIMEc, then use the deep learning model corresponding to the smallest response delay TIME to hierarchize. The offloading model is a hierarchical offloading strategy, which completes the offloading calculation of the data to be calculated with the goal of minimizing the response delay; otherwise, the cloud server is used alone to process the data to be calculated corresponding to the response delay TIMEc as the final hierarchical offloading strategy, and the data to be calculated is completed. Data is offloaded to minimize response latency;
    步骤S8、基于步骤S7获得的分层卸载策略,各执行分层卸载策略的边缘计算节点收集计算任务时的计算负载情况和实际计算时延,之后返回步骤S2。Step S8: Based on the hierarchical offloading strategy obtained in step S7, each edge computing node executing the hierarchical offloading strategy collects the computing load and actual computing delay during the computing task, and then returns to step S2.
  2. 根据权利要求1所述的一种基于云边端协同的深度学习模型推理加速方法,其特征在于,已划分好的待卸载深度学习模型的各个DNN层按如下方法获得:将待卸载深度学习模型的隐藏层、输入层以及输出层的所包含的神经元,以各单独成列的神经元为划分为n列,获得单独成列的神经元列,之后获得DNNz
    A deep learning model inference acceleration method based on cloud-edge-device collaboration according to claim 1, characterized in that each DNN layer of the divided deep learning model to be unloaded is obtained as follows: the deep learning model to be unloaded is obtained The neurons contained in the hidden layer, input layer and output layer are divided into n columns based on the neurons in separate columns, and the neuron columns in separate columns are obtained, and then DNN z is obtained.
    n为正整数。n is a positive integer.
  3. 根据权利要求2所述的一种基于云边端协同的深度学习模型推理加速方法,其特征在于, 步骤S1具体为:A deep learning model inference acceleration method based on cloud-edge-device collaboration according to claim 2, characterized in that: Step S1 is specifically as follows:
    基于已分割好的待卸载深度学习模型的各个DNNz以各个边缘计算节点上待卸载深度学习模型各个DNNz处理图像特征数据Dz的过程为输入、图像特征数据Dz通过各个边缘计算节点上待卸载深度学习模型的各个DNNz对应的计算时延为输出,分别构建各个边缘计算节点分层计算时延模型如下式:CT=f(α,β,γ);其中,α为计算资源负载情况预设CPU负载、β为计算资源负载情况预设GPU负载、γ为计算资源负载情况预设缓存负载。Each DNN z based on the segmented deep learning model to be offloaded uses the process of processing the image feature data D z by each DNN z of the deep learning model to be offloaded on each edge computing node as input. The image feature data D z is passed on each edge computing node. The calculation delay corresponding to each DNN z of the deep learning model to be offloaded is the output, and the hierarchical calculation delay model of each edge computing node is constructed as follows: CT=f(α, β, γ); where α is the computing resource load The condition is the preset CPU load, β is the computing resource load condition and the GPU load is preset, and γ is the computing resource load condition and the cache load is preset.
  4. 根据权利要求3所述的一种基于云边端协同的深度学习模型推理加速方法,其特征在于,步骤S3中,基于已知的边缘计算节点局域网网络带宽情况r,各个边缘计算节点之间的物理距离l,按如下公式:
    T=Dz/r,
    S=l/C;
    A deep learning model inference acceleration method based on cloud-edge-device collaboration according to claim 3, characterized in that, in step S3, based on the known edge computing node local area network network bandwidth r, the The physical distance l is as follows:
    T= Dz /r,
    S=l/C;
    分别计算各个边缘计算节点传输图像特征数据Dz到其他边缘计算节点所需的数据传输时延T、传播时延S;其中,光速C代表电磁波在信道上的传播速率。Calculate the data transmission delay T and propagation delay S required by each edge computing node to transmit the image feature data D z to other edge computing nodes respectively; where the speed of light C represents the propagation rate of electromagnetic waves on the channel.
  5. 根据权利要求4所述的一种基于云边端协同的深度学习模型推理加速方法,其特征在于,边缘计算节点包括深度强化网络、态势感知中心、以及决策收发中心;A deep learning model inference acceleration method based on cloud-edge collaboration according to claim 4, characterized in that the edge computing node includes a deep reinforcement network, a situation awareness center, and a decision-making transceiver center;
    其中所述深度强化网络包括:The deep reinforcement network includes:
    分层计算时延预测模块,用于计算理论分层计算时延CT′和CTc′,以及存储分层计算时延预测模型CT和云服务器计算时延预测模型CTcThe hierarchical computing delay prediction module is used to calculate the theoretical hierarchical computing delays CT′ and CT c ′, as well as the storage hierarchical computing delay prediction model CT and the cloud server computing delay prediction model CT c ;
    传输时延计算模块,用于计算数据传输时延T和传播时延S;Transmission delay calculation module, used to calculate data transmission delay T and propagation delay S;
    在线决策时延统计模块,用于计算边缘计算节点从收到物理终端发送的计算任务到生成深度学习模型分层卸载模型的时间t;The online decision-making delay statistics module is used to calculate the time t from the edge computing node receiving the computing task sent by the physical terminal to generating the deep learning model hierarchical offloading model;
    在线学习模块,用于收集并传递计算任务时的实际计算负载情况和实际计算时延数据到边缘计算节点的分层计算时延预测模块;The online learning module is used to collect and transmit the actual computing load and actual computing delay data during computing tasks to the hierarchical computing delay prediction module of the edge computing node;
    离线样本数据存储模块,用于存储各个边缘计算节点和云服务器在预设负载情况下,图像特 征数据Dz通过各个边缘计算节点上待卸载深度学习模型的各个DNNz对应的计算时延,和图像特征数据Dz通过云服务器上待卸载深度学习模型的各个DNNz对应的计算时延;Offline sample data storage module is used to store image characteristics of each edge computing node and cloud server under preset load conditions. The computational delay corresponding to the feature data D z passing through each DNN z of the deep learning model to be offloaded on each edge computing node, and the computational delay corresponding to the image feature data D z passing through each DNN z of the deep learning model to be offloaded on the cloud server;
    决策信息生成模块,用于将生成的最终分层卸载策略传递至决策收发中心;The decision information generation module is used to pass the generated final hierarchical offloading strategy to the decision transceiver center;
    所述态势感知中心,包括:The situation awareness center includes:
    边缘计算节点计算能力感知模块,用于计算各个边缘计算节点的计算资源负载情况;The edge computing node computing capability awareness module is used to calculate the computing resource load of each edge computing node;
    云服务器计算能力感知模块,用于计算云服务器的计算资源负载情况;The cloud server computing capability awareness module is used to calculate the computing resource load of the cloud server;
    网络遥测模块,用于计算各个边缘计算节点的所在局域网的网络带宽情况r,且用于计算各个边缘计算节点之间的物理距离l;The network telemetry module is used to calculate the network bandwidth r of the local area network where each edge computing node is located, and is used to calculate the physical distance l between each edge computing node;
    所述决策收发中心用于发送、接收最终分层卸载策略。The decision-making transceiver center is used to send and receive the final hierarchical offloading strategy.
  6. 根据权利要求5所述的一种基于云边端协同的深度学习模型推理加速方法,其特征在于,云服务器上包括深度学习模型、决策收发中心;所述深度学习模型为已训练好的深度学习模型;所述决策收发中心用于接收最终分层卸载策略。 A deep learning model inference acceleration method based on cloud-edge-device collaboration according to claim 5, characterized in that the cloud server includes a deep learning model and a decision-making transceiver center; the deep learning model is a trained deep learning Model; the decision sending and receiving center is used to receive the final hierarchical offloading strategy.
PCT/CN2023/098730 2022-08-11 2023-06-07 Deep learning model reasoning acceleration method based on cloud-edge-end collaboration WO2024032121A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210961978.9 2022-08-11
CN202210961978.9A CN115034390B (en) 2022-08-11 2022-08-11 Deep learning model reasoning acceleration method based on cloud edge-side cooperation

Publications (1)

Publication Number Publication Date
WO2024032121A1 true WO2024032121A1 (en) 2024-02-15

Family

ID=83130472

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/098730 WO2024032121A1 (en) 2022-08-11 2023-06-07 Deep learning model reasoning acceleration method based on cloud-edge-end collaboration

Country Status (2)

Country Link
CN (1) CN115034390B (en)
WO (1) WO2024032121A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117834643A (en) * 2024-03-05 2024-04-05 南京邮电大学 Deep neural network collaborative reasoning method for industrial Internet of things

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115034390B (en) * 2022-08-11 2022-11-18 南京邮电大学 Deep learning model reasoning acceleration method based on cloud edge-side cooperation
CN115562760B (en) * 2022-11-22 2023-05-30 南京邮电大学 Deep learning model layered unloading method based on edge computing node classification table
CN116894469B (en) * 2023-09-11 2023-12-15 西南林业大学 DNN collaborative reasoning acceleration method, device and medium in end-edge cloud computing environment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309914A (en) * 2019-07-03 2019-10-08 中山大学 Deep learning model reasoning accelerated method based on Edge Server Yu mobile terminal equipment collaboration
CN111242282A (en) * 2020-01-09 2020-06-05 中山大学 Deep learning model training acceleration method based on end edge cloud cooperation
US20200272896A1 (en) * 2019-02-25 2020-08-27 Alibaba Group Holding Limited System for deep learning training using edge devices
CN114422349A (en) * 2022-03-30 2022-04-29 南京邮电大学 Cloud-edge-end-collaboration-based deep learning model training and reasoning architecture deployment method
KR20220061827A (en) * 2020-11-06 2022-05-13 한국전자통신연구원 Adaptive deep learning inference apparatus and method in mobile edge computing
CN115034390A (en) * 2022-08-11 2022-09-09 南京邮电大学 Deep learning model reasoning acceleration method based on cloud edge-side cooperation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3489865B1 (en) * 2017-11-22 2021-01-06 Commissariat à l'énergie atomique et aux énergies alternatives A stdp-based learning method for a network having dual accumulator neurons
CN114153572A (en) * 2021-10-27 2022-03-08 中国电子科技集团公司第五十四研究所 Calculation unloading method for distributed deep learning in satellite-ground cooperative network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200272896A1 (en) * 2019-02-25 2020-08-27 Alibaba Group Holding Limited System for deep learning training using edge devices
CN110309914A (en) * 2019-07-03 2019-10-08 中山大学 Deep learning model reasoning accelerated method based on Edge Server Yu mobile terminal equipment collaboration
CN111242282A (en) * 2020-01-09 2020-06-05 中山大学 Deep learning model training acceleration method based on end edge cloud cooperation
KR20220061827A (en) * 2020-11-06 2022-05-13 한국전자통신연구원 Adaptive deep learning inference apparatus and method in mobile edge computing
CN114422349A (en) * 2022-03-30 2022-04-29 南京邮电大学 Cloud-edge-end-collaboration-based deep learning model training and reasoning architecture deployment method
CN115034390A (en) * 2022-08-11 2022-09-09 南京邮电大学 Deep learning model reasoning acceleration method based on cloud edge-side cooperation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117834643A (en) * 2024-03-05 2024-04-05 南京邮电大学 Deep neural network collaborative reasoning method for industrial Internet of things
CN117834643B (en) * 2024-03-05 2024-05-03 南京邮电大学 Deep neural network collaborative reasoning method for industrial Internet of things

Also Published As

Publication number Publication date
CN115034390A (en) 2022-09-09
CN115034390B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
WO2024032121A1 (en) Deep learning model reasoning acceleration method based on cloud-edge-end collaboration
Liu et al. FedCPF: An efficient-communication federated learning approach for vehicular edge computing in 6G communication networks
CN112202928B (en) Credible unloading cooperative node selection system and method for sensing edge cloud block chain network
Sun et al. Dynamic digital twin and federated learning with incentives for air-ground networks
CN114584581B (en) Federal learning system and federal learning training method for intelligent city internet of things (IOT) letter fusion
WO2023109699A1 (en) Multi-agent communication learning method
Li et al. Joint optimization of computation cost and delay for task offloading in vehicular fog networks
He et al. Resource allocation based on digital twin-enabled federated learning framework in heterogeneous cellular network
Ma et al. Joint scheduling and resource allocation for efficiency-oriented distributed learning over vehicle platooning networks
CN114357676A (en) Aggregation frequency control method for hierarchical model training framework
CN116187429A (en) End Bian Yun collaborative synchronization federal learning training algorithm based on segmentation learning
Wang et al. A credibility-aware swarm-federated deep learning framework in internet of vehicles
Wang et al. Eidls: An edge-intelligence-based distributed learning system over internet of things
Cui et al. Multi-Agent Reinforcement Learning Based Cooperative Multitype Task Offloading Strategy for Internet of Vehicles in B5G/6G Network
Ni et al. An effective hybrid V2V/V2I transmission latency method based on LSTM neural network
Zheng et al. A distributed learning architecture for semantic communication in autonomous driving networks for task offloading
CN114205251B (en) Switch link resource prediction method based on space-time characteristics
CN115150288B (en) Distributed communication system and method
CN116260821A (en) Distributed parallel computing unloading method based on deep reinforcement learning and blockchain
Deng et al. On dynamic resource allocation for blockchain assisted federated learning over wireless channels
Wang et al. Knowledge selection and local updating optimization for federated knowledge distillation with heterogeneous models
CN112910716B (en) Mobile fog calculation loss joint optimization system and method based on distributed DNN
CN114022731A (en) Federal learning node selection method based on DRL
Zhang et al. On-Device Intelligence for 5G RAN: Knowledge Transfer and Federated Learning Enabled UE-Centric Traffic Steering
Shi et al. Multi-UAV-assisted computation offloading in DT-based networks: A distributed deep reinforcement learning approach

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23851349

Country of ref document: EP

Kind code of ref document: A1