CN114756371A - A method and system for optimal configuration of terminal edge joint resources - Google Patents
A method and system for optimal configuration of terminal edge joint resources Download PDFInfo
- Publication number
- CN114756371A CN114756371A CN202210455191.5A CN202210455191A CN114756371A CN 114756371 A CN114756371 A CN 114756371A CN 202210455191 A CN202210455191 A CN 202210455191A CN 114756371 A CN114756371 A CN 114756371A
- Authority
- CN
- China
- Prior art keywords
- video
- edge
- inference
- mobile device
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000005070 sampling Methods 0.000 claims abstract description 53
- 238000013528 artificial neural network Methods 0.000 claims abstract description 37
- 238000004891 communication Methods 0.000 claims abstract description 23
- 230000005540 biological transmission Effects 0.000 claims abstract description 5
- 238000007726 management method Methods 0.000 claims description 21
- 238000013468 resource allocation Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000005265 energy consumption Methods 0.000 abstract description 20
- 238000005457 optimization Methods 0.000 abstract description 18
- 230000006870 function Effects 0.000 description 11
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000009795 derivation Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
本发明公开了一种终端边缘联合资源优化配置的方法及系统,其方法包括:基于边缘控制器记录不同帧数的视频输入到神经网络时,计算神经网络所需要的乘加数以及神经网络识别的准确率;根据移动设备的视频识别请求完成控制任务,所述控制任务包括:基于每个移动设备视频采样的帧数给移动设备所对应的视频采样管理模块发送采样帧数控制信息;基于用户卸载决策确定用户的推理任务是在本地DNN推理模块完成还是在边缘DNN推理模块中完成;基于各个移动视频设备通信资源的分配策略从而确定各个移动设备上行传输视频的时间比例。基于该优化方法同时降低系统的平均推理延时和移动设备的平均能耗,并提高模型推理的准确率。
The invention discloses a method and a system for optimizing the configuration of terminal edge joint resources. The method includes: when an edge controller records videos with different frame numbers and is input to a neural network, calculating the multiplication and summation required by the neural network and identifying the neural network complete the control task according to the video recognition request of the mobile device, and the control task includes: sending sampling frame number control information to the video sampling management module corresponding to the mobile device based on the frame number of each mobile device video sampling; The offloading decision determines whether the user's inference task is completed in the local DNN inference module or in the edge DNN inference module; based on the allocation strategy of the communication resources of each mobile video device, the time proportion of each mobile device's uplink video transmission is determined. Based on this optimization method, the average inference delay of the system and the average energy consumption of mobile devices are reduced, and the accuracy of model inference is improved.
Description
技术领域technical field
本发明主要涉及计算机技术领域,具体涉及一种终端边缘联合资源优化配置的方法及系统。The invention mainly relates to the field of computer technology, and in particular relates to a method and system for optimal configuration of terminal edge joint resources.
背景技术Background technique
网络、云计算、边缘计算、人工智能等技术的发展,引发了人们对元宇宙的无限想象。为了使用户能够在现实世界和虚拟世界之间进行交互,增强现实(AR)技术起着至关重要的作用。同时,人工智能由于其学习和推理能力,在自动语音识别、自然语言处理、计算机视觉等领域发挥着重要的作用。在AI技术的辅助下,AR可以进行更深入的场景理解和更沉浸的交互。The development of technologies such as network, cloud computing, edge computing, and artificial intelligence has triggered people's infinite imagination of the metaverse. Augmented reality (AR) technology plays a vital role in enabling users to interact between the real world and the virtual world. At the same time, artificial intelligence plays an important role in automatic speech recognition, natural language processing, computer vision and other fields due to its learning and reasoning capabilities. With the assistance of AI technology, AR can carry out deeper scene understanding and more immersive interaction.
然而,人工智能算法,尤其是深度神经网络(DNN)的计算复杂度通常非常高。在计算和能量容量有限的移动设备上,很难及时可靠地完成神经网络的推理。实验表明,即使在移动GPU的加速下,典型的单帧图像处理AI推理任务也需要大约600毫秒。此外,连续执行上述推理任务在商品设备上最多只能持续2.5小时。上述问题导致目前只有少数AR应用程序使用深度学。为了减少DNN的推理时间,一种方法是对神经网络进行网络剪枝。但是,如果修剪太多通道,可能会对模型造成破坏,并且可能无法通过微调来恢复令人满意的准确度。However, the computational complexity of artificial intelligence algorithms, especially deep neural networks (DNNs), is usually very high. On mobile devices with limited computing and energy capacity, it is difficult to complete neural network inference reliably in a timely manner. Experiments show that even with the acceleration of mobile GPUs, a typical single-frame image processing AI inference task takes about 600 milliseconds. Furthermore, continuous execution of the above inference tasks can only last up to 2.5 hours on commodity devices. The above problems result in that only a few AR applications currently use deep learning. To reduce the inference time of DNNs, one approach is to perform network pruning on the neural network. However, if you prune too many channels, you may break the model, and fine-tuning may not restore satisfactory accuracy.
移动边缘计算辅助AI是解决这些问题的另一种方法。移动边缘计算和AI技术的集成最近已成为支持计算密集型任务的有前途的范例。边缘计算将AI模型的推理和训练过程转移到靠近数据源的网络边缘。因此,它将减轻网络流量负载、延迟和隐私问题。但是对于边缘计算辅助AI应用的任务,仍然存在大量挑战,具体表现为:Mobile edge computing assisted AI is another way to address these issues. The integration of mobile edge computing and AI technologies has recently emerged as a promising paradigm to support computationally intensive tasks. Edge computing moves the inference and training process of AI models to the edge of the network close to the data source. Therefore, it will alleviate network traffic load, latency and privacy concerns. However, for the task of edge computing-assisted AI applications, there are still a lot of challenges, which are as follows:
(1)边缘的计算资源虽然远远强于终端用户,但也是有限的,一味依靠边缘设备的计算能力,无法很好地解决终端算力不足的问题;(1) Although the computing resources of the edge are far stronger than those of the end user, they are also limited, and blindly relying on the computing power of the edge device cannot solve the problem of insufficient computing power of the terminal;
(2)用户将AI推理任务卸载到边缘计算,虽然一定程度上会降低计算能力不足带来的影响,但同时也会引入通信延时;(2) Users offload AI reasoning tasks to edge computing, which will reduce the impact of insufficient computing power to a certain extent, but will also introduce communication delays;
(3)延时,能耗,准确率三者相互制约,一味提升其中一方的性能必然会导致另外两者性能下降。(3) Delay, energy consumption, and accuracy restrict each other, and blindly improving the performance of one of them will inevitably lead to a decline in the performance of the other two.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于克服现有技术的不足,本发明提供了一种终端边缘联合资源优化配置的方法及系统,基于该优化方法同时降低系统的平均推理延时和移动设备的平均能耗,并提高模型推理的准确率。The purpose of the present invention is to overcome the deficiencies of the prior art. The present invention provides a method and system for optimal configuration of terminal edge joint resources. Based on the optimization method, the average inference delay of the system and the average energy consumption of the mobile device are simultaneously reduced, and the Improve the accuracy of model inference.
本发明提供了一种终端边缘联合资源优化配置的方法,所述方法包括:The present invention provides a method for optimal configuration of terminal edge joint resources, the method comprising:
基于边缘控制器记录不同帧数的视频输入到神经网络时,计算神经网络所需要的乘加数以及神经网络识别的准确率;在收到各个移动设备的视频识别请求时,根据移动设备的视频识别请求完成控制任务,所述控制任务包括:When the video with different frame numbers recorded by the edge controller is input to the neural network, the multiplication and summation required by the neural network and the accuracy of the neural network recognition are calculated; when receiving the video recognition request of each mobile device, the video Identifying requests to complete control tasks, the control tasks include:
确定每个移动设备视频采样的帧数,并基于每个移动设备视频采样的帧数给移动设备所对应的视频采样管理模块发送采样帧数控制信息;Determine the frame number of each mobile device video sampling, and send sampling frame number control information to the video sampling management module corresponding to the mobile device based on the frame number of each mobile device video sampling;
确定用户卸载决策,基于用户卸载决策确定用户的推理任务是在本地DNN推理模块完成还是在边缘DNN推理模块中完成;Determine the user's unloading decision, and determine whether the user's reasoning task is completed in the local DNN reasoning module or in the edge DNN reasoning module based on the user's unloading decision;
基于各个移动视频设备通信资源的分配策略从而确定各个移动设备上行传输视频的时间比例;Based on the allocation strategy of the communication resources of each mobile video device, the time ratio of each mobile device to transmit video upstream is determined;
确定边缘DNN推理模块的资源分配策略来决定每个卸载用户的CPU计算频率。Determine the resource allocation strategy of the edge DNN inference module to determine the CPU computing frequency of each offload user.
所述方法还包括:The method also includes:
所述边缘DNN推理模块从需要卸载的边缘设备中得到上传的视频,然后根据边缘控制器分配的计算资源完成推理,并将推理结果发送至各个移动设备上。The edge DNN inference module obtains the uploaded video from the edge device to be uninstalled, then completes the inference according to the computing resources allocated by the edge controller, and sends the inference result to each mobile device.
所述方法还包括:The method also includes:
所述视频采样管理模块获取边缘计算的采样帧数控制信息,基于采样帧控制信息控制所在移动设备的采样帧数,确定用于神经网络推理的输入视频的帧数。The video sampling management module obtains the sampling frame number control information of edge computing, controls the sampling frame number of the mobile device based on the sampling frame control information, and determines the input video frame number for neural network inference.
所述方法还包括:The method also includes:
所述本地控制器从视频采样管理模块中获取视频,并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器。The local controller obtains the video from the video sampling management module, and decides whether to transmit the video to the edge server according to the user uninstallation decision obtained from the edge controller.
所述本地控制器从视频采样管理模块中获取视频,并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器包括:The local controller obtains the video from the video sampling management module, and decides whether to transmit the video to the edge server according to the user uninstallation decision obtained from the edge controller, including:
若用户卸载决策为1,则将视频传输至边缘服务器进行推理,同时传输的通信资源由基站进行配置;If the user's unloading decision is 1, the video is transmitted to the edge server for inference, and the transmitted communication resources are configured by the base station;
若用户卸载决策为0,那么允许视频在本地完成推理,并根据本地设备信息分配本地的CPU计算资源。If the user's unloading decision is 0, the video is allowed to complete inference locally, and local CPU computing resources are allocated according to local device information.
当移动设备需要在本地完成DNN推理时,本地DNN推理模块获取视频,并使用分配的本地计算资源完成DNN推理。When the mobile device needs to complete the DNN inference locally, the local DNN inference module obtains the video and uses the allocated local computing resources to complete the DNN inference.
相应的,本发明还提供了一种边端协同的视频AI推理系统,所述系统包括:Correspondingly, the present invention also provides a side-end collaborative video AI inference system, the system includes:
边缘控制器,用于记录着当不同帧数的视频输入到神经网络时,计算神经网络所需要的乘加数以及神经网络识别的准确率;在收到各个移动设备的视频识别请求时,根据移动设备的视频识别请求完成控制任务;The edge controller is used to record the multiplication and summation required by the neural network and the accuracy of the neural network recognition when videos of different frame numbers are input to the neural network; when receiving video recognition requests from various mobile devices, according to The video recognition request of the mobile device completes the control task;
边缘DNN推理模块,用于从需要卸载的边缘设备中得到上传的视频,然后根据边缘控制器分配的计算资源完成推理,并将推理结果发送至各个移动设备上;The edge DNN inference module is used to obtain the uploaded video from the edge device that needs to be uninstalled, and then complete the inference according to the computing resources allocated by the edge controller, and send the inference result to each mobile device;
视频采样管理模块,用于获取边缘计算的采样帧数控制信息,基于采样帧控制信息控制所在移动设备的采样帧数,确定用于神经网络推理的输入视频的帧数;The video sampling management module is used to obtain the sampling frame number control information of edge computing, control the sampling frame number of the mobile device based on the sampling frame control information, and determine the input video frame number for neural network inference;
本地控制器,用于从视频采样管理模块中获取视频,并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器;The local controller is used to obtain the video from the video sampling management module, and decide whether to transmit the video to the edge server according to the user uninstallation decision obtained from the edge controller;
本地DNN推理模块,用于当移动设备需要在本地完成DNN推理时,获取视频,并使用分配的本地计算资源完成DNN推理。The local DNN inference module is used to obtain video when the mobile device needs to complete DNN inference locally, and use the allocated local computing resources to complete DNN inference.
所述根据移动设备的视频识别请求完成控制任务包括:The described completion of the control task according to the video recognition request of the mobile device includes:
确定每个移动设备视频采样的帧数,并基于每个移动设备视频采样的帧数给移动设备所对应的视频采样管理模块发送采样帧数控制信息;Determine the frame number of each mobile device video sampling, and send sampling frame number control information to the video sampling management module corresponding to the mobile device based on the frame number of each mobile device video sampling;
确定用户卸载决策,基于用户卸载决策确定用户的推理任务是在本地DNN推理模块完成还是在边缘DNN推理模块中完成;Determine the user's unloading decision, and determine whether the user's reasoning task is completed in the local DNN reasoning module or in the edge DNN reasoning module based on the user's unloading decision;
基于各个移动视频设备通信资源的分配策略从而确定各个移动设备上行传输视频的时间比例;Based on the allocation strategy of the communication resources of each mobile video device, the time ratio of each mobile device to transmit video upstream is determined;
确定边缘DNN推理模块的资源分配策略来决定每个卸载用户的CPU计算频率。Determine the resource allocation strategy of the edge DNN inference module to determine the CPU computing frequency of each offload user.
所述本地控制器在若用户卸载决策为1时,则将视频传输至边缘服务器进行推理,同时传输的通信资源由基站进行配置。The local controller transmits the video to the edge server for inference if the user's unloading decision is 1, and the transmitted communication resources are configured by the base station.
所述本地控制器在若用户卸载决策为0时,那么允许视频在本地完成推理,并根据本地设备信息分配本地的CPU计算资源。The local controller allows the video to complete inference locally if the user's uninstallation decision is 0, and allocates local CPU computing resources according to local device information.
本发明实施例具有的有益效果如下:The beneficial effects that the embodiment of the present invention has are as follows:
(1)本发明基于边端协同的AI推理算法卸载架构提供了一个边端协同的视频AI推理系统的架构,边缘服务器可以根据请求用户的数量,确定用户用于检测的视频帧数,并给出用户的卸载策略和用户的通信计算资源分配方案。(1) The present invention provides the architecture of a video AI inference system based on side-end collaboration based on the AI inference algorithm offloading architecture. The edge server can determine the number of video frames used by the user for detection according to the number of requesting users, and provide the The user's offloading strategy and the user's communication computing resource allocation scheme are determined.
(2)多维度的性能优化。在给出的系统架构下,联合考虑推理延时,终端能耗和识别准确率,提出了一种有效的算法,在降低延时和能耗的同时,提升神经网络的识别准确率。(2) Multi-dimensional performance optimization. Under the given system architecture, considering inference delay, terminal energy consumption and recognition accuracy jointly, an effective algorithm is proposed, which can improve the recognition accuracy of neural network while reducing delay and energy consumption.
(3)性能权衡分析。该发明给出了推理延时,终端能耗和识别准确率三者之间的权衡关系,利用该关系,可以进行针对性的系统优化,不同应用下AI推理的系统性能。(3) Performance trade-off analysis. The invention provides a trade-off relationship between inference delay, terminal energy consumption and recognition accuracy. By using this relationship, targeted system optimization can be performed, and the system performance of AI inference under different applications can be performed.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见的,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.
图1是本发明实施例中的边端协同的视频AI推理系统结构示意图;1 is a schematic structural diagram of a video AI inference system with side-end collaboration in an embodiment of the present invention;
图2是本发明实施例中的不同卸载策略的性能比较示意图;Fig. 2 is the performance comparison schematic diagram of different unloading strategies in the embodiment of the present invention;
图3是本发明实施例中的延时、能耗、识别准确率权衡关系示意图。FIG. 3 is a schematic diagram of a trade-off relationship among delay, energy consumption, and recognition accuracy in an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
本发明主要解决的技术问题是针对基于深度学习的视频识别任务,提出一种边端协同的视频AI推理系统的架构,并且根据该架构,提出了一种同时优化执行深度学习推理任务的延时,能耗以及准确率的方法。同时,本发明给出了延时,能耗以及准确率的权衡关系,该关系可以作为系统设计的参考。The main technical problem to be solved by the present invention is to propose an architecture of an edge-to-end collaborative video AI inference system for video recognition tasks based on deep learning, and based on the architecture, propose a simultaneous optimization of the delay for executing deep learning inference tasks , energy consumption, and method of accuracy. At the same time, the present invention provides a trade-off relationship between delay, energy consumption and accuracy, and the relationship can be used as a reference for system design.
本发明提出了一种边端协同的视频AI推理系统的架构,其功能是,在一个边缘服务器通过无线接入服务多个移动设备场景下,当这些设备都需要进行基于AI的视频识别任务时,通过合理调整进行识别的视频帧数以及联合配置无线与计算资源,达到降低神经网络的推理延时和能耗,同时提高神经网络推理准确率的目的。The present invention proposes an architecture of an edge-end collaborative video AI inference system, the function of which is that in the scenario where an edge server serves multiple mobile devices through wireless access, when these devices all need to perform AI-based video recognition tasks , by reasonably adjusting the number of video frames for identification and jointly configuring wireless and computing resources to reduce the inference delay and energy consumption of the neural network, and at the same time improve the accuracy of the neural network inference.
具体的,图1示出了本发明实施例中的边端协同的视频AI推理系统结构示意图,该系统包括:Specifically, FIG. 1 shows a schematic structural diagram of a video AI inference system with side-end collaboration in an embodiment of the present invention, where the system includes:
边缘控制器,用于记录着当不同帧数的视频输入到神经网络时,计算神经网络所需要的乘加数以及神经网络识别的准确率;在收到各个移动设备的视频识别请求时,根据移动设备的视频识别请求完成控制任务;The edge controller is used to record the multiplication and summation required by the neural network and the accuracy of the neural network recognition when videos of different frame numbers are input to the neural network; when receiving video recognition requests from various mobile devices, according to The video recognition request of the mobile device completes the control task;
边缘DNN推理模块,用于从需要卸载的边缘设备中得到上传的视频,然后根据边缘控制器分配的计算资源完成推理,并将推理结果发送至各个移动设备上;The edge DNN inference module is used to obtain the uploaded video from the edge device that needs to be uninstalled, and then complete the inference according to the computing resources allocated by the edge controller, and send the inference result to each mobile device;
视频采样管理模块,用于获取边缘计算的采样帧数控制信息,基于采样帧控制信息控制所在移动设备的采样帧数,确定用于神经网络推理的输入视频的帧数;The video sampling management module is used to obtain the sampling frame number control information of edge computing, control the sampling frame number of the mobile device based on the sampling frame control information, and determine the input video frame number for neural network inference;
本地控制器,用于从视频采样管理模块中获取视频,并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器;The local controller is used to obtain the video from the video sampling management module, and decide whether to transmit the video to the edge server according to the user uninstallation decision obtained from the edge controller;
本地DNN推理模块,用于当移动设备需要在本地完成DNN推理时,获取视频,并使用分配的本地计算资源完成DNN推理。The local DNN inference module is used to obtain video when the mobile device needs to complete DNN inference locally, and use the allocated local computing resources to complete DNN inference.
所述根据移动设备的视频识别请求完成控制任务包括:确定每个移动设备视频采样的帧数,并基于每个移动设备视频采样的帧数给移动设备所对应的视频采样管理模块发送采样帧数控制信息;确定用户卸载决策,基于用户卸载决策确定用户的推理任务是在本地DNN推理模块完成还是在边缘DNN推理模块中完成;基于各个移动视频设备通信资源的分配策略从而确定各个移动设备上行传输视频的时间比例;确定边缘DNN推理模块的资源分配策略来决定每个卸载用户的CPU计算频率。The described completion of the control task according to the video recognition request of the mobile device includes: determining the frame number of each mobile device video sampling, and sending the sampling frame number to the video sampling management module corresponding to the mobile device based on the frame number of each mobile device video sampling Control information; determine user offloading decision, and determine whether the user's reasoning task is completed in the local DNN inference module or in the edge DNN inference module based on the user's offloading decision; determine the uplink transmission of each mobile device based on the allocation strategy of the communication resources of each mobile video device The time scale of the video; determine the resource allocation strategy of the edge DNN inference module to determine the CPU computing frequency of each offloading user.
所述本地控制器在若用户卸载决策为1时,则将视频传输至边缘服务器进行推理,同时传输的通信资源由基站进行配置;在若用户卸载决策为0时,那么允许视频在本地完成推理,并根据本地设备信息分配本地的CPU计算资源。When the user's unloading decision is 1, the local controller transmits the video to the edge server for reasoning, and the transmitted communication resources are configured by the base station; when the user's unloading decision is 0, the video is allowed to complete the reasoning locally. , and allocate local CPU computing resources according to local device information.
基于图1所示的系统结构,本发明实施例中的终端边缘联合资源优化配置的方法,所述方法包括:基于边缘控制器记录不同帧数的视频输入到神经网络时,计算神经网络所需要的乘加数以及神经网络识别的准确率;在收到各个移动设备的视频识别请求时,根据移动设备的视频识别请求完成控制任务,所述控制任务包括:确定每个移动设备视频采样的帧数,并基于每个移动设备视频采样的帧数给移动设备所对应的视频采样管理模块发送采样帧数控制信息;确定用户卸载决策,基于用户卸载决策确定用户的推理任务是在本地DNN推理模块完成还是在边缘DNN推理模块中完成;基于各个移动视频设备通信资源的分配策略从而确定各个移动设备上行传输视频的时间比例;确定边缘DNN推理模块的资源分配策略来决定每个卸载用户的CPU计算频率。Based on the system structure shown in FIG. 1 , the method for optimizing the configuration of terminal edge joint resources in an embodiment of the present invention includes: when the edge controller records videos with different frame numbers and inputs them to the neural network, calculating the required amount of the neural network. When receiving the video recognition request of each mobile device, complete the control task according to the video recognition request of the mobile device, and the control task includes: determining the frame of each mobile device video sampling and send sampling frame number control information to the video sampling management module corresponding to the mobile device based on the frame number of the video sampling of each mobile device; determine the user uninstallation decision, and determine the user's inference task based on the user uninstallation decision is in the local DNN inference module Whether it is completed or completed in the edge DNN inference module; based on the allocation strategy of the communication resources of each mobile video device to determine the time proportion of each mobile device uplink video transmission; determine the resource allocation strategy of the edge DNN inference module to determine the CPU computing of each offloading user frequency.
进一步的,所述边缘DNN推理模块从需要卸载的边缘设备中得到上传的视频,然后根据边缘控制器分配的计算资源完成推理,并将推理结果发送至各个移动设备上。Further, the edge DNN inference module obtains the uploaded video from the edge device to be uninstalled, then completes the inference according to the computing resources allocated by the edge controller, and sends the inference result to each mobile device.
进一步的,所述视频采样管理模块获取边缘计算的采样帧数控制信息,基于采样帧控制信息控制所在移动设备的采样帧数,确定用于神经网络推理的输入视频的帧数。Further, the video sampling management module obtains the sampling frame number control information of edge computing, controls the sampling frame number of the mobile device based on the sampling frame control information, and determines the input video frame number for neural network inference.
进一步的,所述本地控制器从视频采样管理模块中获取视频,并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器。Further, the local controller obtains the video from the video sampling management module, and decides whether to transmit the video to the edge server according to the user uninstallation decision obtained from the edge controller.
进一步的,所述本地控制器从视频采样管理模块中获取视频,并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器包括:若用户卸载决策为1,则将视频传输至边缘服务器进行推理,同时传输的通信资源由基站进行配置;若用户卸载决策为0,那么允许视频在本地完成推理,并根据本地设备信息分配本地的CPU计算资源。Further, the local controller obtains the video from the video sampling management module, and decides whether to transmit the video to the edge server according to the user unloading decision obtained from the edge controller. The edge server performs inference, and the communication resources transmitted at the same time are configured by the base station; if the user's unloading decision is 0, the video is allowed to complete inference locally, and local CPU computing resources are allocated according to local device information.
进一步的,当移动设备需要在本地完成DNN推理时,本地DNN推理模块获取视频,并使用分配的本地计算资源完成DNN推理。Further, when the mobile device needs to complete the DNN inference locally, the local DNN inference module obtains the video, and uses the allocated local computing resources to complete the DNN inference.
需要说明的是,根据上述系统架构,本发明提出了对应的优化方案,同时降低系统的平均推理延时和移动设备的平均能耗,并提高模型推理的准确率,具体算法如下:It should be noted that, according to the above system architecture, the present invention proposes a corresponding optimization scheme, which simultaneously reduces the average inference delay of the system and the average energy consumption of the mobile device, and improves the accuracy of model inference. The specific algorithm is as follows:
首先,利用神经网络的乘加数模型得到完成神经网络模型推理需要的乘加数C(Mn),考虑到对于每一层神经网络,完成推理需要的乘加数和输入大小成正比,因此经过推导,总的计算乘加数可以粗略地由输入视频帧数的线性函数表示,记为:First, the multiplier-adder C(M n ) required to complete the inference of the neural network model is obtained by using the multiplier-adder model of the neural network. Considering that for each layer of the neural network, the multiplier-adder required to complete the inference is proportional to the input size, so After derivation, the total calculated multiplication and addend can be roughly represented by a linear function of the number of input video frames, denoted as:
C(Mn)=mc,0Mn+mc,1 C(M n )=m c, 0 M n +m c, 1
其中mc,0和mc,1是通过拟合得到的参数,由模型的架构决定,Mn为采样帧数控制信息,然后根据乘加数给出神经网络的计算延时Dn和能耗En的表达式:Among them, m c, 0 and m c, 1 are parameters obtained by fitting, which are determined by the architecture of the model. M n is the sampling frame number control information, and then the calculation delay D n and energy of the neural network are given according to the multiplication and addend. The expression that consumes En :
其中ρ表示每一个乘加运算需要的CPU转数,d表示每一帧视频的大小,Rn表示第n个移动设备的通信速率,tn代表第n个移动设备的通信时间比例(即通信资源分配比例),κ表示能耗系数,pn表示移动设备的发射功率,xn表示用户卸载决策,表示分配给本地的CPU计算资源,表示根据边缘控制器分配的计算资源,xn代表是否卸载到边缘计算(如果xn=1则代表卸载,否则就在本地计算)。where ρ represents the number of CPU revolutions required for each multiplication and addition operation, d represents the size of each frame of video, R n represents the communication rate of the nth mobile device, and tn represents the communication time ratio of the nth mobile device (that is, the communication resource allocation ratio), κ represents the energy consumption coefficient, p n represents the transmit power of the mobile device, x n represents the user’s unloading decision, Indicates the CPU computing resources allocated to the local, Indicates the computing resources allocated according to the edge controller, and x n represents whether to offload to edge computing (if x n =1, it represents offloading, otherwise, it is calculated locally).
对于准确率,输入视频的帧数越多,模型预测的准确率越高。而随着视频输入帧数的增加,模型预测准确率的增益会逐渐下降。因此,准确率Φ(Mn)可以表示成如下函数:For accuracy, the more frames of the input video, the higher the accuracy of the model prediction. As the number of video input frames increases, the gain of the model prediction accuracy will gradually decrease. Therefore, the accuracy rate Φ(M n ) can be expressed as the following function:
其中ma,0,ma,1以及ma,2是通过拟合得到的参数,数值由神经网络的模型和任务决定。where m a, 0 , m a, 1 and m a, 2 are parameters obtained by fitting, and the values are determined by the model and task of the neural network.
综上所述,优化的目标函数为:In summary, the optimized objective function is:
其中:β1,β2,β3分别为权重系数,优化目标是降低总延时Dn和总能耗En,并提高用户识别准确率Φ(Mn),约束条件为①用户帧数符合给定的范围,②参与卸载的设备其通信时间比例之和小于1,③边缘设备的分配计算频率之和小于其上限,④分配的通信时间以及边缘计算频率需要大于0,⑤移动设备的计算频率分配为大于0小于的值,⑥xn为卸载或者不卸载。Among them: β 1 , β 2 , β 3 are the weight coefficients respectively, and the optimization goal is to reduce the total delay D n and the total energy consumption E n , and improve the user identification accuracy Φ(M n ), the constraint condition is ① the number of user frames In line with the given range, ② the sum of the proportion of the communication time of the devices participating in offloading is less than 1, ③ the sum of the allocated computing frequencies of the edge devices is less than the upper limit, ④ the allocated communication time and the edge computing frequency need to be greater than 0, ⑤ the mobile device’s The calculation frequency is assigned to be greater than 0 and less than , ⑥x n means uninstall or not.
为了求解该优化问题,设定卸载策略xn是给定的,然后把问题分解为2个子问题,分别是在本地完成推理的移动设备的资源优化问题和在边缘完成推理的移动设备的资源优化问题。设在本地完成推理的用户集合为N0,在边缘完成推理的用户集合为N1,因此该优化问题转化为两个子优化问题。In order to solve the optimization problem, the offloading strategy x n is given, and then the problem is decomposed into two sub-problems, namely, the resource optimization problem of mobile devices that complete inference locally and the resource optimization problem of mobile devices that complete inference at the edge. question. Assume that the set of users who complete inference locally is N 0 , and the set of users who complete inference at the edge is N 1 , so the optimization problem is transformed into two sub-optimization problems.
对于N0的资源优化问题,其问题表述为:For the resource optimization problem of N 0 , the problem is expressed as:
其中勾在本地计算时的代价函数。需要满足的约束条件为①用户帧数符合给定的范围,②移动设备的计算频率分配为大于0小于的值in Check the cost function when calculating locally. The constraints that need to be met are ① the number of user frames conforms to the given range, ② the calculation frequency of the mobile device is allocated to be greater than 0 and less than the value of
通过推导,可以得到求解该子问题的闭式表达式:By derivation, the closed-form expression for solving this subproblem can be obtained:
对于N1的资源优化问题,其问题表述为:For the resource optimization problem of N 1 , the problem is formulated as:
其中勾卸载到边缘计算时的代价函数。需要满足的约束条件为①用户帧数符合给定的范围,②参与卸载的设备其通信时间比例之和小于1,③边缘设备的分配计算频率之和小于其上限,④分配的通信时间以及边缘计算频率需要大于0。in Check the cost function when offloading to edge computing. The constraints that need to be met are: ① the number of user frames conforms to the given range, ② the sum of the communication time ratios of the devices participating in the offloading is less than 1, ③ the sum of the allocated calculation frequencies of the edge devices is less than the upper limit, ④ the allocated communication time and the edge The calculation frequency needs to be greater than 0.
通过推导,可以得到tn,与Mn的关系表达式为:By derivation, t n can be obtained, The relational expression with Mn is:
然后通过凸优化的方法可以求解该问题。Then the problem can be solved by the method of convex optimization.
对于卸载策略xn的求解问题,本发明实施例基于贪婪的迭代算法来实现。可以观察到在本地执行推理时,成本函数和优化变量Mn,只依赖于设备自身的参数,不受其他设备参数的影响。但是,对于边集代价函数与集合中的设备数量和参数有关。下面介绍基于算法的原理。首先,计算每个设备的任务在本地执行时集合的成本函数其次,舍得所有设备都卸载到边缘服务器进行推理并且在每次迭代中,得到的每个设备对应的代价函数比较成本函数和代价函数在集合中的设备,可以得到和之间的差值,并选择差值最大的设备记为设备y。尝试将集合中的设备y放入集合并计算新集合的成本。如果新集合的总成本降低,则继续下一次迭代。否则,将设备y放回集合算法结束。For the problem of solving the unloading strategy x n , the embodiments of the present invention are implemented based on a greedy iterative algorithm. It can be observed that when performing inference locally, the cost function and the optimization variable M n , It only depends on the parameters of the device itself and is not affected by the parameters of other devices. However, for edge sets Cost Functions and Sets The number of devices in is related to the parameters. The algorithm-based principle is described below. First, compute the set of tasks for each device when executed locally cost function of Second, it is worthwhile to offload all devices to edge servers for inference and At each iteration, we get The cost function corresponding to each device of Compare cost functions and cost function exist The devices in the collection can be obtained and The difference between and select the device with the largest difference as device y. try to set put the device y in the collection and calculate the cost of the new set. If the total cost of the new set decreases, proceed to the next iteration. Otherwise, put device y back in the collection The algorithm ends.
将下行带宽设置为5Mhz,路径损耗建模为PL=128.1+37.6log10(D),其中D是设备与无线接入点之间的距离,单位为公里。设备随机分布在[500m 500m]范围内。MEC服务器和设备的计算资源分别设置为1.8GHz和22GHz。识别精度要求和最大输入视频帧数设置为系数κ=1028,由对应的设备确定。输入视频的大小为112*112*Mn。此外,计算复杂度系数设置为ρ=12这是在通过多次实验获得的。权重β1,β2,β3分别设置为0.2,0.2,0.6。Setting the downlink bandwidth to 5Mhz, the path loss is modeled as PL=128.1+37.6log 10 (D), where D is the distance between the device and the AP in kilometers. Devices are randomly distributed in the [500m 500m] range. The computing resources of the MEC server and device are set to 1.8GHz and 22GHz, respectively. The recognition accuracy requirements and the maximum number of input video frames are set to The coefficient κ=10 28 is determined by the corresponding equipment. The size of the input video is 112*112* Mn . Furthermore, the computational complexity factor is set to ρ=12 which is obtained through multiple experiments. The weights β 1 , β 2 , and β 3 are set to 0.2, 0.2, and 0.6, respectively.
将提出的卸载方案与本地推理方案(Local)、边缘推理方案(Edge)、随机卸载方案(Random)进行比较,实验结果如图2所示。当设备数量小于10时,仅在边缘执行任务的方案的成本几乎等于提出卸载方案的成本。这是因为当设备数量较少时,所有设备都可以从边缘服务器上执行推理中受益。如果推理任务只在本地执行,设备的平均成本不会改变,因为设备之间的本地资源互不影响。Edge方案的曲线是线性的,因为实验中所有用户的AI模型都是相同的。The proposed offloading scheme is compared with the local reasoning scheme (Local), the edge reasoning scheme (Edge), and the random offloading scheme (Random). The experimental results are shown in Figure 2. When the number of devices is less than 10, the cost of only executing tasks at the edge is almost equal to the cost of proposing an offloading solution. This is because when the number of devices is small, all devices can benefit from performing inference on edge servers. If the inference task is only performed locally, the average cost of the device does not change because the local resources between devices do not affect each other. The curve of the Edge scheme is linear because the AI model is the same for all users in the experiment.
本实验使用不同的权重β1,β2,β3来分析平均延迟、能耗和准确率之间的权衡关系。权衡曲面的性能是通过提出的卸载和分配方案获得的,约束是β1+β2+β3=1。如图3所示,延迟、能耗和准确性是相互限制,三者相互权衡。当延迟恒定时,更高的识别精度需要更高的能耗。从另一个角度来说,为了提高准确率,需要牺牲延迟和能耗的性能。此外,具有相同的精度时,更高的能耗会使设备更倾向于在本地执行推理任务,从而使得延迟降低。This experiment uses different weights β 1 , β 2 , β 3 to analyze the trade-off relationship between average delay, energy consumption and accuracy. The performance of the trade-off surface is obtained by the proposed unloading and allocation scheme with the constraint β 1 +β 2 +β 3 =1. As shown in Figure 3, latency, energy consumption and accuracy are mutually limited, and the three are trade-offs. When the delay is constant, higher recognition accuracy requires higher energy consumption. From another point of view, in order to improve the accuracy, it is necessary to sacrifice the performance of latency and energy consumption. Additionally, with the same accuracy, higher power consumption makes the device more inclined to perform inference tasks locally, resulting in lower latency.
本发明实施例具有的有益效果如下:The beneficial effects that the embodiment of the present invention has are as follows:
(1)本发明基于边端协同的AI推理算法卸载架构提供了一个边端协同的视频AI推理系统的架构,边缘服务器可以根据请求用户的数量,确定用户用于检测的视频帧数,并给出用户的卸载策略和用户的通信计算资源分配方案。(1) The present invention provides the architecture of a video AI inference system based on side-end collaboration based on the AI inference algorithm offloading architecture. The edge server can determine the number of video frames used by the user for detection according to the number of requesting users, and provide the The user's offloading strategy and the user's communication computing resource allocation scheme are determined.
(2)多维度的性能优化。在给出的系统架构下,联合考虑推理延时,终端能耗和识别准确率,提出了一种有效的算法,在降低延时和能耗的同时,提升神经网络的识别准确率。(2) Multi-dimensional performance optimization. Under the given system architecture, considering inference delay, terminal energy consumption and recognition accuracy jointly, an effective algorithm is proposed, which can improve the recognition accuracy of neural network while reducing delay and energy consumption.
(3)性能权衡分析。该发明给出了推理延时,终端能耗和识别准确率三者之间的权衡关系,利用该关系,可以进行针对性的系统优化,提升不同应用下AI推理的系统性能。(3) Performance trade-off analysis. The invention provides a trade-off relationship between inference delay, terminal energy consumption and recognition accuracy. By using this relationship, targeted system optimization can be performed to improve the system performance of AI inference in different applications.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,ReadOnly Memory)、随机存取存储器(RAM,Random AccessMemory)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the storage medium can include: Read only memory (ROM, ReadOnly Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
另外,以上对本发明实施例进行了详细介绍,本文中应采用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。In addition, the above embodiments of the present invention have been introduced in detail, and specific examples should be used in this paper to illustrate the principles and implementations of the present invention, and the descriptions of the above embodiments are only used to help understand the method of the present invention and its core idea; Meanwhile, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific embodiments and application scope. In summary, the contents of this specification should not be construed as limiting the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210455191.5A CN114756371A (en) | 2022-04-27 | 2022-04-27 | A method and system for optimal configuration of terminal edge joint resources |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210455191.5A CN114756371A (en) | 2022-04-27 | 2022-04-27 | A method and system for optimal configuration of terminal edge joint resources |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114756371A true CN114756371A (en) | 2022-07-15 |
Family
ID=82333211
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210455191.5A Pending CN114756371A (en) | 2022-04-27 | 2022-04-27 | A method and system for optimal configuration of terminal edge joint resources |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114756371A (en) |
-
2022
- 2022-04-27 CN CN202210455191.5A patent/CN114756371A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Elgendy et al. | Joint computation offloading and task caching for multi-user and multi-task MEC systems: reinforcement learning-based algorithms | |
CN110377353B (en) | System and method for unloading computing tasks | |
CN109814951B (en) | Joint optimization method for task unloading and resource allocation in mobile edge computing network | |
CN110971706B (en) | Approximate Optimization and Reinforcement Learning-Based Task Offloading Methods in MEC | |
CN112888002B (en) | A game-theory-based task offloading and resource allocation method for mobile edge computing | |
CN113950066A (en) | Method, system and device for offloading partial computing on a single server in a mobile edge environment | |
CN110662238B (en) | A reinforcement learning scheduling method and device for burst requests in edge networks | |
CN112860350A (en) | Task cache-based computation unloading method in edge computation | |
CN113242568A (en) | Task unloading and resource allocation method in uncertain network environment | |
CN111800828A (en) | A mobile edge computing resource allocation method for ultra-dense networks | |
CN113760511B (en) | A task offloading method for edge computing based on deep deterministic strategy | |
CN111913723A (en) | Cloud-edge-end cooperative unloading method and system based on assembly line | |
CN111565380B (en) | Hybrid offloading method based on NOMA-MEC in the Internet of Vehicles | |
CN113568727A (en) | Mobile edge calculation task allocation method based on deep reinforcement learning | |
CN111711962B (en) | A method for coordinated scheduling of subtasks in mobile edge computing systems | |
CN116489708B (en) | Meta universe oriented cloud edge end collaborative mobile edge computing task unloading method | |
CN111130911A (en) | Calculation unloading method based on mobile edge calculation | |
CN114340016A (en) | A method and system for offloading and distributing power grid edge computing | |
CN114641041B (en) | Internet of vehicles slicing method and device oriented to edge intelligence | |
CN117997902B (en) | Cloud edge collaboration-based data distribution method and system | |
CN114356544A (en) | Parallel computing method and system for edge cluster | |
CN112989251B (en) | A mobile web augmented reality 3D model data service method based on collaborative computing | |
Li | Optimization of task offloading problem based on simulated annealing algorithm in MEC | |
CN117749796A (en) | Cloud edge computing power network system calculation unloading method and system | |
CN113504949A (en) | Task unloading and parameter optimization method and system for MAR client in edge computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |