CN114756371A

CN114756371A - A method and system for optimal configuration of terminal edge joint resources

Info

Publication number: CN114756371A
Application number: CN202210455191.5A
Authority: CN
Inventors: 潘广进; 徐树公; 张恒
Original assignee: SHANGHAI UNIVERSITY; Foshan Zhiyouren Technology Co ltd
Current assignee: SHANGHAI UNIVERSITY; Foshan Zhiyouren Technology Co ltd
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2022-07-15

Abstract

The invention discloses a method and a system for optimizing the configuration of terminal edge joint resources. The method includes: when an edge controller records videos with different frame numbers and is input to a neural network, calculating the multiplication and summation required by the neural network and identifying the neural network complete the control task according to the video recognition request of the mobile device, and the control task includes: sending sampling frame number control information to the video sampling management module corresponding to the mobile device based on the frame number of each mobile device video sampling; The offloading decision determines whether the user's inference task is completed in the local DNN inference module or in the edge DNN inference module; based on the allocation strategy of the communication resources of each mobile video device, the time proportion of each mobile device's uplink video transmission is determined. Based on this optimization method, the average inference delay of the system and the average energy consumption of mobile devices are reduced, and the accuracy of model inference is improved.

Description

A method and system for optimal configuration of terminal edge joint resources

技术领域technical field

本发明主要涉及计算机技术领域，具体涉及一种终端边缘联合资源优化配置的方法及系统。The invention mainly relates to the field of computer technology, and in particular relates to a method and system for optimal configuration of terminal edge joint resources.

背景技术Background technique

网络、云计算、边缘计算、人工智能等技术的发展，引发了人们对元宇宙的无限想象。为了使用户能够在现实世界和虚拟世界之间进行交互，增强现实(AR)技术起着至关重要的作用。同时，人工智能由于其学习和推理能力，在自动语音识别、自然语言处理、计算机视觉等领域发挥着重要的作用。在AI技术的辅助下，AR可以进行更深入的场景理解和更沉浸的交互。The development of technologies such as network, cloud computing, edge computing, and artificial intelligence has triggered people's infinite imagination of the metaverse. Augmented reality (AR) technology plays a vital role in enabling users to interact between the real world and the virtual world. At the same time, artificial intelligence plays an important role in automatic speech recognition, natural language processing, computer vision and other fields due to its learning and reasoning capabilities. With the assistance of AI technology, AR can carry out deeper scene understanding and more immersive interaction.

然而，人工智能算法，尤其是深度神经网络(DNN)的计算复杂度通常非常高。在计算和能量容量有限的移动设备上，很难及时可靠地完成神经网络的推理。实验表明，即使在移动GPU的加速下，典型的单帧图像处理AI推理任务也需要大约600毫秒。此外，连续执行上述推理任务在商品设备上最多只能持续2.5小时。上述问题导致目前只有少数AR应用程序使用深度学。为了减少DNN的推理时间，一种方法是对神经网络进行网络剪枝。但是，如果修剪太多通道，可能会对模型造成破坏，并且可能无法通过微调来恢复令人满意的准确度。However, the computational complexity of artificial intelligence algorithms, especially deep neural networks (DNNs), is usually very high. On mobile devices with limited computing and energy capacity, it is difficult to complete neural network inference reliably in a timely manner. Experiments show that even with the acceleration of mobile GPUs, a typical single-frame image processing AI inference task takes about 600 milliseconds. Furthermore, continuous execution of the above inference tasks can only last up to 2.5 hours on commodity devices. The above problems result in that only a few AR applications currently use deep learning. To reduce the inference time of DNNs, one approach is to perform network pruning on the neural network. However, if you prune too many channels, you may break the model, and fine-tuning may not restore satisfactory accuracy.

移动边缘计算辅助AI是解决这些问题的另一种方法。移动边缘计算和AI技术的集成最近已成为支持计算密集型任务的有前途的范例。边缘计算将AI模型的推理和训练过程转移到靠近数据源的网络边缘。因此，它将减轻网络流量负载、延迟和隐私问题。但是对于边缘计算辅助AI应用的任务，仍然存在大量挑战，具体表现为：Mobile edge computing assisted AI is another way to address these issues. The integration of mobile edge computing and AI technologies has recently emerged as a promising paradigm to support computationally intensive tasks. Edge computing moves the inference and training process of AI models to the edge of the network close to the data source. Therefore, it will alleviate network traffic load, latency and privacy concerns. However, for the task of edge computing-assisted AI applications, there are still a lot of challenges, which are as follows:

(1)边缘的计算资源虽然远远强于终端用户，但也是有限的，一味依靠边缘设备的计算能力，无法很好地解决终端算力不足的问题；(1) Although the computing resources of the edge are far stronger than those of the end user, they are also limited, and blindly relying on the computing power of the edge device cannot solve the problem of insufficient computing power of the terminal;

(2)用户将AI推理任务卸载到边缘计算，虽然一定程度上会降低计算能力不足带来的影响，但同时也会引入通信延时；(2) Users offload AI reasoning tasks to edge computing, which will reduce the impact of insufficient computing power to a certain extent, but will also introduce communication delays;

(3)延时，能耗，准确率三者相互制约，一味提升其中一方的性能必然会导致另外两者性能下降。(3) Delay, energy consumption, and accuracy restrict each other, and blindly improving the performance of one of them will inevitably lead to a decline in the performance of the other two.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服现有技术的不足，本发明提供了一种终端边缘联合资源优化配置的方法及系统，基于该优化方法同时降低系统的平均推理延时和移动设备的平均能耗，并提高模型推理的准确率。The purpose of the present invention is to overcome the deficiencies of the prior art. The present invention provides a method and system for optimal configuration of terminal edge joint resources. Based on the optimization method, the average inference delay of the system and the average energy consumption of the mobile device are simultaneously reduced, and the Improve the accuracy of model inference.

本发明提供了一种终端边缘联合资源优化配置的方法，所述方法包括：The present invention provides a method for optimal configuration of terminal edge joint resources, the method comprising:

基于边缘控制器记录不同帧数的视频输入到神经网络时，计算神经网络所需要的乘加数以及神经网络识别的准确率；在收到各个移动设备的视频识别请求时，根据移动设备的视频识别请求完成控制任务，所述控制任务包括：When the video with different frame numbers recorded by the edge controller is input to the neural network, the multiplication and summation required by the neural network and the accuracy of the neural network recognition are calculated; when receiving the video recognition request of each mobile device, the video Identifying requests to complete control tasks, the control tasks include:

确定每个移动设备视频采样的帧数，并基于每个移动设备视频采样的帧数给移动设备所对应的视频采样管理模块发送采样帧数控制信息；Determine the frame number of each mobile device video sampling, and send sampling frame number control information to the video sampling management module corresponding to the mobile device based on the frame number of each mobile device video sampling;

确定用户卸载决策，基于用户卸载决策确定用户的推理任务是在本地DNN推理模块完成还是在边缘DNN推理模块中完成；Determine the user's unloading decision, and determine whether the user's reasoning task is completed in the local DNN reasoning module or in the edge DNN reasoning module based on the user's unloading decision;

基于各个移动视频设备通信资源的分配策略从而确定各个移动设备上行传输视频的时间比例；Based on the allocation strategy of the communication resources of each mobile video device, the time ratio of each mobile device to transmit video upstream is determined;

确定边缘DNN推理模块的资源分配策略来决定每个卸载用户的CPU计算频率。Determine the resource allocation strategy of the edge DNN inference module to determine the CPU computing frequency of each offload user.

所述方法还包括：The method also includes:

所述边缘DNN推理模块从需要卸载的边缘设备中得到上传的视频，然后根据边缘控制器分配的计算资源完成推理，并将推理结果发送至各个移动设备上。The edge DNN inference module obtains the uploaded video from the edge device to be uninstalled, then completes the inference according to the computing resources allocated by the edge controller, and sends the inference result to each mobile device.

所述方法还包括：The method also includes:

所述视频采样管理模块获取边缘计算的采样帧数控制信息，基于采样帧控制信息控制所在移动设备的采样帧数，确定用于神经网络推理的输入视频的帧数。The video sampling management module obtains the sampling frame number control information of edge computing, controls the sampling frame number of the mobile device based on the sampling frame control information, and determines the input video frame number for neural network inference.

所述方法还包括：The method also includes:

所述本地控制器从视频采样管理模块中获取视频，并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器。The local controller obtains the video from the video sampling management module, and decides whether to transmit the video to the edge server according to the user uninstallation decision obtained from the edge controller.

所述本地控制器从视频采样管理模块中获取视频，并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器包括：The local controller obtains the video from the video sampling management module, and decides whether to transmit the video to the edge server according to the user uninstallation decision obtained from the edge controller, including:

若用户卸载决策为1，则将视频传输至边缘服务器进行推理，同时传输的通信资源由基站进行配置；If the user's unloading decision is 1, the video is transmitted to the edge server for inference, and the transmitted communication resources are configured by the base station;

若用户卸载决策为0，那么允许视频在本地完成推理，并根据本地设备信息分配本地的CPU计算资源。If the user's unloading decision is 0, the video is allowed to complete inference locally, and local CPU computing resources are allocated according to local device information.

当移动设备需要在本地完成DNN推理时，本地DNN推理模块获取视频，并使用分配的本地计算资源完成DNN推理。When the mobile device needs to complete the DNN inference locally, the local DNN inference module obtains the video and uses the allocated local computing resources to complete the DNN inference.

相应的，本发明还提供了一种边端协同的视频AI推理系统，所述系统包括：Correspondingly, the present invention also provides a side-end collaborative video AI inference system, the system includes:

边缘控制器，用于记录着当不同帧数的视频输入到神经网络时，计算神经网络所需要的乘加数以及神经网络识别的准确率；在收到各个移动设备的视频识别请求时，根据移动设备的视频识别请求完成控制任务；The edge controller is used to record the multiplication and summation required by the neural network and the accuracy of the neural network recognition when videos of different frame numbers are input to the neural network; when receiving video recognition requests from various mobile devices, according to The video recognition request of the mobile device completes the control task;

边缘DNN推理模块，用于从需要卸载的边缘设备中得到上传的视频，然后根据边缘控制器分配的计算资源完成推理，并将推理结果发送至各个移动设备上；The edge DNN inference module is used to obtain the uploaded video from the edge device that needs to be uninstalled, and then complete the inference according to the computing resources allocated by the edge controller, and send the inference result to each mobile device;

视频采样管理模块，用于获取边缘计算的采样帧数控制信息，基于采样帧控制信息控制所在移动设备的采样帧数，确定用于神经网络推理的输入视频的帧数；The video sampling management module is used to obtain the sampling frame number control information of edge computing, control the sampling frame number of the mobile device based on the sampling frame control information, and determine the input video frame number for neural network inference;

本地控制器，用于从视频采样管理模块中获取视频，并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器；The local controller is used to obtain the video from the video sampling management module, and decide whether to transmit the video to the edge server according to the user uninstallation decision obtained from the edge controller;

本地DNN推理模块，用于当移动设备需要在本地完成DNN推理时，获取视频，并使用分配的本地计算资源完成DNN推理。The local DNN inference module is used to obtain video when the mobile device needs to complete DNN inference locally, and use the allocated local computing resources to complete DNN inference.

所述根据移动设备的视频识别请求完成控制任务包括：The described completion of the control task according to the video recognition request of the mobile device includes:

所述本地控制器在若用户卸载决策为1时，则将视频传输至边缘服务器进行推理，同时传输的通信资源由基站进行配置。The local controller transmits the video to the edge server for inference if the user's unloading decision is 1, and the transmitted communication resources are configured by the base station.

所述本地控制器在若用户卸载决策为0时，那么允许视频在本地完成推理，并根据本地设备信息分配本地的CPU计算资源。The local controller allows the video to complete inference locally if the user's uninstallation decision is 0, and allocates local CPU computing resources according to local device information.

本发明实施例具有的有益效果如下：The beneficial effects that the embodiment of the present invention has are as follows:

(1)本发明基于边端协同的AI推理算法卸载架构提供了一个边端协同的视频AI推理系统的架构，边缘服务器可以根据请求用户的数量，确定用户用于检测的视频帧数，并给出用户的卸载策略和用户的通信计算资源分配方案。(1) The present invention provides the architecture of a video AI inference system based on side-end collaboration based on the AI inference algorithm offloading architecture. The edge server can determine the number of video frames used by the user for detection according to the number of requesting users, and provide the The user's offloading strategy and the user's communication computing resource allocation scheme are determined.

(2)多维度的性能优化。在给出的系统架构下，联合考虑推理延时，终端能耗和识别准确率，提出了一种有效的算法，在降低延时和能耗的同时，提升神经网络的识别准确率。(2) Multi-dimensional performance optimization. Under the given system architecture, considering inference delay, terminal energy consumption and recognition accuracy jointly, an effective algorithm is proposed, which can improve the recognition accuracy of neural network while reducing delay and energy consumption.

(3)性能权衡分析。该发明给出了推理延时，终端能耗和识别准确率三者之间的权衡关系，利用该关系，可以进行针对性的系统优化，不同应用下AI推理的系统性能。(3) Performance trade-off analysis. The invention provides a trade-off relationship between inference delay, terminal energy consumption and recognition accuracy. By using this relationship, targeted system optimization can be performed, and the system performance of AI inference under different applications can be performed.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见的，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1是本发明实施例中的边端协同的视频AI推理系统结构示意图；1 is a schematic structural diagram of a video AI inference system with side-end collaboration in an embodiment of the present invention;

图2是本发明实施例中的不同卸载策略的性能比较示意图；Fig. 2 is the performance comparison schematic diagram of different unloading strategies in the embodiment of the present invention;

图3是本发明实施例中的延时、能耗、识别准确率权衡关系示意图。FIG. 3 is a schematic diagram of a trade-off relationship among delay, energy consumption, and recognition accuracy in an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明主要解决的技术问题是针对基于深度学习的视频识别任务，提出一种边端协同的视频AI推理系统的架构，并且根据该架构，提出了一种同时优化执行深度学习推理任务的延时，能耗以及准确率的方法。同时，本发明给出了延时，能耗以及准确率的权衡关系，该关系可以作为系统设计的参考。The main technical problem to be solved by the present invention is to propose an architecture of an edge-to-end collaborative video AI inference system for video recognition tasks based on deep learning, and based on the architecture, propose a simultaneous optimization of the delay for executing deep learning inference tasks , energy consumption, and method of accuracy. At the same time, the present invention provides a trade-off relationship between delay, energy consumption and accuracy, and the relationship can be used as a reference for system design.

本发明提出了一种边端协同的视频AI推理系统的架构，其功能是，在一个边缘服务器通过无线接入服务多个移动设备场景下，当这些设备都需要进行基于AI的视频识别任务时，通过合理调整进行识别的视频帧数以及联合配置无线与计算资源，达到降低神经网络的推理延时和能耗，同时提高神经网络推理准确率的目的。The present invention proposes an architecture of an edge-end collaborative video AI inference system, the function of which is that in the scenario where an edge server serves multiple mobile devices through wireless access, when these devices all need to perform AI-based video recognition tasks , by reasonably adjusting the number of video frames for identification and jointly configuring wireless and computing resources to reduce the inference delay and energy consumption of the neural network, and at the same time improve the accuracy of the neural network inference.

具体的，图1示出了本发明实施例中的边端协同的视频AI推理系统结构示意图，该系统包括：Specifically, FIG. 1 shows a schematic structural diagram of a video AI inference system with side-end collaboration in an embodiment of the present invention, where the system includes:

所述根据移动设备的视频识别请求完成控制任务包括：确定每个移动设备视频采样的帧数，并基于每个移动设备视频采样的帧数给移动设备所对应的视频采样管理模块发送采样帧数控制信息；确定用户卸载决策，基于用户卸载决策确定用户的推理任务是在本地DNN推理模块完成还是在边缘DNN推理模块中完成；基于各个移动视频设备通信资源的分配策略从而确定各个移动设备上行传输视频的时间比例；确定边缘DNN推理模块的资源分配策略来决定每个卸载用户的CPU计算频率。The described completion of the control task according to the video recognition request of the mobile device includes: determining the frame number of each mobile device video sampling, and sending the sampling frame number to the video sampling management module corresponding to the mobile device based on the frame number of each mobile device video sampling Control information; determine user offloading decision, and determine whether the user's reasoning task is completed in the local DNN inference module or in the edge DNN inference module based on the user's offloading decision; determine the uplink transmission of each mobile device based on the allocation strategy of the communication resources of each mobile video device The time scale of the video; determine the resource allocation strategy of the edge DNN inference module to determine the CPU computing frequency of each offloading user.

所述本地控制器在若用户卸载决策为1时，则将视频传输至边缘服务器进行推理，同时传输的通信资源由基站进行配置；在若用户卸载决策为0时，那么允许视频在本地完成推理，并根据本地设备信息分配本地的CPU计算资源。When the user's unloading decision is 1, the local controller transmits the video to the edge server for reasoning, and the transmitted communication resources are configured by the base station; when the user's unloading decision is 0, the video is allowed to complete the reasoning locally. , and allocate local CPU computing resources according to local device information.

基于图1所示的系统结构，本发明实施例中的终端边缘联合资源优化配置的方法，所述方法包括：基于边缘控制器记录不同帧数的视频输入到神经网络时，计算神经网络所需要的乘加数以及神经网络识别的准确率；在收到各个移动设备的视频识别请求时，根据移动设备的视频识别请求完成控制任务，所述控制任务包括：确定每个移动设备视频采样的帧数，并基于每个移动设备视频采样的帧数给移动设备所对应的视频采样管理模块发送采样帧数控制信息；确定用户卸载决策，基于用户卸载决策确定用户的推理任务是在本地DNN推理模块完成还是在边缘DNN推理模块中完成；基于各个移动视频设备通信资源的分配策略从而确定各个移动设备上行传输视频的时间比例；确定边缘DNN推理模块的资源分配策略来决定每个卸载用户的CPU计算频率。Based on the system structure shown in FIG. 1 , the method for optimizing the configuration of terminal edge joint resources in an embodiment of the present invention includes: when the edge controller records videos with different frame numbers and inputs them to the neural network, calculating the required amount of the neural network. When receiving the video recognition request of each mobile device, complete the control task according to the video recognition request of the mobile device, and the control task includes: determining the frame of each mobile device video sampling and send sampling frame number control information to the video sampling management module corresponding to the mobile device based on the frame number of the video sampling of each mobile device; determine the user uninstallation decision, and determine the user's inference task based on the user uninstallation decision is in the local DNN inference module Whether it is completed or completed in the edge DNN inference module; based on the allocation strategy of the communication resources of each mobile video device to determine the time proportion of each mobile device uplink video transmission; determine the resource allocation strategy of the edge DNN inference module to determine the CPU computing of each offloading user frequency.

进一步的，所述边缘DNN推理模块从需要卸载的边缘设备中得到上传的视频，然后根据边缘控制器分配的计算资源完成推理，并将推理结果发送至各个移动设备上。Further, the edge DNN inference module obtains the uploaded video from the edge device to be uninstalled, then completes the inference according to the computing resources allocated by the edge controller, and sends the inference result to each mobile device.

进一步的，所述视频采样管理模块获取边缘计算的采样帧数控制信息，基于采样帧控制信息控制所在移动设备的采样帧数，确定用于神经网络推理的输入视频的帧数。Further, the video sampling management module obtains the sampling frame number control information of edge computing, controls the sampling frame number of the mobile device based on the sampling frame control information, and determines the input video frame number for neural network inference.

进一步的，所述本地控制器从视频采样管理模块中获取视频，并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器。Further, the local controller obtains the video from the video sampling management module, and decides whether to transmit the video to the edge server according to the user uninstallation decision obtained from the edge controller.

进一步的，所述本地控制器从视频采样管理模块中获取视频，并根据从边缘控制器获取的用户卸载决策决定是否将视频传输至边缘服务器包括：若用户卸载决策为1，则将视频传输至边缘服务器进行推理，同时传输的通信资源由基站进行配置；若用户卸载决策为0，那么允许视频在本地完成推理，并根据本地设备信息分配本地的CPU计算资源。Further, the local controller obtains the video from the video sampling management module, and decides whether to transmit the video to the edge server according to the user unloading decision obtained from the edge controller. The edge server performs inference, and the communication resources transmitted at the same time are configured by the base station; if the user's unloading decision is 0, the video is allowed to complete inference locally, and local CPU computing resources are allocated according to local device information.

进一步的，当移动设备需要在本地完成DNN推理时，本地DNN推理模块获取视频，并使用分配的本地计算资源完成DNN推理。Further, when the mobile device needs to complete the DNN inference locally, the local DNN inference module obtains the video, and uses the allocated local computing resources to complete the DNN inference.

需要说明的是，根据上述系统架构，本发明提出了对应的优化方案，同时降低系统的平均推理延时和移动设备的平均能耗，并提高模型推理的准确率，具体算法如下：It should be noted that, according to the above system architecture, the present invention proposes a corresponding optimization scheme, which simultaneously reduces the average inference delay of the system and the average energy consumption of the mobile device, and improves the accuracy of model inference. The specific algorithm is as follows:

首先，利用神经网络的乘加数模型得到完成神经网络模型推理需要的乘加数C(M_n)，考虑到对于每一层神经网络，完成推理需要的乘加数和输入大小成正比，因此经过推导，总的计算乘加数可以粗略地由输入视频帧数的线性函数表示，记为：First, the multiplier-adder C(M _n ) required to complete the inference of the neural network model is obtained by using the multiplier-adder model of the neural network. Considering that for each layer of the neural network, the multiplier-adder required to complete the inference is proportional to the input size, so After derivation, the total calculated multiplication and addend can be roughly represented by a linear function of the number of input video frames, denoted as:

C(M_n)＝m_c，0M_n+m_c，1 C(M _n )=m _{c, 0} M _n +m _{c, 1}

其中m_c，0和m_c，1是通过拟合得到的参数，由模型的架构决定，M_n为采样帧数控制信息，然后根据乘加数给出神经网络的计算延时D_n和能耗E_n的表达式：Among them, m _{c, 0} and m _{c, 1} are parameters obtained by fitting, which are determined by the architecture of the model. M _n is the sampling frame number control information, and then the calculation delay D _n and energy of the neural network are given according to the multiplication and addend. The expression that consumes _En :

其中ρ表示每一个乘加运算需要的CPU转数，d表示每一帧视频的大小，R_n表示第n个移动设备的通信速率，t_n代表第n个移动设备的通信时间比例(即通信资源分配比例)，κ表示能耗系数，p_n表示移动设备的发射功率，x_n表示用户卸载决策，

表示分配给本地的CPU计算资源，

表示根据边缘控制器分配的计算资源，x_n代表是否卸载到边缘计算(如果x_n＝1则代表卸载，否则就在本地计算)。where ρ represents the number of CPU revolutions required for each multiplication and addition operation, d represents the size of each frame of video, R _n represents the communication rate of the nth mobile device, and _tn represents the communication time ratio of the nth mobile device (that is, the communication resource allocation ratio), κ represents the energy consumption coefficient, p _n represents the transmit power of the mobile device, x _n represents the user’s unloading decision,

Indicates the CPU computing resources allocated to the local,

Indicates the computing resources allocated according to the edge controller, and x _n represents whether to offload to edge computing (if x _n =1, it represents offloading, otherwise, it is calculated locally).

对于准确率，输入视频的帧数越多，模型预测的准确率越高。而随着视频输入帧数的增加，模型预测准确率的增益会逐渐下降。因此，准确率Φ(M_n)可以表示成如下函数：For accuracy, the more frames of the input video, the higher the accuracy of the model prediction. As the number of video input frames increases, the gain of the model prediction accuracy will gradually decrease. Therefore, the accuracy rate Φ(M _n ) can be expressed as the following function:

其中m_a，0，m_a，1以及m_a，2是通过拟合得到的参数，数值由神经网络的模型和任务决定。where m _{a, 0} , m _{a, 1} and m _{a, 2} are parameters obtained by fitting, and the values are determined by the model and task of the neural network.

综上所述，优化的目标函数为：In summary, the optimized objective function is:

其中：β₁，β₂，β₃分别为权重系数，优化目标是降低总延时D_n和总能耗E_n，并提高用户识别准确率Φ(M_n)，约束条件为①用户帧数符合给定的范围，②参与卸载的设备其通信时间比例之和小于1，③边缘设备的分配计算频率之和小于其上限，④分配的通信时间以及边缘计算频率需要大于0，⑤移动设备的计算频率分配为大于0小于

的值，⑥x_n为卸载或者不卸载。Among them: β ₁ , β ₂ , β ₃ are the weight coefficients respectively, and the optimization goal is to reduce the total delay D _n and the total energy consumption E _n , and improve the user identification accuracy Φ(M _n ), the constraint condition is ① the number of user frames In line with the given range, ② the sum of the proportion of the communication time of the devices participating in offloading is less than 1, ③ the sum of the allocated computing frequencies of the edge devices is less than the upper limit, ④ the allocated communication time and the edge computing frequency need to be greater than 0, ⑤ the mobile device’s The calculation frequency is assigned to be greater than 0 and less than

, ⑥x _n means uninstall or not.

为了求解该优化问题，设定卸载策略x_n是给定的，然后把问题分解为2个子问题，分别是在本地完成推理的移动设备的资源优化问题和在边缘完成推理的移动设备的资源优化问题。设在本地完成推理的用户集合为N₀，在边缘完成推理的用户集合为N₁，因此该优化问题转化为两个子优化问题。In order to solve the optimization problem, the offloading strategy x _n is given, and then the problem is decomposed into two sub-problems, namely, the resource optimization problem of mobile devices that complete inference locally and the resource optimization problem of mobile devices that complete inference at the edge. question. Assume that the set of users who complete inference locally is N ₀ , and the set of users who complete inference at the edge is N ₁ , so the optimization problem is transformed into two sub-optimization problems.

对于N₀的资源优化问题，其问题表述为：For the resource optimization problem of N ₀ , the problem is expressed as:

其中

勾在本地计算时的代价函数。需要满足的约束条件为①用户帧数符合给定的范围，②移动设备的计算频率分配为大于0小于

的值in

Check the cost function when calculating locally. The constraints that need to be met are ① the number of user frames conforms to the given range, ② the calculation frequency of the mobile device is allocated to be greater than 0 and less than

the value of

通过推导，可以得到求解该子问题的闭式表达式：By derivation, the closed-form expression for solving this subproblem can be obtained:

对于N₁的资源优化问题，其问题表述为：For the resource optimization problem of N ₁ , the problem is formulated as:

其中

勾卸载到边缘计算时的代价函数。需要满足的约束条件为①用户帧数符合给定的范围，②参与卸载的设备其通信时间比例之和小于1，③边缘设备的分配计算频率之和小于其上限，④分配的通信时间以及边缘计算频率需要大于0。in

Check the cost function when offloading to edge computing. The constraints that need to be met are: ① the number of user frames conforms to the given range, ② the sum of the communication time ratios of the devices participating in the offloading is less than 1, ③ the sum of the allocated calculation frequencies of the edge devices is less than the upper limit, ④ the allocated communication time and the edge The calculation frequency needs to be greater than 0.

通过推导，可以得到t_n，

与M_n的关系表达式为：By derivation, t _n can be obtained,

The relational expression with _Mn is:

然后通过凸优化的方法可以求解该问题。Then the problem can be solved by the method of convex optimization.

对于卸载策略x_n的求解问题，本发明实施例基于贪婪的迭代算法来实现。可以观察到在本地执行推理时，成本函数

和优化变量M_n，

只依赖于设备自身的参数，不受其他设备参数的影响。但是，对于边集

代价函数与集合

中的设备数量和参数有关。下面介绍基于算法的原理。首先，计算每个设备的任务在本地执行时集合

的成本函数

其次，舍得所有设备都卸载到边缘服务器进行推理并且

在每次迭代中，得到

的每个设备对应的代价函数

比较成本函数

和代价函数

在

集合中的设备，可以得到

和

之间的差值，并选择差值最大的设备记为设备y。尝试将集合

中的设备y放入集合

并计算新集合的成本。如果新集合的总成本降低，则继续下一次迭代。否则，将设备y放回集合

算法结束。For the problem of solving the unloading strategy x _n , the embodiments of the present invention are implemented based on a greedy iterative algorithm. It can be observed that when performing inference locally, the cost function

and the optimization variable M _n ,

It only depends on the parameters of the device itself and is not affected by the parameters of other devices. However, for edge sets

Cost Functions and Sets

The number of devices in is related to the parameters. The algorithm-based principle is described below. First, compute the set of tasks for each device when executed locally

cost function of

Second, it is worthwhile to offload all devices to edge servers for inference and

At each iteration, we get

The cost function corresponding to each device of

Compare cost functions

and cost function

exist

The devices in the collection can be obtained

and

The difference between and select the device with the largest difference as device y. try to set

put the device y in the collection

and calculate the cost of the new set. If the total cost of the new set decreases, proceed to the next iteration. Otherwise, put device y back in the collection

The algorithm ends.

将下行带宽设置为5Mhz，路径损耗建模为PL＝128.1+37.6log₁₀(D)，其中D是设备与无线接入点之间的距离，单位为公里。设备随机分布在[500m 500m]范围内。MEC服务器和设备的计算资源分别设置为1.8GHz和22GHz。识别精度要求和最大输入视频帧数设置为

系数κ＝10²⁸，由对应的设备确定。输入视频的大小为112*112*M_n。此外，计算复杂度系数设置为ρ＝12这是在通过多次实验获得的。权重β₁，β₂，β₃分别设置为0.2，0.2，0.6。Setting the downlink bandwidth to 5Mhz, the path loss is modeled as PL=128.1+37.6log ₁₀ (D), where D is the distance between the device and the AP in kilometers. Devices are randomly distributed in the [500m 500m] range. The computing resources of the MEC server and device are set to 1.8GHz and 22GHz, respectively. The recognition accuracy requirements and the maximum number of input video frames are set to

The coefficient κ=10 ²⁸ is determined by the corresponding equipment. The size of the input video is 112*112* _Mn . Furthermore, the computational complexity factor is set to ρ=12 which is obtained through multiple experiments. The weights β ₁ , β ₂ , and β ₃ are set to 0.2, 0.2, and 0.6, respectively.

将提出的卸载方案与本地推理方案(Local)、边缘推理方案(Edge)、随机卸载方案(Random)进行比较，实验结果如图2所示。当设备数量小于10时，仅在边缘执行任务的方案的成本几乎等于提出卸载方案的成本。这是因为当设备数量较少时，所有设备都可以从边缘服务器上执行推理中受益。如果推理任务只在本地执行，设备的平均成本不会改变，因为设备之间的本地资源互不影响。Edge方案的曲线是线性的，因为实验中所有用户的AI模型都是相同的。The proposed offloading scheme is compared with the local reasoning scheme (Local), the edge reasoning scheme (Edge), and the random offloading scheme (Random). The experimental results are shown in Figure 2. When the number of devices is less than 10, the cost of only executing tasks at the edge is almost equal to the cost of proposing an offloading solution. This is because when the number of devices is small, all devices can benefit from performing inference on edge servers. If the inference task is only performed locally, the average cost of the device does not change because the local resources between devices do not affect each other. The curve of the Edge scheme is linear because the AI model is the same for all users in the experiment.

本实验使用不同的权重β₁，β₂，β₃来分析平均延迟、能耗和准确率之间的权衡关系。权衡曲面的性能是通过提出的卸载和分配方案获得的，约束是β₁+β₂+β₃＝1。如图3所示，延迟、能耗和准确性是相互限制，三者相互权衡。当延迟恒定时，更高的识别精度需要更高的能耗。从另一个角度来说，为了提高准确率，需要牺牲延迟和能耗的性能。此外，具有相同的精度时，更高的能耗会使设备更倾向于在本地执行推理任务，从而使得延迟降低。This experiment uses different weights β ₁ , β ₂ , β ₃ to analyze the trade-off relationship between average delay, energy consumption and accuracy. The performance of the trade-off surface is obtained by the proposed unloading and allocation scheme with the constraint β ₁ +β ₂ +β ₃ =1. As shown in Figure 3, latency, energy consumption and accuracy are mutually limited, and the three are trade-offs. When the delay is constant, higher recognition accuracy requires higher energy consumption. From another point of view, in order to improve the accuracy, it is necessary to sacrifice the performance of latency and energy consumption. Additionally, with the same accuracy, higher power consumption makes the device more inclined to perform inference tasks locally, resulting in lower latency.

(3)性能权衡分析。该发明给出了推理延时，终端能耗和识别准确率三者之间的权衡关系，利用该关系，可以进行针对性的系统优化，提升不同应用下AI推理的系统性能。(3) Performance trade-off analysis. The invention provides a trade-off relationship between inference delay, terminal energy consumption and recognition accuracy. By using this relationship, targeted system optimization can be performed to improve the system performance of AI inference in different applications.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序可以存储于一计算机可读存储介质中，存储介质可以包括：只读存储器(ROM，ReadOnly Memory)、随机存取存储器(RAM，Random AccessMemory)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the storage medium can include: Read only memory (ROM, ReadOnly Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

另外，以上对本发明实施例进行了详细介绍，本文中应采用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。In addition, the above embodiments of the present invention have been introduced in detail, and specific examples should be used in this paper to illustrate the principles and implementations of the present invention, and the descriptions of the above embodiments are only used to help understand the method of the present invention and its core idea; Meanwhile, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific embodiments and application scope. In summary, the contents of this specification should not be construed as limiting the present invention.

Claims

1. A method for optimal configuration of joint resources at an edge of a terminal, the method comprising:

When videos with different frame numbers are recorded by the edge controller and input to the neural network, calculating the multiply-add number required by the neural network and the accuracy rate of neural network identification; when receiving a video identification request of each mobile device, completing a control task according to the video identification request of the mobile device, wherein the control task comprises the following steps:

determining the frame number of video samples of each mobile device, and sending the control information of the frame number of the samples to a video sample management module corresponding to the mobile device based on the frame number of the video samples of each mobile device;

determining a user offload decision, and determining whether the inference task of the user is completed in the local DNN inference module or the edge DNN inference module based on the user offload decision;

determining the time proportion of uplink video transmission of each mobile device based on the allocation strategy of the communication resources of each mobile video device;

and determining a resource allocation strategy of the edge DNN inference module to decide the CPU calculation frequency of each uninstalled user.

2. The method for optimal configuration of terminal edge joint resources according to claim 1, wherein the method further comprises:

the edge DNN reasoning module obtains the uploaded video from the edge equipment needing unloading, then completes reasoning according to the computing resources distributed by the edge controller, and sends a reasoning result to each piece of mobile equipment.

3. The method for optimal configuration of joint resources at an edge of a terminal as claimed in claim 1, wherein the method further comprises:

the video sampling management module acquires the control information of the sampling frame number calculated at the edge, controls the sampling frame number of the mobile equipment based on the control information of the sampling frame, and determines the frame number of the input video for neural network inference.

4. The method for optimal configuration of joint resources at an edge of a terminal as claimed in claim 1, wherein the method further comprises:

and the local controller acquires the video from the video sampling management module and determines whether to transmit the video to the edge server according to the user unloading decision acquired from the edge controller.

5. The method of claim 4, wherein the local controller obtaining video from a video sampling management module and deciding whether to transmit the video to the edge server based on the user offload decision obtained from the edge controller comprises:

if the user unloading decision is 1, transmitting the video to an edge server for reasoning, and meanwhile, configuring a transmitted communication resource by a base station;

and if the user unloading decision is 0, allowing the video to locally finish reasoning, and distributing local CPU computing resources according to the local equipment information.

6. The method for optimized configuration of terminal-edge federated resources of claim 1, wherein when the mobile device needs to complete DNN inference locally, the local DNN inference module obtains video and completes DNN inference using the allocated local computing resources.

7. An edge-side collaborative video AI inference system, the system comprising:

the edge controller is used for recording the multiplication and addition number required by the neural network and the accuracy rate of neural network identification when videos with different frame numbers are input to the neural network; when receiving the video identification request of each mobile device, completing a control task according to the video identification request of the mobile device;

the edge DNN reasoning module is used for obtaining the uploaded video from the edge equipment needing unloading, finishing reasoning according to the computing resources distributed by the edge controller and sending a reasoning result to each piece of mobile equipment;

the video sampling management module is used for acquiring the control information of the sampling frame number of the edge calculation, controlling the sampling frame number of the mobile equipment based on the control information of the sampling frame, and determining the frame number of the input video for neural network inference;

the local controller is used for acquiring the video from the video sampling management module and determining whether to transmit the video to the edge server according to the user unloading decision acquired from the edge controller;

And the local DNN reasoning module is used for acquiring the video when the mobile equipment needs to locally finish DNN reasoning and finishing DNN reasoning by using the distributed local computing resources.

8. The frontend coordinated video AI inference system of claim 7 wherein said completing a control task based on a video identification request of a mobile device comprises:

determining the time proportion of uplink video transmission of each mobile device based on the allocation strategy of the communication resource of each mobile video device;

and determining a resource allocation strategy of the edge DNN reasoning module to decide the CPU calculation frequency of each uninstalling user.

9. The frontend collaborative video AI inference system of claim 8, wherein the local controller transmits video to an edge server for inference if a user offload decision is 1, with the transmitted communication resources configured by a base station.

10. The frontend collaborative video AI inference system according to claim 8, wherein the local controller allows videos to complete inference locally if a user offload decision is 0, and allocates local CPU computational resources based on local device information.