CN112686374A - Deep neural network model collaborative reasoning method based on adaptive load distribution - Google Patents

Deep neural network model collaborative reasoning method based on adaptive load distribution Download PDF

Info

Publication number
CN112686374A
CN112686374A CN202011638571.XA CN202011638571A CN112686374A CN 112686374 A CN112686374 A CN 112686374A CN 202011638571 A CN202011638571 A CN 202011638571A CN 112686374 A CN112686374 A CN 112686374A
Authority
CN
China
Prior art keywords
neural network
deep neural
network model
reasoning
load distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011638571.XA
Other languages
Chinese (zh)
Inventor
陈旭
曾烈康
周知
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202011638571.XA priority Critical patent/CN112686374A/en
Publication of CN112686374A publication Critical patent/CN112686374A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a deep neural network model collaborative reasoning method based on self-adaptive load distribution. By modeling the execution process of the deep neural network model and the influence of dynamic change of communication bandwidth on distributed computation of the multi-terminal equipment, the invention constructs an optimization problem comprehensively considering inference time delay and system energy consumption, and provides an integer linear programming-based inference load self-adaptive distribution algorithm of the deep neural network model. Compared with the traditional deep neural network model local inference and the most advanced collaborative computing scheme at present, the method not only provides a brand-new collaborative inference paradigm on multi-terminal equipment, but also can comprehensively consider the heterogeneity and network dynamics of computing resources under the condition of given execution delay required by intelligent application, and realizes the deep neural network collaborative inference of minimizing energy consumption overhead.

Description

Deep neural network model collaborative reasoning method based on adaptive load distribution
Technical Field
The invention relates to the technical field of deep learning, edge calculation and distributed calculation, in particular to a deep neural network model collaborative reasoning method based on adaptive load distribution.
Background
Various intelligent services based on deep learning technology have been developed greatly in recent years, and are now deeply integrated into the daily life of people. For example, in an intelligent home scene widely deployed today, when a certain intelligent camera captures a face image, deep learning inference calculation can be initiated immediately to recognize face information. However, for economic and space utilization reasons, the terminal devices deployed by users in smart homes are usually small terminal devices with limited computing power, such as smart speakers, smart gateways, small home hosts, and so on. When processing deep learning reasoning tasks with strong demands on computing resources, it is often difficult for such terminal devices to meet the user's demands in terms of real-time, security, and environmental friendliness.
For the pain point, the traditional solution generally sends the sensing data and the calculation request of the intelligent home terminal device to the cloud end by means of a cloud computing paradigm, and the inference result is rapidly calculated by means of the powerful computing capability of a data center server. However, this approach can present a number of potential problems: on one hand, the performance of the traditional cloud computing scheme is limited by unstable remote communication between a user and a cloud end, and the deep neural network reasoning result of the cloud end is difficult to ensure to return to the terminal equipment within the time delay range acceptable by the user; on the other hand, data possibly containing user private activity information is sent to a cloud server owned by a business company, so that the user is inevitably worried about privacy disclosure.
Disclosure of Invention
The invention provides a deep neural network model cooperative reasoning method based on adaptive load distribution to overcome at least one defect in the prior art, and realizes low-delay and high-energy-efficiency cooperative reasoning of a deep learning model on multi-terminal equipment.
In order to solve the technical problems, the invention adopts the technical scheme that: a deep neural network model collaborative reasoning method based on adaptive load distribution comprises the following steps:
s1, deep neural network model installation: each terminal device installs the trained deep neural network model to a local computing environment before the cooperative reasoning system operates, and records corresponding configuration parameters according to the structural design of the deep neural network model; the method comprises the steps that a main device collects deep neural network model configuration parameter information of all available terminal devices, wherein the main device is a device for receiving input;
s2, acquiring computing capability information of the terminal equipment: before the cooperative reasoning system operates, each terminal device utilizes a local historical input sample to execute a deep neural network model reasoning task in an off-line mode, records related operation data and estimates local computing capability parameters; the method comprises the steps that a main device collects calculation capacity parameter information of all available terminal devices;
s3, a deep neural network model collaborative reasoning modeling step: the main equipment collects network bandwidth information of the terminal equipment and time delay requirement parameters defined by a user when the collaborative reasoning system runs, and performs performance optimization problem modeling on the collaborative reasoning process by combining the deep neural network model configuration parameters collected in the steps S1 and S2 and the computing capability parameters of each terminal equipment; wherein the performance optimization problem is to minimize the energy consumption overhead of the system while meeting user-defined latency requirements;
s4, deep neural network model collaborative reasoning load distribution: the main equipment converts the optimization problem modeled in the step S3 into a plurality of linear programming subproblems, and dynamically adjusts the collaborative reasoning load distribution aiming at the current input through a self-adaptive load distribution algorithm, so that the optimization of the system energy consumption expense under the condition of meeting the time delay requirement defined by a user is realized;
s5, performing deep neural network model collaborative reasoning: after receiving the input picture, the main device segments the input picture according to the deep neural network model collaborative reasoning load distribution scheme generated in step S4, and then sends each load segment to the corresponding terminal device; after each terminal device receives the current input load blocks, cooperatively executing a deep neural network model reasoning task to finally obtain a cooperative reasoning result.
Further, the step S1 specifically includes:
s11, defining topological structure parameters of a deep neural network model: defining a certain layer l of the deep neural network model to represent a convolution calculation operation or a full-connection layer calculation operation in deep neural network model estimation; given a deep neural network model, defining L ═ 1,2, …, L ] to represent successive L-layer computations of the model;
s12, defining calculation configuration parameters of each layer of the deep neural network model: defining a set of configuration parameters expressing a layer I computation<k,cin,cout,s,p>lWhere k denotes the convolution kernel size, cinIndicates the number of input channels, coutRepresenting the number of output channels, s representing the convolution step, and p representing the convolution supplemental data size; wherein, one convolution calculation operation or one full link layer calculation operation in step S11 can be expressed by the configuration parameter set;
s13, collecting deep neural network model configuration parameters: according to the design of the deep neural network model structure, recording the model topological structure parameters L of S11 as [1,2, …, L ═ L]And the respective layer configuration parameter group described in S12<k,cin,cout,s,p>l
S14, installing a deep neural network model in a local computing environment: and each terminal device downloads and stores the trained deep neural network model file to the local, and loads parameters required by deep neural network model inference to a memory before the cooperative inference system operates, so that the deep neural network model inference task can be immediately executed when input arrives.
Further, the step S2 specifically includes:
s21, defining load distribution parameters of the terminal equipment: definition N ═ 1,2, …, N]Representing an available equipment group containing N terminal equipment participating in the collaborative reasoning task; for any device i, defining the size of the calculation load received by the terminal device i as aiLet pi be [ a ]1,a2,…,aN]Representing a collaborative inference load distribution scheme based on the available terminal equipment group N; given the current layer l of the deep neural network of step S11, calculating the load size a based on the current layer liDefining the amount of calculation load data as rli,rliBy calculating the original input picture size as aiObtaining a data volume of the portion of (1);
s22, defining the calculation capacity parameters of the terminal equipment: defining a set of computing power parameters expressing an ith terminal device<ρ,f,m,Pc,Px>iWherein, in the step (A),ρ represents the number of CPU revolutions required per 1KB of input data processed, f represents the CPU frequency, m represents the available memory space, PcRepresenting the calculated power, PxRepresents a transmission power; the available memory space m represents the memory space available for the collaborative reasoning system except the basic system bottom service;
s23, collecting and calculating capacity parameters by each terminal device: before the cooperative reasoning system operates, each terminal device executes the deep neural network reasoning task in the local computing environment by using the historical input sample in an off-line manner, and records the computing capability parameter group in the step S22<ρ,f,m,Pc,Px>i(ii) a The CPU frequency f can be obtained through a terminal device specification, the number of CPU revolutions required by executing a deep neural network inference task once can be obtained by multiplying the CPU frequency f by the execution time delay of a single inference task, the number of CPU revolutions rho required by processing input data of 1KB or more can be obtained by dividing the input data by the CPU frequency f, and the power P is calculatedcAnd a transmission power PxThe method can be obtained by respectively removing the calculation energy consumption and the transmission energy consumption through the execution time delay of a single reasoning task;
s24, collecting all available terminal equipment calculation capacity parameters: the master device collects the computing power parameter sets of all available terminal devices<ρ,f,m,Pc,Px>iThe available device group N ═ 1,2, …, N according to the available terminal device number record S21]And the cooperative inference load distribution scheme pi ═ a1,a2,…,aN]And (4) random initialization.
Further, the step S3 specifically includes:
s31, defining a collaborative reasoning time delay requirement parameter: defining an inference delay requirement parameter D by a user or a service provider; the parameter D represents that the cooperative reasoning task needs to be executed and completed within the time delay requirement D;
s32, defining communication bandwidth parameters of each terminal device: defining communication bandwidth parameter b of each terminal equipmenti,jWhere i and j represent any two device indices in the available device group N, i.e., i, j ∈ N; bi,iRepresents any device internal communication bandwidth;
s33. collaborative inference task constraintsModeling: for the load a assigned to each terminal deviceiThe following constraints need to be satisfied:
Figure BDA0002879289420000048
ai≥0,ai∈Z,i∈N, (2)
Figure BDA0002879289420000041
wherein Z represents an integer set, H represents the height of the input picture,
Figure BDA0002879289420000042
representation is based on condition ai>0, which is defined as follows:
Figure BDA0002879289420000043
wherein, formula (1) shows that the load size distributed on each terminal device should be smaller than the convolution supplementary data size on the neighbor device or just equal to 0; formula (2) indicates that the size of the load distributed on each terminal device should be a non-negative integer; equation (3) represents that the sum of the sizes of the loads distributed on each terminal device should be exactly equal to the height of the complete input picture;
quantity r of calculation load data for each deep neural network layer on each terminal deviceliThe following constraints need to be satisfied:
rli≤mi,i∈N,l∈L, (4)
formula (4) represents the calculated load data amount r of each deep neural network layer on each terminal deviceliThe available memory space capacity limit needs to be met;
s34, collaborative reasoning task execution performance modeling: for single-layer deep neural network model reasoning, the calculation time delay and the calculation energy consumption are modeled as follows:
Figure BDA0002879289420000044
Figure BDA0002879289420000045
wherein the content of the first and second substances,
Figure BDA0002879289420000046
and
Figure BDA0002879289420000047
respectively representing the calculation time delay and the calculation energy consumption of the terminal device i in the first layer deep neural network model;
for the single-layer deep neural network model inference, the communication delay and the communication energy consumption are modeled as follows:
Figure BDA0002879289420000051
Figure BDA0002879289420000052
wherein the content of the first and second substances,
Figure BDA0002879289420000053
and
Figure BDA0002879289420000054
respectively representing the calculation time delay and the calculation energy consumption of the terminal device i in the first layer deep neural network model;
for the multilayer deep neural network model inference, the total execution delay and the execution energy consumption are modeled as follows:
Figure BDA0002879289420000055
Figure BDA0002879289420000056
Figure BDA0002879289420000057
wherein E iscAnd ExRespectively representing the calculation energy consumption and the communication energy consumption of the collaborative reasoning system in the process of executing the collaborative reasoning, and T representing the total time delay of the collaborative reasoning system in the process of executing the collaborative reasoning;
s35, modeling of a collaborative reasoning performance optimization problem: according to the modeling of S33 and S34, the system collaborative reasoning performance optimization problem can be expressed as minimizing the total energy consumption overhead of the system under the condition that the time delay requirement D defined by a user is met; formalized expression the P1 problem as follows:
P1:min.Ec+Ex
s.t.T≤D,
Figure BDA0002879289420000058
ai≥0,ai∈Z,i∈n,
Figure BDA0002879289420000059
rli≤mi,i∈n,l∈L。
further, the step S4 specifically includes:
s41, converting a collaborative reasoning performance optimization problem: the problem P1 belongs to an integer linear programming problem, is an NP-class difficult problem, and is difficult to efficiently solve an optimal solution in polynomial time; therefore, the problem P1 is approximated to a linear programming problem, and a self-adaptive load distribution algorithm is provided to realize the approximate optimal solution in polynomial time;
defining a continuous variable lambdaiApproximate expression aiWherein λ isiAnd aiThe relationship of (1) is:
ai=λiH, (12)
then, the following equations (1), (2) and (3) can be obtained:
Figure BDA0002879289420000061
λi≥0,i∈N, (14)
Figure BDA0002879289420000062
(13) formula is equivalent to λiH≥pi+1Or λ i0; relaxing the constraint expressed by the formula (13) to be simplified into lambdai≧ 0, the following problem P2 can be obtained:
P2:min.Ec+Ex
s.t.T≤D,
λi≥0,i∈N,
Figure BDA0002879289420000063
rli≤mi,i∈N,l∈L;
the problem P2 is a linear programming problem, a class of widely studied problems, with many mature polynomial time-complexity efficient solution algorithms; by iteratively modifying the constraint range of the problem P2 and solving, we can gradually approach the optimal solution of the problem P1;
s42, solving a collaborative reasoning performance optimization problem: giving an initial available equipment set N, substituting the initial available equipment set N into a problem P2, and solving by using an existing linear programming solving tool to obtain a load distribution scheme pi ═ lambda [ lambda ]12,…,λN](ii) a Substituting pi into P1 verifies the feasibility of the solution: if pi is a feasible solution of P1, returning pi as a load distribution scheme aiming at the current input, otherwise, removing a zero element in pi and a device index corresponding to the minimum element except the zero element from N to obtain a new available device set N, and substituting N into the problem P2 to solve; repeat the above-describedThe steps are carried out until pi obtained by solving is a feasible solution, or the input available equipment set N is an empty set; the input available device set N is an empty set, which means that the currently defined execution latency requirement is too strict to achieve an effective and feasible load distribution scheme.
Further, the linear programming solver tool comprises the IBM CPLEX toolkit.
Further, the step S5 specifically includes:
s51, collaborative inference load distribution deployment: defining equipment for receiving input as main equipment and equipment participating in cooperative reasoning as auxiliary equipment; when the main device receives the input picture, operating the collaborative inference performance optimization problem solving algorithm in S42 to obtain the optimal load distribution result aiming at the current input; the method comprises the steps that a main device divides an input picture along a height edge according to a load distribution proportion to obtain multiple input blocks; sending each block data to corresponding terminal equipment according to the equipment index corresponding to the load distribution result;
s52, collaborative reasoning execution: and after the main equipment transmits the input load blocks to each terminal equipment, all the terminal equipment starts the deep neural network model cooperative reasoning calculation by using the deep learning framework. Each terminal device utilizes a remote procedure calling tool to communicate and exchange intermediate calculation data required in the calculation process.
Further, the terminal device includes a main device and an auxiliary device.
Further, the deep learning framework comprises TensorFlow.
Further, the remote procedure call tool includes a gRPC.
Compared with the prior art, the beneficial effects are:
1. the invention realizes the low-delay and high-energy-efficiency cooperative reasoning of the deep learning model on the multi-terminal equipment by means of the computing resources of the multi-terminal equipment under the edge computing environment and through self-adaptive load distribution scheduling;
2. compared with the traditional cloud computing paradigm, the method and the device have the advantages that from the perspective of edge computing, processing and computing of user data are kept in terminal equipment owned by a user in an edge computing environment, so that not only is higher processing speed realized, but also privacy of the user is protected;
3. compared with the traditional edge end deep neural network collaborative reasoning load distribution algorithm, the method comprehensively considers time delay and energy consumption optimization, realizes the minimization of energy consumption overhead under the requirement of time delay given by a user, and has stronger practicability and economic value;
4. compared with the traditional edge end deep neural network collaborative reasoning load distribution algorithm, the method comprehensively considers the heterogeneity of computing resources and the dynamic property of network communication resources, provides an efficient optimization algorithm of polynomial time complexity, and has better adaptability and reliability aiming at the resource characteristics of the edge computing environment.
Drawings
FIG. 1 is a block diagram of an adaptive load distribution algorithm suitable for deep neural network model cooperative reasoning disclosed by the invention.
FIG. 2 is a schematic diagram of the execution flow of the deep neural network model collaborative inference based on load distribution disclosed by the invention.
Fig. 3 is a schematic diagram of a time delay result in an AlexNet model collaborative inference image classification task experiment under the same condition by different methods in the embodiment of the present invention.
FIG. 4 is a schematic diagram of energy consumption results in an AlexNet model collaborative reasoning image classification task experiment under the same conditions by different methods in the embodiment of the present invention.
Fig. 5 is a schematic diagram of a time delay result in an AlexNet model collaborative inference image classification task experiment under different time delay requirements of the same available equipment group by different methods in the embodiment of the present invention.
FIG. 6 is a schematic diagram of a time delay and energy consumption result in an AlexNet model collaborative reasoning image classification task experiment under the same time delay requirement of different available equipment quantities by the method in the embodiment of the present invention.
Detailed Description
The drawings are for illustration purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
The invention discloses a deep neural network model collaborative reasoning method based on self-adaptive load distribution. By modeling the execution process of the deep neural network model and the influence of dynamic change of communication bandwidth on distributed computation of the multi-terminal equipment, the invention constructs an optimization problem comprehensively considering inference time delay and system energy consumption, and provides an integer linear programming-based inference load self-adaptive distribution algorithm of the deep neural network model.
The experimental environment of this example is as follows. This example constructed an experimental cluster containing four raspberry pies 3, an invida Jetson TX2, and a dell regular commercial desktop computer. The raspberry pi 3 represents an internet of things device with limited computing capacity, the great Jetson TX2 represents a mobile artificial intelligence platform, and the dell commercial desktop computer represents a common small-sized edge server in an intelligent home scene. One of the raspberry pies 3 is specified as a main device for receiving input pictures, and the other devices are auxiliary devices. A Monsoon high-voltage power supply detector is used for measuring energy consumption, and a network flow control tool tc is used for controlling communication bandwidth among terminal devices. The classical deep neural network model AlexNet was implemented using TensorFlow Lite. The deep neural network model is trained prior to deployment. The deep neural network collaborative reasoning task in the experiment is to classify images from the ImageNet dataset. In order to avoid external program interference in the experimental process, all unnecessary system services and application programs are closed in the experimental process.
The experimental method of this example is as follows:
the method comprises the following steps: all loads are reserved and executed in the terminal equipment for receiving the input pictures;
the method 2 comprises the following steps: and (4) multi-device collaborative reasoning. The difference between the method 2 and the method of the present invention is that the input load is segmented according to a greedy algorithm, that is, the load is distributed according to the computing power of each device in the available device group: the more computationally powerful the device, the more load is allocated. The method does not consider network bandwidth resources among terminal devices.
The method 3 comprises the following steps: and (4) multi-device collaborative reasoning. The method 2 is different from the patent method in that the input load is split in equal proportion, namely, the load is distributed according to the total number of the devices in the available device group: each terminal device receives an equal amount of load. The method does not consider the heterogeneous resources of each terminal device and the network bandwidth resources among the devices.
The method 4 comprises the following steps: the invention provides a deep neural network model collaborative reasoning method based on adaptive load distribution.
A deep neural network model collaborative reasoning method based on adaptive load distribution specifically comprises the following implementation steps:
s1, deep neural network model installation step: each terminal device installs the trained AlexNet model to a local computing environment before the cooperative reasoning system operates, and records configuration parameters including AlexNet model topological structure parameters L and calculation configuration parameters according to the structural design of the AlexNet model<k,cin,cout,s,p>l. And receiving the raspberry pi of the input picture as a main device to collect AlexNet model configuration parameter information of all available terminal devices.
S2, acquiring computing power information of the terminal equipment: before the cooperative reasoning system operates, each terminal device utilizes the local historical input sample to execute AlexNet model image classification tasks in an off-line mode, records the average time delay and the average energy consumption of single task execution, and records and estimates the parameter set of the computing capacity of the local device<ρ,f,m,Pc,Px>i. The main device collects the calculation capability parameter information of all available terminal devices, records the available device group parameter N, and randomly initializes the load distribution parameter pi.
S3, performing collaborative reasoning modeling on the deep neural network model: the main equipment collects network bandwidth information b of the terminal equipment when the cooperative reasoning system runsi,jAnd a user-defined latency requirement parameter D, in combination with the deep neural network collected in step S1 and step S2And (4) configuring parameters of the network model and computing capability parameters of each terminal device, and modeling the performance optimization problem of the collaborative reasoning process. Wherein, the performance optimization problem is to minimize the energy consumption overhead of the system while satisfying the user-defined delay requirement, and the formalization is expressed as follows:
P1:min.Ec+Ex
s.t.T≤D,
Figure BDA0002879289420000091
ai≥0,ai∈Z,i∈N,
Figure BDA0002879289420000092
rli≤mi,i∈N,l∈L;
s4, a deep neural network model collaborative reasoning load distribution step: the master device converts the optimization problem modeled in step S3 into a plurality of linear programming sub-problems, each of which is formally expressed as follows:
P2:min.Ec+Ex
s.t.T≤D,
λi≥0,i∈N,
Figure BDA0002879289420000101
rli≤mi,i∈N,l∈L;
through the self-adaptive load distribution algorithm shown in fig. 1, the collaborative inference load distribution scheme aiming at the current input is solved, so that the optimization of the system energy consumption overhead under the condition of meeting the time delay requirement defined by the user is realized. The adaptive load distribution algorithm shown in fig. 1 specifically executes the following steps: given the initial available equipment set N, the problem P2 is substituted and solved using existing linear programming solving tools (e.g., IBM CPLEX toolkit) to arrive at a load distribution scheme pi ═ λ12,…,λN]. Substituting pi into P1 verifies the feasibility of the solution: if pi is a feasible solution of P1, returning pi as a load distribution scheme for the current input, otherwise, removing the zero element in pi and the equipment index corresponding to the minimum element except the zero element from N to obtain a new available equipment set N, and substituting N into the problem P2 to solve. And repeating the steps described above until the solved pi is a feasible solution or the input available equipment set N is an empty set. The input available device set N is an empty set, which means that the currently defined execution latency requirement is too strict to achieve an effective and feasible load distribution scheme.
S5, performing deep neural network model collaborative reasoning: as shown in fig. 2, after receiving the input picture, the master device segments the input picture according to the AlexNet model collaborative inference load distribution scheme generated in step S4, and then sends each load segment to the corresponding terminal device. After each terminal device receives the currently input load blocks, the AlexNet model image classification task is cooperatively executed, and finally an image classification result is obtained.
Fig. 3 and 4 are schematic diagrams of a time delay result and an energy consumption result in an AlexNet model collaborative reasoning image classification task experiment under the same condition by different methods in the embodiment of the present invention, respectively. In fig. 3, the dashed line represents the 100 ms delay requirement set by the user, indicating that each AlexNet model inference must be completed within 100 ms. As can be seen from fig. 3, since the method 1 only performs inference locally, it takes the longest time, and fails to meet the delay requirement set by the user. The methods 2, 3 and 4 all meet the time delay requirement. As can be seen from fig. 4, method 4 (the method disclosed by the present invention) achieves the lowest energy consumption overhead compared to other methods. The comparison highlights the advantages that the method disclosed by the invention comprehensively considers the resource heterogeneity and the network dynamics of a plurality of available terminal devices and comprehensively optimizes the execution delay and the execution energy consumption.
Fig. 5 is a schematic diagram of a time delay result in an AlexNet model collaborative inference image classification task experiment under different time delay requirements of the same available equipment group by different methods in the embodiment of the present invention. In order to highlight the importance of the delay requirement, under different delay requirements, if the model reasoning execution delay does not meet the requirement, the execution energy consumption is recorded as zero; if the model reasoning execution time delay meets the requirement, recording as an effective reasoning and recording the execution energy consumption. According to fig. 5, from the perspective of the delay requirement, compared with other methods, the delay requirement met by performing one effective inference in method 4 (the method disclosed by the present invention) and method 2 is the highest: when the delay requirement is 75 milliseconds, only method 4 and method 2 perform effective reasoning, and other methods fail to meet the delay requirement. From the energy consumption perspective, when the delay requirement is 75 milliseconds, compared with the method 2, the energy consumption overhead of the method 4 is smaller than that of the method 2, and the energy consumption overhead of the method 4 is always smaller than that of the method 2 along with the relaxation of the delay requirement. This comparison highlights the overall advantage of the method disclosed by the invention in terms of delay performance and energy consumption performance.
FIG. 6 is a schematic diagram of a time delay and energy consumption result in an AlexNet model collaborative reasoning image classification task experiment under the same time delay requirement of different available equipment quantities by the method in the embodiment of the present invention. As the number of devices contained in the set of available devices increased from 1 to 6, the devices we added were raspberry pi, mainframe (dell business desktop), raspberry pi, imperial (Jetson TX2), respectively. As can be seen from fig. 6, as the number of available devices increases, the overhead of the cooperative reasoning at the time delay and the energy consumption decreases, and the stronger the computing power of the added device is, the larger the reduction amplitude of the overhead of the cooperative reasoning at the time delay and the energy consumption is. This experiment illustrates the advantages of the method disclosed in the present invention in terms of system scalability and in terms of available computing resource utilization.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A deep neural network model collaborative reasoning method based on adaptive load distribution is characterized by comprising the following steps:
s1, deep neural network model installation: each terminal device installs the trained deep neural network model to a local computing environment before the cooperative reasoning system operates, and records corresponding configuration parameters according to the structural design of the deep neural network model; the method comprises the steps that a main device collects deep neural network model configuration parameter information of all available terminal devices, wherein the main device is a device for receiving input;
s2, acquiring computing capability information of the terminal equipment: before the cooperative reasoning system operates, each terminal device utilizes a local historical input sample to execute a deep neural network model reasoning task in an off-line mode, records related operation data and estimates local computing capability parameters; the method comprises the steps that a main device collects calculation capacity parameter information of all available terminal devices;
s3, a deep neural network model collaborative reasoning modeling step: the main equipment collects network bandwidth information of the terminal equipment and time delay requirement parameters defined by a user when the collaborative reasoning system runs, and performs performance optimization problem modeling on the collaborative reasoning process by combining the deep neural network model configuration parameters collected in the steps S1 and S2 and the computing capability parameters of each terminal equipment; wherein the performance optimization problem is to minimize the energy consumption overhead of the system while meeting user-defined latency requirements;
s4, deep neural network model collaborative reasoning load distribution: the main equipment converts the optimization problem modeled in the step S3 into a plurality of linear programming subproblems, and dynamically adjusts the collaborative reasoning load distribution aiming at the current input through a self-adaptive load distribution algorithm, so that the optimization of the system energy consumption expense under the condition of meeting the time delay requirement defined by a user is realized;
s5, performing deep neural network model collaborative reasoning: after receiving the input picture, the main device segments the input picture according to the deep neural network model collaborative reasoning load distribution scheme generated in step S4, and then sends each load segment to the corresponding terminal device; after each terminal device receives the current input load blocks, cooperatively executing a deep neural network model reasoning task to finally obtain a cooperative reasoning result.
2. The adaptive load distribution based deep neural network model cooperative inference method according to claim 1, wherein the step S1 specifically includes:
s11, defining topological structure parameters of a deep neural network model: defining a certain layer l of the deep neural network model to represent a convolution calculation operation or a full-connection layer calculation operation in deep neural network model estimation; given a deep neural network model, defining L ═ 1,2, …, L ] to represent successive L-layer computations of the model;
s12, defining calculation configuration parameters of each layer of the deep neural network model: defining a set of configuration parameters expressing a layer I computation<k,cin,cout,s,p>lWhere k denotes the convolution kernel size, cinIndicates the number of input channels, coutRepresenting the number of output channels, s representing the convolution step, and p representing the convolution supplemental data size; wherein, one convolution calculation operation or one full link layer calculation operation in step S11 can be expressed by the configuration parameter set;
s13, collecting deep neural network model configuration parameters: according to the design of the deep neural network model structure, recording the model topological structure parameters L of S11 as [1,2, …, L ═ L]And the respective layer configuration parameter group described in S12<k,cin,cout,s,p>l
S14, installing a deep neural network model in a local computing environment: and each terminal device downloads and stores the trained deep neural network model file to the local, and loads parameters required by deep neural network model inference to a memory before the cooperative inference system operates, so that the deep neural network model inference task can be immediately executed when input arrives.
3. The adaptive load distribution based deep neural network model cooperative inference method according to claim 2, wherein the step S2 specifically includes:
s21, defining load distribution parameters of the terminal equipment: definition N ═ 1,2, …, N]Representing an available equipment group containing N terminal equipment participating in the collaborative reasoning task; for any device i, defining the size of the calculation load received by the terminal device i as aiLet pi be [ a ]1,a2,…,aN]Representing a collaborative inference load distribution scheme based on the available terminal equipment group N; given the current layer l of the deep neural network of step S11, calculating the load size a based on the current layer liDefining the amount of calculation load data as rli,rliBy calculating the original input picture size as aiObtaining a data volume of the portion of (1);
s22, defining the calculation capacity parameters of the terminal equipment: defining a set of computing power parameters expressing an ith terminal device<ρ,f,m,Pc,Px>iWhere ρ represents the number of CPU revolutions required per 1KB of input data processed, f represents the CPU frequency, m represents the available memory space, PcRepresenting the calculated power, PxRepresents a transmission power; the available memory space m represents the memory space available for the collaborative reasoning system except the basic system bottom service;
s23, collecting and calculating capacity parameters by each terminal device: before the cooperative reasoning system operates, each terminal device executes the deep neural network reasoning task in the local computing environment by using the historical input sample in an off-line manner, and records the computing capability parameter group in the step S22<ρ,f,m,Pc,Px>i(ii) a The CPU frequency f can be obtained through a terminal device specification, the number of CPU revolutions required by executing the deep neural network inference task once can be obtained by multiplying the CPU frequency f by the execution time delay of the single inference task, and the number of CPU revolutions rho required by processing input data of 1KB or less can be obtained by dividing the size of the input data by the CPU frequency fTo obtain, calculating the power PcAnd a transmission power PxThe method can be obtained by respectively removing the calculation energy consumption and the transmission energy consumption through the execution time delay of a single reasoning task;
s24, collecting all available terminal equipment calculation capacity parameters: the master device collects the computing power parameter sets of all available terminal devices<ρ,f,m,Pc,Px>iThe available device group N ═ 1,2, …, N according to the available terminal device number record S21]And the cooperative inference load distribution scheme pi ═ a1,a2,…,aN]And (4) random initialization.
4. The adaptive load distribution based deep neural network model cooperative inference method according to claim 3, wherein the step S3 specifically includes:
s31, defining a collaborative reasoning time delay requirement parameter: defining an inference delay requirement parameter D by a user or a service provider; the parameter D represents that the cooperative reasoning task needs to be executed and completed within the time delay requirement D;
s32, defining communication bandwidth parameters of each terminal device: defining communication bandwidth parameter b of each terminal equipmenti,jWhere i and j represent any two device indices in the available device group N, i.e., i, j ∈ N; bi,iRepresents any device internal communication bandwidth;
s33, collaborative reasoning task constraint modeling: for the load a assigned to each terminal deviceiThe following constraints need to be satisfied:
Figure FDA0002879289410000031
ai≥0,ai∈Z,i∈N (2)
Figure FDA0002879289410000032
wherein Z represents an integer set, H represents the height of the input picture,
Figure FDA0002879289410000033
representation is based on condition ai>0, which is defined as follows:
Figure FDA0002879289410000034
wherein, formula (1) shows that the load size distributed on each terminal device should be smaller than the convolution supplementary data size on the neighbor device or just equal to 0; formula (2) indicates that the size of the load distributed on each terminal device should be a non-negative integer; equation (3) represents that the sum of the sizes of the loads distributed on each terminal device should be exactly equal to the height of the complete input picture;
quantity r of calculation load data for each deep neural network layer on each terminal deviceliThe following constraints need to be satisfied:
rli≤mi,i∈N,l∈L (4)
formula (4) represents the calculated load data amount r of each deep neural network layer on each terminal deviceliThe available memory space capacity limit needs to be met;
s34, collaborative reasoning task execution performance modeling: for single-layer deep neural network model reasoning, the calculation time delay and the calculation energy consumption are modeled as follows:
Figure FDA0002879289410000041
Figure FDA0002879289410000042
wherein the content of the first and second substances,
Figure FDA0002879289410000043
and
Figure FDA0002879289410000044
respectively representing the calculation time delay and the calculation energy consumption of the terminal device i in the first layer deep neural network model;
for the single-layer deep neural network model inference, the communication delay and the communication energy consumption are modeled as follows:
Figure FDA0002879289410000045
Figure FDA0002879289410000046
wherein the content of the first and second substances,
Figure FDA0002879289410000047
and
Figure FDA0002879289410000048
respectively representing the calculation time delay and the calculation energy consumption of the terminal device i in the first layer deep neural network model;
for the multilayer deep neural network model inference, the total execution delay and the execution energy consumption are modeled as follows:
Figure FDA0002879289410000049
Figure FDA00028792894100000410
Figure FDA00028792894100000411
wherein E iscAnd ExRespectively representing the calculation energy consumption and the communication energy consumption of the collaborative reasoning system in the process of executing the collaborative reasoning, and T representing the collaborative reasoning system in the process of executing the collaborative reasoningTotal delay of (1);
s35, modeling of a collaborative reasoning performance optimization problem: according to the modeling of S33 and S34, the system collaborative reasoning performance optimization problem can be expressed as minimizing the total energy consumption overhead of the system under the condition that the time delay requirement D defined by a user is met; formalized expression the P1 problem as follows:
P1:min.Ec+Ex
s.t.T≤D
Figure FDA0002879289410000055
ai≥0,ai∈Z,i∈N
Figure FDA0002879289410000051
rli≤mi,i∈N,l∈L。
5. the adaptive load distribution based deep neural network model cooperative inference method according to claim 4, wherein the step S4 specifically includes:
s41, converting a collaborative reasoning performance optimization problem: the problem P1 belongs to an integer linear programming problem, is an NP-class difficult problem, and is difficult to efficiently solve an optimal solution in polynomial time; therefore, the problem P1 is approximated to a linear programming problem, and a self-adaptive load distribution algorithm is provided to realize the approximate optimal solution in polynomial time;
defining a continuous variable lambdaiApproximate expression aiWherein λ isiAnd aiThe relationship of (1) is:
ai=λiH (12)
then, the following equations (1), (2) and (3) can be obtained:
Figure FDA0002879289410000052
λi≥0,i∈N (14)
Figure FDA0002879289410000053
(13) formula is equivalent to λiH≥pi+1Or λi0; relaxing the constraint expressed by the formula (13) to be simplified into lambdai≧ 0, the following problem P2 can be obtained:
P2:min.Ec+Ex
s.t.T≤D
λi≥0,i∈N
Figure FDA0002879289410000054
rli≤mi,i∈N,l∈L;
s42, solving a collaborative reasoning performance optimization problem: giving an initial available equipment set N, substituting the initial available equipment set N into a problem P2, and solving by using an existing linear programming solving tool to obtain a load distribution scheme pi ═ lambda [ lambda ]12,…,λN](ii) a Substituting pi into P1 verifies the feasibility of the solution: if pi is a feasible solution of P1, returning pi as a load distribution scheme aiming at the current input, otherwise, removing a zero element in pi and a device index corresponding to the minimum element except the zero element from N to obtain a new available device set N, and substituting N into the problem P2 to solve; repeating the steps described above until the solved pi is a feasible solution, or the input available equipment set N is an empty set; the input available device set N is an empty set, which means that the currently defined execution latency requirement is too strict to achieve an effective and feasible load distribution scheme.
6. The adaptive load distribution based deep neural network model cooperative inference method according to claim 5, characterized in that the linear programming solving tool comprises IBM CPLEX toolkit.
7. The adaptive load distribution based deep neural network model cooperative inference method according to claim 5, wherein the step S5 specifically includes:
s51, collaborative inference load distribution deployment: defining equipment for receiving input as main equipment and equipment participating in cooperative reasoning as auxiliary equipment; when the main device receives the input picture, operating the collaborative inference performance optimization problem solving algorithm in S42 to obtain the optimal load distribution result aiming at the current input; the method comprises the steps that a main device divides an input picture along a height edge according to a load distribution proportion to obtain multiple input blocks; sending each block data to corresponding terminal equipment according to the equipment index corresponding to the load distribution result;
s52, collaborative reasoning execution: and after the main equipment transmits the input load blocks to each terminal equipment, all the terminal equipment starts the deep neural network model cooperative reasoning calculation by using the deep learning framework. Each terminal device utilizes a remote procedure calling tool to communicate and exchange intermediate calculation data required in the calculation process.
8. The adaptive load distribution based deep neural network model cooperative inference method according to claim 7, wherein the terminal device comprises a main device and an auxiliary device.
9. The adaptive load distribution-based deep neural network model cooperative inference method according to claim 7, wherein the deep learning framework comprises TensorFlow.
10. The adaptive load distribution based deep neural network model cooperative inference method of claim 7, wherein the remote procedure call tool comprises a gPC.
CN202011638571.XA 2020-12-31 2020-12-31 Deep neural network model collaborative reasoning method based on adaptive load distribution Pending CN112686374A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011638571.XA CN112686374A (en) 2020-12-31 2020-12-31 Deep neural network model collaborative reasoning method based on adaptive load distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011638571.XA CN112686374A (en) 2020-12-31 2020-12-31 Deep neural network model collaborative reasoning method based on adaptive load distribution

Publications (1)

Publication Number Publication Date
CN112686374A true CN112686374A (en) 2021-04-20

Family

ID=75456615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011638571.XA Pending CN112686374A (en) 2020-12-31 2020-12-31 Deep neural network model collaborative reasoning method based on adaptive load distribution

Country Status (1)

Country Link
CN (1) CN112686374A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024000605A1 (en) * 2022-07-01 2024-01-04 北京小米移动软件有限公司 Ai model reasoning method and apparatus
WO2024082550A1 (en) * 2023-03-24 2024-04-25 Lenovo (Beijing) Limited Methods and apparatuses for ue-server co-inference in wireless system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020132593A1 (en) * 2018-12-21 2020-06-25 Waymo Llc Neural network processor
CN111832720A (en) * 2020-09-21 2020-10-27 电子科技大学 Configurable neural network reasoning and online learning fusion calculation circuit

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020132593A1 (en) * 2018-12-21 2020-06-25 Waymo Llc Neural network processor
CN111832720A (en) * 2020-09-21 2020-10-27 电子科技大学 Configurable neural network reasoning and online learning fusion calculation circuit

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIEKANG ZENG ET AL: "CoEdge: Cooperative DNN Inference with Adaptive Workload Partitioning over Heterogeneous Edge Devices", 《ARXIV.ORG》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024000605A1 (en) * 2022-07-01 2024-01-04 北京小米移动软件有限公司 Ai model reasoning method and apparatus
WO2024082550A1 (en) * 2023-03-24 2024-04-25 Lenovo (Beijing) Limited Methods and apparatuses for ue-server co-inference in wireless system

Similar Documents

Publication Publication Date Title
Santos et al. Towards network-aware resource provisioning in kubernetes for fog computing applications
Samie et al. Computation offloading and resource allocation for low-power IoT edge devices
JP7162385B2 (en) Multi-User Multi-MEC Task Unload Resource Scheduling Method Based on Edge-Terminal Collaboration
CN110519370B (en) Edge computing resource allocation method based on facility site selection problem
CN113778677B (en) SLA-oriented intelligent optimization method for cloud-edge cooperative resource arrangement and request scheduling
CN110069341B (en) Method for scheduling tasks with dependency relationship configured according to needs by combining functions in edge computing
CN103761309A (en) Operation data processing method and system
CN110968366B (en) Task unloading method, device and equipment based on limited MEC resources
CN112686374A (en) Deep neural network model collaborative reasoning method based on adaptive load distribution
CN114520768B (en) AI unloading optimization method for random tasks in industrial Internet of things
CN113615137B (en) CDN optimization platform
WO2022001941A1 (en) Network element management method, network management system, independent computing node, computer device, and storage medium
CN112822701A (en) Multi-user deep neural network model segmentation and resource allocation optimization method in edge computing scene
CN116385857B (en) Calculation power distribution method based on AI intelligent scheduling
Alam et al. Communication-efficient distributed multi-resource allocation
CN113315669B (en) Cloud edge cooperation-based throughput optimization machine learning inference task deployment method
Majeed et al. Modelling fog offloading performance
US11829799B2 (en) Distributed resource-aware training of machine learning pipelines
Chen et al. Joint data collection and resource allocation for distributed machine learning at the edge
CN112822055A (en) DQN-based edge computing node deployment algorithm
CN116684472A (en) Service deployment system and service deployment method for terminal-side computing network
CN116109058A (en) Substation inspection management method and device based on deep reinforcement learning
Luo et al. A Two‐Stage Service Replica Strategy for Business Process Efficiency Optimization in Community Cloud
CN115955479A (en) Task rapid scheduling and resource management method in cloud edge cooperation system
Kolomvatsos et al. Scheduling the execution of tasks at the edge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210420