CN112686374A

CN112686374A - Deep neural network model collaborative reasoning method based on adaptive load distribution

Info

Publication number: CN112686374A
Application number: CN202011638571.XA
Authority: CN
Inventors: 陈旭; 曾烈康; 周知
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-20

Abstract

The invention discloses a deep neural network model collaborative reasoning method based on self-adaptive load distribution. By modeling the execution process of the deep neural network model and the influence of dynamic change of communication bandwidth on distributed computation of the multi-terminal equipment, the invention constructs an optimization problem comprehensively considering inference time delay and system energy consumption, and provides an integer linear programming-based inference load self-adaptive distribution algorithm of the deep neural network model. Compared with the traditional deep neural network model local inference and the most advanced collaborative computing scheme at present, the method not only provides a brand-new collaborative inference paradigm on multi-terminal equipment, but also can comprehensively consider the heterogeneity and network dynamics of computing resources under the condition of given execution delay required by intelligent application, and realizes the deep neural network collaborative inference of minimizing energy consumption overhead.

Description

Deep neural network model collaborative reasoning method based on adaptive load distribution

Technical Field

The invention relates to the technical field of deep learning, edge calculation and distributed calculation, in particular to a deep neural network model collaborative reasoning method based on adaptive load distribution.

Background

Various intelligent services based on deep learning technology have been developed greatly in recent years, and are now deeply integrated into the daily life of people. For example, in an intelligent home scene widely deployed today, when a certain intelligent camera captures a face image, deep learning inference calculation can be initiated immediately to recognize face information. However, for economic and space utilization reasons, the terminal devices deployed by users in smart homes are usually small terminal devices with limited computing power, such as smart speakers, smart gateways, small home hosts, and so on. When processing deep learning reasoning tasks with strong demands on computing resources, it is often difficult for such terminal devices to meet the user's demands in terms of real-time, security, and environmental friendliness.

For the pain point, the traditional solution generally sends the sensing data and the calculation request of the intelligent home terminal device to the cloud end by means of a cloud computing paradigm, and the inference result is rapidly calculated by means of the powerful computing capability of a data center server. However, this approach can present a number of potential problems: on one hand, the performance of the traditional cloud computing scheme is limited by unstable remote communication between a user and a cloud end, and the deep neural network reasoning result of the cloud end is difficult to ensure to return to the terminal equipment within the time delay range acceptable by the user; on the other hand, data possibly containing user private activity information is sent to a cloud server owned by a business company, so that the user is inevitably worried about privacy disclosure.

Disclosure of Invention

The invention provides a deep neural network model cooperative reasoning method based on adaptive load distribution to overcome at least one defect in the prior art, and realizes low-delay and high-energy-efficiency cooperative reasoning of a deep learning model on multi-terminal equipment.

In order to solve the technical problems, the invention adopts the technical scheme that: a deep neural network model collaborative reasoning method based on adaptive load distribution comprises the following steps:

s1, deep neural network model installation: each terminal device installs the trained deep neural network model to a local computing environment before the cooperative reasoning system operates, and records corresponding configuration parameters according to the structural design of the deep neural network model; the method comprises the steps that a main device collects deep neural network model configuration parameter information of all available terminal devices, wherein the main device is a device for receiving input;

s2, acquiring computing capability information of the terminal equipment: before the cooperative reasoning system operates, each terminal device utilizes a local historical input sample to execute a deep neural network model reasoning task in an off-line mode, records related operation data and estimates local computing capability parameters; the method comprises the steps that a main device collects calculation capacity parameter information of all available terminal devices;

s3, a deep neural network model collaborative reasoning modeling step: the main equipment collects network bandwidth information of the terminal equipment and time delay requirement parameters defined by a user when the collaborative reasoning system runs, and performs performance optimization problem modeling on the collaborative reasoning process by combining the deep neural network model configuration parameters collected in the steps S1 and S2 and the computing capability parameters of each terminal equipment; wherein the performance optimization problem is to minimize the energy consumption overhead of the system while meeting user-defined latency requirements;

s4, deep neural network model collaborative reasoning load distribution: the main equipment converts the optimization problem modeled in the step S3 into a plurality of linear programming subproblems, and dynamically adjusts the collaborative reasoning load distribution aiming at the current input through a self-adaptive load distribution algorithm, so that the optimization of the system energy consumption expense under the condition of meeting the time delay requirement defined by a user is realized;

s5, performing deep neural network model collaborative reasoning: after receiving the input picture, the main device segments the input picture according to the deep neural network model collaborative reasoning load distribution scheme generated in step S4, and then sends each load segment to the corresponding terminal device; after each terminal device receives the current input load blocks, cooperatively executing a deep neural network model reasoning task to finally obtain a cooperative reasoning result.

Further, the step S1 specifically includes:

s11, defining topological structure parameters of a deep neural network model: defining a certain layer l of the deep neural network model to represent a convolution calculation operation or a full-connection layer calculation operation in deep neural network model estimation; given a deep neural network model, defining L ═ 1,2, …, L ] to represent successive L-layer computations of the model;

s12, defining calculation configuration parameters of each layer of the deep neural network model: defining a set of configuration parameters expressing a layer I computation<k,c_in,c_out,s,p>_lWhere k denotes the convolution kernel size, c_inIndicates the number of input channels, c_outRepresenting the number of output channels, s representing the convolution step, and p representing the convolution supplemental data size; wherein, one convolution calculation operation or one full link layer calculation operation in step S11 can be expressed by the configuration parameter set;

s13, collecting deep neural network model configuration parameters: according to the design of the deep neural network model structure, recording the model topological structure parameters L of S11 as [1,2, …, L ═ L]And the respective layer configuration parameter group described in S12<k,c_in,c_out,s,p>_l；

S14, installing a deep neural network model in a local computing environment: and each terminal device downloads and stores the trained deep neural network model file to the local, and loads parameters required by deep neural network model inference to a memory before the cooperative inference system operates, so that the deep neural network model inference task can be immediately executed when input arrives.

Further, the step S2 specifically includes:

s21, defining load distribution parameters of the terminal equipment: definition N ═ 1,2, …, N]Representing an available equipment group containing N terminal equipment participating in the collaborative reasoning task; for any device i, defining the size of the calculation load received by the terminal device i as a_iLet pi be [ a ]₁,a₂,…,a_N]Representing a collaborative inference load distribution scheme based on the available terminal equipment group N; given the current layer l of the deep neural network of step S11, calculating the load size a based on the current layer l_iDefining the amount of calculation load data as r_li，r_liBy calculating the original input picture size as a_iObtaining a data volume of the portion of (1);

s22, defining the calculation capacity parameters of the terminal equipment: defining a set of computing power parameters expressing an ith terminal device<ρ,f,m,P^c,P^x>_iWherein, in the step (A),ρ represents the number of CPU revolutions required per 1KB of input data processed, f represents the CPU frequency, m represents the available memory space, P^cRepresenting the calculated power, P^xRepresents a transmission power; the available memory space m represents the memory space available for the collaborative reasoning system except the basic system bottom service;

s23, collecting and calculating capacity parameters by each terminal device: before the cooperative reasoning system operates, each terminal device executes the deep neural network reasoning task in the local computing environment by using the historical input sample in an off-line manner, and records the computing capability parameter group in the step S22<ρ,f,m,P^c,P^x>_i(ii) a The CPU frequency f can be obtained through a terminal device specification, the number of CPU revolutions required by executing a deep neural network inference task once can be obtained by multiplying the CPU frequency f by the execution time delay of a single inference task, the number of CPU revolutions rho required by processing input data of 1KB or more can be obtained by dividing the input data by the CPU frequency f, and the power P is calculated^cAnd a transmission power P^xThe method can be obtained by respectively removing the calculation energy consumption and the transmission energy consumption through the execution time delay of a single reasoning task;

s24, collecting all available terminal equipment calculation capacity parameters: the master device collects the computing power parameter sets of all available terminal devices<ρ,f,m,P^c,P^x>_iThe available device group N ═ 1,2, …, N according to the available terminal device number record S21]And the cooperative inference load distribution scheme pi ═ a₁,a₂,…,a_N]And (4) random initialization.

Further, the step S3 specifically includes:

s31, defining a collaborative reasoning time delay requirement parameter: defining an inference delay requirement parameter D by a user or a service provider; the parameter D represents that the cooperative reasoning task needs to be executed and completed within the time delay requirement D;

s32, defining communication bandwidth parameters of each terminal device: defining communication bandwidth parameter b of each terminal equipment_i,jWhere i and j represent any two device indices in the available device group N, i.e., i, j ∈ N; b_i,iRepresents any device internal communication bandwidth;

s33. collaborative inference task constraintsModeling: for the load a assigned to each terminal device_iThe following constraints need to be satisfied:

a_i≥0,a_i∈Z,i∈N, (2)

wherein Z represents an integer set, H represents the height of the input picture,

representation is based on condition a_i>0, which is defined as follows:

wherein, formula (1) shows that the load size distributed on each terminal device should be smaller than the convolution supplementary data size on the neighbor device or just equal to 0; formula (2) indicates that the size of the load distributed on each terminal device should be a non-negative integer; equation (3) represents that the sum of the sizes of the loads distributed on each terminal device should be exactly equal to the height of the complete input picture;

quantity r of calculation load data for each deep neural network layer on each terminal device_liThe following constraints need to be satisfied:

r_li≤m_i,i∈N,l∈L, (4)

formula (4) represents the calculated load data amount r of each deep neural network layer on each terminal device_liThe available memory space capacity limit needs to be met;

s34, collaborative reasoning task execution performance modeling: for single-layer deep neural network model reasoning, the calculation time delay and the calculation energy consumption are modeled as follows:

wherein the content of the first and second substances,

and

respectively representing the calculation time delay and the calculation energy consumption of the terminal device i in the first layer deep neural network model;

for the single-layer deep neural network model inference, the communication delay and the communication energy consumption are modeled as follows:

wherein the content of the first and second substances,

and

for the multilayer deep neural network model inference, the total execution delay and the execution energy consumption are modeled as follows:

wherein E is^cAnd E^xRespectively representing the calculation energy consumption and the communication energy consumption of the collaborative reasoning system in the process of executing the collaborative reasoning, and T representing the total time delay of the collaborative reasoning system in the process of executing the collaborative reasoning;

s35, modeling of a collaborative reasoning performance optimization problem: according to the modeling of S33 and S34, the system collaborative reasoning performance optimization problem can be expressed as minimizing the total energy consumption overhead of the system under the condition that the time delay requirement D defined by a user is met; formalized expression the P1 problem as follows:

P1:min.E^c+E^x

s.t.T≤D,

a_i≥0,a_i∈Z,i∈n,

r_li≤m_i,i∈n,l∈L。

further, the step S4 specifically includes:

s41, converting a collaborative reasoning performance optimization problem: the problem P1 belongs to an integer linear programming problem, is an NP-class difficult problem, and is difficult to efficiently solve an optimal solution in polynomial time; therefore, the problem P1 is approximated to a linear programming problem, and a self-adaptive load distribution algorithm is provided to realize the approximate optimal solution in polynomial time;

defining a continuous variable lambda_iApproximate expression a_iWherein λ is_iAnd a_iThe relationship of (1) is:

a_i＝λ_iH, (12)

then, the following equations (1), (2) and (3) can be obtained:

λ_i≥0,i∈N, (14)

(13) formula is equivalent to λ_iH≥p_i+1Or λ _i0; relaxing the constraint expressed by the formula (13) to be simplified into lambda_i≧ 0, the following problem P2 can be obtained:

P2:min.E^c+E^x

s.t.T≤D,

λ_i≥0,i∈N,

r_li≤m_i,i∈N,l∈L；

the problem P2 is a linear programming problem, a class of widely studied problems, with many mature polynomial time-complexity efficient solution algorithms; by iteratively modifying the constraint range of the problem P2 and solving, we can gradually approach the optimal solution of the problem P1;

s42, solving a collaborative reasoning performance optimization problem: giving an initial available equipment set N, substituting the initial available equipment set N into a problem P2, and solving by using an existing linear programming solving tool to obtain a load distribution scheme pi ═ lambda [ lambda ]₁,λ₂,…,λ_N](ii) a Substituting pi into P1 verifies the feasibility of the solution: if pi is a feasible solution of P1, returning pi as a load distribution scheme aiming at the current input, otherwise, removing a zero element in pi and a device index corresponding to the minimum element except the zero element from N to obtain a new available device set N, and substituting N into the problem P2 to solve; repeat the above-describedThe steps are carried out until pi obtained by solving is a feasible solution, or the input available equipment set N is an empty set; the input available device set N is an empty set, which means that the currently defined execution latency requirement is too strict to achieve an effective and feasible load distribution scheme.

Further, the linear programming solver tool comprises the IBM CPLEX toolkit.

Further, the step S5 specifically includes:

s51, collaborative inference load distribution deployment: defining equipment for receiving input as main equipment and equipment participating in cooperative reasoning as auxiliary equipment; when the main device receives the input picture, operating the collaborative inference performance optimization problem solving algorithm in S42 to obtain the optimal load distribution result aiming at the current input; the method comprises the steps that a main device divides an input picture along a height edge according to a load distribution proportion to obtain multiple input blocks; sending each block data to corresponding terminal equipment according to the equipment index corresponding to the load distribution result;

s52, collaborative reasoning execution: and after the main equipment transmits the input load blocks to each terminal equipment, all the terminal equipment starts the deep neural network model cooperative reasoning calculation by using the deep learning framework. Each terminal device utilizes a remote procedure calling tool to communicate and exchange intermediate calculation data required in the calculation process.

Further, the terminal device includes a main device and an auxiliary device.

Further, the deep learning framework comprises TensorFlow.

Further, the remote procedure call tool includes a gRPC.

Compared with the prior art, the beneficial effects are:

1. the invention realizes the low-delay and high-energy-efficiency cooperative reasoning of the deep learning model on the multi-terminal equipment by means of the computing resources of the multi-terminal equipment under the edge computing environment and through self-adaptive load distribution scheduling;

2. compared with the traditional cloud computing paradigm, the method and the device have the advantages that from the perspective of edge computing, processing and computing of user data are kept in terminal equipment owned by a user in an edge computing environment, so that not only is higher processing speed realized, but also privacy of the user is protected;

3. compared with the traditional edge end deep neural network collaborative reasoning load distribution algorithm, the method comprehensively considers time delay and energy consumption optimization, realizes the minimization of energy consumption overhead under the requirement of time delay given by a user, and has stronger practicability and economic value;

4. compared with the traditional edge end deep neural network collaborative reasoning load distribution algorithm, the method comprehensively considers the heterogeneity of computing resources and the dynamic property of network communication resources, provides an efficient optimization algorithm of polynomial time complexity, and has better adaptability and reliability aiming at the resource characteristics of the edge computing environment.

Drawings

FIG. 1 is a block diagram of an adaptive load distribution algorithm suitable for deep neural network model cooperative reasoning disclosed by the invention.

FIG. 2 is a schematic diagram of the execution flow of the deep neural network model collaborative inference based on load distribution disclosed by the invention.

Fig. 3 is a schematic diagram of a time delay result in an AlexNet model collaborative inference image classification task experiment under the same condition by different methods in the embodiment of the present invention.

FIG. 4 is a schematic diagram of energy consumption results in an AlexNet model collaborative reasoning image classification task experiment under the same conditions by different methods in the embodiment of the present invention.

Fig. 5 is a schematic diagram of a time delay result in an AlexNet model collaborative inference image classification task experiment under different time delay requirements of the same available equipment group by different methods in the embodiment of the present invention.

FIG. 6 is a schematic diagram of a time delay and energy consumption result in an AlexNet model collaborative reasoning image classification task experiment under the same time delay requirement of different available equipment quantities by the method in the embodiment of the present invention.

Detailed Description

The drawings are for illustration purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.

The invention discloses a deep neural network model collaborative reasoning method based on self-adaptive load distribution. By modeling the execution process of the deep neural network model and the influence of dynamic change of communication bandwidth on distributed computation of the multi-terminal equipment, the invention constructs an optimization problem comprehensively considering inference time delay and system energy consumption, and provides an integer linear programming-based inference load self-adaptive distribution algorithm of the deep neural network model.

The experimental environment of this example is as follows. This example constructed an experimental cluster containing four raspberry pies 3, an invida Jetson TX2, and a dell regular commercial desktop computer. The raspberry pi 3 represents an internet of things device with limited computing capacity, the great Jetson TX2 represents a mobile artificial intelligence platform, and the dell commercial desktop computer represents a common small-sized edge server in an intelligent home scene. One of the raspberry pies 3 is specified as a main device for receiving input pictures, and the other devices are auxiliary devices. A Monsoon high-voltage power supply detector is used for measuring energy consumption, and a network flow control tool tc is used for controlling communication bandwidth among terminal devices. The classical deep neural network model AlexNet was implemented using TensorFlow Lite. The deep neural network model is trained prior to deployment. The deep neural network collaborative reasoning task in the experiment is to classify images from the ImageNet dataset. In order to avoid external program interference in the experimental process, all unnecessary system services and application programs are closed in the experimental process.

The experimental method of this example is as follows:

the method comprises the following steps: all loads are reserved and executed in the terminal equipment for receiving the input pictures;

the method 2 comprises the following steps: and (4) multi-device collaborative reasoning. The difference between the method 2 and the method of the present invention is that the input load is segmented according to a greedy algorithm, that is, the load is distributed according to the computing power of each device in the available device group: the more computationally powerful the device, the more load is allocated. The method does not consider network bandwidth resources among terminal devices.

The method 3 comprises the following steps: and (4) multi-device collaborative reasoning. The method 2 is different from the patent method in that the input load is split in equal proportion, namely, the load is distributed according to the total number of the devices in the available device group: each terminal device receives an equal amount of load. The method does not consider the heterogeneous resources of each terminal device and the network bandwidth resources among the devices.

The method 4 comprises the following steps: the invention provides a deep neural network model collaborative reasoning method based on adaptive load distribution.

A deep neural network model collaborative reasoning method based on adaptive load distribution specifically comprises the following implementation steps:

s1, deep neural network model installation step: each terminal device installs the trained AlexNet model to a local computing environment before the cooperative reasoning system operates, and records configuration parameters including AlexNet model topological structure parameters L and calculation configuration parameters according to the structural design of the AlexNet model<k,c_in,c_out,s,p>_l. And receiving the raspberry pi of the input picture as a main device to collect AlexNet model configuration parameter information of all available terminal devices.

S2, acquiring computing power information of the terminal equipment: before the cooperative reasoning system operates, each terminal device utilizes the local historical input sample to execute AlexNet model image classification tasks in an off-line mode, records the average time delay and the average energy consumption of single task execution, and records and estimates the parameter set of the computing capacity of the local device<ρ,f,m,P^c,P^x>_i. The main device collects the calculation capability parameter information of all available terminal devices, records the available device group parameter N, and randomly initializes the load distribution parameter pi.

S3, performing collaborative reasoning modeling on the deep neural network model: the main equipment collects network bandwidth information b of the terminal equipment when the cooperative reasoning system runs_i,jAnd a user-defined latency requirement parameter D, in combination with the deep neural network collected in step S1 and step S2And (4) configuring parameters of the network model and computing capability parameters of each terminal device, and modeling the performance optimization problem of the collaborative reasoning process. Wherein, the performance optimization problem is to minimize the energy consumption overhead of the system while satisfying the user-defined delay requirement, and the formalization is expressed as follows:

P1:min.E^c+E^x

s.t.T≤D,

a_i≥0,a_i∈Z,i∈N,

r_li≤m_i,i∈N,l∈L；

s4, a deep neural network model collaborative reasoning load distribution step: the master device converts the optimization problem modeled in step S3 into a plurality of linear programming sub-problems, each of which is formally expressed as follows:

P2:min.E^c+E^x

s.t.T≤D,

λ_i≥0,i∈N,

r_li≤m_i,i∈N,l∈L；

through the self-adaptive load distribution algorithm shown in fig. 1, the collaborative inference load distribution scheme aiming at the current input is solved, so that the optimization of the system energy consumption overhead under the condition of meeting the time delay requirement defined by the user is realized. The adaptive load distribution algorithm shown in fig. 1 specifically executes the following steps: given the initial available equipment set N, the problem P2 is substituted and solved using existing linear programming solving tools (e.g., IBM CPLEX toolkit) to arrive at a load distribution scheme pi ═ λ₁,λ₂,…,λ_N]. Substituting pi into P1 verifies the feasibility of the solution: if pi is a feasible solution of P1, returning pi as a load distribution scheme for the current input, otherwise, removing the zero element in pi and the equipment index corresponding to the minimum element except the zero element from N to obtain a new available equipment set N, and substituting N into the problem P2 to solve. And repeating the steps described above until the solved pi is a feasible solution or the input available equipment set N is an empty set. The input available device set N is an empty set, which means that the currently defined execution latency requirement is too strict to achieve an effective and feasible load distribution scheme.

S5, performing deep neural network model collaborative reasoning: as shown in fig. 2, after receiving the input picture, the master device segments the input picture according to the AlexNet model collaborative inference load distribution scheme generated in step S4, and then sends each load segment to the corresponding terminal device. After each terminal device receives the currently input load blocks, the AlexNet model image classification task is cooperatively executed, and finally an image classification result is obtained.

Fig. 3 and 4 are schematic diagrams of a time delay result and an energy consumption result in an AlexNet model collaborative reasoning image classification task experiment under the same condition by different methods in the embodiment of the present invention, respectively. In fig. 3, the dashed line represents the 100 ms delay requirement set by the user, indicating that each AlexNet model inference must be completed within 100 ms. As can be seen from fig. 3, since the method 1 only performs inference locally, it takes the longest time, and fails to meet the delay requirement set by the user. The

methods

2, 3 and 4 all meet the time delay requirement. As can be seen from fig. 4, method 4 (the method disclosed by the present invention) achieves the lowest energy consumption overhead compared to other methods. The comparison highlights the advantages that the method disclosed by the invention comprehensively considers the resource heterogeneity and the network dynamics of a plurality of available terminal devices and comprehensively optimizes the execution delay and the execution energy consumption.

Fig. 5 is a schematic diagram of a time delay result in an AlexNet model collaborative inference image classification task experiment under different time delay requirements of the same available equipment group by different methods in the embodiment of the present invention. In order to highlight the importance of the delay requirement, under different delay requirements, if the model reasoning execution delay does not meet the requirement, the execution energy consumption is recorded as zero; if the model reasoning execution time delay meets the requirement, recording as an effective reasoning and recording the execution energy consumption. According to fig. 5, from the perspective of the delay requirement, compared with other methods, the delay requirement met by performing one effective inference in method 4 (the method disclosed by the present invention) and method 2 is the highest: when the delay requirement is 75 milliseconds, only method 4 and method 2 perform effective reasoning, and other methods fail to meet the delay requirement. From the energy consumption perspective, when the delay requirement is 75 milliseconds, compared with the method 2, the energy consumption overhead of the method 4 is smaller than that of the method 2, and the energy consumption overhead of the method 4 is always smaller than that of the method 2 along with the relaxation of the delay requirement. This comparison highlights the overall advantage of the method disclosed by the invention in terms of delay performance and energy consumption performance.

FIG. 6 is a schematic diagram of a time delay and energy consumption result in an AlexNet model collaborative reasoning image classification task experiment under the same time delay requirement of different available equipment quantities by the method in the embodiment of the present invention. As the number of devices contained in the set of available devices increased from 1 to 6, the devices we added were raspberry pi, mainframe (dell business desktop), raspberry pi, imperial (Jetson TX2), respectively. As can be seen from fig. 6, as the number of available devices increases, the overhead of the cooperative reasoning at the time delay and the energy consumption decreases, and the stronger the computing power of the added device is, the larger the reduction amplitude of the overhead of the cooperative reasoning at the time delay and the energy consumption is. This experiment illustrates the advantages of the method disclosed in the present invention in terms of system scalability and in terms of available computing resource utilization.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A deep neural network model collaborative reasoning method based on adaptive load distribution is characterized by comprising the following steps:

2. The adaptive load distribution based deep neural network model cooperative inference method according to claim 1, wherein the step S1 specifically includes:

3. The adaptive load distribution based deep neural network model cooperative inference method according to claim 2, wherein the step S2 specifically includes:

s22, defining the calculation capacity parameters of the terminal equipment: defining a set of computing power parameters expressing an ith terminal device<ρ,f,m,P^c,P^x>_iWhere ρ represents the number of CPU revolutions required per 1KB of input data processed, f represents the CPU frequency, m represents the available memory space, P^cRepresenting the calculated power, P^xRepresents a transmission power; the available memory space m represents the memory space available for the collaborative reasoning system except the basic system bottom service;

s23, collecting and calculating capacity parameters by each terminal device: before the cooperative reasoning system operates, each terminal device executes the deep neural network reasoning task in the local computing environment by using the historical input sample in an off-line manner, and records the computing capability parameter group in the step S22<ρ,f,m,P^c,P^x>_i(ii) a The CPU frequency f can be obtained through a terminal device specification, the number of CPU revolutions required by executing the deep neural network inference task once can be obtained by multiplying the CPU frequency f by the execution time delay of the single inference task, and the number of CPU revolutions rho required by processing input data of 1KB or less can be obtained by dividing the size of the input data by the CPU frequency fTo obtain, calculating the power P^cAnd a transmission power P^xThe method can be obtained by respectively removing the calculation energy consumption and the transmission energy consumption through the execution time delay of a single reasoning task;

4. The adaptive load distribution based deep neural network model cooperative inference method according to claim 3, wherein the step S3 specifically includes:

s33, collaborative reasoning task constraint modeling: for the load a assigned to each terminal device_iThe following constraints need to be satisfied:

a_i≥0,a_i∈Z,i∈N (2)

representation is based on condition a_i>0, which is defined as follows:

r_li≤m_i,i∈N,l∈L (4)

wherein the content of the first and second substances,

and

wherein the content of the first and second substances,

and

wherein E is^cAnd E^xRespectively representing the calculation energy consumption and the communication energy consumption of the collaborative reasoning system in the process of executing the collaborative reasoning, and T representing the collaborative reasoning system in the process of executing the collaborative reasoningTotal delay of (1);

P1:min.E^c+E^x

s.t.T≤D

a_i≥0,a_i∈Z,i∈N

r_li≤m_i,i∈N,l∈L。

5. the adaptive load distribution based deep neural network model cooperative inference method according to claim 4, wherein the step S4 specifically includes:

a_i＝λ_iH (12)

then, the following equations (1), (2) and (3) can be obtained:

λ_i≥0,i∈N (14)

(13) formula is equivalent to λ_iH≥p_i+1Or λ_i0; relaxing the constraint expressed by the formula (13) to be simplified into lambda_i≧ 0, the following problem P2 can be obtained:

P2:min.E^c+E^x

s.t.T≤D

λ_i≥0,i∈N

r_li≤m_i,i∈N,l∈L；

s42, solving a collaborative reasoning performance optimization problem: giving an initial available equipment set N, substituting the initial available equipment set N into a problem P2, and solving by using an existing linear programming solving tool to obtain a load distribution scheme pi ═ lambda [ lambda ]₁,λ₂,…,λ_N](ii) a Substituting pi into P1 verifies the feasibility of the solution: if pi is a feasible solution of P1, returning pi as a load distribution scheme aiming at the current input, otherwise, removing a zero element in pi and a device index corresponding to the minimum element except the zero element from N to obtain a new available device set N, and substituting N into the problem P2 to solve; repeating the steps described above until the solved pi is a feasible solution, or the input available equipment set N is an empty set; the input available device set N is an empty set, which means that the currently defined execution latency requirement is too strict to achieve an effective and feasible load distribution scheme.

6. The adaptive load distribution based deep neural network model cooperative inference method according to claim 5, characterized in that the linear programming solving tool comprises IBM CPLEX toolkit.

7. The adaptive load distribution based deep neural network model cooperative inference method according to claim 5, wherein the step S5 specifically includes:

8. The adaptive load distribution based deep neural network model cooperative inference method according to claim 7, wherein the terminal device comprises a main device and an auxiliary device.

9. The adaptive load distribution-based deep neural network model cooperative inference method according to claim 7, wherein the deep learning framework comprises TensorFlow.

10. The adaptive load distribution based deep neural network model cooperative inference method of claim 7, wherein the remote procedure call tool comprises a gPC.