CN115357402B - Intelligent edge optimization method and device - Google Patents

Intelligent edge optimization method and device Download PDF

Info

Publication number
CN115357402B
CN115357402B CN202211282973.XA CN202211282973A CN115357402B CN 115357402 B CN115357402 B CN 115357402B CN 202211282973 A CN202211282973 A CN 202211282973A CN 115357402 B CN115357402 B CN 115357402B
Authority
CN
China
Prior art keywords
model
training
edge
central
round
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211282973.XA
Other languages
Chinese (zh)
Other versions
CN115357402A (en
Inventor
詹玉峰
王家盛
齐天宇
翟弟华
张元�
吴楚格
夏元清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202211282973.XA priority Critical patent/CN115357402B/en
Publication of CN115357402A publication Critical patent/CN115357402A/en
Application granted granted Critical
Publication of CN115357402B publication Critical patent/CN115357402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/502Proximity

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an edge intelligent optimization method and device. According to the edge intelligent optimization method provided by the invention, the current round state of the environment is constructed based on model parameters, the number of rounds of training, communication time, idle CPU occupancy rate and training energy consumption, each edge device participates in federal training according to corresponding round number information in the current round state, acquires information such as local model parameters, communication time, idle CPU utilization rate and training energy consumption, and updates the current round state, so that the environment is transferred to the next state. The edge equipment continuously interacts with the environment, a large amount of track information is generated and used for updating the strategy model until the strategy model converges, different federal training rounds are distributed according to the calculation speed, the training energy consumption and the communication time of each equipment, and therefore the purposes of balancing calculation of isomerism and reduction of energy consumption overhead are achieved.

Description

Intelligent edge optimization method and device
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an edge intelligent optimization method and device based on deep reinforcement learning.
Background
Federal learning is a mechanism for model training to be jointly participated in by multiple parties, which is developed along with the development of artificial intelligence technology in the big data era. Because the user does not need to upload local data to the central server, the user only needs to train the models by using the respective local data under the coordination of the central server and uploads the trained models to the central server for aggregation, the control right of the user on the data is also ensured while a data island is broken, and the privacy protection effect is achieved, so that the traditional centralized training method can be replaced, and the wide application is realized.
Federal training also faces a number of practical problems: the first is the computing heterogeneity of the device, and the second is the limited resource budget of the edge device, such as energy consumption. The devices of which the user side participates in the federal training may be edge devices such as smart phones, computers, raspberry groups and even enterprise monitoring cameras, and the like, and the devices have obvious heterogeneity in the calculation speed, and due to the complexity of the actual use scene of the user, other programs may be operated in the foreground of the devices to occupy the calculation resources, so that the calculation power for background federal training is changed. The computational speed of the edge device is closely related to the performance of federal training, and selecting different edge devices to participate in federal training may result in significant differences in training time. According to the traditional method, the participating equipment is randomly selected from the edge end, so that the problem of falling behind is easily caused, the equipment with the slowest calculation speed restricts the aggregation time of each round of the federal model, and the process of federal training is greatly slowed down. Therefore, how to select participants of each round of federal training according to the calculation speed of the equipment and allocate the proper number of training rounds to the participants is the key for solving the calculation heterogeneous problem. Most devices on the edge side participating in federal training have limited network bandwidth and battery power. How to reduce budget expenses such as energy consumption and the like while ensuring the federal training precision is also an important research direction in federal learning. Conventional solutions assume that these devices are distributed near the communication base station and only participate in federal training when the power supply is switched on, which greatly limits the application scenarios of federal training. Therefore, how to give consideration to the training precision and the energy consumption overhead and save the cost of federal training is also the key for optimizing edge intelligence.
The data-driven modeling method is high in accuracy and calculation efficiency, the data-driven idea is applied to the field of edge intelligence, accumulated training data are analyzed by an effective method, relevant knowledge is extracted and used for guiding federal training, and the method is an important direction for researching edge intelligent optimization problems.
Deep reinforcement learning is an effective method for data-driven modeling, automatic interaction is carried out between a computer and the environment, strategies can be learned from past experiences, and the method is suitable for scenes in which mathematical models are difficult to establish. In recent years, thanks to rapidly increasing computing resources, reinforcement learning is sufficiently developed, and the reinforcement learning is successfully applied to the fields of robot walking control, cloud workflow scheduling, intelligent transportation and the like, and even has excellent performance far beyond human level on computer games.
The optimization problem of the edge intelligence is multi-constraint and multi-target, some works have been to apply deep reinforcement learning to the optimization of the edge intelligence at present, and the deep reinforcement learning has great potential. The work can be roughly divided into two categories, one category is optimized from the angle of computing heterogeneity, and equipment with higher computing speed is selected by utilizing reinforcement learning, so that the time of each round of federal training can be shortened, but the method usually needs great energy consumption expense; and in the other type, from the perspective of saving limited resources such as energy consumption, the energy-saving equipment participation scheme is selected by using reinforcement learning, so that the total budget expenditure can be reduced, but the problem of edge intelligence in computing heterogeneous is ignored, and long training time is often needed. At present, only a few leading-edge works comprehensively consider the problems of computing heterogeneity, energy consumption and the like, but a great improvement space is provided on the utilization rate of computing resources. Therefore, the method is designed to take account of both heterogeneous calculation and energy consumption overhead, and meanwhile, the calculation power of the edge equipment can be fully utilized, so that the Federal training performance is improved, and the method has important significance for optimizing the performance of edge intelligence.
Disclosure of Invention
The invention aims to provide an edge intelligent optimization method and device capable of considering both calculation of heterogeneous and energy consumption overhead, and further, the calculation power of edge equipment can be fully utilized, and the performance of federal training is improved.
In order to achieve the purpose, the invention provides the following scheme:
an edge intelligent optimization method comprises the following steps:
step 100: acquiring a central model and a strategy model, and appointing a global training parameter; the central model and the policy model are hosted in a central server; the global training parameters include: total number of edge devices, threshold time, batch size, and training rounds;
step 101: determining edge equipment participating in the current round of training based on the number of the training rounds to obtain a participating equipment set;
step 102: obtaining a local data sample;
step 103: the edge devices in the participating device set receive the central model and the training round number, and update parameters of a local model by the batch size by using the local data samples under the condition that the threshold time is met; the local model is implanted in the edge device;
step 104: collecting local information, and constructing the current round state of the environment based on the local information; the current round of states of the environment include: parameters of a local model, communication time, CPU utilization rate and training energy consumption;
step 105: updating the current round state of the environment, and aggregating the central model based on the parameters of the local model in the current round state of the updated environment and the local data samples to obtain an aggregated central model;
step 106: determining an accuracy of the aggregated central model;
step 107: determining a return value of the strategy model according to the accuracy of the aggregation central model, the communication time in the current state of the updated environment and the training energy consumption in the current state of the updated environment;
step 108: generating a normal distribution for each edge device participating in the training of the current round by using the strategy model according to the updated current round state of the environment;
step 109: sampling the normal distribution to obtain new training round number distribution information, and returning to the step 103 until the threshold time is exceeded, and obtaining decision trajectory information; the decision trajectory information comprises a plurality of decision trajectories; each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds;
step 110: and updating the strategy model by using the decision track information, and returning to the execution step 100 to obtain the optimization model of the federal training until the updated strategy model converges to the optimal solution.
Preferably, the determining, based on the number of training rounds, the edge devices participating in the current round of training to obtain a participating device set specifically includes:
distributing corresponding training rounds to the edge equipment based on the training rounds;
when the number of training rounds allocated to the edge device is 0, the edge device does not participate in the training round; when the number of training rounds distributed to the edge equipment is not 0, the edge equipment participates in the training of the current round according to the distributed number of training rounds;
and acquiring edge devices participating in the current round of training to generate the participating device set.
Preferably, after obtaining the central model and the policy model, the method further comprises: and initializing the central model and the strategy model.
Preferably, the determining the accuracy of the aggregated central model specifically includes:
acquiring a test set;
determining the accuracy of the aggregated central model using a test set.
Preferably, the aggregate central model is:
Figure 952126DEST_PATH_IMAGE001
in the formula (I), the compound is shown in the specification,
Figure 258211DEST_PATH_IMAGE002
is as followstThe aggregate central model of round +1,
Figure 743551DEST_PATH_IMAGE003
is as followsiThe data samples of the individual edge devices are,
Figure 303845DEST_PATH_IMAGE004
is as followsiThe number of data samples of each edge device, D is the sum of the number of data samples of all edge devices,
Figure 593268DEST_PATH_IMAGE005
n denotes the total number of edge devices,
Figure 871803DEST_PATH_IMAGE006
is the t-th wheeliThe parameters of the local model of the individual edge devices,Q t the number of edge devices in the participating device set for the t-th round.
Preferably, the return value of the policy model is:
Figure 301778DEST_PATH_IMAGE007
in the formula (I), the compound is shown in the specification,
Figure 513317DEST_PATH_IMAGE008
for the return value of the t-th round policy model,
Figure 501870DEST_PATH_IMAGE009
aggregating the accuracy of the central model for the t-th round,
Figure 2122DEST_PATH_IMAGE010
to aggregate the accuracy of the central model for round t-1,
Figure 157160DEST_PATH_IMAGE011
is the t-th wheeliThe communication time of the individual edge devices,
Figure 505095DEST_PATH_IMAGE012
is the t-th wheeliThe training energy consumption of each edge device,
Figure 661620DEST_PATH_IMAGE013
is a first weight coefficient of the first weight coefficient,
Figure 445906DEST_PATH_IMAGE014
is a second weight coefficient, and is,
Figure 952105DEST_PATH_IMAGE015
is a third weight coefficient, and is,Q t the number of edge devices in the participating device set for the t-th round.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the edge intelligent optimization method provided by the invention, the local round state of the environment is constructed based on model parameters, the number of rounds of training, communication time, idle CPU occupancy rate and training energy consumption, each edge device participates in federal training according to the corresponding round number information in the local round state, acquires the information of local model parameters, communication time, idle CPU utilization rate, training energy consumption and the like, and updates the local round state, so that the environment is transferred to the next state. The edge equipment continuously interacts with the environment, a large amount of track information is generated and used for updating the strategy model until the strategy model converges, different federal training rounds are distributed according to the calculation speed, the training energy consumption and the communication time of each equipment, and therefore the purposes of balancing calculation of isomerism and reduction of energy consumption overhead are achieved.
The invention also provides an edge intelligent optimization device, which comprises: a central server and an edge device;
the central server and the edge equipment perform information interaction;
a central model and a strategy model are implanted into the central server; the central server is used for appointing global training parameters, determining edge equipment participating in the current round of training based on the number of training rounds, and obtaining a participating equipment set; the global training parameters include: the total number of edge devices, threshold time, batch size, and training round number;
a local model is implanted in the edge device; the edge devices in the participating device set receive the central model and the training round number in the central server, and update parameters of a local model in the batch size by using local data samples under the condition that the threshold time is met;
the central server is used for acquiring local information and constructing the current state of the environment based on the local information; the current round of states of the environment include: parameters of a local model, communication time, CPU utilization rate and training energy consumption;
the central server is used for updating the current state of the environment and aggregating the central model based on the parameters of the local model in the current state of the updated environment and the local data samples to obtain an aggregated central model;
the central server is used for acquiring a test set and determining the precision of the aggregation central model by adopting the test set;
the central server is used for determining a return value of the strategy model according to the precision of the aggregation central model, the communication time in the current state of the environment after updating and the training energy consumption in the current state of the environment after updating;
the central server is used for generating a normal distribution for each edge device participating in the current training by utilizing the strategy model according to the updated current state of the environment;
the central server is used for sampling the normal distribution to obtain new training round number distribution information and sending the obtained new training round number distribution information to the edge equipment in the participating equipment set, and after the edge equipment in the participating equipment set receives the central model and the new training round number, parameters of a local model are updated by the local data samples in batch size under the condition of meeting the threshold time until the threshold time is exceeded, and decision trajectory information is obtained; the decision track information comprises a plurality of decision tracks; each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds;
and the central server is used for updating the strategy model by using the decision track information, training the updated strategy model as a new strategy model, and obtaining an optimized model of federal training until the updated strategy model converges to an optimal solution.
Preferably, the edge device is a raspberry pi, a smartphone, a computer, or a surveillance camera.
Since the technical effect achieved by the edge intelligent optimization device provided by the invention is the same as that achieved by the edge intelligent optimization method provided by the invention, the details are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a diagram illustrating the steps of an edge intelligent optimization method provided by the present invention;
fig. 2 is an implementation schematic diagram of the edge intelligent optimization device provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide an edge intelligent optimization method and device capable of considering both calculation of heterogeneous and energy consumption overhead, and further, the calculation power of edge equipment can be fully utilized, and the performance of federal training is improved.
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.
As shown in fig. 1, the edge intelligent optimization method provided by the present invention includes:
step 100: and acquiring a central model and a strategy model, and appointing a global training parameter. The central model and the policy model are hosted in a central server. The global training parameters include: total number of edge devices, threshold time, batch size, and training rounds.
Step 101: and determining the edge equipment participating in the current round of training based on the number of the training rounds to obtain a participating equipment set. Specifically, the method comprises the following steps:
and distributing corresponding training round numbers for the edge equipment based on the training round numbers.
When the number of training rounds allocated to an edge device is 0, the edge device does not participate in the training round. When the number of training rounds allocated to the edge device is not 0, the edge device participates in the training round according to the number of the allocated training rounds.
And acquiring the edge devices participating in the current round of training to generate the participating device set.
Step 102: local data samples are obtained.
Step 103: and the edge devices in the participating device set receive the central model and the training round number, and update the parameters of the local model by the batch size by using the local data samples under the condition that the threshold time is met. The local model is implanted in the edge device.
Step 104: local information is collected, and the current round state of the environment is constructed based on the local information. The current round of states of the environment include: parameters of the local model, communication time, CPU utilization, and training energy consumption.
Step 105: and updating the current round state of the environment, and aggregating the central model based on the parameters of the local model in the current round state of the updated environment and the local data samples to obtain an aggregated central model. Wherein the aggregate central model is:
Figure 918661DEST_PATH_IMAGE001
in the formula (I), the compound is shown in the specification,
Figure 734171DEST_PATH_IMAGE002
the aggregated central model for round t +1,
Figure 959747DEST_PATH_IMAGE003
is as followsiThe data samples of the individual edge devices are,
Figure 784483DEST_PATH_IMAGE004
is as followsiThe number of data samples of each edge device, D is the sum of the number of data samples of all edge devices,
Figure 874055DEST_PATH_IMAGE005
and N represents the total number of edge devices,
Figure 876778DEST_PATH_IMAGE006
is the t-th wheeliParameters of a local model of the edge device.
Step 106: determining an accuracy of the aggregated central model. Specifically, the method comprises the following steps:
and acquiring a test set.
Determining the accuracy of the aggregated central model using a test set.
Step 107: determining a return value of the policy model according to the accuracy of the aggregated central model, the updated communication time in the current state of the environment, and the updated training energy consumption in the current state of the environment. Wherein, the return value of the strategy model is as follows:
Figure 245442DEST_PATH_IMAGE007
in the formula (I), the compound is shown in the specification,
Figure 716612DEST_PATH_IMAGE008
is the t-th wheelThe value of the return of the policy model,
Figure 487122DEST_PATH_IMAGE009
aggregating the accuracy of the central model for the t-th round,
Figure 706751DEST_PATH_IMAGE010
to aggregate the accuracy of the central model for round t-1,
Figure 641340DEST_PATH_IMAGE011
is the t-th wheeliThe communication time of the individual edge devices,
Figure 745562DEST_PATH_IMAGE012
is the t-th wheeliThe training energy consumption of each edge device,
Figure 432896DEST_PATH_IMAGE013
is a first weight coefficient of the first weight coefficient,
Figure 358780DEST_PATH_IMAGE014
is a second weight coefficient, and is,
Figure 764354DEST_PATH_IMAGE015
is a third weight coefficient, and is,Q t the number of edge devices in the participating device set for the t-th round.
Step 108: and generating a normal distribution for each edge device participating in the training of the current round by using the strategy model according to the updated current round state of the environment.
Step 109: and sampling the normal distribution to obtain new training round number distribution information, and returning to the step 103 until the threshold time is exceeded, and obtaining decision trajectory information. The decision trajectory information comprises a plurality of decision trajectories. Each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds.
Step 110: and updating the strategy model by using the decision track information, and returning to the step 100 until the updated strategy model converges to the optimal solution, thereby obtaining the optimized model of the federal training.
In order to further improve the training accuracy, after the central model and the strategic model are obtained in step 100, the method for edge intelligent optimization provided by the invention further comprises: and initializing the central model and the strategy model.
The present invention also provides an edge intelligent optimization device, as shown in fig. 2, the device includes: a central server and edge devices.
And the central server performs information interaction with the edge equipment.
And a central model and a strategy model are implanted in the central server. And the central server is used for appointing global training parameters, determining the edge equipment participating in the current round of training based on the number of the training rounds, and obtaining a participating equipment set. The global training parameters include: total number of edge devices, threshold time, batch size, and training rounds.
The edge device has a local model embedded therein. And the edge devices in the participating device set receive the central model and the training round number in the central server, and update the parameters of the local model by the batch size by using the local data samples under the condition of meeting the threshold time.
The central server is used for collecting local information and constructing the current round state of the environment based on the local information. The current round of states of the environment include: parameters of the local model, communication time, CPU utilization, and training energy consumption.
The central server is used for updating the current state of the environment, and aggregating the central model based on the parameters of the local model in the current state of the updated environment and the local data samples to obtain an aggregated central model.
The central server is used for obtaining the test set and determining the accuracy of the aggregation central model by adopting the test set.
The central server is used for determining a return value of the strategy model according to the accuracy of the aggregation central model, the updated communication time in the current state of the environment and the updated training energy consumption in the current state of the environment.
And the central server is used for generating a normal distribution for each edge device participating in the current training by using the strategy model according to the updated current state of the environment.
The central server is configured to sample the normal distribution to obtain new training round number distribution information, and send the obtained new training round number distribution information to the edge device in the participating device set, where after the edge device in the participating device set receives the central model and the new training round number, the edge device updates parameters of a local model in the batch size by using the local data sample under the condition that the threshold time is met, and obtains decision trajectory information until the threshold time is exceeded. The decision trajectory information includes a plurality of decision trajectories. Each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds.
And the central server is used for updating the strategy model by using the decision track information, training the updated strategy model as a new strategy model, and obtaining an optimized model of federal training until the updated strategy model converges to an optimal solution.
The adopted edge equipment can be a raspberry pi, a smart phone, a computer or a monitoring camera.
The following describes a specific implementation process of the above-mentioned edge intelligent optimization method and apparatus by taking a raspberry pi as an edge device as an example.
As shown in fig. 2, the edge intelligent optimization apparatus provided in this embodiment is divided into two parts, one part is a central server located on the left side in fig. 2 and is served by a desktop computer, and the other part is an edge device on the right side and is composed of a plurality of raspberry groups, and the representation meaning of each symbol in fig. 2 is as follows:
Ntotal number of edge devices (e.g., raspberry pies) for federal learning.BLot size used for federal training.
Figure 672267DEST_PATH_IMAGE016
Is the threshold time.EThe vectors formed by training rounds are distributed for different raspberries to meet the requirement
Figure 230418DEST_PATH_IMAGE017
Wherein
Figure 260691DEST_PATH_IMAGE018
Denotes the firstiTraining round number information of each raspberry pie, wherein the value of the training round number information is not more than a threshold valueMIs a natural number of (1).WRepresenting a model parameter matrix, satisfy
Figure 668408DEST_PATH_IMAGE019
In which
Figure 442329DEST_PATH_IMAGE020
Denotes the firstiModel parameters for individual raspberry pies.
Figure 120566DEST_PATH_IMAGE021
Represents a communication time vector, satisfies
Figure 118478DEST_PATH_IMAGE022
In which
Figure 16420DEST_PATH_IMAGE023
Is shown asiThe time taken for each raspberry to communicate, including the sum of the up and down times.UA vector formed by the utilization rate of the CPU in idle time is defined as
Figure 328453DEST_PATH_IMAGE024
Wherein
Figure 126775DEST_PATH_IMAGE025
Is shown asiCPU utilization (idle utilization) of individual raspberry groups when not participating in federal training.PTo train energy consumption vectors, satisfy
Figure 233272DEST_PATH_IMAGE026
Wherein
Figure 615580DEST_PATH_IMAGE027
Denotes the firstiThe training total energy consumption of the raspberry pie comprises calculation energy consumption and communication energy consumption.vRepresenting the test accuracy of the central model on the test set. In addition, in order to represent information between different numbers of rounds, subscripts are introducedtTo distinguish, e.g.
Figure 403408DEST_PATH_IMAGE028
Respectively representtModel parameter matrix of the wheel, firsttWheel firstiEnergy consumption of a raspberry pietAccuracy of the wheel center model.
The basic idea of the embodiment is as follows: and (3) constructing a reinforcement learning model at the central server end, constructing a deep reinforcement learning environment at the edge equipment end, and continuously interacting the model and the environment to learn an optimal training round number distribution scheme. Specifically, the parameters of the model collected in the round of dispatching the raspberry
Figure 39926DEST_PATH_IMAGE029
Number of rounds of training
Figure 130372DEST_PATH_IMAGE030
Communication time
Figure 219551DEST_PATH_IMAGE031
And idle CPU occupancy rate
Figure 545490DEST_PATH_IMAGE032
And training energy consumption
Figure 814011DEST_PATH_IMAGE033
Current round state modeled as an environment
Figure 527889DEST_PATH_IMAGE034
I.e. by
Figure 120675DEST_PATH_IMAGE035
. The number of training rounds allocated to the device is defined as the actions of the raspberry pie
Figure 374939DEST_PATH_IMAGE036
. Precision of two adjacent wheel central model
Figure 969737DEST_PATH_IMAGE037
Local communication time
Figure 792200DEST_PATH_IMAGE038
And communication energy consumption
Figure 855971DEST_PATH_IMAGE039
Is used to construct a merit function (i.e., a return value) that is fed back to the raspberry pie
Figure 664658DEST_PATH_IMAGE040
Satisfy the following requirements
Figure 864695DEST_PATH_IMAGE041
. Strategy model of raspberry pie
Figure 858059DEST_PATH_IMAGE042
State information
Figure 661323DEST_PATH_IMAGE043
As input, output is number of training rounds
Figure 460652DEST_PATH_IMAGE044
. Each raspberry group will be based on
Figure 593824DEST_PATH_IMAGE045
The corresponding round number information in (1) participates in the federal training and collects the local model parameters
Figure 85986DEST_PATH_IMAGE046
Communication time
Figure 576879DEST_PATH_IMAGE047
Idle CPU utilization
Figure 586423DEST_PATH_IMAGE048
And training energyConsumption unit
Figure 538549DEST_PATH_IMAGE049
The information is uploaded to a central server, and model parameters are updated
Figure 404873DEST_PATH_IMAGE050
Communication time
Figure 743582DEST_PATH_IMAGE051
Energy consumption of communication
Figure 884713DEST_PATH_IMAGE052
And idle CPU occupancy
Figure 694275DEST_PATH_IMAGE053
Make the environment shift to the next state
Figure 997081DEST_PATH_IMAGE054
. The raspberry pie continuously interacts with the environment to generate a large amount of track information
Figure 495189DEST_PATH_IMAGE055
For policy models
Figure 551263DEST_PATH_IMAGE056
Until the policy model is updated
Figure 575851DEST_PATH_IMAGE057
And (6) converging.
The optimization method provided by the embodiment specifically includes the following steps:
step 1, initializing a central model
Figure 95563DEST_PATH_IMAGE058
And a policy model
Figure 408864DEST_PATH_IMAGE059
Specifying Total number of Raspberry pies for Federal learning of Global training parametersNBatch size for federal trainingBTime of threshold
Figure 891798DEST_PATH_IMAGE060
Vector formed by training rounds with different raspberry groups
Figure 735339DEST_PATH_IMAGE061
Step 2, according to the training round number vector
Figure 458576DEST_PATH_IMAGE061
Allocating corresponding training round number for each raspberry group, and if the allocated training round number is
Figure 164232DEST_PATH_IMAGE062
Then it is firstiThe raspberry pie participates in the training round and carries out
Figure 60644DEST_PATH_IMAGE063
Iteration of rounds, if number of training rounds assigned
Figure 967814DEST_PATH_IMAGE064
Then represents the firstiThe raspberry pies do not participate in the federate training round, so that the participating device sets of the round can be determined
Figure 314481DEST_PATH_IMAGE065
Step 3, participating in the equipment set in the training process of the t round
Figure 540057DEST_PATH_IMAGE065
Middle raspberry pi receiving central model
Figure 364794DEST_PATH_IMAGE066
And number of rounds information
Figure 857961DEST_PATH_IMAGE067
At the time of satisfying the threshold
Figure 109951DEST_PATH_IMAGE068
Under the condition of (1), using the local dataSample(s)
Figure 619561DEST_PATH_IMAGE069
In batch sizeBUpdating local models
Figure 982409DEST_PATH_IMAGE070
And collecting local information
Figure 752919DEST_PATH_IMAGE071
And uploading to a central server, and updating the local model by using the formula (1).
Figure 711165DEST_PATH_IMAGE072
(1)
Wherein the content of the first and second substances,
Figure 239229DEST_PATH_IMAGE073
the number of samples, which are sampled samples of the local data set,
Figure 717353DEST_PATH_IMAGE074
are the parameters of the local model and are,
Figure 404686DEST_PATH_IMAGE075
for the value of the loss function for that sample,
Figure 608265DEST_PATH_IMAGE076
for learning rate, B =1,2.
Step 4, the central server receives the information uploaded by the raspberry group and updates
Figure 797195DEST_PATH_IMAGE077
And obtaining an aggregate central model by aggregating the central models by using the formula (2)
Figure 767425DEST_PATH_IMAGE078
Evaluating aggregated central models on a test set
Figure 512527DEST_PATH_IMAGE079
Extract of (1)Degree of rotation
Figure 27953DEST_PATH_IMAGE080
And calculating a return value according to the formula (3)
Figure 983139DEST_PATH_IMAGE081
For evaluating a policy model
Figure 475169DEST_PATH_IMAGE082
Good or bad.
Figure 340357DEST_PATH_IMAGE083
(2)
Figure 275952DEST_PATH_IMAGE084
(3)
Wherein, the first and the second end of the pipe are connected with each other,
Figure 469167DEST_PATH_IMAGE085
is shown asiA data sample is sent to each raspberry
Figure 187725DEST_PATH_IMAGE086
The number of the (c) component(s),
Figure 969736DEST_PATH_IMAGE087
representing the total number of samples on all raspberry legs,
Figure 384887DEST_PATH_IMAGE088
are all weight coefficients.
Step 5, reinforcement learning raspberry group according to the state
Figure 517928DEST_PATH_IMAGE089
Using a policy model
Figure 40176DEST_PATH_IMAGE090
Generating a normal distribution for each device, and generating new round number distribution information by sampling each normal distribution
Figure 427426DEST_PATH_IMAGE091
Repeating the step 2~5 for a plurality of times until a time threshold is exceeded
Figure 501562DEST_PATH_IMAGE092
Saving decision trajectories of raspberry pies
Figure 528423DEST_PATH_IMAGE093
Step 6, the raspberry pi utilizes a plurality of pieces of track information according to the algorithm in the formula (4)
Figure 431526DEST_PATH_IMAGE094
Updating a policy model
Figure 984867DEST_PATH_IMAGE095
Figure 183899DEST_PATH_IMAGE096
(4)
Figure 698057DEST_PATH_IMAGE097
Figure 952320DEST_PATH_IMAGE098
Wherein, the first and the second end of the pipe are connected with each other,
Figure 969955DEST_PATH_IMAGE099
representing updated policy models
Figure 106932DEST_PATH_IMAGE100
Is determined by the parameters of (a) and (b),
Figure 436282DEST_PATH_IMAGE101
representing a policy model
Figure 713810DEST_PATH_IMAGE102
The parameter(s) of (a) is,
Figure 117110DEST_PATH_IMAGE103
respectively representing the length and the number of tracks,l=1,2,...,Lm=1,2,...,n,
Figure 172791DEST_PATH_IMAGE104
a discount factor is indicated in the form of a discount factor,xis shown astThe length of the track of the wheel is,
Figure 769863DEST_PATH_IMAGE105
respectively representjOn the strip tracktThe status, actions and rewards of the wheels,
Figure 569192DEST_PATH_IMAGE106
representing corresponding cumulative discount returns, baseline
Figure 826998DEST_PATH_IMAGE107
Is shown astWhile in turnjThe average discount return for the bar track is,
Figure 335471DEST_PATH_IMAGE108
which represents the operation of the assignment of the value,
Figure 373834DEST_PATH_IMAGE109
is a gradient operator.
And repeating all the steps until the strategy model of the raspberry pi converges to the optimal solution, and obtaining the optimized model of the federal training.
Based on the above description, compared with the prior art, the edge intelligent optimization method and apparatus provided by the present invention further have the following advantages:
1. the invention solves the optimization problem of multiple targets and constraints by using a deep reinforcement learning method. The deep reinforcement learning can automatically interact with the edge intelligence, can automatically learn and generate an optimal scheme, does not need a complex mathematical modeling process, and provides a new idea and a new way for optimizing the federal training process.
2. According to the invention, different training rounds are distributed to the equipment with different calculation speeds, the calculation heterogeneous problems among the equipment are skillfully balanced, the calculation power of the equipment can be fully utilized, the training speed of the global model is improved, and a new attempt is made for federal learning and deployment in a practical environment.
3. The method can save the energy consumption expense of the edge equipment without influencing the training speed and precision of the model, can improve the economic benefit of the federal training and ensure the sustainability of the federal training, thereby further meeting the requirements of intelligent multi-target optimization of the edge.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. An edge intelligent optimization method is characterized by comprising the following steps:
step 100: acquiring a central model and a strategy model, and specifying a global training parameter; the central model and the policy model are hosted in a central server; the global training parameters include: total number of edge devices, threshold time, batch size, and training rounds;
step 101: determining edge equipment participating in the current round of training based on the number of the training rounds to obtain a participating equipment set;
step 102: obtaining a local data sample;
step 103: the edge devices in the participating device set receive the central model and the training round number, and update parameters of a local model by the batch size by using the local data samples under the condition that the threshold time is met; the local model is implanted in the edge device;
step 104: collecting local information, and constructing the current round state of the environment based on the local information; the current round of states of the environment include: parameters of a local model, communication time, CPU utilization rate and training energy consumption;
step 105: updating the current round state of the environment, and aggregating the central model based on the parameters of the local model in the current round state of the updated environment and the local data samples to obtain an aggregated central model;
step 106: determining an accuracy of the aggregated central model;
step 107: determining a return value of the strategy model according to the accuracy of the aggregation central model, the communication time in the current state of the updated environment and the training energy consumption in the current state of the updated environment;
step 108: generating a normal distribution for each edge device participating in the training of the current round by using the strategy model according to the updated current round state of the environment;
step 109: sampling the normal distribution to obtain new training round number distribution information, and returning to the step 103 until the threshold time is exceeded, and obtaining decision trajectory information; the decision track information comprises a plurality of decision tracks; each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds;
step 110: and updating the strategy model by using the decision track information, and returning to the step 100 until the updated strategy model converges to the optimal solution, thereby obtaining the optimized model of the federal training.
2. The edge intelligent optimization method according to claim 1, wherein the determining the edge devices participating in the current round of training based on the number of training rounds to obtain a participating device set specifically includes:
distributing corresponding training rounds to the edge equipment based on the training rounds;
when the number of training rounds allocated to the edge device is 0, the edge device does not participate in the training round; when the number of training rounds distributed to the edge equipment is not 0, the edge equipment participates in the training of the current round according to the distributed number of training rounds;
and acquiring edge devices participating in the current round of training to generate the participating device set.
3. The edge intelligent optimization method according to claim 1, further comprising, after obtaining the central model and the strategic model: and initializing the central model and the strategy model.
4. The edge intelligent optimization method according to claim 1, wherein the determining the accuracy of the aggregated central model specifically comprises:
acquiring a test set;
determining the accuracy of the aggregated central model using a test set.
5. The edge intelligent optimization method according to claim 1, wherein the aggregate central model is:
Figure 866235DEST_PATH_IMAGE001
in the formula (I), the compound is shown in the specification,
Figure 40864DEST_PATH_IMAGE002
is a firsttThe aggregate central model of round +1,
Figure 907189DEST_PATH_IMAGE003
is a firstiThe data samples of the individual edge devices are,
Figure 652422DEST_PATH_IMAGE004
is as followsiThe number of data samples of each edge device, D is the data samples of all edge devicesThe sum of the amounts of (a) and (b),
Figure 262395DEST_PATH_IMAGE005
and N represents the total number of edge devices,
Figure 291531DEST_PATH_IMAGE006
is the t-th wheeliThe parameters of the local model of the individual edge devices,Q t the number of edge devices in the participating device set for the t-th round.
6. The edge intelligent optimization method according to claim 1, wherein the return values of the policy model are:
Figure 640342DEST_PATH_IMAGE007
in the formula (I), the compound is shown in the specification,
Figure 122139DEST_PATH_IMAGE008
for the return value of the t-th round policy model,
Figure 535802DEST_PATH_IMAGE009
the accuracy of the central model is aggregated for the t-th round,
Figure 153866DEST_PATH_IMAGE010
to aggregate the accuracy of the central model for round t-1,
Figure 378305DEST_PATH_IMAGE011
is the t-th wheeliThe communication time of each of the edge devices,
Figure 347398DEST_PATH_IMAGE012
is the t-th wheeliThe training energy consumption of each edge device,
Figure 299173DEST_PATH_IMAGE013
is a first weight systemThe number of the first and second groups is,
Figure 369414DEST_PATH_IMAGE014
is a second weight coefficient, and is,
Figure 14022DEST_PATH_IMAGE015
is a third weight coefficient, and is,Q t the number of edge devices in the participating device set for the t-th round.
7. An edge intelligence optimization device, comprising: a central server and an edge device;
the central server and the edge equipment perform information interaction;
a central model and a strategy model are implanted into the central server; the central server is used for appointing global training parameters, determining edge equipment participating in the current round of training based on the number of training rounds, and obtaining a participating equipment set; the global training parameters include: the total number of edge devices, threshold time, batch size, and training round number;
a local model is implanted in the edge device; the edge devices in the participating device set receive the central model and the number of training rounds in the central server, and update parameters of the local model in the batch size by using local data samples under the condition that the threshold time is met;
the central server is used for collecting local information and constructing the current round state of the environment based on the local information; the current round of states of the environment include: parameters of a local model, communication time, CPU utilization rate and training energy consumption;
the central server is used for updating the current state of the environment and aggregating the central model based on the parameters of the local model in the current state of the updated environment and the local data samples to obtain an aggregated central model;
the central server is used for acquiring a test set and determining the precision of the aggregation central model by adopting the test set;
the central server is used for determining a return value of the strategy model according to the precision of the aggregation central model, the communication time in the current state of the environment after updating and the training energy consumption in the current state of the environment after updating;
the central server is used for generating a normal distribution for each edge device participating in the current training by utilizing the strategy model according to the updated current state of the environment;
the central server is used for sampling the normal distribution to obtain new training round number distribution information and sending the obtained new training round number distribution information to the edge equipment in the participating equipment set, and after the edge equipment in the participating equipment set receives the central model and the new training round number, parameters of a local model are updated by the local data samples in batch size under the condition of meeting the threshold time until the threshold time is exceeded, and decision trajectory information is obtained; the decision track information comprises a plurality of decision tracks; each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds;
and the central server is used for updating the strategy model by using the decision track information, training the updated strategy model as a new strategy model, and obtaining an optimized model of federal training until the updated strategy model converges to an optimal solution.
8. The intelligent edge optimization device of claim 7, wherein the edge device is a raspberry pi, a smartphone, a computer, or a surveillance camera.
CN202211282973.XA 2022-10-20 2022-10-20 Intelligent edge optimization method and device Active CN115357402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211282973.XA CN115357402B (en) 2022-10-20 2022-10-20 Intelligent edge optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211282973.XA CN115357402B (en) 2022-10-20 2022-10-20 Intelligent edge optimization method and device

Publications (2)

Publication Number Publication Date
CN115357402A CN115357402A (en) 2022-11-18
CN115357402B true CN115357402B (en) 2023-01-24

Family

ID=84008718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211282973.XA Active CN115357402B (en) 2022-10-20 2022-10-20 Intelligent edge optimization method and device

Country Status (1)

Country Link
CN (1) CN115357402B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887748A (en) * 2021-12-07 2022-01-04 浙江师范大学 Online federal learning task allocation method and device, and federal learning method and system
CN114168328A (en) * 2021-12-06 2022-03-11 北京邮电大学 Mobile edge node calculation task scheduling method and system based on federal learning
CN114528304A (en) * 2022-02-18 2022-05-24 安徽工业大学 Federal learning method, system and storage medium for updating self-adaptive client parameters
CN114546608A (en) * 2022-01-06 2022-05-27 上海交通大学 Task scheduling method based on edge calculation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168328A (en) * 2021-12-06 2022-03-11 北京邮电大学 Mobile edge node calculation task scheduling method and system based on federal learning
CN113887748A (en) * 2021-12-07 2022-01-04 浙江师范大学 Online federal learning task allocation method and device, and federal learning method and system
CN114546608A (en) * 2022-01-06 2022-05-27 上海交通大学 Task scheduling method based on edge calculation
CN114528304A (en) * 2022-02-18 2022-05-24 安徽工业大学 Federal learning method, system and storage medium for updating self-adaptive client parameters

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"DRL + FL": An intelligent resource allocation model based on deep reinforcement learning for Mobile Edge Computing;Nanliang Shan 等;《Computer Communications》;20200528;全文 *
Allo: Optimizing Federated Learning via Guided Epoch Allocation;Jiasheng Wang 等;《State Intellectual Property Office of China》;20220727;全文 *
Experience-Driven Computational Resource Allocation of Federated Learning by Deep Reinforcement Learning;Yufeng Zhan 等;《2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)》;20201231;全文 *
一种面向边缘计算的高效异步联邦学习机制;芦效峰 等;《计算机研究与发展》;20201231;全文 *

Also Published As

Publication number Publication date
CN115357402A (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN113191484B (en) Federal learning client intelligent selection method and system based on deep reinforcement learning
CN113435472A (en) Vehicle-mounted computing power network user demand prediction method, system, device and medium
Tun et al. Federated learning based energy demand prediction with clustered aggregation
CN113467952A (en) Distributed federated learning collaborative computing method and system
CN112650581A (en) Cloud-side cooperative task scheduling method for intelligent building
CN113781002B (en) Low-cost workflow application migration method based on agent model and multiple group optimization in cloud edge cooperative network
CN114585006B (en) Edge computing task unloading and resource allocation method based on deep learning
CN112381113A (en) HK model-based industrial internet big data collaborative decision-making method
CN116489226A (en) Online resource scheduling method for guaranteeing service quality
Cui et al. Multi-Agent Reinforcement Learning Based Cooperative Multitype Task Offloading Strategy for Internet of Vehicles in B5G/6G Network
CN115357402B (en) Intelligent edge optimization method and device
CN117436485A (en) Multi-exit point end-edge-cloud cooperative system and method based on trade-off time delay and precision
Lou et al. Cooperation emergence of manufacturing services in cloud manufacturing with agent-based modeling and simulating
CN115118591B (en) Cluster federation learning method based on alliance game
CN115883371A (en) Virtual network function placement method based on learning optimization method in edge-cloud collaborative system
Mays et al. Decentralized data allocation via local benchmarking for parallelized mobile edge learning
CN114022731A (en) Federal learning node selection method based on DRL
Sang et al. RALaaS: Resource-aware learning-as-a-service in edge-cloud collaborative smart connected communities
CN117539640B (en) Heterogeneous reasoning task-oriented side-end cooperative system and resource allocation method
Ma Multi-Task Offloading via Graph Neural Networks in Heterogeneous Multi-access Edge Computing
Chen et al. Container cluster placement in edge computing based on reinforcement learning incorporating graph convolutional networks scheme
CN117687762B (en) Multi-data center cooperative scheduling method and system considering privacy constraint
CN117541025B (en) Edge calculation method for intensive transmission line inspection
Lu et al. Resource Allocation Method of Industrial Terminal Edge Computing Based on Reinforcement Learning Algorithm
Lajeunesse et al. A Cooperative Optimal Mining Model for Bitcoin

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant