CN115357402B - Intelligent edge optimization method and device - Google Patents
Intelligent edge optimization method and device Download PDFInfo
- Publication number
- CN115357402B CN115357402B CN202211282973.XA CN202211282973A CN115357402B CN 115357402 B CN115357402 B CN 115357402B CN 202211282973 A CN202211282973 A CN 202211282973A CN 115357402 B CN115357402 B CN 115357402B
- Authority
- CN
- China
- Prior art keywords
- model
- training
- edge
- central
- round
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5094—Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/10—Interfaces, programming languages or software development kits, e.g. for simulating neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/502—Proximity
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to an edge intelligent optimization method and device. According to the edge intelligent optimization method provided by the invention, the current round state of the environment is constructed based on model parameters, the number of rounds of training, communication time, idle CPU occupancy rate and training energy consumption, each edge device participates in federal training according to corresponding round number information in the current round state, acquires information such as local model parameters, communication time, idle CPU utilization rate and training energy consumption, and updates the current round state, so that the environment is transferred to the next state. The edge equipment continuously interacts with the environment, a large amount of track information is generated and used for updating the strategy model until the strategy model converges, different federal training rounds are distributed according to the calculation speed, the training energy consumption and the communication time of each equipment, and therefore the purposes of balancing calculation of isomerism and reduction of energy consumption overhead are achieved.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an edge intelligent optimization method and device based on deep reinforcement learning.
Background
Federal learning is a mechanism for model training to be jointly participated in by multiple parties, which is developed along with the development of artificial intelligence technology in the big data era. Because the user does not need to upload local data to the central server, the user only needs to train the models by using the respective local data under the coordination of the central server and uploads the trained models to the central server for aggregation, the control right of the user on the data is also ensured while a data island is broken, and the privacy protection effect is achieved, so that the traditional centralized training method can be replaced, and the wide application is realized.
Federal training also faces a number of practical problems: the first is the computing heterogeneity of the device, and the second is the limited resource budget of the edge device, such as energy consumption. The devices of which the user side participates in the federal training may be edge devices such as smart phones, computers, raspberry groups and even enterprise monitoring cameras, and the like, and the devices have obvious heterogeneity in the calculation speed, and due to the complexity of the actual use scene of the user, other programs may be operated in the foreground of the devices to occupy the calculation resources, so that the calculation power for background federal training is changed. The computational speed of the edge device is closely related to the performance of federal training, and selecting different edge devices to participate in federal training may result in significant differences in training time. According to the traditional method, the participating equipment is randomly selected from the edge end, so that the problem of falling behind is easily caused, the equipment with the slowest calculation speed restricts the aggregation time of each round of the federal model, and the process of federal training is greatly slowed down. Therefore, how to select participants of each round of federal training according to the calculation speed of the equipment and allocate the proper number of training rounds to the participants is the key for solving the calculation heterogeneous problem. Most devices on the edge side participating in federal training have limited network bandwidth and battery power. How to reduce budget expenses such as energy consumption and the like while ensuring the federal training precision is also an important research direction in federal learning. Conventional solutions assume that these devices are distributed near the communication base station and only participate in federal training when the power supply is switched on, which greatly limits the application scenarios of federal training. Therefore, how to give consideration to the training precision and the energy consumption overhead and save the cost of federal training is also the key for optimizing edge intelligence.
The data-driven modeling method is high in accuracy and calculation efficiency, the data-driven idea is applied to the field of edge intelligence, accumulated training data are analyzed by an effective method, relevant knowledge is extracted and used for guiding federal training, and the method is an important direction for researching edge intelligent optimization problems.
Deep reinforcement learning is an effective method for data-driven modeling, automatic interaction is carried out between a computer and the environment, strategies can be learned from past experiences, and the method is suitable for scenes in which mathematical models are difficult to establish. In recent years, thanks to rapidly increasing computing resources, reinforcement learning is sufficiently developed, and the reinforcement learning is successfully applied to the fields of robot walking control, cloud workflow scheduling, intelligent transportation and the like, and even has excellent performance far beyond human level on computer games.
The optimization problem of the edge intelligence is multi-constraint and multi-target, some works have been to apply deep reinforcement learning to the optimization of the edge intelligence at present, and the deep reinforcement learning has great potential. The work can be roughly divided into two categories, one category is optimized from the angle of computing heterogeneity, and equipment with higher computing speed is selected by utilizing reinforcement learning, so that the time of each round of federal training can be shortened, but the method usually needs great energy consumption expense; and in the other type, from the perspective of saving limited resources such as energy consumption, the energy-saving equipment participation scheme is selected by using reinforcement learning, so that the total budget expenditure can be reduced, but the problem of edge intelligence in computing heterogeneous is ignored, and long training time is often needed. At present, only a few leading-edge works comprehensively consider the problems of computing heterogeneity, energy consumption and the like, but a great improvement space is provided on the utilization rate of computing resources. Therefore, the method is designed to take account of both heterogeneous calculation and energy consumption overhead, and meanwhile, the calculation power of the edge equipment can be fully utilized, so that the Federal training performance is improved, and the method has important significance for optimizing the performance of edge intelligence.
Disclosure of Invention
The invention aims to provide an edge intelligent optimization method and device capable of considering both calculation of heterogeneous and energy consumption overhead, and further, the calculation power of edge equipment can be fully utilized, and the performance of federal training is improved.
In order to achieve the purpose, the invention provides the following scheme:
an edge intelligent optimization method comprises the following steps:
step 100: acquiring a central model and a strategy model, and appointing a global training parameter; the central model and the policy model are hosted in a central server; the global training parameters include: total number of edge devices, threshold time, batch size, and training rounds;
step 101: determining edge equipment participating in the current round of training based on the number of the training rounds to obtain a participating equipment set;
step 102: obtaining a local data sample;
step 103: the edge devices in the participating device set receive the central model and the training round number, and update parameters of a local model by the batch size by using the local data samples under the condition that the threshold time is met; the local model is implanted in the edge device;
step 104: collecting local information, and constructing the current round state of the environment based on the local information; the current round of states of the environment include: parameters of a local model, communication time, CPU utilization rate and training energy consumption;
step 105: updating the current round state of the environment, and aggregating the central model based on the parameters of the local model in the current round state of the updated environment and the local data samples to obtain an aggregated central model;
step 106: determining an accuracy of the aggregated central model;
step 107: determining a return value of the strategy model according to the accuracy of the aggregation central model, the communication time in the current state of the updated environment and the training energy consumption in the current state of the updated environment;
step 108: generating a normal distribution for each edge device participating in the training of the current round by using the strategy model according to the updated current round state of the environment;
step 109: sampling the normal distribution to obtain new training round number distribution information, and returning to the step 103 until the threshold time is exceeded, and obtaining decision trajectory information; the decision trajectory information comprises a plurality of decision trajectories; each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds;
step 110: and updating the strategy model by using the decision track information, and returning to the execution step 100 to obtain the optimization model of the federal training until the updated strategy model converges to the optimal solution.
Preferably, the determining, based on the number of training rounds, the edge devices participating in the current round of training to obtain a participating device set specifically includes:
distributing corresponding training rounds to the edge equipment based on the training rounds;
when the number of training rounds allocated to the edge device is 0, the edge device does not participate in the training round; when the number of training rounds distributed to the edge equipment is not 0, the edge equipment participates in the training of the current round according to the distributed number of training rounds;
and acquiring edge devices participating in the current round of training to generate the participating device set.
Preferably, after obtaining the central model and the policy model, the method further comprises: and initializing the central model and the strategy model.
Preferably, the determining the accuracy of the aggregated central model specifically includes:
acquiring a test set;
determining the accuracy of the aggregated central model using a test set.
Preferably, the aggregate central model is:
in the formula (I), the compound is shown in the specification,is as followstThe aggregate central model of round +1,is as followsiThe data samples of the individual edge devices are,is as followsiThe number of data samples of each edge device, D is the sum of the number of data samples of all edge devices,n denotes the total number of edge devices,is the t-th wheeliThe parameters of the local model of the individual edge devices,Q t the number of edge devices in the participating device set for the t-th round.
Preferably, the return value of the policy model is:
in the formula (I), the compound is shown in the specification,for the return value of the t-th round policy model,aggregating the accuracy of the central model for the t-th round,to aggregate the accuracy of the central model for round t-1,is the t-th wheeliThe communication time of the individual edge devices,is the t-th wheeliThe training energy consumption of each edge device,is a first weight coefficient of the first weight coefficient,is a second weight coefficient, and is,is a third weight coefficient, and is,Q t the number of edge devices in the participating device set for the t-th round.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the edge intelligent optimization method provided by the invention, the local round state of the environment is constructed based on model parameters, the number of rounds of training, communication time, idle CPU occupancy rate and training energy consumption, each edge device participates in federal training according to the corresponding round number information in the local round state, acquires the information of local model parameters, communication time, idle CPU utilization rate, training energy consumption and the like, and updates the local round state, so that the environment is transferred to the next state. The edge equipment continuously interacts with the environment, a large amount of track information is generated and used for updating the strategy model until the strategy model converges, different federal training rounds are distributed according to the calculation speed, the training energy consumption and the communication time of each equipment, and therefore the purposes of balancing calculation of isomerism and reduction of energy consumption overhead are achieved.
The invention also provides an edge intelligent optimization device, which comprises: a central server and an edge device;
the central server and the edge equipment perform information interaction;
a central model and a strategy model are implanted into the central server; the central server is used for appointing global training parameters, determining edge equipment participating in the current round of training based on the number of training rounds, and obtaining a participating equipment set; the global training parameters include: the total number of edge devices, threshold time, batch size, and training round number;
a local model is implanted in the edge device; the edge devices in the participating device set receive the central model and the training round number in the central server, and update parameters of a local model in the batch size by using local data samples under the condition that the threshold time is met;
the central server is used for acquiring local information and constructing the current state of the environment based on the local information; the current round of states of the environment include: parameters of a local model, communication time, CPU utilization rate and training energy consumption;
the central server is used for updating the current state of the environment and aggregating the central model based on the parameters of the local model in the current state of the updated environment and the local data samples to obtain an aggregated central model;
the central server is used for acquiring a test set and determining the precision of the aggregation central model by adopting the test set;
the central server is used for determining a return value of the strategy model according to the precision of the aggregation central model, the communication time in the current state of the environment after updating and the training energy consumption in the current state of the environment after updating;
the central server is used for generating a normal distribution for each edge device participating in the current training by utilizing the strategy model according to the updated current state of the environment;
the central server is used for sampling the normal distribution to obtain new training round number distribution information and sending the obtained new training round number distribution information to the edge equipment in the participating equipment set, and after the edge equipment in the participating equipment set receives the central model and the new training round number, parameters of a local model are updated by the local data samples in batch size under the condition of meeting the threshold time until the threshold time is exceeded, and decision trajectory information is obtained; the decision track information comprises a plurality of decision tracks; each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds;
and the central server is used for updating the strategy model by using the decision track information, training the updated strategy model as a new strategy model, and obtaining an optimized model of federal training until the updated strategy model converges to an optimal solution.
Preferably, the edge device is a raspberry pi, a smartphone, a computer, or a surveillance camera.
Since the technical effect achieved by the edge intelligent optimization device provided by the invention is the same as that achieved by the edge intelligent optimization method provided by the invention, the details are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a diagram illustrating the steps of an edge intelligent optimization method provided by the present invention;
fig. 2 is an implementation schematic diagram of the edge intelligent optimization device provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide an edge intelligent optimization method and device capable of considering both calculation of heterogeneous and energy consumption overhead, and further, the calculation power of edge equipment can be fully utilized, and the performance of federal training is improved.
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.
As shown in fig. 1, the edge intelligent optimization method provided by the present invention includes:
step 100: and acquiring a central model and a strategy model, and appointing a global training parameter. The central model and the policy model are hosted in a central server. The global training parameters include: total number of edge devices, threshold time, batch size, and training rounds.
Step 101: and determining the edge equipment participating in the current round of training based on the number of the training rounds to obtain a participating equipment set. Specifically, the method comprises the following steps:
and distributing corresponding training round numbers for the edge equipment based on the training round numbers.
When the number of training rounds allocated to an edge device is 0, the edge device does not participate in the training round. When the number of training rounds allocated to the edge device is not 0, the edge device participates in the training round according to the number of the allocated training rounds.
And acquiring the edge devices participating in the current round of training to generate the participating device set.
Step 102: local data samples are obtained.
Step 103: and the edge devices in the participating device set receive the central model and the training round number, and update the parameters of the local model by the batch size by using the local data samples under the condition that the threshold time is met. The local model is implanted in the edge device.
Step 104: local information is collected, and the current round state of the environment is constructed based on the local information. The current round of states of the environment include: parameters of the local model, communication time, CPU utilization, and training energy consumption.
Step 105: and updating the current round state of the environment, and aggregating the central model based on the parameters of the local model in the current round state of the updated environment and the local data samples to obtain an aggregated central model. Wherein the aggregate central model is:
in the formula (I), the compound is shown in the specification,the aggregated central model for round t + 1,is as followsiThe data samples of the individual edge devices are,is as followsiThe number of data samples of each edge device, D is the sum of the number of data samples of all edge devices,and N represents the total number of edge devices,is the t-th wheeliParameters of a local model of the edge device.
Step 106: determining an accuracy of the aggregated central model. Specifically, the method comprises the following steps:
and acquiring a test set.
Determining the accuracy of the aggregated central model using a test set.
Step 107: determining a return value of the policy model according to the accuracy of the aggregated central model, the updated communication time in the current state of the environment, and the updated training energy consumption in the current state of the environment. Wherein, the return value of the strategy model is as follows:
in the formula (I), the compound is shown in the specification,is the t-th wheelThe value of the return of the policy model,aggregating the accuracy of the central model for the t-th round,to aggregate the accuracy of the central model for round t-1,is the t-th wheeliThe communication time of the individual edge devices,is the t-th wheeliThe training energy consumption of each edge device,is a first weight coefficient of the first weight coefficient,is a second weight coefficient, and is,is a third weight coefficient, and is,Q t the number of edge devices in the participating device set for the t-th round.
Step 108: and generating a normal distribution for each edge device participating in the training of the current round by using the strategy model according to the updated current round state of the environment.
Step 109: and sampling the normal distribution to obtain new training round number distribution information, and returning to the step 103 until the threshold time is exceeded, and obtaining decision trajectory information. The decision trajectory information comprises a plurality of decision trajectories. Each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds.
Step 110: and updating the strategy model by using the decision track information, and returning to the step 100 until the updated strategy model converges to the optimal solution, thereby obtaining the optimized model of the federal training.
In order to further improve the training accuracy, after the central model and the strategic model are obtained in step 100, the method for edge intelligent optimization provided by the invention further comprises: and initializing the central model and the strategy model.
The present invention also provides an edge intelligent optimization device, as shown in fig. 2, the device includes: a central server and edge devices.
And the central server performs information interaction with the edge equipment.
And a central model and a strategy model are implanted in the central server. And the central server is used for appointing global training parameters, determining the edge equipment participating in the current round of training based on the number of the training rounds, and obtaining a participating equipment set. The global training parameters include: total number of edge devices, threshold time, batch size, and training rounds.
The edge device has a local model embedded therein. And the edge devices in the participating device set receive the central model and the training round number in the central server, and update the parameters of the local model by the batch size by using the local data samples under the condition of meeting the threshold time.
The central server is used for collecting local information and constructing the current round state of the environment based on the local information. The current round of states of the environment include: parameters of the local model, communication time, CPU utilization, and training energy consumption.
The central server is used for updating the current state of the environment, and aggregating the central model based on the parameters of the local model in the current state of the updated environment and the local data samples to obtain an aggregated central model.
The central server is used for obtaining the test set and determining the accuracy of the aggregation central model by adopting the test set.
The central server is used for determining a return value of the strategy model according to the accuracy of the aggregation central model, the updated communication time in the current state of the environment and the updated training energy consumption in the current state of the environment.
And the central server is used for generating a normal distribution for each edge device participating in the current training by using the strategy model according to the updated current state of the environment.
The central server is configured to sample the normal distribution to obtain new training round number distribution information, and send the obtained new training round number distribution information to the edge device in the participating device set, where after the edge device in the participating device set receives the central model and the new training round number, the edge device updates parameters of a local model in the batch size by using the local data sample under the condition that the threshold time is met, and obtains decision trajectory information until the threshold time is exceeded. The decision trajectory information includes a plurality of decision trajectories. Each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds.
And the central server is used for updating the strategy model by using the decision track information, training the updated strategy model as a new strategy model, and obtaining an optimized model of federal training until the updated strategy model converges to an optimal solution.
The adopted edge equipment can be a raspberry pi, a smart phone, a computer or a monitoring camera.
The following describes a specific implementation process of the above-mentioned edge intelligent optimization method and apparatus by taking a raspberry pi as an edge device as an example.
As shown in fig. 2, the edge intelligent optimization apparatus provided in this embodiment is divided into two parts, one part is a central server located on the left side in fig. 2 and is served by a desktop computer, and the other part is an edge device on the right side and is composed of a plurality of raspberry groups, and the representation meaning of each symbol in fig. 2 is as follows:
Ntotal number of edge devices (e.g., raspberry pies) for federal learning.BLot size used for federal training.Is the threshold time.EThe vectors formed by training rounds are distributed for different raspberries to meet the requirementWhereinDenotes the firstiTraining round number information of each raspberry pie, wherein the value of the training round number information is not more than a threshold valueMIs a natural number of (1).WRepresenting a model parameter matrix, satisfyIn whichDenotes the firstiModel parameters for individual raspberry pies.Represents a communication time vector, satisfiesIn whichIs shown asiThe time taken for each raspberry to communicate, including the sum of the up and down times.UA vector formed by the utilization rate of the CPU in idle time is defined asWhereinIs shown asiCPU utilization (idle utilization) of individual raspberry groups when not participating in federal training.PTo train energy consumption vectors, satisfyWhereinDenotes the firstiThe training total energy consumption of the raspberry pie comprises calculation energy consumption and communication energy consumption.vRepresenting the test accuracy of the central model on the test set. In addition, in order to represent information between different numbers of rounds, subscripts are introducedtTo distinguish, e.g.Respectively representtModel parameter matrix of the wheel, firsttWheel firstiEnergy consumption of a raspberry pietAccuracy of the wheel center model.
The basic idea of the embodiment is as follows: and (3) constructing a reinforcement learning model at the central server end, constructing a deep reinforcement learning environment at the edge equipment end, and continuously interacting the model and the environment to learn an optimal training round number distribution scheme. Specifically, the parameters of the model collected in the round of dispatching the raspberryNumber of rounds of trainingCommunication timeAnd idle CPU occupancy rateAnd training energy consumptionCurrent round state modeled as an environmentI.e. by. The number of training rounds allocated to the device is defined as the actions of the raspberry pie. Precision of two adjacent wheel central modelLocal communication timeAnd communication energy consumptionIs used to construct a merit function (i.e., a return value) that is fed back to the raspberry pieSatisfy the following requirements. Strategy model of raspberry pieState informationAs input, output is number of training rounds. Each raspberry group will be based onThe corresponding round number information in (1) participates in the federal training and collects the local model parametersCommunication timeIdle CPU utilizationAnd training energyConsumption unitThe information is uploaded to a central server, and model parameters are updatedCommunication timeEnergy consumption of communicationAnd idle CPU occupancyMake the environment shift to the next state. The raspberry pie continuously interacts with the environment to generate a large amount of track informationFor policy modelsUntil the policy model is updatedAnd (6) converging.
The optimization method provided by the embodiment specifically includes the following steps:
Step 2, according to the training round number vectorAllocating corresponding training round number for each raspberry group, and if the allocated training round number isThen it is firstiThe raspberry pie participates in the training round and carries outIteration of rounds, if number of training rounds assignedThen represents the firstiThe raspberry pies do not participate in the federate training round, so that the participating device sets of the round can be determined。
Step 3, participating in the equipment set in the training process of the t roundMiddle raspberry pi receiving central modelAnd number of rounds informationAt the time of satisfying the thresholdUnder the condition of (1), using the local dataSample(s)In batch sizeBUpdating local modelsAnd collecting local informationAnd uploading to a central server, and updating the local model by using the formula (1).
Wherein the content of the first and second substances,the number of samples, which are sampled samples of the local data set,are the parameters of the local model and are,for the value of the loss function for that sample,for learning rate, B =1,2.
Step 4, the central server receives the information uploaded by the raspberry group and updatesAnd obtaining an aggregate central model by aggregating the central models by using the formula (2)Evaluating aggregated central models on a test setExtract of (1)Degree of rotationAnd calculating a return value according to the formula (3)For evaluating a policy modelGood or bad.
Wherein, the first and the second end of the pipe are connected with each other,is shown asiA data sample is sent to each raspberryThe number of the (c) component(s),representing the total number of samples on all raspberry legs,are all weight coefficients.
Step 5, reinforcement learning raspberry group according to the stateUsing a policy modelGenerating a normal distribution for each device, and generating new round number distribution information by sampling each normal distribution。
Repeating the step 2~5 for a plurality of times until a time threshold is exceededSaving decision trajectories of raspberry pies。
Step 6, the raspberry pi utilizes a plurality of pieces of track information according to the algorithm in the formula (4)Updating a policy model。
Wherein, the first and the second end of the pipe are connected with each other,representing updated policy modelsIs determined by the parameters of (a) and (b),representing a policy modelThe parameter(s) of (a) is,respectively representing the length and the number of tracks,l=1,2,...,L,m=1,2,...,n, a discount factor is indicated in the form of a discount factor,xis shown astThe length of the track of the wheel is,respectively representjOn the strip tracktThe status, actions and rewards of the wheels,representing corresponding cumulative discount returns, baselineIs shown astWhile in turnjThe average discount return for the bar track is,which represents the operation of the assignment of the value,is a gradient operator.
And repeating all the steps until the strategy model of the raspberry pi converges to the optimal solution, and obtaining the optimized model of the federal training.
Based on the above description, compared with the prior art, the edge intelligent optimization method and apparatus provided by the present invention further have the following advantages:
1. the invention solves the optimization problem of multiple targets and constraints by using a deep reinforcement learning method. The deep reinforcement learning can automatically interact with the edge intelligence, can automatically learn and generate an optimal scheme, does not need a complex mathematical modeling process, and provides a new idea and a new way for optimizing the federal training process.
2. According to the invention, different training rounds are distributed to the equipment with different calculation speeds, the calculation heterogeneous problems among the equipment are skillfully balanced, the calculation power of the equipment can be fully utilized, the training speed of the global model is improved, and a new attempt is made for federal learning and deployment in a practical environment.
3. The method can save the energy consumption expense of the edge equipment without influencing the training speed and precision of the model, can improve the economic benefit of the federal training and ensure the sustainability of the federal training, thereby further meeting the requirements of intelligent multi-target optimization of the edge.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.
Claims (8)
1. An edge intelligent optimization method is characterized by comprising the following steps:
step 100: acquiring a central model and a strategy model, and specifying a global training parameter; the central model and the policy model are hosted in a central server; the global training parameters include: total number of edge devices, threshold time, batch size, and training rounds;
step 101: determining edge equipment participating in the current round of training based on the number of the training rounds to obtain a participating equipment set;
step 102: obtaining a local data sample;
step 103: the edge devices in the participating device set receive the central model and the training round number, and update parameters of a local model by the batch size by using the local data samples under the condition that the threshold time is met; the local model is implanted in the edge device;
step 104: collecting local information, and constructing the current round state of the environment based on the local information; the current round of states of the environment include: parameters of a local model, communication time, CPU utilization rate and training energy consumption;
step 105: updating the current round state of the environment, and aggregating the central model based on the parameters of the local model in the current round state of the updated environment and the local data samples to obtain an aggregated central model;
step 106: determining an accuracy of the aggregated central model;
step 107: determining a return value of the strategy model according to the accuracy of the aggregation central model, the communication time in the current state of the updated environment and the training energy consumption in the current state of the updated environment;
step 108: generating a normal distribution for each edge device participating in the training of the current round by using the strategy model according to the updated current round state of the environment;
step 109: sampling the normal distribution to obtain new training round number distribution information, and returning to the step 103 until the threshold time is exceeded, and obtaining decision trajectory information; the decision track information comprises a plurality of decision tracks; each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds;
step 110: and updating the strategy model by using the decision track information, and returning to the step 100 until the updated strategy model converges to the optimal solution, thereby obtaining the optimized model of the federal training.
2. The edge intelligent optimization method according to claim 1, wherein the determining the edge devices participating in the current round of training based on the number of training rounds to obtain a participating device set specifically includes:
distributing corresponding training rounds to the edge equipment based on the training rounds;
when the number of training rounds allocated to the edge device is 0, the edge device does not participate in the training round; when the number of training rounds distributed to the edge equipment is not 0, the edge equipment participates in the training of the current round according to the distributed number of training rounds;
and acquiring edge devices participating in the current round of training to generate the participating device set.
3. The edge intelligent optimization method according to claim 1, further comprising, after obtaining the central model and the strategic model: and initializing the central model and the strategy model.
4. The edge intelligent optimization method according to claim 1, wherein the determining the accuracy of the aggregated central model specifically comprises:
acquiring a test set;
determining the accuracy of the aggregated central model using a test set.
5. The edge intelligent optimization method according to claim 1, wherein the aggregate central model is:
in the formula (I), the compound is shown in the specification,is a firsttThe aggregate central model of round +1,is a firstiThe data samples of the individual edge devices are,is as followsiThe number of data samples of each edge device, D is the data samples of all edge devicesThe sum of the amounts of (a) and (b),and N represents the total number of edge devices,is the t-th wheeliThe parameters of the local model of the individual edge devices,Q t the number of edge devices in the participating device set for the t-th round.
6. The edge intelligent optimization method according to claim 1, wherein the return values of the policy model are:
in the formula (I), the compound is shown in the specification,for the return value of the t-th round policy model,the accuracy of the central model is aggregated for the t-th round,to aggregate the accuracy of the central model for round t-1,is the t-th wheeliThe communication time of each of the edge devices,is the t-th wheeliThe training energy consumption of each edge device,is a first weight systemThe number of the first and second groups is,is a second weight coefficient, and is,is a third weight coefficient, and is,Q t the number of edge devices in the participating device set for the t-th round.
7. An edge intelligence optimization device, comprising: a central server and an edge device;
the central server and the edge equipment perform information interaction;
a central model and a strategy model are implanted into the central server; the central server is used for appointing global training parameters, determining edge equipment participating in the current round of training based on the number of training rounds, and obtaining a participating equipment set; the global training parameters include: the total number of edge devices, threshold time, batch size, and training round number;
a local model is implanted in the edge device; the edge devices in the participating device set receive the central model and the number of training rounds in the central server, and update parameters of the local model in the batch size by using local data samples under the condition that the threshold time is met;
the central server is used for collecting local information and constructing the current round state of the environment based on the local information; the current round of states of the environment include: parameters of a local model, communication time, CPU utilization rate and training energy consumption;
the central server is used for updating the current state of the environment and aggregating the central model based on the parameters of the local model in the current state of the updated environment and the local data samples to obtain an aggregated central model;
the central server is used for acquiring a test set and determining the precision of the aggregation central model by adopting the test set;
the central server is used for determining a return value of the strategy model according to the precision of the aggregation central model, the communication time in the current state of the environment after updating and the training energy consumption in the current state of the environment after updating;
the central server is used for generating a normal distribution for each edge device participating in the current training by utilizing the strategy model according to the updated current state of the environment;
the central server is used for sampling the normal distribution to obtain new training round number distribution information and sending the obtained new training round number distribution information to the edge equipment in the participating equipment set, and after the edge equipment in the participating equipment set receives the central model and the new training round number, parameters of a local model are updated by the local data samples in batch size under the condition of meeting the threshold time until the threshold time is exceeded, and decision trajectory information is obtained; the decision track information comprises a plurality of decision tracks; each of the decision trajectories includes: the current round state of the environment, the return value of the strategy model and the number of training rounds;
and the central server is used for updating the strategy model by using the decision track information, training the updated strategy model as a new strategy model, and obtaining an optimized model of federal training until the updated strategy model converges to an optimal solution.
8. The intelligent edge optimization device of claim 7, wherein the edge device is a raspberry pi, a smartphone, a computer, or a surveillance camera.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211282973.XA CN115357402B (en) | 2022-10-20 | 2022-10-20 | Intelligent edge optimization method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211282973.XA CN115357402B (en) | 2022-10-20 | 2022-10-20 | Intelligent edge optimization method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115357402A CN115357402A (en) | 2022-11-18 |
CN115357402B true CN115357402B (en) | 2023-01-24 |
Family
ID=84008718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211282973.XA Active CN115357402B (en) | 2022-10-20 | 2022-10-20 | Intelligent edge optimization method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115357402B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113887748A (en) * | 2021-12-07 | 2022-01-04 | 浙江师范大学 | Online federal learning task allocation method and device, and federal learning method and system |
CN114168328A (en) * | 2021-12-06 | 2022-03-11 | 北京邮电大学 | Mobile edge node calculation task scheduling method and system based on federal learning |
CN114528304A (en) * | 2022-02-18 | 2022-05-24 | 安徽工业大学 | Federal learning method, system and storage medium for updating self-adaptive client parameters |
CN114546608A (en) * | 2022-01-06 | 2022-05-27 | 上海交通大学 | Task scheduling method based on edge calculation |
-
2022
- 2022-10-20 CN CN202211282973.XA patent/CN115357402B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114168328A (en) * | 2021-12-06 | 2022-03-11 | 北京邮电大学 | Mobile edge node calculation task scheduling method and system based on federal learning |
CN113887748A (en) * | 2021-12-07 | 2022-01-04 | 浙江师范大学 | Online federal learning task allocation method and device, and federal learning method and system |
CN114546608A (en) * | 2022-01-06 | 2022-05-27 | 上海交通大学 | Task scheduling method based on edge calculation |
CN114528304A (en) * | 2022-02-18 | 2022-05-24 | 安徽工业大学 | Federal learning method, system and storage medium for updating self-adaptive client parameters |
Non-Patent Citations (4)
Title |
---|
"DRL + FL": An intelligent resource allocation model based on deep reinforcement learning for Mobile Edge Computing;Nanliang Shan 等;《Computer Communications》;20200528;全文 * |
Allo: Optimizing Federated Learning via Guided Epoch Allocation;Jiasheng Wang 等;《State Intellectual Property Office of China》;20220727;全文 * |
Experience-Driven Computational Resource Allocation of Federated Learning by Deep Reinforcement Learning;Yufeng Zhan 等;《2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)》;20201231;全文 * |
一种面向边缘计算的高效异步联邦学习机制;芦效峰 等;《计算机研究与发展》;20201231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115357402A (en) | 2022-11-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113191484B (en) | Federal learning client intelligent selection method and system based on deep reinforcement learning | |
CN113435472A (en) | Vehicle-mounted computing power network user demand prediction method, system, device and medium | |
Tun et al. | Federated learning based energy demand prediction with clustered aggregation | |
CN113467952A (en) | Distributed federated learning collaborative computing method and system | |
CN112650581A (en) | Cloud-side cooperative task scheduling method for intelligent building | |
CN113781002B (en) | Low-cost workflow application migration method based on agent model and multiple group optimization in cloud edge cooperative network | |
CN114585006B (en) | Edge computing task unloading and resource allocation method based on deep learning | |
CN112381113A (en) | HK model-based industrial internet big data collaborative decision-making method | |
CN116489226A (en) | Online resource scheduling method for guaranteeing service quality | |
Cui et al. | Multi-Agent Reinforcement Learning Based Cooperative Multitype Task Offloading Strategy for Internet of Vehicles in B5G/6G Network | |
CN115357402B (en) | Intelligent edge optimization method and device | |
CN117436485A (en) | Multi-exit point end-edge-cloud cooperative system and method based on trade-off time delay and precision | |
Lou et al. | Cooperation emergence of manufacturing services in cloud manufacturing with agent-based modeling and simulating | |
CN115118591B (en) | Cluster federation learning method based on alliance game | |
CN115883371A (en) | Virtual network function placement method based on learning optimization method in edge-cloud collaborative system | |
Mays et al. | Decentralized data allocation via local benchmarking for parallelized mobile edge learning | |
CN114022731A (en) | Federal learning node selection method based on DRL | |
Sang et al. | RALaaS: Resource-aware learning-as-a-service in edge-cloud collaborative smart connected communities | |
CN117539640B (en) | Heterogeneous reasoning task-oriented side-end cooperative system and resource allocation method | |
Ma | Multi-Task Offloading via Graph Neural Networks in Heterogeneous Multi-access Edge Computing | |
Chen et al. | Container cluster placement in edge computing based on reinforcement learning incorporating graph convolutional networks scheme | |
CN117687762B (en) | Multi-data center cooperative scheduling method and system considering privacy constraint | |
CN117541025B (en) | Edge calculation method for intensive transmission line inspection | |
Lu et al. | Resource Allocation Method of Industrial Terminal Edge Computing Based on Reinforcement Learning Algorithm | |
Lajeunesse et al. | A Cooperative Optimal Mining Model for Bitcoin |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |