CN115190135A - Distributed storage system and copy selection method thereof - Google Patents

Distributed storage system and copy selection method thereof Download PDF

Info

Publication number
CN115190135A
CN115190135A CN202210768871.2A CN202210768871A CN115190135A CN 115190135 A CN115190135 A CN 115190135A CN 202210768871 A CN202210768871 A CN 202210768871A CN 115190135 A CN115190135 A CN 115190135A
Authority
CN
China
Prior art keywords
network
edge
edge server
actor
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210768871.2A
Other languages
Chinese (zh)
Other versions
CN115190135B (en
Inventor
党曼玉
洪旺
施展
廖子逸
李一泠
张望
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202210768871.2A priority Critical patent/CN115190135B/en
Publication of CN115190135A publication Critical patent/CN115190135A/en
Application granted granted Critical
Publication of CN115190135B publication Critical patent/CN115190135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1023Server selection for load balancing based on a hash applied to IP addresses or costs

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a distributed storage system and a copy selection method thereof, belonging to the technical field of distributed storage, wherein an Actor network is arranged in each edge server to quickly calculate the score of each edge server, and a Critic network is deployed at a cloud end to comprehensively consider the information of all Actor networks for joint action evaluation; the Actor network is trained on the basis of an evaluation result output by the corresponding Critic network, and the Critic network is trained on the basis of data obtained by random sampling from the experience pool; the training processes of the Actor network and the Critic network are mutually independent and continuously carried out, so that the service quality of each edge server can be accurately scored at each moment, the server ranking is maintained among the servers and distributed to the client, the copy selection has complete server state information and is free of forwarding delay overhead, the copy selection in the edge environment can be better adapted, the request processing delay in the edge environment is reduced, and the consideration of performance and reliability is realized.

Description

Distributed storage system and copy selection method thereof
Technical Field
The invention belongs to the technical field of distributed storage, and particularly relates to a distributed storage system and a copy selection method thereof.
Background
With the popularization of mobile phones, wearable devices and various sensors, the number of internet of things devices is rapidly increasing. According to the mobile report of Ericsson 2021, 146 hundred million connections of the Internet of things are established in 2021 year around the world, and the number is expected to increase to 302 hundred million by 2027 years. These devices are used to support a variety of applications including applications such as road safety services, real-time video analytics, gaming, augmented reality, and virtual reality. However, due to computational, storage, and energy constraints, these applications can only collect data and then move it to a cloud data center with powerful processing power for processing. With the support of cloud computing, users can run these applications by using less powerful devices.
However, in the cloud computing mode, data is sent from the edge to the cloud end through multiple jumps, which causes great delay in request processing. And so many thing networking devices all produce a large amount of data at every moment, and all data are forwarded to the high in the clouds and are handled, will occupy a large amount of network bandwidth. For this reason, a new calculation mode edge calculation has emerged. Edge computing provides computing and storage services by deploying edge servers at the edge of the network, thereby enabling direct processing of user data at the edge, reducing latency of requests and saving network bandwidth between the edge and the cloud. Further, as the transmission path is shortened, the reliability of transmission is also improved.
The storage service is deployed at the edge, so that the terminal equipment can access data at high speed, and the response delay of data access is reduced, which is very important for the application sensitive to the common delay. However, due to the many sources of variability, performance fluctuations often occur in the nodes of the distributed storage system, thereby affecting the quality of service of the system. And in the edge environment, the service quality of the system changes due to the position change of the user and the time-varying dynamic network. The copy selection strategy, as a widely used request scheduling method for improving the service quality of the system, can effectively reduce the processing delay of each request by selecting the edge server with the lowest delay for the request. Replica selection does not increase the load on the system compared to other methods (e.g., redundant requests, reissue requests, etc.). And replica selection is an indispensable ring in the distributed storage system (when a request arrives, it is always necessary to service a server for it). Therefore, it is necessary to research the copy selection policy in the edge environment to guarantee the service quality of the system. However, the conventional copy selection policy is often set at the client, and cannot quickly adapt to the change of the state of the edge server. In order to solve the problems, the existing copy selection strategy is mostly arranged at a server end so as to sense the performance of the server, and mainly comprises a copy selection strategy based on a client and a copy selection strategy based on a central node; the client-based copy selection strategy is inaccurate in estimation of server delay due to the fact that complete server state information is lacked, and load oscillation is easy to occur due to the fact that multiple selection nodes are difficult to coordinate, and the request delay is increased; and executing a copy selection task for all clients through an additional central node based on a copy selection strategy of the central node, using the cloud data center as a copy selection node in an edge scene, sending a request to one cloud data center, selecting an edge server with the best service capability for each request in the cloud data center, wherein request forwarding exists, additional response delay is introduced, and the delay generated by request forwarding is larger in an edge geographical distribution environment.
In order to reduce the request processing delay in the edge environment, guarantee the service quality of the system, and achieve the compromise between performance and reliability, how to design a distributed storage system convenient for copy selection, and how to optimize a copy selection method in the distributed storage system become a problem which needs to be solved urgently.
Disclosure of Invention
In view of the above drawbacks and needs of the prior art, the present invention provides a distributed storage system and a copy selection method thereof, which are used to solve the technical problem of high response delay in the prior art.
To achieve the above object, in a first aspect, the present invention provides a distributed storage system, including: the cloud terminal and the server terminal; wherein, the server end includes: a plurality of distributed edge servers; an Actor network is deployed in each edge server; the cloud end is provided with a plurality of Critic networks, the number of the Critic networks is the same as that of the edge servers, and one Critic network corresponds to one Actor network;
the operation process of the distributed storage system comprises the following steps:
at each time t, each edge server performs the following operations: the method comprises the following steps that an edge server collects current state data of a network environment where the edge server is located as state information of the edge server, and inputs the current state data into an Actor network used for carrying out service quality scoring on the edge server to obtain scoring of the edge server; after the edge server sends the state information and the scores of all edge servers to a corresponding Critic network in the cloud to obtain an evaluation result, training an Actor network in the edge server by taking the maximum evaluation result as a target;
at each time t, the cloud performs the following operations: collecting information sent by all edge servers, and after collecting the information sent by all edge servers at the time t, calculating the reward value r at the time t-1 t-1 And storing the corresponding tuple information into an experience pool; when the experience pool is full of data, randomly sampling tuple information data from the experience pool to train each Critic network at the same time; wherein, the tuple information comprises: the state information of all edge servers at the time t-1, the scores of all edge servers at the time t-1, the reward value at the time t-1 and the state information of all edge servers at the time t.
Further preferably, the reward value r at the time t-1 is t-1 Comprises the following steps:
Figure BDA0003723162000000031
Figure BDA0003723162000000032
Figure BDA0003723162000000033
wherein N is the number of edge servers;
Figure BDA0003723162000000034
average latency for the ith edge server;
Figure BDA0003723162000000035
average of the average delays for all edge servers;
Figure BDA0003723162000000036
the number of requests processed for the ith edge server;
Figure BDA0003723162000000037
is the average of the number of requests processed by the ith edge server.
Further preferably, in the process of executing operation at each time t, after the experience pool is not full of data or Critic network training is completed, the cloud end judges whether the elapsed time from the time t is greater than a preset time period, if so, the scores of the edge servers at different times are obtained from the experience pool, and the score average value of each edge server is obtained through calculation; dividing the edge servers into low-delay edge servers and high-delay edge servers by taking the median of the score average value of each edge server as a dividing point; the average value of the scores of the low-delay edge servers is greater than or equal to the dividing point, and the average value of the scores of the high-delay edge servers is smaller than the dividing point; partitioning the edge server by adopting two root bucket structures, wherein the two root bucket structures are respectively marked as a Low bucket and a High bucket; will be provided with
Figure BDA0003723162000000038
Placing N/2 Low-delay edge servers in Low bucketHigh latency edge servers in a High bucket; select in Low bucket
Figure BDA0003723162000000039
Selecting M/2 High delay edge servers in a High bucket to place copies by the low delay edge servers; otherwise, the operation of the cloud end at the moment t is finished; wherein N is the number of edge servers; m is the number of copies.
Further preferably, the Actor network includes: an Actor online network and an Actor target network; the Critic network comprises a Critic online network and a Critic target network;
the operation process of the distributed storage system comprises the following steps:
at each time t, each edge server performs the following operations: the method comprises the steps that an edge server collects current state data of a network environment where the edge server is located to serve as state information of the edge server, and the current state data are respectively input into an Actor online network and an Actor target network inside the edge server to obtain a score output by the Actor online network and a score output by the Actor target network; after the edge server sends the state information and the scores output by the Actor online networks of all edge servers to a corresponding Critic online network in the cloud to obtain an evaluation result, the edge server trains the Actor online network in the edge server by taking the maximum evaluation result as a target; after each training cycle, updating an Actor target network based on the parameters of the Actor online network;
at each time t, the cloud performs the following operations: collecting information sent by all edge servers, calculating the reward value at the t-1 moment after collecting the information sent by all edge servers at the t moment, and storing corresponding tuple information into an experience pool; when the experience pool is full of data, randomly sampling tuple information data from the experience pool to train each Critic network at the same time; the tuple information includes: state information s of all edge servers at time t-1 t-1 And the scores a output by the Actor on-line network of all edge servers at the moment t-1 t-1 Prize value r at time t-1 t-1 And state information s of all edge servers at time t t And all edge services at time tScoring a 'output by Actor target network of device' t (ii) a Wherein,
Figure BDA0003723162000000041
Figure BDA0003723162000000042
Figure BDA0003723162000000043
the state information of the ith edge server at the time t-1;
Figure BDA0003723162000000044
the score is the score output by the Actor online network of the ith edge server at the time of t-1;
Figure BDA0003723162000000045
state information of the ith edge server at the moment t;
Figure BDA0003723162000000046
the score is output by an Actor target network of the ith edge server at the moment t; n is the number of edge servers.
Further preferably, the method for training each Critic network from randomly sampled tuple information data in the experience pool includes:
the j-th tuple information data obtained by sampling is recorded as(s) b ,a b ,r b ,s b+1 ,a' b+1 ) (ii) a Wherein,
Figure BDA0003723162000000047
Figure BDA0003723162000000048
Figure BDA0003723162000000049
state information of the ith edge server at the moment b;
Figure BDA00037231620000000410
the score is output by an Actor on-line network of the ith edge server at the moment b;
Figure BDA00037231620000000411
the score output by an Actor target network of the ith edge server at the moment of b + 1;
obtaining an evaluation result and a corresponding evaluation label of each edge server based on tuple information data obtained by sampling; wherein, the evaluation result of the ith edge server obtained based on the jth tuple information data is that
Figure BDA00037231620000000412
And a b Inputting the evaluation result obtained by the ith Critic online network; evaluation label of ith edge server obtained based on jth tuple information data
Figure BDA00037231620000000413
r b The reward value at the moment b; γ is the reward discount rate;
Figure BDA00037231620000000414
to be composed of
Figure BDA00037231620000000415
And a' b+1 Inputting the evaluation result obtained by the ith Critic target network;
training each Critic online network by minimizing the difference between the evaluation result of each edge server and the corresponding evaluation label; and after each training cycle, updating the corresponding Critic target network based on the parameters of the Critic online network.
In a second aspect, the present invention provides a copy selection method based on the distributed storage system, including: in the operation process of the distributed storage system, when a server end receives a copy access request, ranking each edge server based on the scores of the edge servers, and selecting the edge server with the highest ranking and the data copy as a node for copy selection to perform data access.
Further preferably, all edge servers in the distributed storage system form a Ceph system; and after normalizing the scores of each edge server with the data copy, the Ceph system is used as an affinity-primary parameter value corresponding to the edge server, and selects the edge server for data access based on the affinity-primary parameter value.
Further preferably, the Ceph system normalizes the scores of each edge server for which a data copy exists using a max-min normalization method.
In a third aspect, the present invention provides a system for selecting a copy, including: a memory storing a computer program and a processor executing the computer program to perform the copy selection method provided by the second aspect of the present invention.
In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program is executed by a processor, the computer program controls an apparatus where the storage medium is located to execute the copy selection method provided in the second aspect of the present invention.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
1. the invention provides a distributed storage system.A cloud end and an edge server end are provided with different network structures; because the edge has larger delay to the cloud, and multiple factors influencing the service quality of the system exist in the edge environment, and meanwhile, the score value of the edge server is considered to be a continuous numerical value, the invention sets an Actor network in each edge server to quickly calculate the score (ranking) of each edge server, and does not uniformly calculate the score through the cloud and then distribute the score; in addition, a Critic network is deployed at the cloud side to comprehensively consider the information of all Actor networks for joint action evaluation; the Actor network is trained based on an evaluation result output by the corresponding Critic network, the Critic network is trained based on data obtained by random sampling from an experience pool, training processes of the Actor network and the Critic network are independent and continuous, so that the service quality of each edge server can be accurately scored at each moment, and a server ranking is maintained between servers and distributed to clients, so that a copy selection has complete server state information and no forwarding delay overhead, the transmission overhead of cloud-side data is greatly reduced, the selection of the copy in the edge environment can be better adapted, the request processing delay in the edge environment is reduced, and the balance between performance and reliability is realized.
2. According to the distributed storage system provided by the invention, the Actor network and the Critic network are both of a double-network structure, so that the learning stability is greatly improved, and the accuracy of copy selection is further improved.
3. The distributed storage system provided by the invention considers that the data access service is a state data access service, the data access request can only select the copy among the servers with data copies, and the placement position of the copy can influence the effectiveness of the copy selection strategy.
4. Because the selection of modifying the copy by an intrusion system relates to a large number of existing mechanisms in the system, and the copy selection mechanism is difficult to be embedded into the existing system perfectly, the copy selection method provided by the invention designs an additional processing flow aiming at the existing internal mechanisms of the Ceph system to change the selection of the copy, after the score of an edge server is obtained once, the score of the edge server is normalized to be used as an affinity-primary parameter value of an OSD node of the edge server, and the Ceph system selects a main OSD node thereof as a node selected by the copy to perform data access based on the affinity-primary parameter value, namely the edge server with the highest rank and the data copy exists.
Drawings
Fig. 1 is a schematic structural diagram of a distributed storage system according to embodiment 1 of the present invention;
fig. 2 is a schematic diagram of an Actor network structure provided in embodiment 1 of the present invention;
fig. 3 is a structural diagram of a criticic network provided in embodiment 1 of the present invention;
fig. 4 is a flow chart of multi-agent reinforcement learning data under a marginal environment according to embodiment 1 of the present invention;
FIG. 5 is a schematic view of a double "root bucket" structure provided in embodiment 1 of the present invention;
FIG. 6 is a rule implementation in a double "root bucket" structure provided in embodiment 1 of the present invention;
FIG. 7 is a schematic diagram of average delay results of different replica selection strategies under three loads of Read-only, read-assist and Update-assist according to embodiment 2 of the present invention;
fig. 8 is a schematic diagram of an average response delay result of each node at each time point under Read-only load using 3 different strategies according to embodiment 2 of the present invention;
fig. 9 is a schematic diagram of the delay influence of different numbers of clients on three copy selection policies under Read-only load according to embodiment 2 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the respective embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Examples 1,
A distributed storage system, as shown in fig. 1, comprising: the cloud terminal and the server terminal; wherein, the server end includes: a plurality of distributed edge servers; an Actor network is deployed in each edge server; a plurality of Critic networks are deployed at the cloud end, the number of the Critic networks is the same as that of the edge servers, and one Critic network corresponds to one Actor network;
the operation process of the distributed storage system comprises the following steps:
the operation process of the server side comprises the following steps:
at each time t, each edge server performs the following operations: the method comprises the following steps that an edge server collects current state data of a network environment where the edge server is located as state information of the edge server, and inputs the current state data into an Actor network used for carrying out service quality scoring on the edge server to obtain scoring of the edge server; after the edge server sends the state information and the scores of all the edge servers to a corresponding Critic network in the cloud to obtain the evaluation result, training an Actor network in the edge server by taking the maximum evaluation result as a target;
it should be noted that each edge server runs an OSD process, including an OSD node. All the edge servers form a Ceph system, after the scores of the edge servers are normalized, the normalized scores serve as an affinity-primary parameter value of the OSD node of each edge server, and the Ceph system can select a main OSD node of the edge servers as a node selected by a copy to perform data access based on the affinity-primary parameter value, namely the edge server with the highest rank and a data copy. Specifically, normalization methods such as tanh normalization, sigmoid normalization, max-min normalization, and the like can be employed. Preferably, the max-min normalization method is adopted to normalize the score of the edge server, so that the original data information can be more completely retained compared with other normalization methods (such as tanh normalization and sigmoid normalization).
In an optional implementation manner, the edge server mainly comprises a Ceph system module, an information acquisition module, a scoring module and an adapter module;
a scoring module: the scoring module mainly comprises an Actor network in Deep Deterministic Policy Gradient (DDPG) reinforcement learning, outputs actions (scores) for the edge server according to information acquired by the edge server independently, and sends a group of information such as state information, actions and performance to the cloud.
An adapter module: since the intrusive system changes the copy selection according to the scores, which involves a large number of system internal processes, this module is dedicated to interfacing with the specific system, and the scores are converted into a machine capable of changing the copy selection according to the specific systemAnd (5) preparing. Specifically, an additional processing flow is designed for an existing internal mechanism of the Ceph system to change the selection of the copy, and since OSD nodes placed on objects in the Ceph system are directly calculated through the CRUSH algorithm, and the read-write operation of the objects is completed through the main OSD nodes, the main OSD carries a lot of processing logic of the system. If the system is invaded directly according to the scores (ranking) to change the requested target OSD nodes, a large number of existing mechanisms in the system are involved. Therefore, the invention considers the selection of the main OSD node, and changes the node selected by the copy by changing the main OSD node corresponding to the object. The selection of the OSD nodes is performed by a 'drawing algorithm' in Ceph, and three OSD nodes (three copies) with the longest drawing are selected from all the nodes as the placing nodes of the data. In this most initial OSD order, the main OSD node is the longest node. Subsequently, a processing flow is designed for selecting the main OSD nodes for more dynamic property, and the Ceph system provides an Affinity-Primary parameter to control the probability of each OSD node becoming the main OSD node. The extent of Affinity-Primary interval in the Ceph system is set to [0,1]The score value output by the neural network is obviously beyond this range, so the value output by the neural network needs to be mapped into this interval. The use of a max-min normalization method for mapping is considered in this embodiment. Compared with other normalization methods (such as tanh normalization and sigmoid normalization), the method can completely retain original data information, and is shown as the following formula:
Figure BDA0003723162000000081
the operation process of the cloud:
at each time t, the cloud performs the following operations: collecting the information sent by all edge servers, and after collecting the information sent by all edge servers at the time t, calculating the reward value r at the time t-1 t-1 Storing the corresponding tuple information into an experience pool; when the experience pool is full of data, randomly sampling tuple information data from the experience pool to train each Critic network at the same time; wherein, the tuple information comprises: all edge servers at time t-1The scores of all edge servers at time t-1, the prize value at time t-1, and the status information of all edge servers at time t.
In this embodiment, the reward value r at the time t-1 t-1 Comprises the following steps:
Figure BDA0003723162000000082
Figure BDA0003723162000000083
Figure BDA0003723162000000084
wherein N is the number of edge servers;
Figure BDA0003723162000000085
average delay for the ith edge server;
Figure BDA0003723162000000086
average of the average delays for all edge servers;
Figure BDA0003723162000000087
the number of requests processed for the ith edge server;
Figure BDA0003723162000000088
is the average of the number of requests processed by the ith edge server.
Under an optional implementation mode, the cloud end mainly comprises a reward calculation module, an experience pool module, an evaluation module and a copy placement optimization module.
A reward calculation module: the reward calculation module needs to maintain the last state and action information, receives the current state and action information, calculates the integral reward value of the system in the first round and calculates the integral reward value of the system in the second round<Last state, last action, reward value, this stateAction output this time by Actor target network>Tuple information is stored in an experience pool. Note that the reward calculation module needs to maintain all information at time t-1 (i.e., state s at time t-1) t-1 And action information a t-1 ) And collects the information of all edge servers at the time t (i.e. the state s at the time t) t And motion information a' t ). The reward value r at the t-1 moment can be calculated through the information at the t moment t-1 And will tuple(s) t-1 ,a t-1 ,r t-1 ,s t ,a' t ) The information is stored in an experience pool. Specifically, how the reward value is calculated is important for reinforcement learning, and the embodiment measures the magnitude of the reward value by using the number of requests processed by each node, and the average delay of the nodes is
Figure BDA0003723162000000091
Less than unity average delay
Figure BDA0003723162000000092
The number of requests has a positive reward feedback, and the number of requests processed by each node at time t is defined as
Figure BDA0003723162000000093
While considering that the lower the latency the more rewards the node should receive to process the request, the number of requests from different nodes should have different reward weights. Having weight information, then instead of all requests having reward feedback, the number of requests processed per node will be
Figure BDA0003723162000000094
And mean value
Figure BDA0003723162000000095
Has reward feedback representing multi-processing or low-processing requests (considered equally based on the number of processing requests per node). The number of requests with reward feedback per node is defined as shown in the formula:
Figure BDA0003723162000000096
rewarding weight and delay due to each node
Figure BDA0003723162000000097
Correlation, considering the average delay per node and the overall average delay used directly
Figure BDA00037231620000000910
Represents the weight parameter as shown in the formula:
Figure BDA0003723162000000098
the calculation of the final prize value is then defined as shown in the formula:
Figure BDA0003723162000000099
an evaluation module: the evaluation module consists of a criticic network in DDPG reinforcement learning and evaluates the action information of the Actor network. The evaluation value output by the Critic network is used as 'supervision information' of Actor network learning, and historical data is sampled from an experience pool to train and learn the Critic network.
A copy placement optimization module: considering that the storage system provides a stateful data access service, the placement of the copy will affect the optional nodes of the selection policy. The data migration is considered, and the placement position of the copy is optimized, so that the copy selection is better carried out.
Note that the input of the Actor network is defined as s for the self status information observed by each edge server, and the output is defined as a for the score (action). In an alternative embodiment, a specific implementation structure of the Actor network is shown in fig. 2, and the entire Actor network is composed of two full connection layers (Linear layers) and one Relu activation Layer. Considering the limited resources of the edge server, MARLRS defines the output (or input) of the fully-connected layer, i.e., the hidden layer in the middle, as 50 dimensions for less computational overhead. The weight matrixes of two full connection layers of the Actor network are respectively defined as w a1 ,w a2 The dimensions of the weight matrix are len(s) × 50, 50 × 1, respectively, where len(s) is the dimension of the state. Then the indicator network represents the meterThe calculation formula is as follows: a = Relu (s w) a1 )*w a2
The role of the Critic network is to evaluate the calculation result of the Actor network, that is, the output of the Critic network is the "supervision information" learned by the Actor network. The better the result of the Actor network is, the larger the output result of the criticic network is; the worse the result of the Actor network is, the less negative and the smaller the output result of the criticic network is. The Critic network is used for evaluating the calculation result of the Actor network, namely the output of the Critic network is the 'supervision information' learned by the Actor network. The better the result of the Actor network is, the larger the output result of the criticic network is; the worse the result of the Actor network is, the less negative and the smaller the output result of the criticic network is. The Critic network needs to input both the input s of the Actor network and the output a of the Actor network. In an alternative embodiment, the concrete implementation structure of the criticic network is shown in fig. 3, and s and a which are input are respectively calculated by using a full connection layer, and the intermediate result is respectively defined as mid s And mid a The weight matrix of two fully-connected layers is defined as w cs And w ca The matrix dimensions are defined as len(s) × 200 and N × 200, respectively, where N is the number of edge servers. mid (R) and s and mid a Are respectively mid s =s*wc s And mid a =a*w ca . Then, the intermediate output result of the network is subjected to linear summation calculation according to the following steps: mid = mid s +mid a + b; where b is the noise matrix. Then, the evaluation result q is calculated through an activation function and a full connection layer like the Actor network. Defining the weight matrix of the last full connection layer as wc, the calculation formula represented by the criticc network is as shown in the formula: q = Relu (mid) w c
Further, because the edge has a large delay to the cloud, the invention sets an Actor network in each edge server to quickly calculate the score (ranking) of each node, rather than uniformly calculating the score through the cloud and then distributing the score. The criticic networks need to comprehensively consider information of all Actor networks to perform joint action evaluation, and in order to train the networks better, a batch of data needs to be randomly sampled from an experience pool to learn at the same time, and all criticic networks are deployed at the cloud (one Actor network corresponds to one criticic network, namely, the ith Actor network corresponds to the ith criticic network, i =1,2, \8230;, N). In order to improve the stability of learning, in an alternative embodiment, both the Actor network and the Critic network adopt a dual-network configuration. Specifically, the Actor network includes: an Actor online network and an Actor target network; the Critic network comprises a Critic online network and a Critic target network; and evaluating the action of the Actor online network at each moment through the Critic online network.
The operation process of the distributed storage system comprises the following steps:
at each time t, each edge server performs the following operations: the method comprises the steps that an edge server collects current state data of a network environment where the edge server is located as state information of the edge server, and the current state data are respectively input into an Actor on-line network and an Actor target network inside the edge server to obtain scores output by the Actor on-line network and scores output by the Actor target network; the method comprises the steps that after an edge server sends state information of the edge server and scores output by Actor online networks of all edge servers to a corresponding criticic online network in a cloud end to obtain an evaluation result, the Actor online network in the edge server is trained by taking the maximum evaluation result as a target; after each training cycle, updating the target network of the Actor based on the parameters of the on-line network of the Actor;
at each time t, the cloud performs the following operations: collecting information sent by all edge servers, calculating the reward value at the t-1 moment after collecting the information sent by all edge servers at the t moment, and storing corresponding tuple information into an experience pool; when the experience pool is full of data, randomly sampling tuple information data from the experience pool to train each Critic network at the same time; the tuple information includes: state information s of all edge servers at time t-1 t-1 And the score a output by the Actor on-line network of all edge servers at the moment of t-1 t-1 The reward value r at time t-1 t-1 And state information s of all edge servers at time t t And the scores a 'output by the Actor target networks of all edge servers at the moment t' t (ii) a Wherein,
Figure BDA0003723162000000111
Figure BDA0003723162000000112
Figure BDA0003723162000000113
the state information of the ith edge server at the time t-1;
Figure BDA0003723162000000114
the score is the score output by the Actor online network of the ith edge server at the time of t-1;
Figure BDA0003723162000000115
state information of the ith edge server at the moment t;
Figure BDA0003723162000000116
the score is output by an Actor target network of the ith edge server at the moment t; n is the number of edge servers.
Specifically, the method for training the criticic network by randomly sampling tuple information data from the experience pool comprises the following steps:
the total quantity of tuple information data obtained by sampling in the experience pool is B; the j-th tuple information data obtained by sampling is recorded as(s) b ,a b ,r b ,s b+1 ,a' b+1 ) (ii) a Wherein,
Figure BDA0003723162000000117
Figure BDA0003723162000000118
Figure BDA0003723162000000119
state information of the ith edge server at the moment b;
Figure BDA00037231620000001110
is the first at time bScores output by an Actor on-line network of the i edge servers;
Figure BDA00037231620000001111
the score output by an Actor target network of the ith edge server at the moment of b + 1;
obtaining an evaluation result and a corresponding evaluation label of each edge server based on tuple information data obtained by sampling; wherein, the evaluation result of the ith edge server obtained based on the jth tuple information data is that
Figure BDA00037231620000001112
And a b Inputting the evaluation result obtained by the ith Critic online network; evaluation label of ith edge server obtained based on jth tuple information data
Figure BDA00037231620000001113
r b The reward value at the b moment; γ is the reward discount rate;
Figure BDA00037231620000001114
to be composed of
Figure BDA00037231620000001115
And a' b+1 Inputting the evaluation result obtained by the ith Critic target network;
training each Critic online network by minimizing the difference between the evaluation result of each edge server and the corresponding evaluation label; and after each training cycle, updating the corresponding Critic target network based on the parameters of the Critic online network.
It should be noted that, in the dual-network structure of Actor and Critic, the online network and the target network have the same network model setting, but the weighting parameters between the networks are different; specifically, the structure of the Actor online network and the Actor target network are the same and are the same as the structure of the Actor network, which is not described herein. The structure of the Critic online network is the same as that of the Critic target network, and is not described herein again. The online network weight is updated in real time (single step), and the target network weight is updated according to the online network weight after the online network is updated by n steps.
Specifically, the neural network calculation processes of the Actor online network, the Actor target network, the Critic online network and the Critic target network are respectively defined as a function mu (i) 、μ' (i) 、Q (i) And Q' (i) The overall parameters of the neural network are respectively defined as
Figure BDA00037231620000001211
And
Figure BDA00037231620000001212
where i represents the number of the edge server. To further explain the operation process of the distributed storage system, the complete data flow process of the Actor network and the critical network in the edge environment is described below by taking the multi-agent reinforcement learning data flow diagram in the edge environment shown in fig. 4 as an example:
1) First, there is a clock synchronization process between edge servers. When the time t is reached, all edge servers observe and obtain the self environment state information, and the state information is defined as
Figure BDA0003723162000000121
2) Then, the status information is transmitted
Figure BDA0003723162000000122
As the input of the Actor online network, the action at the time t is calculated by the neural network
Figure BDA0003723162000000123
The formula is defined as
Figure BDA0003723162000000124
Each edge server then performs the action directly
Figure BDA0003723162000000125
3) Will tuple
Figure BDA0003723162000000126
The information and the additional reward value calculation information are sent to a reward calculation module in the cloud. Considering that the input of the Critic target network depends on the output of the Actor target network, it will be at this stage
Figure BDA0003723162000000127
Inputting into the target network of Actor to calculate, and defining the network output as
Figure BDA0003723162000000128
The formula is defined as
Figure BDA0003723162000000129
If not at this stage will
Figure BDA00037231620000001210
When information is calculated, each time when the Critic target network performs calculation, data needs to be sent from the cloud to the edge, and after calculation, the Actor target network at the edge sends corresponding data to the cloud. Completing the corresponding calculations at this stage may save unnecessary overhead.
4) The reward calculation module aggregates all edge server information and maintains the information at time t-1. Therefore, the system global reward r at the t-1 moment can be calculated according to the information at the t moment t-1 . Then the tuple(s) t-1 ,a t-1 ,r t-1 ,s t ,a' t ) And storing the information into an experience pool for random sampling learning of the Critic network.
5) The Critic network randomly samples B tuple data from the experience pool. As indicated by the tuple information in stage 4); specifically, the jth tuple information data obtained by sampling is recorded as(s) b ,a b ,r b ,s b+1 ,a' b+1 )。
6) And phase 5) are completely parallel processes without mutual interference. Using corresponding Critic online network to perform action online network at time tIs evaluated, and the evaluation result is defined as
Figure BDA0003723162000000131
The formula is defined as
Figure BDA0003723162000000132
Inputting the Critic online network information into the state of the online network corresponding to the Actor
Figure BDA0003723162000000133
And joint action
Figure BDA0003723162000000134
Figure BDA0003723162000000135
7) In the jth tuple information data
Figure BDA00037231620000001320
And a' b+1 Input to input Critic target network acquisition
Figure BDA0003723162000000137
The formula is defined as
Figure BDA0003723162000000138
Using the prize value r b And
Figure BDA0003723162000000139
calculating 'supervision information' required by Critic online network learning (different from labels of supervision learning, wherein the labels of Critic online network depend on a Critic target network which is learned in a system per se), and obtaining an evaluation label of the ith edge server based on the jth tuple information data as follows:
Figure BDA00037231620000001310
where gamma is the reward discount rate.
8) The online network carries out forward propagation and calculates gradient, which is the first step of the online network training learning process. Both Actor and Critic networks perform this process, but not simultaneously (on different machines) and with different training data. In the most original design of the DDPG network, an Actor network and a Critic network both use the same batch of sampled data for training and learning, but now the Actor network and the Critic network belong to different machines, and an experience pool is placed in the cloud. If the original model is reused, additional overhead (and time delay) will be generated. Therefore, the invention leads the Actor network to only learn the data at the time t, and the Critic network randomly samples the data with the size of b from the experience pool and simultaneously carries out training learning.
9) And calculating the loss value of the corresponding network at the stage, and performing back propagation updating on the online network parameters. Forward propagation is carried out in the usage phase of the Critic online network, and the j-th tuple information data is
Figure BDA00037231620000001311
And a b The evaluation result inputted to the ith Critic online network is recorded as
Figure BDA00037231620000001312
Wherein
Figure BDA00037231620000001313
And a label
Figure BDA00037231620000001314
Calculating loss value
Figure BDA00037231620000001315
The formula is shown as the formula:
Figure BDA00037231620000001316
where B is the size of the batch sample data.
Further, the Actor online network isEvaluation information directly using Critic online network
Figure BDA00037231620000001317
As a criterion for judging the performance of the user,
Figure BDA00037231620000001318
the larger the indicator the better the decision the indicator network makes, so the indicator online network is moving towards a higher probability of obtaining a larger one
Figure BDA00037231620000001319
Modifies the weighting parameters of the network. The loss function of the Actor on-line network is defined as shown in the formula:
Figure BDA0003723162000000141
and then performing back propagation to respectively update the parameters mu of the Actor online network and the Critic online network (i) And Q (i)
10 Needs to update the network weight of the target network depending on the weight information of the online network after the online network updates n steps in real time. But instead of directly copying the online network weight parameter information completely, a learning rate tau is defined, and the target network learns a part of the content from the online network each time, and the process is called Soft Update (Soft Update). The target network parameter updating formulas are respectively shown as the following formulas:
Figure BDA0003723162000000142
Figure BDA0003723162000000143
further, in an optional implementation manner, in the process of executing the operation by the cloud at each time t, after the experience pool is not full of data or the Critic network training is completed, it is determined that the operation has passed from the time tWhether the elapsed time is longer than a preset time period (the value is 600s in the embodiment), if yes, obtaining scores of each edge server at different moments from an experience pool, and calculating to obtain a score average value of each edge server; dividing the edge servers into low-delay edge servers and high-delay edge servers by taking the median of the score average of each edge server as a dividing point; the average value of the scores of the low-delay edge servers is greater than or equal to the dividing point, and the average value of the scores of the high-delay edge servers is smaller than the dividing point; partitioning the edge server by adopting two root bucket structures, wherein the two root bucket structures are respectively marked as a Low bucket and a High bucket; will be provided with
Figure BDA0003723162000000144
Placing N/2 High-latency edge servers in a High bucket; select in Low bucket
Figure BDA0003723162000000145
Selecting M/2 High delay edge servers in a High bucket to place copies by the low delay edge servers; otherwise, the operation of the cloud end at the moment t is finished; wherein N is the number of edge servers; m is the number of copies.
Specifically, in the above optional implementation manner, the overall process of the distributed storage system includes:
edge portion: at each time t, the edge server starts to collect current status data, and then calculates the action using the Actor network. And then carrying out adaptation operation on the action, executing the adaptation action, and simultaneously sending information such as state, action and performance to the cloud. And finally, waiting for the evaluation result of the cloud to train and learn the Actor network.
Cloud part: and after the cloud end collects the information of all the edge servers at the time t, calculating the reward value at the time t-1, and storing the corresponding tuple information into an experience pool for Critic network sampling learning. The cloud then evaluates the behavior of all edge servers using the Critic network. And then sending the evaluation result to each edge server, and simultaneously judging whether the experience pool is full of data or not. If enough data exists, the Critic network randomly samples data from the experience pool and trains and learns the Critic network. If not, directly judging whether the time period of the copy placement adjustment is passed. If yes, the scoring data is directly obtained from the experience pool, and the service performance expectation of each server is calculated. And finally partitioning the server according to the expected value, and changing the placement position of the copy. Otherwise, ending the flow.
It should be noted that the storage system provides a stateful data access service, which means that a data access request can only be selected between edge servers where copies of data exist, and therefore the placement of the copies will influence the decision of selection. However, the data is randomly placed, so that copies of the data possibly occurring at time t are all on servers with higher delay, requests for accessing the part of data have higher delay overhead, and the response delay of the part of requests cannot be well optimized only through a copy selection strategy. If it is assumed that there are now 8 edge servers, the response delay of each edge server is set at 2-9 ms, and there are now 8 files to be put into the storage system. Assuming the storage system uses a 3-copy policy, each edge server will store 3 files (considering the data is evenly distributed), and if the data is randomly placed, all copies of the data that may occur will be stored in servers with higher response delays. In order to solve the problems, files can be directly exchanged in a mode to ensure that all data are in an edge server with low response delay, and then a copy selection strategy is used to ensure that access requests of all data can obtain low response delay; however, data migration is overhead and requires a certain time to complete, and in the edge scenario, there is a transmission delay between servers, which requires more time to complete the data migration task. Therefore, the strategy cannot be updated in real time by placing the copy as the selection of the copy, the strategy update time period is larger for placing the copy than for selecting the copy, and it is difficult to measure the performance of the server in a long time period.
Aiming at the situation that the data placement is possible, the invention designs a copy placement optimization strategy (named as RDRP) based on ranking expectation, and migrates the data to optimize the placement of the copy and places the corresponding data migration into a server with lower delay. Considering that the edge servers are ranked once at each time t in the invention, and the purpose of RDRP is to better select a copy, RDRP uses the expectation of server ranking in this time period to measure the performance ranking that each edge server can provide when the copy placement is optimized. The scores of the servers at each time t are the output of the Actor network
Figure BDA0003723162000000151
Defining a long time period containing m total time instants, the ranking of each server is expected to be as shown in the formula:
Figure BDA0003723162000000152
meanwhile, the aim of the data placement strategy is broken by considering that the copies of the data are all placed at the highest-ranking node, which not only causes data imbalance, but also causes the response delay to be increased due to too many requests suffered by the node. Therefore, the RDRP divides the edge servers into two parts, lower latency and higher latency according to the ranking expectation, and ensures that all data is evenly placed in the edge servers with lower latency with at least one copy. Specifically, the RDRP specific implementation process is designed by combining the built-in rules of the Ceph system, and in order to implement a more flexible placement manner, the Ceph system designs a structure of buckets and rules in the cluster topology, and various flexible data placement strategies can be implemented by combining the buckets and the rules.
According to the partition design of lower delay and higher delay, two buckets are required to be defined to respectively place OSD nodes with different prediction scores, however, the partition problem of the OSD nodes is only solved, and the specific data placement position selection is controlled by a rule. The invention designs two root barrels and defines phasesThe corresponding rule flow achieves this goal. In the Ceph system, the placement of data is changed in a non-intrusive manner by combining buckets with rules. Two "root buckets" are designed, as shown in fig. 5, and two "root buckets" are defined as a Low bucket and a High bucket, respectively, where the Low bucket places nodes with lower latency and the High bucket places nodes with higher latency. Put in Low bucket
Figure BDA0003723162000000161
A number of hosts (i.e., edge servers), and N/2 number of hosts are placed in the High bucket. Defining the number of copies of data as M, the selection rule will select in the Low bucket
Figure BDA0003723162000000162
And selecting M/2 Host in the High bucket, thereby ensuring that each data has a copy in the node with lower delay.
The present embodiment shows the implementation of a specific bucket structure and rule definition with 3 copies and 5 OSD nodes. Table 1 shows specific barrel construction details.
TABLE 1
Figure BDA0003723162000000163
As shown in table 1, which contains an implementation of 7 buckets, the first field indicates the bucket type. The remaining 4 types of fields are the specific definition information of the bucket, where id represents the unique identification number of the bucket (in Ceph, the bucket is numbered from-1 to the bottom, and OSD nodes are numbered from 0 to the top); alg represents a placement selection algorithm of a sub-bucket or an OSD node in the bucket (in the selection of the placement algorithm, the invention considers that the bucket structure needs to be changed to transfer data, and an upgraded extraction algorithm straw2 is used for reducing the data transfer volume); hash represents a hash function used in the calculation process (0 represents a default function jenkins 1); item represents a sub-bucket or OSD node placed in the bucket.
As shown in fig. 6. Where ruleset represents a unique identity in the rule set; type represents the way in which multiple copies are kept (copy or erasure code); the last step represents the specific selection procedure. There are three types of operations in step, take, choose, and emit, respectively. Where take denotes obtaining a "root bucket"; choose indicates the selection of a sub-bucket or OSD node, emit is the selection of ending a "root bucket". In the choose type of operation, the first parameter is the way of choice, here the firstn (depth first traversal) method is used; the second parameter is the selected number; the third parameter is a category identifier; the fourth parameter is a specific category (which may be a bucket or OSD).
Changing the bucket structure while the system is running can change the placement of the copies, algorithm 1 is pseudo code for the bucket replacement algorithm, as shown in table 2. The algorithm first empties both the Low and High "root buckets," and then adds the corresponding Host bucket to the "root bucket," respectively, according to the desired ranking.
TABLE 2
Figure BDA0003723162000000171
In summary, the present invention maintains a server rank among servers and distributes it to clients, so that the replica selection has complete server state information and no forwarding delay overhead. Then, aiming at various factors influencing the service quality of the system in the edge environment, a performance modeling method based on multi-agent reinforcement learning is designed by researching and establishing a high-dimensional performance model by using a neural network. And by adjusting the structure and the data flow of the basic model, different network structures can be deployed at the cloud and the edge to accelerate the adjustment of the copy selection strategy, and the transmission overhead of the cloud-edge data is reduced. Finally, considering that the copy placement position will affect the selection of the copy, a copy placement optimization method based on ranking expectation is designed. According to the expectation of server ranking, the copy placement position is adjusted, so that the request can select a server with lower delay, and the request processing delay is reduced. The invention can better adapt to the copy selection in the edge environment and realize the compromise of performance and reliability.
Examples 2,
A copy selection method for a distributed storage system according to embodiment 1 includes:
in the operation process of the distributed storage system, when a server end receives a copy access request, ranking each edge server based on the scores of the edge servers, and selecting the edge server with the highest ranking and the data copy as a node for copy selection to perform data access.
Preferably, all edge servers in the distributed storage system constitute a Ceph system; and after normalizing the scores of each edge server with the data copy, the Ceph system is used as an affinity-primary parameter value corresponding to the edge server, and selects the edge server for data access based on the affinity-primary parameter value.
The related technical scheme is the same as embodiment 1, and is not described herein.
In order to illustrate the performance of the copy selection method provided by the present invention, performance test experiments are performed on three copy selection methods in three loads, wherein the number of clients is set to 10. Fig. 7 shows the average delay results of different replica selection strategies under three loads of Read-only, read-assist, and Update-assist, wherein the abscissa represents a specific replica selection strategy, and the ordinate represents a corresponding performance index (average delay, calculated in ms); wherein, MARLRS is the copy selection strategy provided by the invention, and the centralized On-off and the distributed DRS-RT are the two existing copy selection strategies. As can be seen from fig. 7, the centralized DRS-RT method has a higher average delay than the decentralized On-Off method because in the edge environment (there is a transmission delay between nodes), there is a higher delay overhead of request forwarding in the DRS-RT method. The MARLRS method provided by the invention has lower response delay under three loads compared with the other two methods because a high-dimensional model is built by multi-agent reinforcement learning and a centralized copy selection mechanism of server-side ranking is used. However, as can be seen from the comparison of different loads, the MARLRS has the lowest average delay reduction ratio at the load with higher writing ratio compared with the other two methods, because the writing operation has synchronization overhead, and the MARLRS has no control over the selection of the synchronization copy node.
Further, table 3 shows the mean delay reduction ratio of the MARLRS provided by the present invention compared to the other two methods at three loads. Specifically, the average delay is reduced by 8.89%, 8.55%, and 2.47% compared to the On-Off method, respectively. The mean delay was reduced by 11.78%, 13.72% and 10.07% compared to the DRS-RT method, respectively.
TABLE 3
Figure BDA0003723162000000181
Further, the performance of the distributed storage system is unstable due to various factors, and the response delay of each node for processing the request is different. And at the edge different requests using different servers will also get different response delays due to user mobility. The invention observes the stability of the system service by collecting the average response delay of the system at each moment for a long time and verifies the effectiveness of the MARLRS provided by the invention. Wherein each time interval is 1 second in duration. Specifically, fig. 8 shows the average response delay of each node at each time under the Read-only load using 3 different strategies. Wherein, three methods of MARLRS, on-Off and DRS-RT are represented from top to bottom respectively; the abscissa represents individual time instants; the present invention collects the average delay data of the system for 1000 moments in total, and the ordinate represents the delay (in ms). It can be seen from the figure that there are more load oscillation moments in the On-Off method, because the method using the client as the selection node has only a partial view and multiple selection nodes have difficulty in coordinating the strategy, thereby easily causing oscillation. From the overall trend of each subgraph, the overall average delay fluctuation of the system using the On-Off and DRS-RT methods is large, which shows that the two methods do not have good allocation requests. After a long-time observation, compared with copy selection strategies of On-Off and DRS-RT methods, MARLRS is more effective, so that the response delay of a system can be more stable, and more stable service quality can be provided.
Further, the average delay variation of different replica selection strategies is observed by increasing the number of clients (increasing the overall load of the system). The number of clients is set to 10, 20, 30, 40, 50, respectively. The average delay of each replica selection strategy under Read-only workload is tested. As shown in fig. 9, which shows the delay impact of different numbers of clients On three copy selection policies under Read-only load, it can be seen from the figure that, although the average response delay of the three policies shows a rising trend as the number of clients increases (the system load increases), the average delay of the MARLRS provided by the present invention decreases by 8.89%, 10.02%, 11.34%, 12.76%, and 14.43% as the number of clients increases, respectively, compared with the On-Off method; compared with the DRS-RT method, the average delay of MARLRS decreases by 11.78%, 12.04%, 12.12%, 12.15%, 11.88% as the number of clients increases, respectively. Specifically, the average delay reduction ratio of MARLRS at different numbers of clients under Read-only load is shown in table 4:
TABLE 4
Figure BDA0003723162000000191
As can be seen from the data in Table 4, as the number of clients increases, MARLRS provided by the present invention has a greater latency reduction effect than On-Off. This illustrates that as the number of clients increases (the number of concurrencies increases), the On-Off switching strategy decreases the selection efficiency and it is more difficult to coordinate the decision to reduce the load oscillation, resulting in higher latency. A DRS-RT method using a forwarding mechanism, which is exceeded even at 40 client numbers. Meanwhile, the data in the table show that the delay reduction effect of MARLRS compared to DRS-RT method does not change much, because MARLRS and DRS-RT are both centralized decisions. With the increase of the number of concurrencies, the MARLRS method for making a decision every other time may have a situation that the delay is increased due to higher concurrency within a time, but the DRS-RT also has a problem of concurrency of single-point centralized decision, resulting in higher delay. Overall, the MARLRS method outperforms the other two methods in terms of average response delay as the number of clients increases.
In summary, the present invention discloses a copy selection method for a distributed storage system, which maintains a server rank among servers and distributes the server rank to clients, so that the copy selection has complete server state information and no forwarding delay overhead. Then, aiming at various factors influencing the service quality of the system in the edge environment, a performance modeling method based on multi-agent reinforcement learning is designed by researching and establishing a high-dimensional performance model by using a neural network. And by adjusting the structure and the data flow of the basic model, different network structures can be deployed at the cloud and the edge to accelerate the adjustment of the copy selection strategy, and the transmission overhead of the data at the cloud edge is reduced. Finally, considering that the copy placement position will affect the selection of the copy, a copy placement optimization method based on ranking expectation is designed. According to the expectation of server ranking, the copy placement position is adjusted, so that the request can select a server with lower delay, and the request processing delay is reduced. The invention can better adapt to the copy selection in the edge environment and realize the compromise of performance and reliability.
Examples 3,
A replica selection system comprising: the device comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to execute the copy selection method provided by the embodiment 2 of the invention.
The related technical scheme is the same as embodiment 2, and is not described herein.
Examples 4,
A computer-readable storage medium, which includes a stored computer program, wherein when the computer program is executed by a processor, the apparatus in which the storage medium is located is controlled to execute the copy selection method provided in embodiment 2 of the present invention.
The related technical scheme is the same as embodiment 2, and is not described herein.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A distributed storage system, comprising: the cloud terminal and the server terminal; the server side includes: a plurality of distributed edge servers; an Actor network is deployed in each edge server; the cloud end is provided with a plurality of Critic networks, the number of the Critic networks is the same as that of the edge servers, and one Critic network corresponds to one Actor network;
the operation process of the distributed storage system comprises the following steps:
at each time t, each edge server performs the following operations: the method comprises the steps that an edge server collects current state data of a network environment where the edge server is located as state information of the edge server, and the current state data are input into an Actor network used for scoring the service quality of the edge server to obtain a score of the edge server; after the edge server sends the state information and the scores of all the edge servers to a corresponding Critic network in the cloud to obtain the evaluation result, training an Actor network in the edge server by taking the maximum evaluation result as a target;
at each time t, the cloud performs the following operations: collecting the information sent by all edge servers, and after collecting the information sent by all edge servers at the time t, calculating the reward value r at the time t-1 t-1 Storing the corresponding tuple information into an experience pool; when the experience pool is full of data, randomly sampling tuple information data from the experience pool to train the Critic network; wherein the tuple information comprises: the state information of all edge servers at the time t-1, the scores of all edge servers at the time t-1, the reward value at the time t-1 and the state information of all edge servers at the time t.
2. The distributed storage system of claim 1, wherein the t-1 timeBonus value of the moment r t-1 Comprises the following steps:
Figure FDA0003723161990000011
Figure FDA0003723161990000012
Figure FDA0003723161990000013
wherein N is the number of edge servers;
Figure FDA0003723161990000014
average latency for the ith edge server;
Figure FDA0003723161990000015
average of the average delays for all edge servers;
Figure FDA0003723161990000021
the number of requests processed for the ith edge server;
Figure FDA0003723161990000022
is the average of the number of requests processed by the ith edge server.
3. The distributed storage system according to claim 1, wherein in the process of executing operations at each time t, after the experience pool is not full of data or Critic network training is completed, the cloud determines whether a time duration elapsed from the time t is greater than a preset time period, if so, scores of each edge server at different times are obtained from the experience pool, and a score average value of each edge server is calculated; dividing by the median of the score average of each edge serverA point, dividing edge servers into low latency edge servers and high latency edge servers; wherein the average score of the low latency edge servers is greater than or equal to a partition point and the average score of the high latency edge servers is less than a partition point; partitioning the edge server by adopting two root barrel structures respectively, and marking as a Low barrel and a High barrel respectively; will be provided with
Figure FDA0003723161990000023
Placing N/2 High latency edge servers in the High bucket; selecting among the Low buckets
Figure FDA0003723161990000024
Selecting M/2 High latency edge servers in the High bucket to place copies; otherwise, the operation of the cloud end at the moment t is finished; wherein N is the number of edge servers; m is the number of copies.
4. The distributed storage system according to any one of claims 1-3, wherein the Actor network comprises: an Actor online network and an Actor target network; the Critic network comprises a Critic online network and a Critic target network;
the operation process of the distributed storage system comprises the following steps:
at each time t, each edge server performs the following operations: the method comprises the steps that an edge server collects current state data of a network environment where the edge server is located to serve as state information of the edge server, and the current state data are respectively input into an Actor online network and an Actor target network inside the edge server to obtain a score output by the Actor online network and a score output by the Actor target network; the method comprises the steps that after an edge server sends state information of the edge server and scores output by Actor online networks of all edge servers to a corresponding criticic online network in a cloud end to obtain an evaluation result, the Actor online network in the edge server is trained by taking the maximum evaluation result as a target; after each training cycle, updating the target network of the Actor based on the parameters of the on-line network of the Actor;
at each time t, the cloud performs the following operations: collecting information sent by all edge servers, calculating the reward value at the t-1 moment after collecting the information sent by all edge servers at the t moment, and storing corresponding tuple information into an experience pool; when the experience pool is full of data, randomly sampling tuple information data from the experience pool to train each Critic network; the tuple information includes: state information s of all edge servers at time t-1 t-1 And the score a output by the Actor on-line network of all edge servers at the moment of t-1 t-1 The reward value r at time t-1 t-1 And state information s of all edge servers at time t t And the scores a 'output by the Actor target networks of all edge servers at the moment t' t (ii) a Wherein,
Figure FDA0003723161990000031
Figure FDA0003723161990000032
Figure FDA0003723161990000033
the state information of the ith edge server at the time of t-1;
Figure FDA0003723161990000034
the score is the score output by the Actor online network of the ith edge server at the time of t-1;
Figure FDA0003723161990000035
state information of the ith edge server at the moment t;
Figure FDA0003723161990000036
the score is output by an Actor target network of the ith edge server at the time t; n is the number of edge servers.
5. The distributed storage system according to claim 4, wherein the method for training the Critic network from randomly sampled tuple information data in the experience pool comprises the following steps:
the j-th tuple information data obtained by sampling is recorded as(s) b ,a b ,r b ,s b+1 ,a' b+1 ) (ii) a Wherein,
Figure FDA0003723161990000037
Figure FDA0003723161990000038
Figure FDA0003723161990000039
state information of the ith edge server at the moment b;
Figure FDA00037231619900000310
the score is output by an Actor online network of the ith edge server at the moment b;
Figure FDA00037231619900000311
the score is output by an Actor target network of the ith edge server at the moment b + 1;
acquiring an evaluation result and a corresponding evaluation label of each edge server based on tuple information data obtained by sampling; wherein, the evaluation result of the ith edge server obtained based on the jth tuple information data is that
Figure FDA00037231619900000312
And a b Inputting the evaluation result obtained by the ith Critic online network; evaluation label of ith edge server obtained based on jth tuple information data
Figure FDA0003723161990000041
r b The reward value at the moment b; γ is the reward discount rate;
Figure FDA0003723161990000042
to be composed of
Figure FDA0003723161990000043
And a' b+1 Inputting the evaluation result obtained by the ith Critic target network;
training each Critic online network by minimizing the difference between the evaluation result of each edge server and the corresponding evaluation label; and after each training cycle, updating the corresponding Critic target network based on the parameters of the Critic online network.
6. A copy selection method for a distributed storage system according to any one of claims 1 to 5, comprising: in the operation process of the distributed storage system, when a server end receives a copy access request, ranking each edge server based on the scores of the edge servers, and selecting the edge server with the highest ranking and the data copy as a node for copy selection to perform data access.
7. The copy selection method of claim 6, wherein all edge servers in the distributed storage system form a Ceph system; and after normalizing the scores of each edge server with the data copy, the Ceph system is used as an affinity-primary parameter value corresponding to the edge server, and selects the edge server for data access based on the affinity-primary parameter value.
8. The replica selection method of claim 7 wherein the Ceph system normalizes the scores of each edge server where a data replica exists using a max-min normalization method.
9. A copy selection system, comprising: a memory storing a computer program and a processor executing the computer program to perform the copy selection method of any of claims 6-8.
10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed by a processor, controls an apparatus in which the storage medium is located to perform the copy selection method of any of claims 6-8.
CN202210768871.2A 2022-06-30 2022-06-30 Distributed storage system and copy selection method thereof Active CN115190135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210768871.2A CN115190135B (en) 2022-06-30 2022-06-30 Distributed storage system and copy selection method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210768871.2A CN115190135B (en) 2022-06-30 2022-06-30 Distributed storage system and copy selection method thereof

Publications (2)

Publication Number Publication Date
CN115190135A true CN115190135A (en) 2022-10-14
CN115190135B CN115190135B (en) 2024-05-14

Family

ID=83515750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210768871.2A Active CN115190135B (en) 2022-06-30 2022-06-30 Distributed storage system and copy selection method thereof

Country Status (1)

Country Link
CN (1) CN115190135B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362426A (en) * 2019-06-21 2019-10-22 华中科技大学 A kind of selective copy realization method and system towards sudden load
US20200351344A1 (en) * 2019-04-30 2020-11-05 EMC IP Holding Company LLC Data tiering for edge computers, hubs and central systems
CN112511336A (en) * 2020-11-05 2021-03-16 上海大学 Online service placement method in edge computing system
CN113014968A (en) * 2021-02-24 2021-06-22 南京大学 Multi-user dynamic code rate video transmission method and system based on reinforcement learning
CN113114756A (en) * 2021-04-08 2021-07-13 广西师范大学 Video cache updating method for self-adaptive code rate selection in mobile edge calculation
US11206221B1 (en) * 2021-06-04 2021-12-21 National University Of Defense Technology Online task dispatching and scheduling system and method thereof
CN113873022A (en) * 2021-09-23 2021-12-31 中国科学院上海微系统与信息技术研究所 Mobile edge network intelligent resource allocation method capable of dividing tasks
CN114423061A (en) * 2022-01-20 2022-04-29 重庆邮电大学 Wireless route optimization method based on attention mechanism and deep reinforcement learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200351344A1 (en) * 2019-04-30 2020-11-05 EMC IP Holding Company LLC Data tiering for edge computers, hubs and central systems
CN110362426A (en) * 2019-06-21 2019-10-22 华中科技大学 A kind of selective copy realization method and system towards sudden load
CN112511336A (en) * 2020-11-05 2021-03-16 上海大学 Online service placement method in edge computing system
CN113014968A (en) * 2021-02-24 2021-06-22 南京大学 Multi-user dynamic code rate video transmission method and system based on reinforcement learning
CN113114756A (en) * 2021-04-08 2021-07-13 广西师范大学 Video cache updating method for self-adaptive code rate selection in mobile edge calculation
US11206221B1 (en) * 2021-06-04 2021-12-21 National University Of Defense Technology Online task dispatching and scheduling system and method thereof
CN113873022A (en) * 2021-09-23 2021-12-31 中国科学院上海微系统与信息技术研究所 Mobile edge network intelligent resource allocation method capable of dividing tasks
CN114423061A (en) * 2022-01-20 2022-04-29 重庆邮电大学 Wireless route optimization method based on attention mechanism and deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
代兴宇;廖飞;陈捷;: "边缘计算安全防护体系研究", 通信技术, no. 01, 10 January 2020 (2020-01-10) *
卢海峰;顾春华;罗飞;丁炜超;杨婷;郑帅;: "基于深度强化学习的移动边缘计算任务卸载研究", 计算机研究与发展, no. 07, 7 July 2020 (2020-07-07) *

Also Published As

Publication number Publication date
CN115190135B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
Seghir et al. A hybrid approach using genetic and fruit fly optimization algorithms for QoS-aware cloud service composition
Zhang et al. A multi-agent reinforcement learning approach for efficient client selection in federated learning
CN111124689B (en) Container resource dynamic allocation method in cluster
CN111064633B (en) Cloud-edge cooperative power information communication equipment automated testing resource allocation method
CN112329948A (en) Multi-agent strategy prediction method and device
CN113485826B (en) Load balancing method and system for edge server
Supreeth et al. Hybrid genetic algorithm and modified-particle swarm optimization algorithm (GA-MPSO) for predicting scheduling virtual machines in educational cloud platforms
CN113887748B (en) Online federal learning task allocation method and device, and federal learning method and system
Li et al. DQN-enabled content caching and quantum ant colony-based computation offloading in MEC
CN115022231A (en) Optimal path planning method and system based on deep reinforcement learning
Li et al. Co-evolutionary multi-colony ant colony optimization based on adaptive guidance mechanism and its application
Awad et al. A swarm intelligence-based approach for dynamic data replication in a cloud environment
Mostafa et al. An intelligent dynamic replica selection model within grid systems
CN113064907B (en) Content updating method based on deep reinforcement learning
Wang et al. Fractional Order Differential Evolution
Yan et al. Service caching for meteorological emergency decision-making in cloud-edge computing
Liu et al. Learning-based adaptive data placement for low latency in data center networks
Tang et al. Multi-user layer-aware online container migration in edge-assisted vehicular networks
CN115225512B (en) Multi-domain service chain active reconfiguration mechanism based on node load prediction
CN115190135B (en) Distributed storage system and copy selection method thereof
CN116389255A (en) Service function chain deployment method for improving double-depth Q network
Zhang et al. Optimizing federated edge learning on Non-IID data via neural architecture search
Gao et al. Deep reinforcement learning based rendering service placement for cloud gaming in mobile edge computing systems
CN112306641B (en) Training method for virtual machine migration model
CN118245809B (en) Batch size adjustment method in distributed data parallel online asynchronous training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant