CN113779302A - Semi-distributed cooperative storage method based on value decomposition network and multi-agent reinforcement learning - Google Patents

Semi-distributed cooperative storage method based on value decomposition network and multi-agent reinforcement learning Download PDF

Info

Publication number
CN113779302A
CN113779302A CN202111058748.3A CN202111058748A CN113779302A CN 113779302 A CN113779302 A CN 113779302A CN 202111058748 A CN202111058748 A CN 202111058748A CN 113779302 A CN113779302 A CN 113779302A
Authority
CN
China
Prior art keywords
network
wireless
wireless service
storage
service node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111058748.3A
Other languages
Chinese (zh)
Other versions
CN113779302B (en
Inventor
陈由甲
蔡粤楷
郑海峰
胡锦松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202111058748.3A priority Critical patent/CN113779302B/en
Publication of CN113779302A publication Critical patent/CN113779302A/en
Application granted granted Critical
Publication of CN113779302B publication Critical patent/CN113779302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a semi-distributed cooperative storage method based on a value decomposition network and multi-agent reinforcement learning, which is characterized in that a semi-distributed multi-agent reinforcement learning frame is designed according to a wireless intelligent storage network model, and a state space, an action space and a reward function are designed to realize characteristic identification of user and wireless service node information in a wireless network; a dynamic storage algorithm is provided by combining the efficient decision-making capability of the Dueling DQN network, and is used for a storage replacement strategy of each wireless service node; providing a global strategy updating parameter calculated by decomposing a network by using a sink node embedded value of a wireless network, and transmitting the global strategy updating parameter to each wireless service node to update the local strategy of each agent; and continuously iterating and updating the neural networks in the agents to enable the global loss function to reach a convergence state, so that a globally optimal storage strategy is obtained. The information of each agent is transmitted to the sink node, so that mutual cooperation of the agents is promoted, and global optimum is quickly achieved.

Description

Semi-distributed cooperative storage method based on value decomposition network and multi-agent reinforcement learning
Technical Field
The invention belongs to the field of wireless communication and the technical field of computers, relates to deep reinforcement learning, a distributed system, algorithm complexity optimization, wireless transmission and the like in machine learning, and particularly relates to a semi-distributed cooperative storage method based on value decomposition network and multi-agent reinforcement learning.
Background
With the exponential growth of mobile wireless communication, data demand, and the continuous increase of device storage and computing power, real-time multimedia services are gradually becoming a major business in 5G communication networks, and human life and work are gradually migrating toward the overall movement of the mobile internet, pushing various network functions to the edge of the network, such as edge computing and edge storage. By storing popular content requested by users, edge storage aims to reduce traffic load and duplicate transmissions in the backhaul network, thereby significantly reducing transmission delays. In addition, for the rise of online video services, how to improve the experience of video users in wireless networks also becomes a new challenge. Therefore, this should be a video service policy bar. To capture the dynamics of user requested content and the wireless environment, a policy control framework is introduced into the field of wireless storage. In addition, due to the layout of large-scale wireless service nodes, more and more attention is paid to how to improve the overall service performance of the wireless network through cooperation among a plurality of wireless service nodes.
Disclosure of Invention
In order to make up the blank and the deficiency of the prior art, the invention aims to provide a semi-distributed cooperative storage method based on a value decomposition network and multi-agent reinforcement learning, which designs a semi-distributed multi-agent reinforcement learning frame, a state space, an action space and a reward function according to a wireless intelligent storage network model to realize the characteristic identification of user and wireless service node information in a wireless network; a dynamic storage algorithm is provided by combining the efficient decision-making capability of the Dueling DQN network, and is used for a storage replacement strategy of each wireless service node; providing a global strategy updating parameter calculated by decomposing a network by using a sink node embedded value of a wireless network, and transmitting the global strategy updating parameter to each wireless service node to update the local strategy of each agent; and continuously iterating and updating the neural networks in the agents to enable the global loss function to reach a convergence state, so that a globally optimal storage strategy is obtained. The information of each agent is transmitted to the sink node, so that mutual cooperation of the agents is promoted, and global optimum is quickly achieved.
The key problem of the invention is to accurately predict the user requirements, a dimension decomposition mechanism is introduced in consideration of the actual environment complexity of the user in the network, particularly in the wireless network, and finally, a user service strategy algorithm based on dimension decomposition is provided in each agent, and the final strategy is converged by continuously updating and iterating. Simulation results show that under various environmental parameter scenes, the algorithm is remarkably improved in the aspects of reducing access delay and improving the performance of user service experience. In addition, the algorithm can process a great action space, the convergence of the whole system is accelerated by a semi-fractional framework constructed by value decomposition, the calculation complexity is low, and most of the operation time is saved compared with the traditional multi-agent algorithm.
The invention specifically adopts the following technical scheme:
a semi-distributed cooperative storage method based on a value decomposition network and multi-agent reinforcement learning is characterized in that the implementation process comprises the following steps:
step S1: constructing a wireless network model of multi-device cooperative semi-distributed cooperative storage based on wireless network transmission, wherein the wireless network model comprises a sink node and wireless service nodes, an intelligent agent state space and an action space based on value decomposition network and multi-intelligent agent deep reinforcement learning, a combined state space and action space and a reward function based on optimization target design are defined, so that the wireless network service quality is improved to the maximum extent, and the access delay of storage contents is reduced;
step S2: the information about each wireless service node is collected and analyzed in the aggregation node, the cooperation of each wireless service node is coordinated through a constructed value decomposition network model, namely, the action value function of each wireless service node is used as the input of the value decomposition network, the output is the global action value function and the global strategy updating parameters of the whole system, and the result is fed back to the whole semi-distributed system, including the strategy of feeding back each wireless service node to update a single wireless service node, so that the cooperation performance and the convergence speed of wireless edge storage are improved.
Further, step S1 specifically includes the following steps:
step S11: defining a user set, a condition that the user set belongs to a wireless service node, a user request variable, a wireless service node storage variable, a file set, a quality variable, a video layer set and a wireless service node set, unit time delay of local hit, a cooperation hit and downloading from a server, and a user request quality and service quality variable;
step S12: constructing performance indexes of a storage model, including video access delay and user experience scores, and constructing a final optimization target, namely a reward function, based on the two target optimization problems; defining a user request variable, a user request quality and a wireless service node storage variable as a state space, and defining the wireless service node storage variable and the user service quality at the next moment as an action space;
step S13: utilizing a Dueling DQN network to perform state and action fitting, wherein the Dueling DQN network splits branches of a neural network into a state value branch and a dominant action branch, the state value branch is used for estimating the state of the current wireless network, and the dominant action branch is used for estimating each action; and evaluating the performance of each action by combining the state value and the advantage value.
Further, step S11 is specifically: set of users, U, expressed as {1.. I.. I }jRepresenting a set of users belonging to a wireless serving node j, a user request variable lambdaivAnd wireless service node storage variable deltajvlA file set {1.. V.. V }, a quality variable K, a video layer set {1.. L.. L }, a wireless service node set {1.. J.. J }; using unit access delay d0,djj',djRespectively representing unit time delay of local hit, cooperation hit and downloading from a server;
and defining a user request quality kivAnd quality of service variables
Figure BDA0003254934830000037
In step S12: the performance indexes of the storage model are constructed, and comprise video access time delay D and user experience score M as follows:
Figure BDA0003254934830000031
Figure BDA0003254934830000032
wherein c is1Take 0.16, c20.66 is taken as a quality evaluation coefficient.
And constructs a final optimization objective, i.e. a reward function,
Figure BDA0003254934830000033
η is a weighting factor used to adjust the weights of access delay and user experience score.
In step S13, the evaluation operator q (S, a; theta) of the Dueling DQN network is divided into q for the target network and the evaluation networkej,qgj
Further, step S2 specifically includes the following steps:
step S21: introducing a value decomposition network into a sink node, wherein the sink node firstly collects the states and rewards of all agents to construct a combined state and combined action and calculates the reward value of the whole system; and introduces an experience playback library for storing a library containing four elements (S)(t),A(t),r(t),S(t+1)) Each sample computing from the agent's DuelingDQN network a respective action cost function q (s, a; theta), and finally calculating the global action cost function of the whole system by using the value decomposition network
Figure BDA0003254934830000034
And
Figure BDA0003254934830000035
step S22: based on the global action cost function of the whole system calculated in step S21, the collective reward function constructs a loss function to calculate global policy update parameters,
Figure BDA0003254934830000036
Figure BDA0003254934830000041
the trained global strategy updating parameters are reversely transmitted back to each wireless service node in the wireless service node group, so that the wireless service nodes can conveniently use a gradient updating method for the self neural network
Figure BDA0003254934830000042
And updating to obtain a better strategy.
Further, a dimension decomposition mechanism is embedded into the dulingdqn to reduce the complexity of decision and improve the performance of wireless service:
the actions output by the dulingdqn network are decomposed according to the dimension of the actual physical meaning, that is, the actions are decomposed into three dimensions, which are: what type of video delta is storedjvWhat video layer δ is storedjlAnd what quality of service to the user
Figure BDA0003254934830000047
(ii) a The action in each dimension is represented by a single neural network branch, and all actions are independently selected in the dimension of the action and are not influenced by each other;
further, after the dimension decomposition mechanism is embedded, the calculation method of the action cost function is as follows: calculated by dimension in a Dueling DQN network, i.e.
Figure BDA0003254934830000043
Figure BDA0003254934830000044
At the same time, the calculation of the sink node global action cost function is also updated, i.e.
Figure BDA0003254934830000045
And
Figure BDA0003254934830000046
and, a wireless network model, comprising: the system comprises a wireless service node, a sink node, a source server and a core network; each wireless service node can download files on a source server through a return link, store the files locally and directly serve users in a cell; user storage and inter-user collaboration are performed using the semi-distributed collaborative storage method based on a value decomposition network and multi-agent reinforcement learning as described above.
Furthermore, for each wireless service node, namely each agent, a user request set and a file storage set are used as state spaces and used as input of a neural network, the output of the neural network is a storage content set and a file quality set of a service user in the next time period, and each wireless service node can download files on a source server through a backhaul link, store the files locally and directly serve the users in a coverage range; in order to reduce the file downloading time, different wireless service nodes are allowed to perform multi-device cooperation through the sink node, so that the wireless network service quality is further improved, and the access delay of the stored content is reduced.
Compared with the prior art, the method and the preferred scheme thereof can promote the cooperation among the multiple intelligent agents by utilizing the value decomposition network of the sink node, and solve the problem of decision complexity in a real wireless network environment by utilizing the framework of a dimension decomposition mechanism, thereby improving the capability of mobile wireless edge storage. The method can obtain good performance even under the conditions of limited storage resources, limited computing resources and complex user and wireless environment, so as to reduce the access delay of the stored content and improve the service quality of the user.
Drawings
The invention is described in further detail below with reference to the following figures and detailed description:
fig. 1 is a schematic diagram of a semi-distributed wireless cooperative storage network model in an embodiment of the present invention.
Fig. 2 is a schematic diagram of a Dueling DQN network in an embodiment of the present invention.
FIG. 3 is a schematic diagram of a dimension decomposition mechanism.
FIG. 4 is a comparison chart of the results of different file parameters in the embodiment of the present invention.
FIG. 5 is a graph comparing performance of different algorithms in an embodiment of the invention.
FIG. 6 is a comparison of results for different dimensional numbers in an example of the invention.
Detailed Description
In order to make the features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail as follows:
it should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The semi-distributed cooperative storage algorithm based on the value decomposition network and the multi-agent reinforcement learning provided by the embodiment is specifically realized according to the following steps,
step S1: a semi-distributed cooperative storage wireless network model is provided, state spaces, action spaces and reward functions based on time delay and user experience design of each agent are defined, and the purpose is to improve the service quality of local wireless service nodes to the maximum extent;
1) in this example, first, a user set, U, represented by {1.. I }, isjRepresenting a set of users belonging to a wireless serving node j, a user request variable lambdaivAnd wireless service node storage variable deltajvlA set of files {1.. V }, a quality variable K, a set of video layers {1.. L.. L }, a set of wireless serving nodes {1.. J.. J }. In addition, unit access time delay d is given0,djj',djRespectively representing unit time delay of local hit, cooperative hit and download from a server, and defining user request quality k for constructing an optimization problemivAnd quality of service variables
Figure BDA0003254934830000051
2) The performance indexes of the storage model are constructed, including video access time delay D and user experience score M,
Figure BDA0003254934830000061
Figure BDA0003254934830000062
and constructs a final optimization objective, i.e. the reward function,
Figure BDA0003254934830000063
in addition to defining the user request variables,the user request quality also comprises a state space of a wireless service node storage variable, and the next moment of the wireless service node storage variable and the user service quality are action spaces.
3) The Dueling DQN network is used for fitting the state and the action, the Dueling DQN network divides the branches of the neural network into a state value branch and a dominant action branch, the state value branch is mainly used for estimating the state of the current wireless network, the dominant action branch is used for estimating each action,
Figure BDA0003254934830000064
Figure BDA0003254934830000065
finally, q (s, a; theta) is combined with the state value and the advantage value to accurately evaluate the performance of each action, and the Dueling DQN network is further divided into a target network and an evaluation network, so that the q (s, a; theta) also has two kinds of qej,qgj
Step S2: information about each wireless service node is collected and analyzed in a sink node, cooperation of each wireless service node is coordinated through a value decomposition network model, namely an action value function of each intelligent agent is used as input of the value decomposition network and is output as a global action value function and a global strategy updating parameter of the whole system, and a result is fed back to the whole semi-distributed system, so that cooperation performance of wireless edge storage is improved;
by defining an action value function, continuously iterating network parameters, and finally obtaining an optimal storage strategy and a user service strategy by each wireless service node in each obstructed state; in order to achieve better performance of the neural network, the Dueling DQN network of this embodiment may additionally use a fighting mechanism and a dual-network mechanism. The dual-network mechanism adopts a neural network with completely consistent structure for delaying updating to improve the stability of the algorithm, so that the algorithm is easier to converge. The decision mechanism additionally adopts the values of the estimated state value and the dominant value to judge the quality of the output action of the neural network, so that the decision is more accurate.
1) To be withoutThe online service nodes can better cooperate, a new module value decomposition network is introduced into the aggregation node, the aggregation node can firstly collect the states and rewards of all intelligent agents to construct combined states and combined actions and then calculate the reward value of the whole system, and in addition, an experience playback library is introduced and used for storing the reward value containing four elements (S)(t),A(t),r(t),S(t+1)) Each sample can calculate a respective action cost function q (s, a; theta), and finally calculating the global action cost function of the whole system by using the value decomposition network
Figure BDA0003254934830000066
And
Figure BDA0003254934830000067
2) based on the global action cost function of the whole system calculated in the previous step, the aggregate reward function constructs a loss function to calculate the global strategy update parameters,
Figure BDA0003254934830000071
Figure BDA0003254934830000072
the trained global policy update parameters may be passed back to individual wireless serving nodes within the wireless serving node cluster to facilitate their use with their own neural network
Figure BDA0003254934830000073
And updating to obtain a better strategy, wherein the updated strategy has better performance in both collaboration and predictability.
Step S3: in consideration of the complexity of the deep reinforcement learning practical environment, a dimension decomposition mechanism is additionally provided to be embedded into the dulling DQN so as to reduce the complexity of decision and improve the performance of the wireless service.
1) Decomposing the actions output by the Dueling DQN network according to the dimensions, and analyzing the actions in the sceneIn (1), the action is decomposed into three dimensions, respectively what type of video δ is storedjvWhat video layer δ is storedjlAnd what quality of service to the user
Figure BDA0003254934830000074
The actions in each dimension are represented by a single neural network branch, so that all actions are selected independently in the dimension without mutual influence.
2) After the dimension decomposition mechanism is embedded, the action cost function calculation method also generates corresponding changes, and the action cost function is calculated according to the dimension in the Dueling DQN network, namely the action cost function is calculated according to the dimension
Figure BDA0003254934830000075
Figure BDA0003254934830000076
At the same time, the calculation of the sink node global action cost function is also updated, i.e.
Figure BDA0003254934830000077
And
Figure BDA0003254934830000078
in order to further understand the semi-distributed cooperative storage algorithm based on the value decomposition network and the multi-agent reinforcement learning, which is proposed by the present invention, the following detailed description is made with reference to specific embodiments. The embodiment is implemented on the premise of the technical scheme of the invention.
As shown in fig. 1, it is a semi-distributed wireless intelligent storage network model.
The model mainly comprises wireless service nodes, sink nodes, a source server, a core network and the like, introduces a user storage model under the wireless service nodes and a cooperation model among users, and each wireless service node can download files on the source server through a return link, store the files locally and directly serve the users in a cell.
Fig. 2 is a schematic diagram of a Dueling DQN network and a value decomposition network.
The network framework is divided into a target network and an evaluation network, the neural network branches of each network are subdivided into a state value network and an advantage value network, and the advantages and disadvantages of actions can be better evaluated through the double-network and double-branch architecture. The network takes the state space as input and the action space as output, and the network parameters are continuously optimized by continuously receiving the global strategy updating parameters.
As shown in fig. 3, the decomposition mechanism is schematic.
The composite action space is decomposed into independent actions in multiple dimensions, each dimension can select actions in an independent neural network branch, and therefore high complexity of composite actions is avoided.
FIG. 4 is a graph comparing the results of different file parameters in the examples.
The experimental result shows that the algorithm can not only cope with the complexity calculation degree under the condition of high file number, but also can well find a global strategy under different high complexity, so that the reward value converges to a high value.
The analysis shows that the semi-distributed cooperative storage algorithm based on the value decomposition network and the multi-agent reinforcement learning can obtain better storage capacity than the existing method, can well improve the storage problem of the user, and has certain reference value and actual economic benefit.
Fig. 5 is a graph showing comparison between the performance of the algorithm in the embodiment of the present invention.
Compared with the traditional multi-agent algorithm, the introduced semi-distributed architecture and the dimension decomposition method can respectively improve the performance by 23.4% and 30.5% compared with the traditional algorithm under the conditions of 5 files and 10 files, and meanwhile, the convergence speed is higher, so that the algorithm can rapidly position a global optimal strategy under different scenes.
FIG. 6 shows the comparison of the results of the dimension numbers in the example of the present invention.
The experimental result can be analyzed, the algorithm can be extended and expanded on a dimension decomposition mechanism, and the more the subdivision branches of the dimension number are (on the premise of meeting the action decomposition standard), the better yield can be brought by the algorithm.
The above method provided by this embodiment can be stored in a computer readable storage medium in a coded form, and implemented in a computer program, and inputs basic parameter information required for calculation through computer hardware, and outputs the calculation result.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.
The present invention is not limited to the above preferred embodiments, and all other various semi-distributed cooperative storage methods based on value decomposition network and multi-agent reinforcement learning can be obtained by anyone who can benefit from the present invention.

Claims (8)

1. A semi-distributed cooperative storage method based on a value decomposition network and multi-agent reinforcement learning is characterized in that the implementation process comprises the following steps:
step S1: constructing a wireless network model of multi-device cooperative semi-distributed cooperative storage based on wireless network transmission, wherein the wireless network model comprises a sink node and wireless service nodes, an intelligent agent state space and an action space based on value decomposition network and multi-intelligent agent deep reinforcement learning, a combined state space and action space and a reward function based on optimization target design are defined, so that the wireless network service quality is improved to the maximum extent, and the access delay of storage contents is reduced;
step S2: the method comprises the steps of collecting and analyzing information about each wireless service node in a sink node, coordinating the cooperation of each wireless service node by constructing an value decomposition network model, namely taking an action value function of each wireless service node as the input of the value decomposition network, outputting the action value function as a global action value function and a global strategy updating parameter of the whole system, and feeding back the result to the whole semi-distributed system, wherein the strategy of feeding back the updating parameter to each wireless service node to update a single wireless service node is included, so that the cooperation performance and the convergence speed of wireless edge storage are improved.
2. The semi-distributed collaborative storage method based on value decomposition network and multi-agent reinforcement learning of claim 1, wherein: step S1 specifically includes the following steps:
step S11: defining a user set, a condition that the user set belongs to a wireless service node, a user request variable, a wireless service node storage variable, a file set, a quality variable, a video layer set and a wireless service node set, unit time delay of local hit, a cooperation hit and downloading from a server, and a user request quality and service quality variable;
step S12: constructing performance indexes of a storage model, including video access delay and user experience scores, and constructing a final optimization target, namely a reward function, based on the two target optimization problems; defining a user request variable, a user request quality and a wireless service node storage variable as a state space, and defining the wireless service node storage variable and the user service quality at the next moment as an action space;
step S13: fitting of states and actions is carried out by utilizing a DuelingDQN network, the DuelingDQN network splits branches of a neural network into state value branches and dominant action branches, the state value branches are used for estimating the state of the current wireless network, and the dominant action branches are used for estimating each action; and evaluating the performance of each action by combining the state value and the advantage value.
3. The semi-distributed collaborative storage method based on value decomposition networking and multi-agent reinforcement learning of claim 2, wherein:
step S11 specifically includes: set of users, U, expressed as {1.. I.. I }jRepresenting a set of users belonging to a wireless serving node j, a user request variable lambdaivAnd wireless service node storage variable deltajvlA file set {1.. V.. V }, a quality variable K, a video layer set {1.. L.. L }, a wireless service node set {1.. J.. J }; using unit access delay d0,djj',djRespectively representing unit time delay of local hit, cooperation hit and downloading from a server; and defining a user request quality kivAnd quality of service variables
Figure FDA0003254934820000028
In step S12: the performance indexes of the storage model are constructed, and comprise video access time delay D and user experience score M as follows:
Figure FDA0003254934820000021
Figure FDA0003254934820000022
wherein c is1Take 0.16, c2Taking 0.66 as a quality evaluation coefficient;
and constructs a final optimization objective, i.e. a reward function,
Figure FDA0003254934820000023
eta is a weight coefficient used for adjusting the weight of access delay and user experience score;
in step S13, the evaluation operator q (S, a; theta) of the DuelingDQN network is divided into q for the target network and the evaluation networkej,qgj
4. The semi-distributed collaborative storage method based on value decomposition networking and multi-agent reinforcement learning of claim 3, wherein: step S2 specifically includes the following steps:
step S21: introducing a value decomposition network into a sink node, wherein the sink node firstly collects the states and rewards of all agents to construct a combined state and combined action and calculates the reward value of the whole system; and introduces an experience playback library for storing a library containing four elements (S)(t),A(t),r(t),S(t+1)) Each sample computing from the agent's DuelingDQN network a respective action cost function q (s, a; theta), and finally calculating the global action cost function of the whole system by using the value decomposition network
Figure FDA0003254934820000024
And
Figure FDA0003254934820000025
step S22: based on the global action cost function of the whole system calculated in step S21, the collective reward function constructs a loss function to calculate global policy update parameters,
Figure FDA0003254934820000026
Figure FDA0003254934820000027
the trained global strategy updating parameters are reversely transmitted back to each wireless service node in the wireless service node group, so that the wireless service nodes can conveniently use a gradient updating method for the self neural network
Figure FDA0003254934820000031
And updating to obtain a better strategy.
5. The semi-distributed collaborative storage method based on value decomposition networking and multi-agent reinforcement learning of claim 2, wherein: a dimension decomposition mechanism is adopted to be embedded into the DuelingDQN so as to reduce the complexity of decision and improve the performance of wireless service:
will DThe actions output by the uelingDQN network are decomposed according to the dimension of the actual physical meaning, i.e. the actions are decomposed into three dimensions, which are: what type of video delta is storedjvWhat video layer δ is storedjlAnd what quality of service to the user
Figure FDA0003254934820000032
The action in each dimension is represented by a single neural network branch, and all actions are independently selected in the dimension without mutual influence.
6. The semi-distributed collaborative storage method based on value decomposition networking and multi-agent reinforcement learning of claim 5, wherein:
after the dimension decomposition mechanism is embedded, the calculation method of the action cost function comprises the following steps: calculated by dimension in the DuelingDQN network, i.e.
Figure FDA0003254934820000033
Figure FDA0003254934820000034
At the same time, the calculation of the sink node global action cost function is also updated, i.e.
Figure FDA0003254934820000035
7. A wireless network model, comprising: the system comprises a wireless service node, a sink node, a source server and a core network; each wireless service node can download files on a source server through a return link, store the files locally and directly serve users in a cell; performing content storage and inter-user collaboration using a semi-distributed collaborative storage method based on a value decomposition network and multi-agent reinforcement learning as claimed in any one of claims 1-6.
8. The wireless network model of claim 7, wherein: for each wireless service node, namely each agent, taking a user request set and a file storage set as state spaces and as input of a neural network, wherein the output of the neural network is a storage content set of a next time period and a file quality set of a service user, and each wireless service node can download files on a source server through a backhaul link, store the files locally and directly serve the users in a coverage range; in order to reduce the file downloading time, different wireless service nodes are allowed to perform multi-device cooperation through the sink node, so that the wireless network service quality is further improved, and the access delay of the stored content is reduced.
CN202111058748.3A 2021-09-09 2021-09-09 Semi-distributed collaborative storage method based on value decomposition network and multiple agents Active CN113779302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111058748.3A CN113779302B (en) 2021-09-09 2021-09-09 Semi-distributed collaborative storage method based on value decomposition network and multiple agents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111058748.3A CN113779302B (en) 2021-09-09 2021-09-09 Semi-distributed collaborative storage method based on value decomposition network and multiple agents

Publications (2)

Publication Number Publication Date
CN113779302A true CN113779302A (en) 2021-12-10
CN113779302B CN113779302B (en) 2023-09-22

Family

ID=78842194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111058748.3A Active CN113779302B (en) 2021-09-09 2021-09-09 Semi-distributed collaborative storage method based on value decomposition network and multiple agents

Country Status (1)

Country Link
CN (1) CN113779302B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114867061A (en) * 2022-07-05 2022-08-05 深圳市搜了网络科技股份有限公司 Cloud monitoring method based on wireless communication network
CN115065728A (en) * 2022-06-13 2022-09-16 福州大学 Multi-strategy reinforcement learning-based multi-target content storage method
CN115086374A (en) * 2022-06-14 2022-09-20 河南职业技术学院 Scene complexity self-adaptive multi-agent layered cooperation method
CN116996919A (en) * 2023-09-26 2023-11-03 中南大学 Single-node multi-domain anti-interference method based on reinforcement learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190014488A1 (en) * 2017-07-06 2019-01-10 Futurewei Technologies, Inc. System and method for deep learning and wireless network optimization using deep learning
CN111079305A (en) * 2019-12-27 2020-04-28 南京航空航天大学 Different-strategy multi-agent reinforcement learning cooperation method based on lambda-reward
CN112364984A (en) * 2020-11-13 2021-02-12 南京航空航天大学 Cooperative multi-agent reinforcement learning method
CN112396187A (en) * 2020-11-19 2021-02-23 天津大学 Multi-agent reinforcement learning method based on dynamic collaborative map
CN112465151A (en) * 2020-12-17 2021-03-09 电子科技大学长三角研究院(衢州) Multi-agent federal cooperation method based on deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190014488A1 (en) * 2017-07-06 2019-01-10 Futurewei Technologies, Inc. System and method for deep learning and wireless network optimization using deep learning
CN111079305A (en) * 2019-12-27 2020-04-28 南京航空航天大学 Different-strategy multi-agent reinforcement learning cooperation method based on lambda-reward
CN112364984A (en) * 2020-11-13 2021-02-12 南京航空航天大学 Cooperative multi-agent reinforcement learning method
CN112396187A (en) * 2020-11-19 2021-02-23 天津大学 Multi-agent reinforcement learning method based on dynamic collaborative map
CN112465151A (en) * 2020-12-17 2021-03-09 电子科技大学长三角研究院(衢州) Multi-agent federal cooperation method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李孜恒;孟超;: "基于深度强化学习的无线网络资源分配算法", 通信技术, no. 08 *
谢添;高士顺;赵海涛;林沂;熊俊;: "基于强化学习的定向无线通信网络抗干扰资源调度算法", 电波科学学报, no. 04 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065728A (en) * 2022-06-13 2022-09-16 福州大学 Multi-strategy reinforcement learning-based multi-target content storage method
CN115065728B (en) * 2022-06-13 2023-12-08 福州大学 Multi-strategy reinforcement learning-based multi-target content storage method
CN115086374A (en) * 2022-06-14 2022-09-20 河南职业技术学院 Scene complexity self-adaptive multi-agent layered cooperation method
CN114867061A (en) * 2022-07-05 2022-08-05 深圳市搜了网络科技股份有限公司 Cloud monitoring method based on wireless communication network
CN114867061B (en) * 2022-07-05 2022-12-13 深圳市搜了网络科技股份有限公司 Cloud monitoring method based on wireless communication network
CN116996919A (en) * 2023-09-26 2023-11-03 中南大学 Single-node multi-domain anti-interference method based on reinforcement learning
CN116996919B (en) * 2023-09-26 2023-12-05 中南大学 Single-node multi-domain anti-interference method based on reinforcement learning

Also Published As

Publication number Publication date
CN113779302B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
CN113779302A (en) Semi-distributed cooperative storage method based on value decomposition network and multi-agent reinforcement learning
Jiang et al. Distributed resource scheduling for large-scale MEC systems: A multiagent ensemble deep reinforcement learning with imitation acceleration
CN111611062B (en) Cloud-edge collaborative hierarchical computing method and cloud-edge collaborative hierarchical computing system
CN113435472A (en) Vehicle-mounted computing power network user demand prediction method, system, device and medium
Wu et al. Multi-agent DRL for joint completion delay and energy consumption with queuing theory in MEC-based IIoT
Lin et al. AI-driven collaborative resource allocation for task execution in 6G-enabled massive IoT
CN114710439B (en) Network energy consumption and throughput joint optimization routing method based on deep reinforcement learning
Yang et al. Deep reinforcement learning based wireless network optimization: A comparative study
CN111198550A (en) Cloud intelligent production optimization scheduling on-line decision method and system based on case reasoning
CN115065728B (en) Multi-strategy reinforcement learning-based multi-target content storage method
Jeon et al. A distributed nwdaf architecture for federated learning in 5g
CN116185523A (en) Task unloading and deployment method
CN113887748B (en) Online federal learning task allocation method and device, and federal learning method and system
Zhao et al. A digital twin-assisted intelligent partial offloading approach for vehicular edge computing
Cui et al. Learning‐based deep neural network inference task offloading in multi‐device and multi‐server collaborative edge computing
CN114648223A (en) Smart city energy consumption data mining system and method based on Internet of things
He et al. Computation offloading and resource allocation based on DT-MEC-assisted federated learning framework
CN102077526A (en) Method, apparatus and computer program product for distributed information management
Zhao et al. MEDIA: An Incremental DNN Based Computation Offloading for Collaborative Cloud-Edge Computing
CN110366210A (en) A kind of calculating discharging method for the application of stateful data flow
Zhai et al. Collaborative computation offloading for cost minimization in hybrid computing systems
Cui et al. Resource-Efficient DNN Training and Inference for Heterogeneous Edge Intelligence in 6G
Sun et al. Optimizing task-specific timeliness with edge-assisted scheduling for status update
Zhang et al. On-Device Intelligence for 5G RAN: Knowledge Transfer and Federated Learning Enabled UE-Centric Traffic Steering
Niu et al. A pipelining task offloading strategy via delay-aware multi-agent reinforcement learning in Cybertwin-enabled 6G network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant