CN113779302B - Semi-distributed collaborative storage method based on value decomposition network and multiple agents - Google Patents

Semi-distributed collaborative storage method based on value decomposition network and multiple agents Download PDF

Info

Publication number
CN113779302B
CN113779302B CN202111058748.3A CN202111058748A CN113779302B CN 113779302 B CN113779302 B CN 113779302B CN 202111058748 A CN202111058748 A CN 202111058748A CN 113779302 B CN113779302 B CN 113779302B
Authority
CN
China
Prior art keywords
network
wireless service
service node
wireless
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111058748.3A
Other languages
Chinese (zh)
Other versions
CN113779302A (en
Inventor
陈由甲
蔡粤楷
郑海峰
胡锦松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202111058748.3A priority Critical patent/CN113779302B/en
Publication of CN113779302A publication Critical patent/CN113779302A/en
Application granted granted Critical
Publication of CN113779302B publication Critical patent/CN113779302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The application provides a semi-distributed collaborative storage method based on value decomposition network and multi-agent reinforcement learning, which designs a semi-distributed multi-agent reinforcement learning framework, a state space, an action space and a reward function according to a wireless intelligent storage network model to realize the characteristic identification of user and wireless service node information in a wireless network; combining with the efficient decision capability of the Dueling DQN network, a dynamic storage algorithm is further provided for the storage replacement policy of each wireless service node; the method comprises the steps of providing global strategy updating parameters calculated by utilizing an aggregation node embedded value decomposition network of a wireless network, and transmitting the global strategy updating parameters to each wireless service node to update local strategies of each intelligent agent; the neural network in each intelligent agent is continuously and iteratively updated to enable the global loss function to reach a convergence state, so that the global optimal storage strategy is obtained. The information of each intelligent agent is transmitted to the sink node so as to promote the mutual collaboration of each intelligent agent and quickly achieve global optimum.

Description

Semi-distributed collaborative storage method based on value decomposition network and multiple agents
Technical Field
The application belongs to the field of wireless communication and the technical field of computers, relates to deep reinforcement learning, a distributed system, algorithm complexity optimization, wireless transmission and the like in machine learning, and particularly relates to a semi-distributed collaborative storage method based on a value decomposition network and multi-agent reinforcement learning.
Background
With the exponential growth of mobile wireless communication and data demands and the continuous improvement of device storage and computing capabilities, real-time multimedia services gradually become a main business in 5G communication networks, and human life and work gradually migrate to the mobile internet comprehensively, pushing various network functions to the edges of the networks, such as edge computing and edge storage. By storing popular content of user requests, edge storage aims to reduce traffic load and duplicate transmissions in the backhaul network, thereby significantly reducing transmission delay. In addition, for the rise of online video services, how to improve the experience of video users in wireless networks has also become a new challenge. To capture the dynamic nature of the user's requested content and the wireless environment, policy control frameworks are introduced into the wireless storage domain. Deep reinforcement learning combines deep neural network and Q learning, and shows excellent performance in solving the problem of complex control, and in addition, due to the layout of large-scale wireless service nodes, how to improve the overall service performance of the wireless network through cooperation among a plurality of wireless service nodes has been paid more attention.
Disclosure of Invention
In order to make up for the blank and the deficiency of the prior art, the application aims to provide a semi-distributed collaborative storage method based on a value decomposition network and multi-agent reinforcement learning, which designs a semi-distributed multi-agent reinforcement learning framework, a state space, an action space and a reward function according to a wireless intelligent storage network model to realize the characteristic identification of user and wireless service node information in a wireless network; combining with the efficient decision capability of the Dueling DQN network, a dynamic storage algorithm is further provided for the storage replacement policy of each wireless service node; the method comprises the steps of providing global strategy updating parameters calculated by utilizing an aggregation node embedded value decomposition network of a wireless network, and transmitting the global strategy updating parameters to each wireless service node to update local strategies of each intelligent agent; the neural network in each intelligent agent is continuously and iteratively updated to enable the global loss function to reach a convergence state, so that the global optimal storage strategy is obtained. The information of each intelligent agent is transmitted to the sink node so as to promote the mutual collaboration of each intelligent agent and quickly achieve global optimum.
The key problem of the application is that accurate prediction of user demands is achieved, a dimension decomposition mechanism is introduced in consideration of the actual environment complexity of users in a network, particularly in a wireless network, and finally a user service strategy algorithm based on dimension decomposition is provided in each intelligent agent, and the final strategy is converged by continuously updating iteration. Simulation results show that the algorithm is remarkably improved in terms of reducing access delay and improving user service experience performance under various environmental parameter scenes. In addition, the algorithm can process a very large action space, the convergence of the whole system is accelerated by the half-division frame constructed by value decomposition, the calculation complexity is low, and most of the running time is saved compared with the traditional multi-agent algorithm.
The application adopts the following technical scheme:
the semi-distributed collaborative storage method based on the value decomposition network and the multi-agent reinforcement learning is characterized by comprising the following steps of:
step S1: the method comprises the steps of constructing a wireless network model of multi-equipment cooperation semi-distributed cooperation storage based on wireless network transmission, comprising aggregation nodes and wireless service nodes, defining an agent state space and an action space based on value decomposition network and multi-agent deep reinforcement learning, combining the state space and the action space, and a reward function based on optimization target design so as to improve the wireless network service quality to the maximum extent and reduce the access delay of storage contents;
step S2: and collecting and analyzing information about each wireless service node in the sink node, coordinating the cooperation of each wireless service node by constructing a value decomposition network model, namely, outputting a global action cost function and a global strategy updating parameter of the whole system by taking the action cost function of each wireless service node as the input of the value decomposition network, and feeding back the result to the whole semi-distributed system, wherein the feedback comprises feeding back the result to each wireless service node to update the strategy of the single wireless service node so as to improve the cooperation performance and convergence speed of wireless edge storage.
Further, the step S1 specifically includes the following steps:
step S11: defining a user set, a situation that the user set belongs to a wireless service node, a user request variable, a wireless service node storage variable, a file set, a quality variable, a video layer set and the wireless service node set, a local hit, a collaboration hit, a unit time delay of downloading from a server, and a user request quality and a service quality variable;
step S12: constructing performance indexes of a storage model, including video access time delay and user experience score, and constructing a final optimization target, namely a reward function, based on the two target optimization problems; defining a user request variable, user request quality and wireless service node storage variable as a state space, and defining the wireless service node storage variable and user service quality at the next moment as an action space;
step S13: fitting states and actions with a Dueling DQN network that splits branches of the neural network into state value branches for estimating the current wireless network state and dominant action branches for estimating each action; the performance of each action is evaluated in combination with the state value and the dominance value.
Further, step S11 specifically includes: a user set represented by 1, U (U) j Representing the set of users belonging to a wireless service node j, the user request variable lambda iv And wireless service node storage variable delta jvl File set { v. V, the quality variable K is a function of the quality variable, the video layer set 1. L, wireless service node set {1..j..j }; using unit access delay d 0 ,d jj' ,d j Respectively representing local hits, collaborative hits and unit delays downloaded from a server; defining user request quality k iv And quality of service variables
In step S12: the performance indexes of the storage model are built, wherein the performance indexes comprise video access time delay D and user experience scores M as follows:
wherein c 1 Take 0.16, c 2 Taking 0.66 as the quality evaluation coefficient.
And based on these two objective optimization problems, construct a final optimization objective, namely a reward function,η is a weight coefficient used to adjust the weight of the access delay and the user experience score.
In step S13, the evaluation operator q (S, a; θ) of the lasting DQN network is divided into q for the target network and the evaluation network ej ,q gj
Further, the step S2 specifically includes the following steps:
step S21: introducing a value decomposition network into a sink node, wherein the sink node firstly collects the states of all the agents and rewards to construct a combined state and a combined action and calculates a rewards value of the whole system from the combined state and the combined action; and introduces an experience playback library for storing a playback library containing four elements (S (t) ,A (t) ,r (t) ,S (t+1) ) Each sample calculating a respective action cost function q (s, a; θ), finally calculating global action cost function of the whole system by utilizing value decomposition networkAnd->
Step S22: constructing a loss function according to the global action cost function of the whole system calculated in the step S21 and the aggregate rewarding functionTo calculate the global policy update parameters, the global strategy updating parameters obtained by training are reversely transferred back to each wireless service node in the wireless service node group, so that the wireless service node is convenient to use a gradient updating method aiming at the self neural network>And updating to obtain a better strategy.
Further, a dimension decomposition mechanism is adopted to be embedded into the lasting DQN, so that the complexity of decision making is reduced and the performance of wireless service is improved:
the actions output by the lasting DQN network are decomposed according to the dimension of the actual physical meaning, namely, the actions are decomposed into three dimensions, namely, the three dimensions are respectively: what type of video delta is stored jv What video layer delta is stored jl And what quality to serve to the userThe actions in each dimension are represented by a single neural network branch, and all the actions are selected independently in the dimension without mutual influence;
further, after the dimension decomposition mechanism is embedded, the calculation method of the action cost function is as follows: calculated in terms of dimensions in a lasting DQN network, i.e At the same time, the calculation of the global action cost function of the sink node is updated, namely +.>And->
And, a wireless network model, comprising: the system comprises a wireless service node, an aggregation node, a source server and a core network; each wireless service node can download files on the source server through a return link, store the files locally and directly serve users in the cell; user storage and inter-user collaboration are performed using a semi-distributed collaborative storage method based on a value decomposition network and multi-agent reinforcement learning as described above.
Further, for each wireless service node, namely each agent, the user request set and the file storage set are used as a state space and are used as input of a neural network, the output of the neural network is a storage content set of the next time period and a file quality set of a service user, and each wireless service node can download files on a source server through a return link and store the files locally to directly serve the users in a coverage area; in order to reduce the file downloading time, different wireless service nodes are allowed to perform multi-device cooperation through the sink node, so that the service quality of the wireless network is further improved, and the access time delay of stored content is reduced.
Compared with the prior art, the method and the system have the advantages that the network is decomposed by utilizing the values of the sink nodes, so that cooperation among multiple intelligent agents can be promoted, and meanwhile, the problem of decision complexity in a real wireless network environment is solved by utilizing the framework of a dimension decomposition mechanism, so that the capability of mobile wireless edge storage is improved. Good performance is achieved even in situations where storage resources are limited, computing resources are limited, and the user and wireless environment are complex, to reduce access latency to stored content and to improve the quality of service for the user.
Drawings
The application is described in further detail below with reference to the attached drawings and detailed description:
fig. 1 is a schematic diagram of a semi-distributed wireless collaborative storage network model in an embodiment of the present application.
FIG. 2 is a schematic diagram of a Dueling DQN network in an embodiment of the application.
FIG. 3 is a schematic diagram of a dimension decomposition mechanism.
FIG. 4 is a graph comparing results under different file parameters in the examples of the present application.
FIG. 5 is a graph comparing the performance of different algorithms in an embodiment of the application.
FIG. 6 is a graph comparing results of different dimensions in an example of the application.
Detailed Description
In order to make the features and advantages of the present patent more comprehensible, embodiments accompanied with figures are described in detail below:
it should be noted that the following detailed description is exemplary and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
The semi-distributed collaborative storage algorithm based on the value decomposition network and the multi-agent reinforcement learning provided by the embodiment is realized according to the following steps,
step S1: providing a wireless network model of semi-distributed collaborative storage, defining a state space and an action space of each intelligent agent and a reward function designed based on time delay and user experience, and aiming at improving the service quality of a local wireless service node to the maximum extent;
1) In this example, the first, {1..i..I } represents is a set of users of the (a), U (U) j Representing the set of users belonging to a wireless service node j, the user request variable lambda iv And wireless service node storage variable delta jvl File set { v. V,the quality variable K is a function of the quality variable, the video layer set 1. L, wireless service node set {1..j. J }. In addition, unit access time delay d is also provided 0 ,d jj' ,d j Representing the unit delays of local hits, collaborative hits and downloads from the server, respectively, while the user request quality k is defined for the construction optimization problem iv And quality of service variables
2) Building performance metrics of the storage model, including video access latency D and user experience score M as follows, and based on these two objective optimization problems, constructing a final optimization objective, i.e. the reward function, ++>In addition, a user request variable is defined, the user request quality and a wireless service node storage variable are used as a state space, and the wireless service node storage variable and the user service quality at the next moment are used as an action space.
3) Fitting of states and actions is performed with a forcing DQN network that splits branches of the neural network into state value branches, which are used mainly to estimate the current wireless network state, and dominant action branches, which are used to estimate each action, finally, the performance of each action can be accurately estimated by q (s, a; theta) by combining the state value and the advantage value, and the q (s, a; theta) is also divided into two q types because the lasting DQN network is also divided into a target network and an estimation network ej ,q gj
Step S2: collecting and analyzing information about each wireless service node in the sink node, coordinating the cooperation of each wireless service node by constructing a value decomposition network model, namely, taking an action value function of each intelligent agent as the input of the value decomposition network, outputting the action value function and the global strategy updating parameter as the global action value function of the whole system, feeding back the result to the whole semi-distributed system, and improving the cooperation performance of wireless edge storage;
by defining an action cost function, carrying out continuous iteration of network parameters, and finally, each wireless service node can obtain an optimal storage strategy and a user service strategy in each non-communication state; in order to achieve better performance of the neural network, the Dueling DQN network of the present embodiment may additionally use a duel-bucket mechanism and a dual-network mechanism. The dual-network mechanism adopts a neural network with completely consistent structure for delay updating to improve the stability of the algorithm, so that the algorithm is easier to converge. The decision mechanism additionally adopts the scores of the estimated state value and the dominant value to judge the merits of the output action of the neural network, so that the decision is more accurate.
1) In order to better cooperate between wireless service nodes, a new module called value decomposition network is introduced into the sink node, which can collect the states of all the intelligent agents and rewards to construct joint states and joint actions and calculate the rewards value of the whole system, in addition, an experience playback library is introduced for storing the information containing four elements (S (t) ,A (t) ,r (t) ,S (t+1) ) Each of which can calculate a respective action cost function q (s, a; θ), finally calculating global action cost function of the whole system by utilizing value decomposition networkAnd
2) Global of the whole system calculated from the foregoingAn action cost function, a collective reward function constructs a loss function to calculate global policy update parameters, the global strategy updating parameters obtained by training can be reversely transferred back to each wireless service node in the wireless service node group so as to facilitate the use of +.>And updating to obtain a better strategy, wherein the updated strategy has better performance in the aspects of collaboration and predictability.
Step S3: considering the complexity of the deep reinforcement learning practical environment, a dimension decomposition mechanism is additionally proposed to be embedded into the Dueling DQN to reduce the complexity of decision making and improve the performance of wireless services.
1) The actions output by the lasting DQN network are decomposed according to dimensions, in this scenario the actions are decomposed into three dimensions, respectively what type of video delta is stored jv What video layer delta is stored jl And what quality to serve to the userThe actions in each dimension are represented by a single neural network branch, so that all actions are selected independently in their own dimension and do not affect each other.
2) After embedding the dimension decomposition mechanism, the action cost function calculation method also generates corresponding change, and the action cost function is calculated according to the dimension in the lasting DQN network, namely At the same time, the calculation of the global action cost function of the sink node is updated, namely +.>And
in order to further understand the semi-distributed collaborative storage algorithm based on the value decomposition network and multi-agent reinforcement learning proposed by the present application, the following detailed description is provided with reference to specific embodiments. The embodiment is implemented on the premise of the technical scheme of the application.
As shown in fig. 1, a semi-distributed wireless smart storage network model is provided.
The model mainly comprises a wireless service node, a sink node, a source server, a core network and the like, introduces a user storage model under the wireless service node and an inter-user cooperation model, and each wireless service node can download files on the source server through a backhaul link and locally store the files to directly serve users in a cell.
As shown in fig. 2, a diagram of a Dueling DQN network and a value decomposition network is shown.
The network framework is divided into a target network and an evaluation network, the neural network branch of each network is further divided into a state value network and a dominance value network, and the advantages and disadvantages of the action can be better evaluated through the double-network and the double-branch architecture. The network takes the state space as input and the action space as output, and continuously optimizes the network parameters by continuously receiving the global policy update parameters.
As shown in fig. 3, the decomposition mechanism is schematically represented.
The composite action space is decomposed into independent actions in multiple dimensions, each dimension can select actions in independent neural network branches, and high complexity of the composite action is avoided.
As shown in FIG. 4, a comparison of results under different file parameters is shown in the examples.
According to experimental results, the algorithm can cope with complex calculation under high file numbers, and a global strategy can be well found under different high complexity, so that the reward value converges to a high value.
The analysis shows that the semi-distributed collaborative storage algorithm based on the value decomposition network and multi-agent reinforcement learning provided by the application can obtain better storage capacity than the existing method, can well improve the storage problem of users, and has certain reference value and practical economic benefit.
FIG. 5 is a graph showing the comparison of algorithm performance in an embodiment of the present application.
Compared with the traditional multi-agent algorithm, the introduced semi-distributed architecture and the dimension decomposition method can respectively improve the performance by 23.4% and 30.5% compared with the traditional algorithm under the conditions of 5 files and 10 files, and the convergence speed is faster, so that the algorithm can rapidly locate a global optimal strategy under different scenes.
FIG. 6 is a graph showing the comparison of dimension number results in an example of the present application.
According to the experimental result, the algorithm can be extended and expanded on a dimension decomposition mechanism, and the more the number of subdivision branches is (under the premise of conforming to an action decomposition standard), the better the algorithm can be obtained.
The above method provided in this embodiment may be stored in a computer readable storage medium in a coded form, implemented in a computer program, and input basic parameter information required for calculation through computer hardware, and output a calculation result.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present application and not for limiting the same, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the application without departing from the spirit and scope of the application, which is intended to be covered by the claims.
The patent is not limited to the best mode, any person can obtain other various semi-distributed collaborative storage methods based on value decomposition network and multi-agent reinforcement learning under the teaching of the patent, and all equivalent changes and modifications made according to the scope of the patent are covered by the patent.

Claims (3)

1. The semi-distributed collaborative storage method based on the value decomposition network and the multi-agent reinforcement learning is characterized by comprising the following steps of:
step S1: the method comprises the steps of constructing a wireless network model of multi-equipment cooperation semi-distributed cooperation storage based on wireless network transmission, comprising aggregation nodes and wireless service nodes, defining an agent state space and an action space based on value decomposition network and multi-agent deep reinforcement learning, combining the state space and the action space, and a reward function based on optimization target design so as to improve the wireless network service quality to the maximum extent and reduce the access delay of storage contents;
step S2: collecting and analyzing information about each wireless service node in the sink node, coordinating the cooperation of each wireless service node by constructing a value decomposition network model, namely, outputting a global action cost function and a global strategy updating parameter of the whole system as the input of the value decomposition network by using the action cost function of each wireless service node, and feeding back the result to the whole semi-distributed system, wherein the feedback updating parameter is fed back to each wireless service node to update the strategy of the single wireless service node so as to improve the cooperation performance and convergence speed of wireless edge storage;
the step S1 specifically comprises the following steps:
step S11: defining a user set, a situation that the user set belongs to a wireless service node, a user request variable, a wireless service node storage variable, a file set, a quality variable, a video layer set and the wireless service node set, a local hit, a collaboration hit, a unit time delay of downloading from a server, and a user request quality and a service quality variable;
step S12: constructing performance indexes of a storage model, including video access time delay and user experience score, and constructing a final optimization target, namely a reward function, based on the two target optimization problems; defining a user request variable, user request quality and wireless service node storage variable as a state space, and defining the wireless service node storage variable and user service quality at the next moment as an action space;
step S13: fitting states and actions with a DuelingDQN network that splits branches of the neural network into state value branches for estimating the current wireless network state and dominant action branches for estimating each action; evaluating the performance of each action in combination with the state value and the dominance value;
the step S11 specifically includes: a user set represented by 1, U (U) j Representing the set of users belonging to a wireless service node j, the user request variable lambda iv And wireless service node storage variable delta jvl File set { v. V, the quality variable K is a function of the quality variable, the video layer set 1. L, wireless service node set {1..j..j }; using unit access delay d 0 ,d jj' ,d j Respectively representing local hits, collaborative hits and unit delays downloaded from a server; defining user request quality k iv And quality of service variables
In step S12: the performance indexes of the storage model are built, wherein the performance indexes comprise video access time delay D and user experience scores M as follows:
wherein c 1 Take 0.16, c 2 Taking 0.66 as a quality evaluation coefficient;
and based on these two objective optimization problems, construct a final optimization objective, namely a reward function,η is a weight coefficient used to adjust the weight of the access delay and the user experience score;
in step S13, the evaluation operator q (S, a; θ) of the lasting DQN network is divided into q for the target network and the evaluation network ej ,q gj
The step S2 specifically comprises the following steps:
step S21: introducing a value decomposition network into a sink node, wherein the sink node firstly collects the states of all the agents and rewards to construct a combined state and a combined action and calculates a rewards value of the whole system from the combined state and the combined action; and introduces an experience playback library for storing a playback library containing four elements (S (t) ,A (t) ,r (t) ,s (t+1) ) Each sample calculating a respective action cost function q (s, a; θ), finally calculating global action cost function of the whole system by utilizing value decomposition networkAnd->
Step S22: according to the global action cost function of the whole system calculated in the step S21, the aggregate rewarding function constructs a loss function to calculate a global strategy updating parameter, the global strategy updating parameters obtained by training are reversely transmitted back to each wireless service node in the wireless service node group, so that the gradient is used for the neural network of the wireless service node groupNew method->Updating to obtain a better strategy;
embedding the dimension decomposition mechanism into the DuelingDQN to reduce the complexity of decision making and improve the performance of wireless services:
the actions output by the DuelingDQN network are decomposed according to the dimension of the actual physical meaning, namely, the actions are decomposed into three dimensions, namely, the three dimensions are respectively: what type of video delta is stored jv What video layer delta is stored jl And what quality to serve to the userThe actions in each dimension are represented by a single neural network branch, and all the actions are selected independently in the dimension without mutual influence;
after the dimension decomposition mechanism is embedded, the calculation method of the action cost function is as follows: calculated in terms of dimensions in a lasting DQN network, i.e At the same time, the calculation of the global action cost function of the sink node is updated, namely +.>And
2. a wireless network model, comprising: the system comprises a wireless service node, an aggregation node, a source server and a core network; each wireless service node can download files on the source server through a return link, store the files locally and directly serve users in the cell; content storage and inter-user collaboration are performed using the semi-distributed collaborative storage method based on a value decomposition network and multi-agent reinforcement learning of claim 1.
3. The wireless network model of claim 2, wherein: for each wireless service node, namely each agent, the user request set and the file storage set are used as a state space and are used as the input of a neural network, the output of the neural network is a storage content set of the next time period and a file quality set of a service user, and each wireless service node can download files on a source server through a return link and store the files locally to directly serve the users in a coverage range; in order to reduce the file downloading time, different wireless service nodes are allowed to perform multi-device cooperation through the sink node, so that the service quality of the wireless network is further improved, and the access time delay of stored content is reduced.
CN202111058748.3A 2021-09-09 2021-09-09 Semi-distributed collaborative storage method based on value decomposition network and multiple agents Active CN113779302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111058748.3A CN113779302B (en) 2021-09-09 2021-09-09 Semi-distributed collaborative storage method based on value decomposition network and multiple agents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111058748.3A CN113779302B (en) 2021-09-09 2021-09-09 Semi-distributed collaborative storage method based on value decomposition network and multiple agents

Publications (2)

Publication Number Publication Date
CN113779302A CN113779302A (en) 2021-12-10
CN113779302B true CN113779302B (en) 2023-09-22

Family

ID=78842194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111058748.3A Active CN113779302B (en) 2021-09-09 2021-09-09 Semi-distributed collaborative storage method based on value decomposition network and multiple agents

Country Status (1)

Country Link
CN (1) CN113779302B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065728B (en) * 2022-06-13 2023-12-08 福州大学 Multi-strategy reinforcement learning-based multi-target content storage method
CN115086374A (en) * 2022-06-14 2022-09-20 河南职业技术学院 Scene complexity self-adaptive multi-agent layered cooperation method
CN114867061B (en) * 2022-07-05 2022-12-13 深圳市搜了网络科技股份有限公司 Cloud monitoring method based on wireless communication network
CN116996919B (en) * 2023-09-26 2023-12-05 中南大学 Single-node multi-domain anti-interference method based on reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079305A (en) * 2019-12-27 2020-04-28 南京航空航天大学 Different-strategy multi-agent reinforcement learning cooperation method based on lambda-reward
CN112364984A (en) * 2020-11-13 2021-02-12 南京航空航天大学 Cooperative multi-agent reinforcement learning method
CN112396187A (en) * 2020-11-19 2021-02-23 天津大学 Multi-agent reinforcement learning method based on dynamic collaborative map
CN112465151A (en) * 2020-12-17 2021-03-09 电子科技大学长三角研究院(衢州) Multi-agent federal cooperation method based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10375585B2 (en) * 2017-07-06 2019-08-06 Futurwei Technologies, Inc. System and method for deep learning and wireless network optimization using deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079305A (en) * 2019-12-27 2020-04-28 南京航空航天大学 Different-strategy multi-agent reinforcement learning cooperation method based on lambda-reward
CN112364984A (en) * 2020-11-13 2021-02-12 南京航空航天大学 Cooperative multi-agent reinforcement learning method
CN112396187A (en) * 2020-11-19 2021-02-23 天津大学 Multi-agent reinforcement learning method based on dynamic collaborative map
CN112465151A (en) * 2020-12-17 2021-03-09 电子科技大学长三角研究院(衢州) Multi-agent federal cooperation method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于强化学习的定向无线通信网络抗干扰资源调度算法;谢添;高士顺;赵海涛;林沂;熊俊;;电波科学学报(04);全文 *
基于深度强化学习的无线网络资源分配算法;李孜恒;孟超;;通信技术(08);全文 *

Also Published As

Publication number Publication date
CN113779302A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN113779302B (en) Semi-distributed collaborative storage method based on value decomposition network and multiple agents
Yang et al. Deep reinforcement learning based wireless network optimization: A comparative study
CN114710439B (en) Network energy consumption and throughput joint optimization routing method based on deep reinforcement learning
CN116489712B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN115065728B (en) Multi-strategy reinforcement learning-based multi-target content storage method
Jeon et al. A distributed NWDAF architecture for federated learning in 5G
Yan et al. Distributed edge caching with content recommendation in fog-rans via deep reinforcement learning
CN116185523A (en) Task unloading and deployment method
Li et al. DQN-enabled content caching and quantum ant colony-based computation offloading in MEC
Hao et al. Optimal IoT service offloading with uncertainty in SDN-based mobile edge computing
CN102077526B (en) Method, apparatus and computer program product for distributed information management
Zhang et al. On-device intelligence for 5g ran: Knowledge transfer and federated learning enabled ue-centric traffic steering
CN116663644A (en) Multi-compression version Yun Bianduan DNN collaborative reasoning acceleration method
Cui et al. Resource-Efficient DNN Training and Inference for Heterogeneous Edge Intelligence in 6G
Zhao et al. MEDIA: An incremental DNN based computation offloading for collaborative cloud-edge computing
CN110366210A (en) A kind of calculating discharging method for the application of stateful data flow
Zhai et al. Collaborative computation offloading for cost minimization in hybrid computing systems
Niu et al. A pipelining task offloading strategy via delay-aware multi-agent reinforcement learning in Cybertwin-enabled 6G network
CN117640413B (en) Micro-service and database joint deployment method based on reinforcement learning in fog calculation
CN114006817B (en) VGDT construction method and device oriented to SDN and readable storage medium
CN116634388B (en) Electric power fusion network-oriented big data edge caching and resource scheduling method and system
Jia An edge computing-based evaluation and optimisation of online higher vocational education mechanism
CN115190135B (en) Distributed storage system and copy selection method thereof
Du et al. 5G Message Cooperative Content Caching Scheme for Blockchain-Enabled Mobile Edge Networks Using Reinforcement Learning
Wang et al. Research on computing offloading methods based on edge computing and reinforcement learning in the industrial Internet of Things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant