CN113099491A - Wireless positioning network resource optimization method - Google Patents

Wireless positioning network resource optimization method Download PDF

Info

Publication number
CN113099491A
CN113099491A CN202110271000.5A CN202110271000A CN113099491A CN 113099491 A CN113099491 A CN 113099491A CN 202110271000 A CN202110271000 A CN 202110271000A CN 113099491 A CN113099491 A CN 113099491A
Authority
CN
China
Prior art keywords
target node
node
resource allocation
target
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110271000.5A
Other languages
Chinese (zh)
Other versions
CN113099491B (en
Inventor
张霆廷
杨程
刘凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Peng Cheng Laboratory
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology, Peng Cheng Laboratory filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202110271000.5A priority Critical patent/CN113099491B/en
Publication of CN113099491A publication Critical patent/CN113099491A/en
Application granted granted Critical
Publication of CN113099491B publication Critical patent/CN113099491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W64/00Locating users or terminals or network equipment for network management purposes, e.g. mobility management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a method for optimizing wireless positioning network resources, which specifically comprises the following steps: the mean square error is taken as a performance index for measuring the positioning accuracy to obtain a target node positioning accuracy error value M (P k ) (ii) a Error value M (of target node positioning accuracy)P k ) Minimization as an objective function, considering the target nodeN a Objective function M =1P k ) Minimization problem P1; implementing a resource allocation algorithm based on Reinforcement Learning (RL); for target nodeN a Increasing the quantity, optimizing an optimal resource allocation strategy, and constructing a resource optimization scheme P2 by taking the minimized power consumption of the system as a target; aiming at the position estimation of the target node k, the optimal resource allocation strategy is further optimized to obtain the maximum positioning precision error value M (M) of all possible target nodes in the uncertain regionP k ) Minimization to target build resource optimizationScheme P3; robust link selection is achieved based on Reinforcement Learning (RL). The resource optimization method can obtain a high-precision positioning result.

Description

Wireless positioning network resource optimization method
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a wireless positioning network resource optimization method.
Background
The positioning network is designed to maximize the accuracy of the positioning network, and besides the traditional method of increasing the release of positioning reference nodes, the amount of information carried by the transmission signals between the positioning nodes also affects the accuracy of the wireless positioning network. Because the establishment of an actual positioning network is often subject to capital limitation and the requirement of simplifying the application of a system model, the key point of improving the precision of the system with limited resources is to optimize the allocation of resources such as power bandwidth and the like.
Based on the Time of Arrival (TOA) ranging technique of clock synchronization, the lower limit of the positioning accuracy of the uncooperative positioning network and the cooperative positioning network: a Direct Position Error Bound (DRLB) and a Cramer Rao Lower Bound (CRLB) have been derived, and a great deal of research on wireless positioning network resource allocation has been carried out at the present stage by using the positioning accuracy Lower Bound as a performance index. In the field of positioning network resource allocation, many researches aim at pure power factors, and partial researches consider that bandwidth is also one of the factors influencing the positioning accuracy of the wireless positioning network, and adopt a single resource optimization scheduling mode to verify that pure bandwidth optimization can obtain better positioning accuracy than pure power optimization, which is also in accordance with the expression form of a CRLB. On the basis, researchers develop the optimization Allocation research on Joint Power and Bandwidth Allocation (JPBA), and the result shows that JPBA is obviously superior to the strategy of optimizing only Power or Bandwidth.
Considering the research based on the wireless positioning network resource allocation at present, the resource is optimally allocated for the CRLB, that is, starting from the theoretical lower limit of the closed form. However, the positioning network CRLB is often difficult to obtain, and especially under the condition of low Signal-to-Noise Ratio (SNR), the resource optimization allocation directly using CRLB as a performance index causes a large Error, and the research of performing resource optimization allocation by using Mean Square Error (MSE) of a positioning algorithm with more practical significance as a performance index is deficient. Considering the particularity of the non-closed form of the mean square error, the non-parametric optimization solution also needs to be considered further.
Reinforcement learning, which can solve non-closed form problems, can accordingly generate (near) optimal control behavior by immediate reward feedback interacting with the environment, rather than simply greedy optimizing the current reward, it takes into account long-term objectives, which are crucial for time-varying dynamic systems (e.g., wireless location networks). Currently, reinforcement learning has been widely applied to research on resource allocation of a communication system, and can provide an effective reference for resource optimal allocation targeting a wireless positioning network MSE. In addition, considering that the state number and the action space dimension increase exponentially with the increase of the number of target nodes, the optimization solution method may have the problem that dimension explosion causes no traversal.
Disclosure of Invention
Aiming at the problems, the invention provides a wireless positioning network resource optimization method, which adopts the actual measurement MSE of a positioning algorithm as an optimization object and reinforcement learning as a main solution, establishes a resource optimization distribution framework, solves the problem of error caused by the fact that the theoretical lower limit cannot be obtained, and has more practical significance; a distributed optimization framework of linear complexity is provided, and a suboptimal regression method is combined, so that the problem that time-consuming training is required again every time a new target node appears is solved while the positioning accuracy is ensured; a robust link selection algorithm is provided by considering the existence of actual ranging errors, a high-precision positioning result can be obtained by the algorithm when a part of ranging links are blocked or have clock deviation, and the method is significant for prolonging the life cycle of a positioning node.
The technical scheme of the invention is as follows:
a wireless positioning network resource optimization method takes mean square error as a performance index for measuring positioning accuracy, a resource optimization distribution framework is built based on reinforcement learning, a distributed solution scheme is utilized, the problem that time-consuming training is needed to be carried out again every time a new target node appears is solved while the positioning accuracy is ensured, and finally, robustness link selection is realized by utilizing measured data, the method comprises the following steps:
(1) in wireless location networks, using
Figure BDA0002974404450000021
Respectively representing a set of an anchor node and a target node, wherein the anchor node measures distance with the target node by a frequency division multiplexing method based on a clock asynchronous mode to obtain position estimation of the target node k
Figure BDA0002974404450000022
Target node k position estimation using mean square error measurement
Figure BDA0002974404450000023
The accuracy of the target node is obtained, and the error value of the positioning precision of the target node is obtained
Figure BDA0002974404450000024
(2) Error value of target node positioning precision
Figure BDA0002974404450000025
Minimization as an objective function, considering the target node Na1-objective function
Figure BDA0002974404450000026
The minimization problem P1, the constraint is: each anchor node has a transmission bandwidth beta0And a transmission power P0The upper limit of (d); the total transmission power of all anchor nodes cannot exceed a threshold; the frequency bands of the signals transmitted by each anchor node cannot be overlapped;
(3) implementing a resource allocation algorithm based on Reinforcement Learning (RL), comprising: according to the objective function
Figure BDA0002974404450000027
Setting reward to guide the anchor node to select resources of different grades by minimization to obtain an optimal resource allocation strategy, and then according to the input real bit of the target nodePut pkObtaining suboptimal resource allocation actions of the anchor nodes by combining a proximity algorithm or a BP neural network mode;
(4) for target node
Figure BDA0002974404450000028
Increasing the quantity, optimizing the optimal resource allocation strategy obtained in the step (3), and constructing a resource optimization scheme P2 with the aim of minimizing the power consumption of the system, wherein the constraint conditions are added on the basis of the P1 constraint conditions: error value of positioning accuracy of each target node
Figure BDA0002974404450000029
Cannot be larger than a positioning precision threshold;
(5) position estimation for target node k
Figure BDA00029744044500000210
The optimal resource allocation strategy obtained in the step (4) is further optimized to obtain the maximum positioning precision error value of all possible target nodes in the uncertain region
Figure BDA0002974404450000039
The minimization is to construct a resource optimization scheme P3 for the target, and the constraint conditions are added on the basis of the P1 constraint conditions: all possible target node positions are in the uncertain region;
(6) obtaining a ranging matrix R based on actual ranging, realizing robust link selection based on Reinforcement Learning (RL) according to the anchor node topological structure and the ranging matrix R, and finally obtaining target node position estimation
Figure BDA0002974404450000031
And a corresponding robust link selection scheme.
Further, the specific implementation steps of the resource allocation algorithm in the step (3) are as follows:
1) initialization setting: by sampling
Figure BDA0002974404450000032
The positions of the target nodes form a training library, and a training set is expressed as
Figure BDA0002974404450000033
Anchor node position is pAnchorThe s-th target node sample position is denoted as ps
Figure BDA0002974404450000034
Setting a channel coefficient ξsj
Figure BDA0002974404450000035
Discrete step lengths of power and bandwidth are respectively delta P and delta beta;
defining an anchor node action as
Figure BDA0002974404450000036
Defining five states (S) of training accuracy0-S4) Poor, general, better, and very good in sequence;
2) training process: for each training of each node in the training library:
initializing state-action table matrix Q to zero matrix, setting current state to S0
Let st、atIf the current training times do not reach the upper training limit, the current training times are the state and the action at the current moment:
Qold=Q(st,at);
selecting a group of resource allocation actions aiming at all anchor nodes from the action set, and outputting a positioning precision error value at the t +1 moment according to a resource allocation scheme at the t +1 moment
Figure BDA0002974404450000037
Obtaining a return value R obtained after the resource allocation action is executed at the moment of t +1t+1Updating the state-action table matrix Q;
solving the strategy by using a greedy algorithm, wherein epsilon is the probability of 'exploration', and the value range is [0,1 ]:
Figure BDA0002974404450000038
wherein, PrExpressing probability, wherein pi represents that the probability with 1-epsilon selects the action of a maximized state-action table matrix Q, and the probability with epsilon is randomly selected;
updating the state-action table matrix Q, and updating the state to the current state;
3) a cyclic training step 2), when the upper limit of the set convergence times is reached, finishing the cycle, and outputting a state-action table matrix Q and an optimal resource allocation strategy;
4) inputting a true position p of a target nodek
And (3) obtaining suboptimal resource allocation actions in a mode of combining a proximity algorithm or a BP neural network:
Figure BDA0002974404450000041
further, the step (6) of implementing robust link selection specifically includes:
1) sampling
Figure BDA0002974404450000042
Target nodes needing positioning, and a training set represented as
Figure BDA0002974404450000043
Setting anchor node position to pAnchorThe s target node position is set to ps
Figure BDA0002974404450000044
Based on anchor node NbAnd a target node
Figure BDA0002974404450000045
Actual distance measurement of Nb×NsObtaining rangeA matrix R;
2) randomly selecting a ranging link in a ranging matrix R to obtain psSet of position estimates of
Figure BDA0002974404450000046
Culling collections
Figure BDA0002974404450000047
Obtaining a minimum uncertainty region eta containing the new set1×η2
For an uncertainty region η1×η2At each vertex of the list, initializing the state-action table matrix Q to zero, setting the current state to S0
Training process in resource allocation algorithm is carried out on each vertex, and a ranging link is selected according to whether the anchor node is allocated to the resource or not;
recording the link selection scheme of each vertex;
the link selection scheme traversal for each vertex is applied to the other vertices, and a robust link selection scheme is obtained based on the optimization objectives and constraint conditions of the resource optimization scheme P3.
3) The step 2) is circulated until the training is finished NsA target node;
4) output NsTarget node position estimation
Figure BDA0002974404450000048
And a corresponding robust link selection scheme.
The invention provides a wireless positioning network resource optimization method, which has the beneficial effects that:
1. the invention takes MSE as a performance index for measuring the positioning accuracy, and builds a resource optimization distribution framework based on a reinforcement learning algorithm. Compared with a resource optimization framework taking CRLB as a performance index of the positioning network, the method effectively improves the positioning network precision under the same scene.
2. A suboptimal regression method is provided, and the problem that time-consuming training is needed again every time a new target node appears is solved while the positioning accuracy is ensured. Aiming at the problem of exponentially growing action space, a resource allocation framework with linear complexity is provided.
3. The invention considers the existence of actual positioning error, and realizes the landing of the robust link selection algorithm by utilizing the measured data. The algorithm obtains a high-precision positioning result when a part of ranging links are blocked or have clock deviation, and has great significance for prolonging the life cycle of the positioning node.
Drawings
FIG. 1 is an overall flow diagram of the method of the present invention;
FIG. 2 is a performance comparison diagram of a single target node scenario under a reinforcement learning framework in an embodiment of the present invention;
FIG. 3 is a graph comparing the performance of a distributed optimization model in an embodiment of the invention;
fig. 4 is a graph comparing the performance of robust link selection in an embodiment of the invention.
Detailed Description
In order to further describe the technical scheme of the present invention in detail, the present embodiment is implemented on the premise of the technical scheme of the present invention, and detailed implementation modes and specific steps are given.
FIG. 1 shows an overall flow chart of the method of the present invention, where N exists in a two-dimensional positioning networkbAnchor node with known location, NaTarget node with limited prior knowledge, use
Figure BDA0002974404450000051
Respectively represent a set of an anchor node and a target node, and the anchor node performs ranging with the target node in a Frequency Division Multiplexing (FDM) manner based on clock asynchronization. Estimating the distance between the target node k and the anchor node j, namely:
Figure BDA0002974404450000052
wherein c is the free space speed of light;
Figure BDA0002974404450000053
is an estimate of time; dkjThe true distance between two points; omegakjFor Gaussian ranging noise, i.e.
Figure BDA0002974404450000054
Wherein the ranging variance is shown as formula (2):
Figure BDA0002974404450000055
wherein, PkjIs the transmission power between nodes; beta is akjIs the transmission bandwidth between nodes; SNRkjIs the signal-to-noise ratio of the signal between the nodes; xikjIn order to be the channel coefficients,
Figure BDA0002974404450000056
wherein N is0Comparing the range vectors for a noise power density spectrum
Figure BDA00029744044500000515
Distance vector from true
Figure BDA00029744044500000516
The position estimate of the k target nodes can be obtained by equation (3),
Figure BDA0002974404450000057
wherein the content of the first and second substances,
Figure BDA0002974404450000058
refers to the location estimate of node k. The positioning accuracy of the k target nodes is usually measured by MSE, which is limited by CRLB, as shown in equation (4),
Figure BDA0002974404450000059
wherein p iskThe true position of node k; j. the design is a squaree(pk) An Equivalent Fisher Information Matrix (EFIM),
Figure BDA00029744044500000510
is pkRepresents the positioning accuracy of the k target nodes,
Figure BDA00029744044500000511
represents pkThe lower boundary of cramer-melalo (r),
Figure BDA00029744044500000512
denotes Je(pk) Tr denotes the trace of the matrix.
Will be provided with
Figure BDA00029744044500000513
Minimization is used as an objective function to optimize power and bandwidth allocation among all anchor nodes. First consider an NaSpecial case of 1. Therefore, the proxy index k is omitted. The original problem can be expressed as
Figure BDA00029744044500000514
Figure BDA0002974404450000061
Figure BDA0002974404450000062
Figure BDA0002974404450000063
Figure BDA0002974404450000064
(5) The target function in (1) is to make the positioning error of the target node
Figure BDA0002974404450000065
Minimization; constraints (6) and (7) indicate that each anchor node has a transmission bandwidth β due to the hardware design0And power P0The upper limit of (d); (8) the total transmit power constraint is given in (1); (9) to ensure that the frequency bands of the transmitted signals are not allowed to overlap.
The reinforcement learning framework comprises three elements of state, action and reward. Partitioning states first using a Fuzzy C-Means (FCM) strategy, and then according to an objective function
Figure BDA0002974404450000066
Minimization is used to set rewards to guide the anchor node to select resources of different levels, and the specific algorithm flow is as follows (expressed as algorithm 1):
1) initialization setting: by sampling
Figure BDA0002974404450000067
The positions of the target nodes form a training library, and a training set is expressed as
Figure BDA0002974404450000068
Anchor node position is pAnchorThe s-th target node sample position is denoted as ps
Figure BDA0002974404450000069
Setting a channel coefficient ξsj
Figure BDA00029744044500000610
Discrete step lengths of power and bandwidth are respectively delta P and delta beta;
defining an anchor node action as
Figure BDA00029744044500000611
Defining five states (S) of training accuracy0-S4) In turn isPoor, general, better, very good;
2) training process: for each training of each node in the training library:
initializing state-action table matrix Q to zero matrix, setting current state to S0
Let st、atIf the current training times do not reach the upper training limit, the current training times are the state and the action at the current moment:
Qold=Q(st,at);
selecting a group of resource allocation actions aiming at all anchor nodes from the action set, and outputting a positioning precision error value at the t +1 moment according to a resource allocation scheme at the t +1 moment
Figure BDA00029744044500000612
Obtaining a return value R obtained after the resource allocation action is executed at the moment of t +1t+1Updating the state-action table matrix Q;
solving the strategy by using a greedy algorithm, wherein epsilon is the probability of 'exploration', and the value range is [0,1 ]:
Figure BDA00029744044500000613
wherein, PrExpressing probability, wherein pi represents that the probability with 1-epsilon selects the action of a maximized state-action table matrix Q, and the probability with epsilon is randomly selected;
updating the state-action table matrix Q, and updating the state to the current state;
3) a cyclic training step 2), when the upper limit of the set convergence times is reached, finishing the cycle, and outputting a state-action table matrix Q and an optimal resource allocation strategy;
4) inputting a true position p of a target nodek
And (3) obtaining suboptimal resource allocation actions in a mode of combining a proximity algorithm or a BP neural network:
Figure BDA0002974404450000071
as the number of target nodes increases, the dimensions of the state and action space will also increase and exhibit exponentially increasing complexity, making the reinforcement learning algorithm more challenging. Therefore, a distributed alternative to linear complexity is necessary. A distributed solution is formulated in P2, that is, a resource optimization framework is constructed with an objective function of minimizing the power consumption of the positioning system under the constraint condition that the system itself meets the positioning accuracy requirement of each node.
Figure BDA0002974404450000072
Figure BDA0002974404450000073
Figure BDA0002974404450000074
Figure BDA0002974404450000075
Figure BDA0002974404450000076
Figure BDA0002974404450000077
In the formula (11), the reaction mixture is,
Figure BDA0002974404450000078
represents NaA positioning accuracy threshold requirement of the target node.
The position estimates are actually obtained with errors. It is necessary to establish the resource allocation scheme shown in P3, because the direct resource allocation to the deviated position may cause a large positioning error.
Figure BDA0002974404450000079
s.t.p∈(η1×η2) (17)
(6)-(9)
Among them, the model of P3 follows the principle of robustness optimization as maximum M (P)k) Minimization, i.e. assuming that the target node is present in the peripheral region of the position estimate given by the positioning algorithm, η1、η2The length of the uncertain region side. This scheme may minimize the value of the maximum MSE for all possible nodes within the uncertainty region compared to other resource allocation schemes.
N can be obtained by utilizing the ranging module in combination with the actual positioning sceneb×NaThe wireless ranging link, as shown in equation (18),
Figure BDA00029744044500000710
designing an anchor node topology and a matrix R as input, and outputting an estimated target node position and a link selection scheme, wherein if the current anchor node is allocated with resources, the anchor node is turned on for ranging, and if the current anchor node is not allocated with resources, the current anchor node enters a dormant state. The robust ranging link selection algorithm is as follows:
1) sampling
Figure BDA0002974404450000081
Target nodes needing positioning, and a training set represented as
Figure BDA0002974404450000082
Setting anchor node position to pAnchorThe s target node position is set to ps
Figure BDA0002974404450000083
Based on anchor node NbAnd a target node
Figure BDA0002974404450000084
Actual distance measurement of Nb×NsObtaining a ranging matrix R;
2) randomly selecting a ranging link in a ranging matrix R to obtain psSet of position estimates of
Figure BDA0002974404450000085
Culling collections
Figure BDA0002974404450000086
Obtaining a minimum uncertainty region eta containing the new set1×η2
For an uncertainty region η1×η2At each vertex of the list, initializing the state-action table matrix Q to zero, setting the current state to S0
Training process in resource allocation algorithm (algorithm 1) is carried out on each vertex, and a ranging link is selected according to whether the anchor node is allocated to the resource or not;
recording the link selection scheme of each vertex;
the link selection scheme traversal for each vertex is applied to the other vertices, and a robust link selection scheme is obtained based on the optimization objectives and constraint conditions of the resource optimization scheme P3.
3) The step 2) is circulated until the training is finished NsA target node;
4) output NsTarget node position estimation
Figure BDA0002974404450000087
And a corresponding robust link selection scheme.
Results of the embodiment of the present invention are shown in fig. 2, 3 and 4, for example, fig. 2 is a comparison of performances of the reinforcement learning algorithm provided by the present invention in a single target node scenario; FIG. 3 is a comparison of the performance of the distributed optimization model proposed by the present invention; fig. 4 is a performance comparison of the robust link selection algorithm provided by the present invention, and it can be seen from the result comparison graph that the resource optimization method provided by the present invention can effectively improve the positioning accuracy.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process or method.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (3)

1. A method for optimizing wireless positioning network resources is characterized by comprising the following steps:
(1) for use in wireless location networks
Figure FDA0002974404440000011
Respectively representing a set of an anchor node and a target node, wherein the anchor node measures distance with the target node by a frequency division multiplexing method based on a clock asynchronous mode to obtain position estimation of the target node k
Figure FDA0002974404440000012
Target node k position estimation using mean square error measurement
Figure FDA0002974404440000013
The accuracy of the target node is obtained, and the error value of the positioning precision of the target node is obtained
Figure FDA0002974404440000014
(2) Error value of target node positioning precision
Figure FDA0002974404440000015
Minimization as an objective function, considering the target node Na1-objective function
Figure FDA0002974404440000016
The minimization problem P1, the constraint is: each anchor node has a transmission bandwidth beta0And a transmission power P0The upper limit of (d); the total transmission power of all anchor nodes cannot exceed a threshold; the frequency bands of the signals transmitted by each anchor node cannot be overlapped;
(3) implementing a resource allocation algorithm based on Reinforcement Learning (RL), comprising: according to the objective function
Figure FDA0002974404440000017
Setting reward to guide the anchor node to select resources of different grades by minimization to obtain an optimal resource allocation strategy, and then according to the input real position p of the target nodekObtaining suboptimal resource allocation actions of the anchor nodes by combining a proximity algorithm or a BP neural network mode;
(4) for target node
Figure FDA0002974404440000018
Increasing the quantity, optimizing the optimal resource allocation strategy obtained in the step (3), and constructing a resource optimization scheme P2 with the aim of minimizing the power consumption of the system, wherein the constraint conditions are added on the basis of the P1 constraint conditions: error value of positioning accuracy of each target node
Figure FDA0002974404440000019
Cannot be larger than a positioning precision threshold;
(5) position estimation for target node k
Figure FDA00029744044400000110
There is a problem of an error in that,further optimizing the optimal resource allocation strategy obtained in the step (4) to obtain the maximum positioning precision error value of all possible target nodes in the uncertain region
Figure FDA00029744044400000111
The minimization is to construct a resource optimization scheme P3 for the target, and the constraint conditions are added on the basis of the P1 constraint conditions: all possible target node positions are in the uncertain region;
(6) obtaining a ranging matrix R based on actual ranging, realizing robust link selection based on Reinforcement Learning (RL) according to the anchor node topological structure and the ranging matrix R, and finally obtaining target node position estimation
Figure FDA00029744044400000112
And a corresponding robust link selection scheme.
2. The method of claim 1, wherein the resource allocation algorithm in step (3) is implemented by the following steps:
1) initialization setting: by sampling
Figure FDA00029744044400000113
The positions of the target nodes form a training library, and a training set is expressed as
Figure FDA00029744044400000114
Anchor node position is pAnchorThe s-th target node sample position is denoted as ps
Figure FDA00029744044400000115
Setting a channel coefficient ξsj
Figure FDA00029744044400000116
Discrete step lengths of power and bandwidth are respectively delta P and delta beta;
defining an anchor node action as
Figure FDA0002974404440000021
Defining five states (S) of training accuracy0-S4) Poor, general, better, and very good in sequence;
2) training process: for each training of each node in the training library:
initializing state-action table matrix Q to zero matrix, setting current state to S0
Let st、atIf the current training times do not reach the upper training limit, the current training times are the state and the action at the current moment:
Qold=Q(st,at);
a set of resource allocation actions for all anchor nodes is selected from the action set, according to the resource allocation scheme at time t +1,
outputting the error value of positioning accuracy at the moment of t +1
Figure FDA0002974404440000022
Obtaining a return value R obtained after the resource allocation action is executed at the moment of t +1t+1Updating the state-action table matrix Q;
solving the strategy by using an epsilon-greedy algorithm, wherein epsilon is the probability of 'exploration', and the value range is [0,1 ]:
Figure FDA0002974404440000023
wherein, PrExpressing probability, wherein pi represents that the probability with 1-epsilon selects the action of a maximized state-action table matrix Q, and the probability with epsilon is randomly selected;
updating the state-action table matrix Q, and updating the state to the current state;
3) a cyclic training step 2), when the upper limit of the set convergence times is reached, finishing the cycle, and outputting a state-action table matrix Q and an optimal resource allocation strategy;
4) inputting a true position p of a target nodek
And (3) obtaining suboptimal resource allocation actions in a mode of combining a proximity algorithm or a BP neural network:
Figure FDA0002974404440000024
3. the method for optimizing the resources of the wireless positioning network as claimed in claim 2, wherein the step (6) of implementing the robust link selection specifically comprises the steps of:
1) sampling
Figure FDA0002974404440000025
Target nodes needing positioning, and a training set represented as
Figure FDA0002974404440000026
Setting anchor node position to pAnchorThe s target node position is set to ps
Figure FDA0002974404440000027
Based on anchor node NbAnd a target node
Figure FDA0002974404440000028
Actual distance measurement of Nb×NsObtaining a ranging matrix R;
2) randomly selecting a ranging link in a ranging matrix R to obtain psSet of position estimates of
Figure FDA0002974404440000029
Culling collections
Figure FDA00029744044400000210
InGroup value, obtaining minimum uncertainty region eta containing new set1×η2
For an uncertainty region η1×η2At each vertex of the list, initializing the state-action table matrix Q to zero, setting the current state to S0
Training process in resource allocation algorithm is carried out on each vertex, and a ranging link is selected according to whether the anchor node is allocated to the resource or not;
recording the link selection scheme of each vertex;
the link selection scheme traversal for each vertex is applied to the other vertices, and a robust link selection scheme is obtained based on the optimization objectives and constraint conditions of the resource optimization scheme P3.
3) The step 2) is circulated until the training is finished NsA target node;
4) output NsTarget node position estimation
Figure FDA0002974404440000031
And a corresponding robust link selection scheme.
CN202110271000.5A 2021-03-12 2021-03-12 Wireless positioning network resource optimization method Active CN113099491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110271000.5A CN113099491B (en) 2021-03-12 2021-03-12 Wireless positioning network resource optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110271000.5A CN113099491B (en) 2021-03-12 2021-03-12 Wireless positioning network resource optimization method

Publications (2)

Publication Number Publication Date
CN113099491A true CN113099491A (en) 2021-07-09
CN113099491B CN113099491B (en) 2022-05-10

Family

ID=76667076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110271000.5A Active CN113099491B (en) 2021-03-12 2021-03-12 Wireless positioning network resource optimization method

Country Status (1)

Country Link
CN (1) CN113099491B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113597008A (en) * 2021-07-29 2021-11-02 上海大学 Resource optimization method of wireless positioning network based on DDPG
CN113853019A (en) * 2021-08-23 2021-12-28 天翼数字生活科技有限公司 Wireless positioning network resource optimization scheduling method
CN114245291A (en) * 2021-11-19 2022-03-25 中国矿业大学 Distance measurement positioning method for virtualizing reference node into unknown node
CN114363906A (en) * 2021-12-29 2022-04-15 鹏城实验室 Mass node access resource allocation method, device, terminal and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108924935A (en) * 2018-07-06 2018-11-30 西北工业大学 A kind of power distribution method in NOMA based on nitrification enhancement power domain
US20190034830A1 (en) * 2017-07-26 2019-01-31 Yandex Europe Ag Methods and systems for evaluating training objects by a machine learning algorithm
WO2020146820A1 (en) * 2019-01-11 2020-07-16 Apple Inc. Resource allocation, reference signal design, and beam management for new radio (nr) positioning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190034830A1 (en) * 2017-07-26 2019-01-31 Yandex Europe Ag Methods and systems for evaluating training objects by a machine learning algorithm
CN108924935A (en) * 2018-07-06 2018-11-30 西北工业大学 A kind of power distribution method in NOMA based on nitrification enhancement power domain
WO2020146820A1 (en) * 2019-01-11 2020-07-16 Apple Inc. Resource allocation, reference signal design, and beam management for new radio (nr) positioning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林锦锐等: "基于Stackelberg博弈的无线定位网络功率分配方案", 《太赫兹科学与电子信息学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113597008A (en) * 2021-07-29 2021-11-02 上海大学 Resource optimization method of wireless positioning network based on DDPG
CN113597008B (en) * 2021-07-29 2024-04-12 上海大学 Resource optimization method of wireless positioning network based on DDPG
CN113853019A (en) * 2021-08-23 2021-12-28 天翼数字生活科技有限公司 Wireless positioning network resource optimization scheduling method
CN114245291A (en) * 2021-11-19 2022-03-25 中国矿业大学 Distance measurement positioning method for virtualizing reference node into unknown node
CN114363906A (en) * 2021-12-29 2022-04-15 鹏城实验室 Mass node access resource allocation method, device, terminal and storage medium
CN114363906B (en) * 2021-12-29 2024-04-16 鹏城实验室 Mass node access resource allocation method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN113099491B (en) 2022-05-10

Similar Documents

Publication Publication Date Title
CN113099491B (en) Wireless positioning network resource optimization method
Qiong et al. Towards V2I age-aware fairness access: A DQN based intelligent vehicular node training and test method
Lu et al. Optimization of task offloading strategy for mobile edge computing based on multi-agent deep reinforcement learning
CN112383922A (en) Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
CN110167176B (en) Wireless network resource allocation method based on distributed machine learning
CN113900380B (en) Robust output formation tracking control method and system for heterogeneous cluster system
Li et al. Deep reinforcement learning optimal transmission policy for communication systems with energy harvesting and adaptive MQAM
CN112203307B (en) Multi-channel wireless network scheduling method supporting information age optimization
CN105188124A (en) Robustness gaming power control method under imperfect CSI for multi-user OFDMA relay system
CN110278570B (en) Wireless communication system based on artificial intelligence
Pozza et al. CARD: Context-aware resource discovery for mobile Internet of Things scenarios
CN114051252B (en) Multi-user intelligent transmitting power control method in radio access network
Hakami et al. A resource allocation scheme for D2D communications with unknown channel state information
Li et al. Graph-based algorithm unfolding for energy-aware power allocation in wireless networks
CN114885340A (en) Ultra-dense wireless network power distribution method based on deep transfer learning
CN113382060B (en) Unmanned aerial vehicle track optimization method and system in Internet of things data collection
CN113259944B (en) RIS auxiliary frequency spectrum sharing method based on deep reinforcement learning
Li et al. UAV trajectory optimization for spectrum cartography: a PPO approach
CN117255356A (en) Efficient self-cooperation method based on federal learning in wireless access network
Yang et al. Parallel stochastic decomposition algorithms for multi-agent systems
CN112738849A (en) Load balancing regulation and control method applied to multi-hop environment backscatter wireless network
CN116647817A (en) Marine ecological environment monitoring wireless sensor network node positioning method
CN115987886A (en) Underwater acoustic network Q learning routing method based on meta-learning parameter optimization
Chen et al. Mse minimized scheduling for multiple-source remote estimation with aoi constraints in iwsn
CN112631130B (en) ILC system input signal optimal estimation method facing time delay and noise

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant