CN113099491A

CN113099491A - Wireless positioning network resource optimization method

Info

Publication number: CN113099491A
Application number: CN202110271000.5A
Authority: CN
Inventors: 张霆廷; 杨程; 刘凡
Original assignee: Shenzhen Graduate School Harbin Institute of Technology; Peng Cheng Laboratory
Current assignee: Shenzhen Graduate School Harbin Institute of Technology; Peng Cheng Laboratory
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-07-09
Anticipated expiration: 2041-03-12
Also published as: CN113099491B

Abstract

The invention discloses a method for optimizing wireless positioning network resources, which specifically comprises the following steps: the mean square error is taken as a performance index for measuring the positioning accuracy to obtain a target node positioning accuracy error value M (P _k) (ii) a Error value M (of target node positioning accuracy)P _k) Minimization as an objective function, considering the target nodeN _aObjective function M =1P _k) Minimization problem P1; implementing a resource allocation algorithm based on Reinforcement Learning (RL); for target nodeN _aIncreasing the quantity, optimizing an optimal resource allocation strategy, and constructing a resource optimization scheme P2 by taking the minimized power consumption of the system as a target; aiming at the position estimation of the target node k, the optimal resource allocation strategy is further optimized to obtain the maximum positioning precision error value M (M) of all possible target nodes in the uncertain regionP _k) Minimization to target build resource optimizationScheme P3; robust link selection is achieved based on Reinforcement Learning (RL). The resource optimization method can obtain a high-precision positioning result.

Description

Wireless positioning network resource optimization method

Technical Field

The invention belongs to the technical field of communication, and particularly relates to a wireless positioning network resource optimization method.

Background

The positioning network is designed to maximize the accuracy of the positioning network, and besides the traditional method of increasing the release of positioning reference nodes, the amount of information carried by the transmission signals between the positioning nodes also affects the accuracy of the wireless positioning network. Because the establishment of an actual positioning network is often subject to capital limitation and the requirement of simplifying the application of a system model, the key point of improving the precision of the system with limited resources is to optimize the allocation of resources such as power bandwidth and the like.

Based on the Time of Arrival (TOA) ranging technique of clock synchronization, the lower limit of the positioning accuracy of the uncooperative positioning network and the cooperative positioning network: a Direct Position Error Bound (DRLB) and a Cramer Rao Lower Bound (CRLB) have been derived, and a great deal of research on wireless positioning network resource allocation has been carried out at the present stage by using the positioning accuracy Lower Bound as a performance index. In the field of positioning network resource allocation, many researches aim at pure power factors, and partial researches consider that bandwidth is also one of the factors influencing the positioning accuracy of the wireless positioning network, and adopt a single resource optimization scheduling mode to verify that pure bandwidth optimization can obtain better positioning accuracy than pure power optimization, which is also in accordance with the expression form of a CRLB. On the basis, researchers develop the optimization Allocation research on Joint Power and Bandwidth Allocation (JPBA), and the result shows that JPBA is obviously superior to the strategy of optimizing only Power or Bandwidth.

Considering the research based on the wireless positioning network resource allocation at present, the resource is optimally allocated for the CRLB, that is, starting from the theoretical lower limit of the closed form. However, the positioning network CRLB is often difficult to obtain, and especially under the condition of low Signal-to-Noise Ratio (SNR), the resource optimization allocation directly using CRLB as a performance index causes a large Error, and the research of performing resource optimization allocation by using Mean Square Error (MSE) of a positioning algorithm with more practical significance as a performance index is deficient. Considering the particularity of the non-closed form of the mean square error, the non-parametric optimization solution also needs to be considered further.

Reinforcement learning, which can solve non-closed form problems, can accordingly generate (near) optimal control behavior by immediate reward feedback interacting with the environment, rather than simply greedy optimizing the current reward, it takes into account long-term objectives, which are crucial for time-varying dynamic systems (e.g., wireless location networks). Currently, reinforcement learning has been widely applied to research on resource allocation of a communication system, and can provide an effective reference for resource optimal allocation targeting a wireless positioning network MSE. In addition, considering that the state number and the action space dimension increase exponentially with the increase of the number of target nodes, the optimization solution method may have the problem that dimension explosion causes no traversal.

Disclosure of Invention

Aiming at the problems, the invention provides a wireless positioning network resource optimization method, which adopts the actual measurement MSE of a positioning algorithm as an optimization object and reinforcement learning as a main solution, establishes a resource optimization distribution framework, solves the problem of error caused by the fact that the theoretical lower limit cannot be obtained, and has more practical significance; a distributed optimization framework of linear complexity is provided, and a suboptimal regression method is combined, so that the problem that time-consuming training is required again every time a new target node appears is solved while the positioning accuracy is ensured; a robust link selection algorithm is provided by considering the existence of actual ranging errors, a high-precision positioning result can be obtained by the algorithm when a part of ranging links are blocked or have clock deviation, and the method is significant for prolonging the life cycle of a positioning node.

The technical scheme of the invention is as follows:

a wireless positioning network resource optimization method takes mean square error as a performance index for measuring positioning accuracy, a resource optimization distribution framework is built based on reinforcement learning, a distributed solution scheme is utilized, the problem that time-consuming training is needed to be carried out again every time a new target node appears is solved while the positioning accuracy is ensured, and finally, robustness link selection is realized by utilizing measured data, the method comprises the following steps:

(1) in wireless location networks, using

Respectively representing a set of an anchor node and a target node, wherein the anchor node measures distance with the target node by a frequency division multiplexing method based on a clock asynchronous mode to obtain position estimation of the target node k

Target node k position estimation using mean square error measurement

The accuracy of the target node is obtained, and the error value of the positioning precision of the target node is obtained

(2) Error value of target node positioning precision

Minimization as an objective function, considering the target node N_a1-objective function

The minimization problem P1, the constraint is: each anchor node has a transmission bandwidth beta₀And a transmission power P₀The upper limit of (d); the total transmission power of all anchor nodes cannot exceed a threshold; the frequency bands of the signals transmitted by each anchor node cannot be overlapped;

(3) implementing a resource allocation algorithm based on Reinforcement Learning (RL), comprising: according to the objective function

Setting reward to guide the anchor node to select resources of different grades by minimization to obtain an optimal resource allocation strategy, and then according to the input real bit of the target nodePut p_kObtaining suboptimal resource allocation actions of the anchor nodes by combining a proximity algorithm or a BP neural network mode;

(4) for target node

Increasing the quantity, optimizing the optimal resource allocation strategy obtained in the step (3), and constructing a resource optimization scheme P2 with the aim of minimizing the power consumption of the system, wherein the constraint conditions are added on the basis of the P1 constraint conditions: error value of positioning accuracy of each target node

Cannot be larger than a positioning precision threshold;

(5) position estimation for target node k

The optimal resource allocation strategy obtained in the step (4) is further optimized to obtain the maximum positioning precision error value of all possible target nodes in the uncertain region

The minimization is to construct a resource optimization scheme P3 for the target, and the constraint conditions are added on the basis of the P1 constraint conditions: all possible target node positions are in the uncertain region;

(6) obtaining a ranging matrix R based on actual ranging, realizing robust link selection based on Reinforcement Learning (RL) according to the anchor node topological structure and the ranging matrix R, and finally obtaining target node position estimation

And a corresponding robust link selection scheme.

Further, the specific implementation steps of the resource allocation algorithm in the step (3) are as follows:

1) initialization setting: by sampling

The positions of the target nodes form a training library, and a training set is expressed as

Anchor node position is p_AnchorThe s-th target node sample position is denoted as p_s，

Setting a channel coefficient ξ_sj，

Discrete step lengths of power and bandwidth are respectively delta P and delta beta;

defining an anchor node action as

Defining five states (S) of training accuracy₀-S₄) Poor, general, better, and very good in sequence;

2) training process: for each training of each node in the training library:

initializing state-action table matrix Q to zero matrix, setting current state to S₀；

Let s_t、a_tIf the current training times do not reach the upper training limit, the current training times are the state and the action at the current moment:

Q_old＝Q(s_t,a_t)；

selecting a group of resource allocation actions aiming at all anchor nodes from the action set, and outputting a positioning precision error value at the t +1 moment according to a resource allocation scheme at the t +1 moment

Obtaining a return value R obtained after the resource allocation action is executed at the moment of t +1_t+1Updating the state-action table matrix Q;

solving the strategy by using a greedy algorithm, wherein epsilon is the probability of 'exploration', and the value range is [0,1 ]:

wherein, P_rExpressing probability, wherein pi represents that the probability with 1-epsilon selects the action of a maximized state-action table matrix Q, and the probability with epsilon is randomly selected;

updating the state-action table matrix Q, and updating the state to the current state;

3) a cyclic training step 2), when the upper limit of the set convergence times is reached, finishing the cycle, and outputting a state-action table matrix Q and an optimal resource allocation strategy;

4) inputting a true position p of a target node_k；

And (3) obtaining suboptimal resource allocation actions in a mode of combining a proximity algorithm or a BP neural network:

further, the step (6) of implementing robust link selection specifically includes:

1) sampling

Target nodes needing positioning, and a training set represented as

Setting anchor node position to p_AnchorThe s target node position is set to p_s，

Based on anchor node N_bAnd a target node

Actual distance measurement of N_b×N_sObtaining rangeA matrix R;

2) randomly selecting a ranging link in a ranging matrix R to obtain p_sSet of position estimates of

Culling collections

Obtaining a minimum uncertainty region eta containing the new set₁×η₂；

For an uncertainty region η₁×η₂At each vertex of the list, initializing the state-action table matrix Q to zero, setting the current state to S₀；

Training process in resource allocation algorithm is carried out on each vertex, and a ranging link is selected according to whether the anchor node is allocated to the resource or not;

recording the link selection scheme of each vertex;

the link selection scheme traversal for each vertex is applied to the other vertices, and a robust link selection scheme is obtained based on the optimization objectives and constraint conditions of the resource optimization scheme P3.

3) The step 2) is circulated until the training is finished N_sA target node;

4) output N_sTarget node position estimation

And a corresponding robust link selection scheme.

The invention provides a wireless positioning network resource optimization method, which has the beneficial effects that:

1. the invention takes MSE as a performance index for measuring the positioning accuracy, and builds a resource optimization distribution framework based on a reinforcement learning algorithm. Compared with a resource optimization framework taking CRLB as a performance index of the positioning network, the method effectively improves the positioning network precision under the same scene.

2. A suboptimal regression method is provided, and the problem that time-consuming training is needed again every time a new target node appears is solved while the positioning accuracy is ensured. Aiming at the problem of exponentially growing action space, a resource allocation framework with linear complexity is provided.

3. The invention considers the existence of actual positioning error, and realizes the landing of the robust link selection algorithm by utilizing the measured data. The algorithm obtains a high-precision positioning result when a part of ranging links are blocked or have clock deviation, and has great significance for prolonging the life cycle of the positioning node.

Drawings

FIG. 1 is an overall flow diagram of the method of the present invention;

FIG. 2 is a performance comparison diagram of a single target node scenario under a reinforcement learning framework in an embodiment of the present invention;

FIG. 3 is a graph comparing the performance of a distributed optimization model in an embodiment of the invention;

fig. 4 is a graph comparing the performance of robust link selection in an embodiment of the invention.

Detailed Description

In order to further describe the technical scheme of the present invention in detail, the present embodiment is implemented on the premise of the technical scheme of the present invention, and detailed implementation modes and specific steps are given.

FIG. 1 shows an overall flow chart of the method of the present invention, where N exists in a two-dimensional positioning network_bAnchor node with known location, N_aTarget node with limited prior knowledge, use

Respectively represent a set of an anchor node and a target node, and the anchor node performs ranging with the target node in a Frequency Division Multiplexing (FDM) manner based on clock asynchronization. Estimating the distance between the target node k and the anchor node j, namely:

wherein c is the free space speed of light;

is an estimate of time; d_kjThe true distance between two points; omega_kjFor Gaussian ranging noise, i.e.

Wherein the ranging variance is shown as formula (2):

wherein, P_kjIs the transmission power between nodes; beta is a_kjIs the transmission bandwidth between nodes; SNR_kjIs the signal-to-noise ratio of the signal between the nodes; xi_kjIn order to be the channel coefficients,

wherein N is₀Comparing the range vectors for a noise power density spectrum

Distance vector from true

The position estimate of the k target nodes can be obtained by equation (3),

wherein the content of the first and second substances,

refers to the location estimate of node k. The positioning accuracy of the k target nodes is usually measured by MSE, which is limited by CRLB, as shown in equation (4),

wherein p is_kThe true position of node k; j. the design is a square_e(p_k) An Equivalent Fisher Information Matrix (EFIM),

is p_kRepresents the positioning accuracy of the k target nodes,

represents p_kThe lower boundary of cramer-melalo (r),

denotes J_e(p_k) Tr denotes the trace of the matrix.

Will be provided with

Minimization is used as an objective function to optimize power and bandwidth allocation among all anchor nodes. First consider an N_aSpecial case of 1. Therefore, the proxy index k is omitted. The original problem can be expressed as

(5) The target function in (1) is to make the positioning error of the target node

Minimization; constraints (6) and (7) indicate that each anchor node has a transmission bandwidth β due to the hardware design₀And power P₀The upper limit of (d); (8) the total transmit power constraint is given in (1); (9) to ensure that the frequency bands of the transmitted signals are not allowed to overlap.

The reinforcement learning framework comprises three elements of state, action and reward. Partitioning states first using a Fuzzy C-Means (FCM) strategy, and then according to an objective function

Minimization is used to set rewards to guide the anchor node to select resources of different levels, and the specific algorithm flow is as follows (expressed as algorithm 1):

1) initialization setting: by sampling

Setting a channel coefficient ξ_sj，

defining an anchor node action as

Defining five states (S) of training accuracy₀-S₄) In turn isPoor, general, better, very good;

2) training process: for each training of each node in the training library:

Q_old＝Q(s_t,a_t)；

4) inputting a true position p of a target node_k；

as the number of target nodes increases, the dimensions of the state and action space will also increase and exhibit exponentially increasing complexity, making the reinforcement learning algorithm more challenging. Therefore, a distributed alternative to linear complexity is necessary. A distributed solution is formulated in P2, that is, a resource optimization framework is constructed with an objective function of minimizing the power consumption of the positioning system under the constraint condition that the system itself meets the positioning accuracy requirement of each node.

In the formula (11), the reaction mixture is,

represents N_aA positioning accuracy threshold requirement of the target node.

The position estimates are actually obtained with errors. It is necessary to establish the resource allocation scheme shown in P3, because the direct resource allocation to the deviated position may cause a large positioning error.

s.t.p∈(η₁×η₂) (17)

(6)-(9)

Among them, the model of P3 follows the principle of robustness optimization as maximum M (P)_k) Minimization, i.e. assuming that the target node is present in the peripheral region of the position estimate given by the positioning algorithm, η₁、η₂The length of the uncertain region side. This scheme may minimize the value of the maximum MSE for all possible nodes within the uncertainty region compared to other resource allocation schemes.

N can be obtained by utilizing the ranging module in combination with the actual positioning scene_b×N_aThe wireless ranging link, as shown in equation (18),

designing an anchor node topology and a matrix R as input, and outputting an estimated target node position and a link selection scheme, wherein if the current anchor node is allocated with resources, the anchor node is turned on for ranging, and if the current anchor node is not allocated with resources, the current anchor node enters a dormant state. The robust ranging link selection algorithm is as follows:

1) sampling

Target nodes needing positioning, and a training set represented as

Based on anchor node N_bAnd a target node

Actual distance measurement of N_b×N_sObtaining a ranging matrix R;

Culling collections

Obtaining a minimum uncertainty region eta containing the new set₁×η₂；

Training process in resource allocation algorithm (algorithm 1) is carried out on each vertex, and a ranging link is selected according to whether the anchor node is allocated to the resource or not;

recording the link selection scheme of each vertex;

3) The step 2) is circulated until the training is finished N_sA target node;

4) output N_sTarget node position estimation

And a corresponding robust link selection scheme.

Results of the embodiment of the present invention are shown in fig. 2, 3 and 4, for example, fig. 2 is a comparison of performances of the reinforcement learning algorithm provided by the present invention in a single target node scenario; FIG. 3 is a comparison of the performance of the distributed optimization model proposed by the present invention; fig. 4 is a performance comparison of the robust link selection algorithm provided by the present invention, and it can be seen from the result comparison graph that the resource optimization method provided by the present invention can effectively improve the positioning accuracy.

In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process or method.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A method for optimizing wireless positioning network resources is characterized by comprising the following steps:

(1) for use in wireless location networks

Target node k position estimation using mean square error measurement

(2) Error value of target node positioning precision

Setting reward to guide the anchor node to select resources of different grades by minimization to obtain an optimal resource allocation strategy, and then according to the input real position p of the target node_kObtaining suboptimal resource allocation actions of the anchor nodes by combining a proximity algorithm or a BP neural network mode;

(4) for target node

Cannot be larger than a positioning precision threshold;

(5) position estimation for target node k

There is a problem of an error in that,further optimizing the optimal resource allocation strategy obtained in the step (4) to obtain the maximum positioning precision error value of all possible target nodes in the uncertain region

And a corresponding robust link selection scheme.

2. The method of claim 1, wherein the resource allocation algorithm in step (3) is implemented by the following steps:

1) initialization setting: by sampling

Setting a channel coefficient ξ_sj，

defining an anchor node action as

2) training process: for each training of each node in the training library:

Q_old＝Q(s_t,a_t)；

a set of resource allocation actions for all anchor nodes is selected from the action set, according to the resource allocation scheme at time t +1,

outputting the error value of positioning accuracy at the moment of t +1

solving the strategy by using an epsilon-greedy algorithm, wherein epsilon is the probability of 'exploration', and the value range is [0,1 ]:

4) inputting a true position p of a target node_k；

3. the method for optimizing the resources of the wireless positioning network as claimed in claim 2, wherein the step (6) of implementing the robust link selection specifically comprises the steps of:

1) sampling

Target nodes needing positioning, and a training set represented as

Based on anchor node N_bAnd a target node

Actual distance measurement of N_b×N_sObtaining a ranging matrix R;

Culling collections

InGroup value, obtaining minimum uncertainty region eta containing new set₁×η₂；

recording the link selection scheme of each vertex;

3) The step 2) is circulated until the training is finished N_sA target node;

4) output N_sTarget node position estimation

And a corresponding robust link selection scheme.