CN116449836A

CN116449836A - Reconfigurable intelligent surface-assisted multi-robot system track planning method

Info

Publication number: CN116449836A
Application number: CN202310365852.XA
Authority: CN
Inventors: 刘元玮; 高新宇; 董杰
Original assignee: Beijing Tiantan Intelligent Technology Co ltd
Current assignee: Beijing Tiantan Intelligent Technology Co ltd
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-07-18
Anticipated expiration: 2043-04-07
Also published as: CN116449836B

Abstract

The invention discloses a reconfigurable intelligent surface-assisted multi-robot system track planning method. Firstly, establishing a communication model of a multi-robot system, then providing an integrated machine learning scheme, combining a long-term memory-autoregressive integrated moving average model and a duel-depth Q network algorithm, respectively predicting initial and final positions of robots and planning tracks under the condition of avoiding overestimation of action values, and finally optimizing by using the scheme to obtain the optimal track of the multi-robot system. The invention provides a novel track planning method of a reconfigurable intelligent surface auxiliary multi-robot system, which has good application value.

Description

Reconfigurable intelligent surface-assisted multi-robot system track planning method

Technical Field

The invention relates to the field of wireless communication, in particular to a multi-communication robot track planning method based on an indoor environment.

Background

Today, robots are widely considered to be much less capable when working independently, with the true strength being the cooperation of multiple robots. As a result, multi-robot systems in shared environments have attracted considerable attention in various emerging applications, such as cargo transportation, automatic patrol, and emergency rescue. In these scenarios, robots need to coordinate with each other to achieve some well-defined goal, e.g., move from one given location to another. However, with the increasing complexity of application environments, significant local computing resources are consumed in cooperatively processing tasks in a multi-robot system. Wireless communication using advanced multiple access techniques is of great importance for multi-robot systems due to the collaborative requirements and high computational complexity of trajectory planning in multi-robot systems.

In certain communication areas, such as blind spots, there is still a problem of spectrum shortage. To address this problem, reconfigurable smart surfaces are potential candidates for improving spectral efficiency, which is a passive device that can actively reflect signals to users. In particular, the use of a reconfigurable intelligent surface enables the creation of a virtual line of sight between a base station and a robot when the robot is located in a communication blind zone. In view of the advantages of reconfigurable smart surfaces and non-orthogonal multiple access techniques, reconfigurable smart surface assistance is considered as a potential solution to efficiently address the problem of multi-robot system trajectory design. Therefore, inspired by the advantage of reconfigurable intelligent surfaces, reconfigurable intelligent surface-aided multi-robot system trajectory planning has been considered as one of the candidates for next-generation wireless communication robot navigation.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a reconfigurable intelligent surface-assisted multi-robot system track planning method.

In order to achieve the above purpose, the present invention adopts the following technical methods:

a reconfigurable intelligent surface-assisted multi-robot system track planning method comprises the following steps:

step one: establishing a communication model of a reconfigurable intelligent surface-assisted multi-robot system; the communication model specifically comprises a single antenna base station, L mobile robots and a reconfigurable intelligent surface with K reflecting elements;

defining channels from the single antenna base station to the reconfigurable intelligent surface, from the reconfigurable intelligent surface to the first mobile robot and from the single antenna base station to the first mobile robot as h respectively ^H ∈C ^1×K 、g _i ∈C ^K×l And I _i ∈C ^1×l The method comprises the steps of carrying out a first treatment on the surface of the In addition, for reconfigurable intelligent surfaces, t.epsilon.0, T]The reflection coefficient matrix at the moment is expressed as:

wherein beta is _k And theta _k Respectively representing the amplitude and phase of the kth reflective element; the position q of the first mobile robot at the time t _i (t) the received signal is:

wherein n-CN (0, delta) ² ) Representing zero mean and variance as delta ² Additive white gaussian noise of S _i (t) and S _j (t) is the transmission symbol of the i-th mobile robot and the j-th mobile robot, respectively; in addition, the decoding order value thereof satisfies O (j)>O (i), indicating that the decoding order of the ith mobile robot is prioritized over that of the jth mobile robot, the received rate of the ith mobile robot and the rate of decoding the jth mobile robot are expressed as:

wherein p is _i (t)、p _j (t) and d represent the transmission power of the ith mobile robot and the d mobile robot, respectively; then the trajectory optimization problem is expressed as maximizing the total communication rate for all users over the 0 to T period;

step two: utilizing integrated machine learning combined with a long-term memory-autoregressive integrated moving average model and a duel-depth Q network algorithm to respectively predict initial and final positions of the robot and conduct track planning under the condition of avoiding overestimation of action value;

step three: optimizing to obtain an optimal track of the multi-robot system on the basis of the second step; when the neural network reaches convergence through training, the optimal robot track can be output.

Further, the specific process of the second step is as follows:

predicting possible initial termination position sets of all robots respectively according to long-term memory network and autoregressive integrated moving average modelAnd->The method comprises the following steps:

wherein, alpha, beta and gamma respectively represent parameters of a long-term and short-term memory network and parameters of an autoregressive integrated moving average model; s is S _max 、S _min 、And->Respectively representing the maximum value in the training sample, the minimum value in the training sample, the predicted value of the autoregressive model and the predicted value of the moving average model; then, the weights are assigned according to the CRITIC weight method, and +.>Andfusion was performed as follows:

wherein w is ₁ And w ₂ Respectively representing the assigned weights, and optimizing by the duel-bucket double-depth Q network learning; in a duel-level double-depth Q network, a single antenna is definedThe base station is an agent, and the decentralization processing is performed simultaneously to avoid the incapability of convergence, so that a loss function can be obtained as follows:

wherein mu ^C ,And->Respectively representing parameters of a convolution layer in the duel double-depth Q network, parameters of a first dense layer in the duel double-depth Q network and parameters of a second dense layer in the duel double-depth Q network; Γ (e'), Q _e 、Q _f And f ^max (e', μ) each representing a state feature vector, a state Q estimation function, an action Q estimation function, and an action corresponding to a maximum Q value; furthermore, e' and v represent current action, next action and current network parameters, respectively; based on the loss function, the agent learns to converge, and then the optimal network parameters can be output.

The present invention also provides a computer readable storage medium having stored therein a computer program which when executed by a processor implements the above method.

The invention also provides a reconfigurable intelligent surface-assisted multi-robot system, which comprises a processor and a memory, wherein the memory is used for storing a computer program; the processor is configured to execute the computer program to implement the above method.

The invention has the beneficial effects that: the invention provides an integrated machine learning solution for a reconfigurable intelligent surface-assisted multi-robot system to solve the track planning problem. Firstly, establishing a communication model of a multi-robot system, then providing an integrated machine learning scheme, combining a long-term memory-autoregressive integrated moving average model and a duel-depth Q network algorithm, respectively predicting initial and final positions of robots and planning tracks under the condition of avoiding overestimation of action values, and finally optimizing by using the scheme to obtain the optimal track of the multi-robot system. The invention provides a solution for track planning of the reconfigurable intelligent surface-assisted multi-robot system, and has good application value.

Drawings

FIG. 1 is a general idea of a method according to an embodiment of the invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings, wherein the present embodiment is provided with the technical method as a premise, and a detailed implementation manner and a specific operation process are provided, and the protection scope of the present invention is not limited to the present embodiment.

The embodiment provides a reconfigurable intelligent surface-assisted multi-robot system track planning method. By the method, the navigation problem of the multi-robot system can be realized. As shown in fig. 1, the method of this embodiment first establishes a communication model of a multi-robot system, and then proposes an integrated machine learning scheme, which combines a long-term memory-autoregressive integrated moving average model and a duel-depth Q network algorithm, respectively predicts initial and final positions of robots and performs trajectory planning under the condition of avoiding overestimation of action values, and finally optimizes by using the scheme to obtain an optimal trajectory of the multi-robot system.

The reconfigurable intelligent surface-assisted multi-robot system track planning method specifically comprises the following steps:

step one: establishing a communication model of a reconfigurable intelligent surface-assisted multi-robot system; the communication model comprises a single antenna base station, L mobile robots and a reconfigurable intelligent surface with K reflecting elements;

defining channels from the single antenna base station to the reconfigurable intelligent surface, from the reconfigurable intelligent surface to the first mobile robot and from the single antenna base station to the first mobile robot as h respectively ^H ∈C ^1×K 、g _i ∈C ^K×1 And I _i ∈C ^1×1 . In addition, for reconfigurable intelligent surfaces, t.epsilon.0, T]Moment of reflection coefficientThe array can be expressed as:

wherein n-CN (0, delta) ² ) Representing zero mean and variance as delta ² Additive white gaussian noise of S _i (t) and S _j (t) is the transmission symbol of the i-th mobile robot and the j-th mobile robot, respectively. In addition, the decoding order value thereof satisfies O (j)>O (i), indicating that the decoding order of the ith mobile robot is prioritized over the jth mobile robot, the received rate of the ith mobile robot and the rate of decoding the jth mobile robot may be expressed as:

wherein p is _i (t) and d represent the transmission power of the ith mobile robot and the d-th mobile robot, respectively. Then the trajectory optimization problem is expressed as maximizing the total communication rate for all users over the 0 to T period.

Step two: and (3) performing track planning on initial and final position predictions of the robot respectively under the condition of avoiding overestimation of action value by utilizing integrated machine learning combined with a long-term memory-autoregressive integrated moving average model and a duel-depth Q network algorithm.

As a variant of recurrent neural network, long-short-term memory networks handle non-stationary and non-linear numbers efficientlyThe sequence is excellent. However, long and short term memory does not completely solve the problem of gradient extinction for long sequences. Autoregressive integrated moving average models, which do not suffer from this problem, provide an effective solution to linear sequence data. However, it is a temporal prediction model, essentially capturing linear relationships, and cannot involve nonlinear relationships. In this embodiment, the possible initial termination position sets of all robots are respectively predicted according to the long-term memory network and the autoregressive integrated moving average modelAnd->The method comprises the following steps:

wherein, alpha, beta and gamma respectively represent parameters of the long-term and short-term memory network and parameters of the autoregressive integrated moving average model. S is S _max 、S _min 、And->Representing the maximum value in the training sample, the minimum value in the training sample, the predicted value of the autoregressive model, and the predicted value of the moving average model, respectively. Then, the weights are assigned according to the CRITIC weight method, and +.>Andfusion was performed as follows:

wherein w is ₁ And w ₂ Respectively representing the assigned weights, and optimizing by the duel-bucket double-depth Q network learning. In the duel-bucket dual-depth Q network, a single-antenna base station is defined as an agent, and the decentralization processing is performed simultaneously to avoid the incapability of converging, so that a loss function can be obtained as follows:

wherein mu ^C ,And->The parameters of the convolution layer in the duel double-depth Q network, the parameters of the first dense layer in the duel double-depth Q network and the parameters of the second dense layer in the duel double-depth Q network are respectively represented. Γ (e'), Q _e 、Q _f And f ^max (e', μ) each represents a state feature vector, a state Q estimation function, an action Q estimation function, and an action corresponding to the maximum Q value. In addition, e' and μ represent the current action, the next action and the current network parameters, respectively. Based on the loss function, the agent learns to converge, and then the optimal network parameters can be output.

Step three: and on the basis of the second step, optimizing to obtain the optimal track of the multi-robot system. When the neural network reaches convergence through training, the optimal robot track can be output.

Various modifications and variations of the present invention will be apparent to those skilled in the art in light of the foregoing teachings and are intended to be included within the scope of the following claims.

Claims

1. The reconfigurable intelligent surface-assisted multi-robot system track planning method is characterized by comprising the following steps of:

2. The method according to claim 1, wherein the specific process of the second step is:

wherein w is ₁ And w ₂ Respectively representing the assigned weights, and optimizing by the duel-bucket double-depth Q network learning; in the duel-bucket dual-depth Q network, a single-antenna base station is defined as an agent, and the decentralization processing is performed simultaneously to avoid the incapability of converging, so that a loss function can be obtained as follows:

wherein mu ^C ,And->Respectively representing parameters of a convolution layer in the duel double-depth Q network, parameters of a first dense layer in the duel double-depth Q network and parameters of a second dense layer in the duel double-depth Q network; Γ (e'),Q _e 、Q _f And f ^max (e', μ) each representing a state feature vector, a state Q estimation function, an action Q estimation function, and an action corresponding to a maximum Q value; furthermore, e' and μ represent the current action, the next action and the current network parameters, respectively; based on the loss function, the agent learns to converge, and then the optimal network parameters can be output.

3. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-2.

4. A reconfigurable intelligent surface-assisted multi-robot system comprising a processor and a memory for storing a computer program; the processor being adapted to implement the method of any of claims 1-2 when the computer program is executed.