CN117633402B

CN117633402B - Unmanned carrier intelligent robust navigation method and system in unknown complex environment

Info

Publication number: CN117633402B
Application number: CN202311524311.3A
Authority: CN
Inventors: 朱祥维; 何少烽; 马岳鑫; 翁淇臻; 翟春磊; 刘格陆; 沈丹
Original assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Current assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-06-25
Anticipated expiration: 2043-11-15
Also published as: CN117633402A

Abstract

The invention relates to the technical field of unmanned navigation, in particular to an unmanned carrier intelligent robust navigation method and system under an unknown complex environment, comprising the steps of acquiring state space information under the unknown complex environment; carrying out path planning on the state space information by utilizing a pre-constructed intelligent navigation network model to obtain optimal navigation information; the intelligent navigation network model comprises a lightweight intelligent filtering module and an autonomous path planning network module adopting a deep reinforcement learning SAC algorithm which are connected in sequence; the lightweight intelligent filtering module comprises a Kalman filtering prediction module, a lightweight cyclic neural network module and a Kalman filtering updating module which are sequentially connected. According to the invention, the lightweight intelligent filtering module and the deep reinforcement learning SAC algorithm are combined, and the Kalman gain can be adaptively learned, so that the navigation task can be rapidly and reliably completed, the method has stronger interpretability, and the high-efficiency and real-time requirements of navigation are met.

Description

Unmanned carrier intelligent robust navigation method and system in unknown complex environment

Technical Field

The invention relates to the technical field of unmanned navigation, in particular to an unmanned carrier intelligent robust navigation method and system under an unknown complex environment.

Background

Unmanned carriers (unmanned aerial vehicles, robots, unmanned vehicles, autonomous underwater vehicles and the like) play an important role in civil and military fields, the ability of the unmanned carriers to perceive and autonomously control decisions is very critical when tasks are executed, and currently common autonomous navigation technologies include a synchronous positioning and mapping technology (Simultinous Localization AND MAPPING, SLAM) algorithm, combined navigation based on fusion of various sensors of an inertial navigation system (Inertial Navigation System, INS), and models developed in the two technologies, such as: visual SLAM, laser SLAM, fusion global satellite navigation system (Global Navigation SATELLITE SYSTEM, GNSS), inertial measurement unit (Inertial Measurement Unit, IMU), and visual satellite-Inertial (GVI) SLAM, etc.

However, the SLAM algorithm needs to spend a large amount of time to build the environment before the navigation, and then the navigation is realized by using the path planning algorithm, when the unmanned carrier is used for replacing a working scene, the SLAM algorithm needs to perform repeated image building work, and new navigation starts after new image building work is completed, because under the unknown complex navigation environment, scene variability requires that the algorithm required by the unmanned carrier has stronger universality, namely, the navigation task can be reliably and quickly completed under the condition of no map, the SLAM algorithm is difficult to be applied to the unknown complex navigation environment, and in the unknown complex environment, the measurement information of the sensor is provided with noise, and the noise is very likely to be non-Gaussian or nonlinear related noise, the Kalman filtering (KALMAN FILTER, KF) is the optimal solution to a system represented by a completely known linear Gaussian state space model, and therefore, aiming at the problem of nonlinearity in practice, the Kalman filtering (EKF), the Unscented Kalman Filtering (UKF) and the Particle Filtering (PF) are proposed, the method is difficult to be applied to the accurate and difficult to know the accurate result of the unknown environment, and the unknown environment is difficult to be more difficult to be achieved under the condition of the unknown complex environment, and the condition is difficult to be more difficult to accurately known, and the accurate result is difficult to be found on the unknown environment.

In recent years, artificial neural network deep learning promotes the rapid development of artificial intelligence, in order to break through the bottleneck of deep learning, academic world proposes brain-like intelligence, in the navigation field, also many scholars propose brain-like SLAM in the sense of bionics, and many models, such as RatSLAM, neuroSLAM, dolphinSLAM, are proposed by combining with navigation function cells found in the brain science field, so how to apply deep learning to unmanned carrier robust navigation scenes is an important problem to be studied in unknown complex navigation environments.

Disclosure of Invention

The invention provides an unmanned carrier intelligent robust navigation method and system in an unknown complex environment, which solve the technical problem that the existing autonomous navigation technology cannot reliably and efficiently meet the requirements of non-Gaussian or nonlinear noise and scene variability in the unknown complex navigation environment.

In order to solve the technical problems, the invention provides an unmanned carrier intelligent robust navigation method and system under an unknown complex environment.

In a first aspect, the invention provides an unmanned carrier intelligent robust navigation method in an unknown complex environment, which comprises the following steps:

acquiring state space information in an unknown complex environment, wherein the state space information comprises state vector information and measurement vector information;

Carrying out path planning on the state space information by utilizing a pre-constructed intelligent navigation network model to obtain optimal navigation information; the intelligent navigation network model comprises a lightweight intelligent filtering module and an autonomous path planning network module adopting a deep reinforcement learning SAC algorithm which are connected in sequence; the lightweight intelligent filtering module comprises a Kalman filtering prediction module, a lightweight cyclic neural network module and a Kalman filtering updating module which are sequentially connected.

In a further embodiment, the step of processing the state space information by using a pre-constructed intelligent navigation network model to obtain optimal navigation information includes:

filtering the state space information by adopting a lightweight intelligent filtering module to obtain an optimal state posterior estimated value at the current moment;

And carrying out path planning on the optimal state posterior estimation value at the current moment according to an autonomous path planning network module adopting a deep reinforcement learning SAC algorithm to obtain optimal navigation information.

In a further embodiment, the lightweight recurrent neural network module comprises an input layer, a double-gated recurrent unit network element, a first feedforward neural network element and an output layer which are connected in sequence; the first feedforward neural network unit is also connected with a second feedforward neural network unit; the dual-gating circulation unit network unit comprises a state-gating circulation unit network unit and a measurement-gating circulation unit network unit which are connected in parallel.

In a further embodiment, the step of filtering the state space information by using a lightweight intelligent filtering module to obtain an optimal state posterior estimate at the current moment includes:

establishing a state space model in an unknown complex environment, wherein the state space model comprises a state equation and a measurement equation;

Predicting a system state space according to the state space model to obtain a priori statistical moment at the current moment, wherein the priori statistical moment comprises a state priori estimated moment and a measurement priori estimated moment, the state priori estimated moment comprises a state priori first-order moment and a state priori second-order moment, and the measurement priori estimated moment comprises a measurement priori first-order moment and a measurement priori second-order moment;

calculating to obtain measurement information according to the measurement vector information at the current moment and the measurement prior first moment;

calculating to obtain state estimation innovation according to the optimal state posterior estimation value and the state priori first moment at the previous moment;

The lightweight cyclic neural network module is adopted to learn the measurement information and the state estimation information, so that an optimal Kalman gain is obtained;

and inputting the optimal Kalman gain, the state priori first moment and the measurement information at the current moment into the Kalman filtering updating module for updating to obtain an optimal state posterior estimated value at the current moment.

In a further embodiment, the step of learning the measurement information and the state estimation information by using the lightweight recurrent neural network module to obtain an optimal kalman gain includes:

calculating to obtain a measurement characteristic difference value according to measurement vector information of the current moment and the previous moment;

calculating to obtain a state characteristic difference value according to the optimal state posterior estimation values of two continuous moments before the current moment;

Inputting the state estimation innovation and the state characteristic difference value into a state gating circulation unit network unit for learning to obtain a state prior second moment at the current moment;

inputting the measurement information and the measurement characteristic difference value into the measurement gating circulation unit network unit for learning to obtain a measurement prior second moment at the current moment;

The first feedforward neural network unit is utilized to learn the state prior second moment and the measurement prior second moment, so that an optimal Kalman gain is obtained;

and inputting the state prior second moment, the measured prior second moment and the optimal Kalman gain into a second feedforward neural network unit for learning to obtain a state posterior second moment at the current moment, and feeding back the state posterior second moment at the current moment to the state gating circulation unit network unit.

In a further embodiment, the optimal state posterior estimate calculation formula is:

Wherein,

In the method, in the process of the invention,The optimal state posterior estimation value of the current time t is represented; /(I)Representing a state prior first moment at the current moment t; k _t represents the optimal Kalman gain at the current time t; Δy _t represents measurement information; /(I)Representing a transpose of the jacobian matrix of measurements; sigma _t|t-1 represents the state prior second moment at the current time t; s _t|t-1 represents a measurement prior second moment at the current moment; y _t represents measurement vector information; /(I)Representing the measured a priori first moment at the current time t.

In a further embodiment, the step of obtaining the optimal navigation information by performing path planning on the optimal state posterior estimate at the current moment according to the autonomous path planning network module adopting the deep reinforcement learning SAC algorithm includes:

Performing transformation processing on the optimal state posterior estimation value at the current moment to obtain an agent state space;

defining an intelligent body action space and a reward function, and initializing an experience pool;

Outputting an optimal action value under the current state according to the strategy model, and transferring to the next state to obtain an experience tuple; the experience tuple comprises a current state, an action, a reward, a next state and a navigation ending mark;

and adding the experience tuples into the experience pool, and updating network parameters when the number of experiences reaches a preset threshold value to generate optimal navigation information.

In a second aspect, the present invention provides an unmanned carrier intelligent robust navigation system in an unknown complex environment, the system comprising:

The data acquisition module is used for acquiring state space information in an unknown complex environment, wherein the state space information comprises state vector information and measurement vector information;

the intelligent navigation module is used for planning a path of the state space information by utilizing a pre-constructed intelligent navigation network model to obtain optimal navigation information; the intelligent navigation network model comprises a lightweight intelligent filtering module and an autonomous path planning network module adopting a deep reinforcement learning SAC algorithm which are connected in sequence; the lightweight intelligent filtering module comprises a Kalman filtering prediction module, a lightweight cyclic neural network module and a Kalman filtering updating module which are sequentially connected.

In a third aspect, the present invention also provides a computer device, including a processor and a memory, where the processor is connected to the memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the computer device performs steps for implementing the method.

In a fourth aspect, the present invention also provides a computer readable storage medium having stored therein a computer program which when executed by a processor performs the steps of the above method.

The invention provides an unmanned carrier intelligent robust navigation method and system under an unknown complex environment, wherein the method comprises the steps of obtaining state space information under the unknown complex environment, and carrying out path planning on the state space information by utilizing a pre-constructed intelligent navigation network model to obtain optimal navigation information; the intelligent navigation network model comprises a lightweight intelligent filtering module and an autonomous path planning network module adopting a deep reinforcement learning SAC algorithm which are connected in sequence; the lightweight intelligent filtering module comprises a Kalman filtering prediction module, a lightweight cyclic neural network module and a Kalman filtering updating module which are sequentially connected. Compared with the prior art, the lightweight intelligent filtering module adopted by the method is embedded with the lightweight cyclic neural network module in the Kalman filtering algorithm, so that the noise problem in an unknown complex environment can be solved, the adaptive learning of the optimal Kalman gain is realized, the interpretability is relatively strong, and the high-efficiency and real-time requirements of filtering are met; meanwhile, the invention realizes autonomous path planning by using a deep reinforcement learning algorithm, avoids the work of reconstructing the map after scene switching, and improves the efficiency and the universality of navigation decision.

Drawings

Fig. 1 is a schematic flow diagram of an intelligent robust navigation method of an unmanned carrier in an unknown complex environment according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an unmanned intelligent navigation system provided by an embodiment of the present invention;

FIG. 3 is a block diagram of KALMANNET filtering implementations provided by embodiments of the present invention;

FIG. 4 is a schematic diagram of the RNN structure in KALMANNET according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a SAC algorithm architecture according to an embodiment of the present invention;

FIG. 6 is a block diagram of an unmanned carrier intelligent robust navigation system in an unknown complex environment provided by an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following examples are given for the purpose of illustration only and are not to be construed as limiting the invention, including the drawings for reference and description only, and are not to be construed as limiting the scope of the invention as many variations thereof are possible without departing from the spirit and scope of the invention.

Referring to fig. 1, an embodiment of the present invention provides an intelligent robust navigation method for an unmanned carrier in an unknown complex environment, as shown in fig. 1, the method includes the following steps:

s1, acquiring state space information in an unknown complex environment, wherein the state space information comprises state vector information and measurement vector information.

S2, carrying out path planning on the state space information by utilizing a pre-constructed intelligent navigation network model to obtain optimal navigation information.

In this embodiment, the intelligent navigation network model includes a lightweight intelligent filtering module and an autonomous path planning network module that adopts a deep reinforcement learning SAC algorithm, which are sequentially connected, as shown in fig. 2, and the step of processing the state space information by using the pre-built intelligent navigation network model to obtain optimal navigation information includes:

The robust navigation (Robust Navigation) refers to a technology capable of performing autonomous navigation in complex and uncertain environments, and based on autonomous mobile devices such as robots, unmanned vehicles and the like, the development of the robust navigation technology also faces some challenges, such as: in complex environments, intelligent navigation devices such as robots and the like need to be able to adapt and adjust strategies quickly to cope with various uncertain factors, and it is to be noted that although traditional kalman filtering and variants thereof are still widely applied to the navigation field, common to these traditional algorithms is a model-based method, which often encounters a problem of model mismatch in unknown complex environments, and this will lead to unreliable filtering results and difficulty in completing navigation tasks, while in recent years, deep learning attracts a great deal of attention in the navigation field, and deep learning attracts a great deal of importance in academia and industry due to its strong representation capability, and research shows that common continuous nonlinear functions can be approximated by a feedforward neural network, a cyclic neural network (Recurrent Neural Network, RNN) is a neural network with nodes directionally connected into a loop, which is suitable for processing time series data, and in fact, a completely connected cyclic network can be used as an approximator of any nonlinear power system, so KALMANNET is used as a real-time state estimator, which becomes a model integrating deep learning and kalman filtering.

In an unknown complex environment, if the state transfer function and the observation function are known, but the noise is non-gaussian nonlinear, the covariance matrix of the process noise and the measurement noise cannot be obtained, which results in that the kalman gain cannot be calculated, so the embodiment uses a lightweight intelligent filtering module KALMANNET formed by combining the kalman filtering and the lightweight RNN neural network structure to process the model mismatch and the nonlinear problem, so as to learn Xi Kaer man gain from the acquired environmental observation information, and integrate the kalman gain into the EKF framework, and finally obtain a more accurate and reliable navigation state estimation value, that is, in the embodiment, the step of using the lightweight intelligent filtering module to filter the state space information to obtain the optimal state posterior estimation value at the current moment specifically includes:

Predicting a system state space according to the state space model to obtain a priori statistical moment at the current moment, wherein the priori statistical moment comprises a state priori estimated moment and a measurement priori estimated moment;

Calculating to obtain measurement information according to the measurement vector information and the measurement priori estimated moment at the current moment;

Calculating to obtain state quantity information according to the optimal state posterior estimated value and the state prior estimated moment at the previous moment;

the lightweight cyclic neural network module is adopted to learn the measurement information and the state quantity information, so that an optimal Kalman gain is obtained;

Specifically, in the kalman filtering process, the embodiment combines the state equation and the measurement equation of the traditional discrete time, nonlinear and gaussian system to construct a state space model adopted by the lightweight intelligent filtering module KALMANNET for filtering, where the state space model includes the state equation and the measurement equation, and for the state equation, the state vector information at the time t is defined as:

Wherein x _t represents state vector information of the unmanned intelligent navigation system at the time t; Representing the two-dimensional coordinate of the unmanned intelligent navigation system at the time t; /(I) The orientation angle (the included angle between the moving direction and the x axis) of the unmanned intelligent navigation system is represented; n represents the angular resolution of the lidar; d _i,t denotes the distance in the i-th direction; /(I)Representing a one-dimensional vector space of length (3+n).

Assuming that an unmanned intelligent navigation system (agent) moves at a speed v _t and a yaw angle dθ _t in an unknown complex environment, there are:

From this, the state equation can be derived:

In the method, in the process of the invention, State transition process noise at time t is represented; f (·) represents a nonlinear state transfer function; /(I)The process noise in the x, y, θ directions at time (t-1) are shown, respectively.

According to the embodiment, unknown environment observation information is collected through the sensor module, the sensor module comprises a positioning module (a GNSS positioning module and the like are suitable for outdoor positioning, UWB, WIFI positioning module and the like are suitable for indoor positioning) and a ranging module (a sensor such as a laser radar), and measurement vector information is generated by sensor data of the positioning module and the ranging module, namely:

from this, the measurement equation can be derived:

y_t＝h(x_t)+v_t

Wherein yt represents measurement vector information of the unmanned intelligent navigation system at the time t; (x _t,y_t) represents that the sensor of the unmanned intelligent navigation system measures the two-dimensional coordinate at the time t; θ _t represents the sensor measurement orientation angle of the unmanned intelligent navigation system; r _i,t represents the measurement distance in the i-th direction; v _t denotes sensor measurement noise; h (·) represents a nonlinear observation function.

After the state equation and the measurement equation are established, the embodiment utilizes the lightweight intelligent filtering module to realize filtering prediction and updating, and the lightweight intelligent filtering module comprises the following specific steps of:

the embodiment predicts the prior statistical moment of the current moment t based on the posterior estimation of the previous moment (t-1), wherein the prior statistical moment of the current moment t comprises a state prior estimated moment and a measurement prior estimated moment; the state prior estimated moment comprises a state prior first-order moment and a state prior second-order moment, the measurement prior estimated moment comprises a measurement prior first-order moment and a measurement prior second-order moment, and the state prior first-order moment and the state prior second-order moment are respectively as follows:

Wherein,

In the method, in the process of the invention,Representing a state prior first moment at the current moment t; /(I)A state posterior first moment representing a previous time (t-1); v _t-1 represents the motion speed of the unmanned intelligent navigation system in an unknown complex environment; dθ _t-1 represents the yaw angle of the unmanned intelligent navigation system in an unknown complex environment; Σ _t∣t-1 represents the state prior second moment at the current time t; f _t represents a state jacobian matrix; Σ _t-1∣t-1 represents the state posterior second moment at the previous time (t-1); /(I)Representing a transpose of the state jacobian matrix; q _t represents the process noise covariance matrix at the current time t.

In this embodiment, the measurement prior first moment and the measurement prior second moment are respectively:

Wherein,

In the method, in the process of the invention,Representing a measurement prior first moment at the current t moment; /(I)Representing a state prior first moment at the current t moment; s _t∣t-1 represents a measurement prior second moment at the current t moment; h _t represents a metrology jacobian matrix; /(I)Representing a transpose of the metrology jacobian matrix; r _t represents the measurement noise covariance matrix at the current time t.

Because the process noise covariance matrix and the measurement noise covariance matrix are unknown data, in order to solve the problem, as shown in fig. 3, in this embodiment, a lightweight cyclic neural network module RNN structure is embedded in a kalman filtering algorithm to learn sensor data and an unknown environment, so as to obtain an optimal kalman gain, and the optimal kalman gain and a priori statistical moment are used for calculating a posterior state moment, in this embodiment, the lightweight intelligent filtering module includes a kalman filtering prediction module, a lightweight cyclic neural network module and a kalman filtering update module which are sequentially connected, where the lightweight cyclic neural network module includes an input layer, a double-gate cyclic neural network unit, a first feedforward neural network unit and an output layer which are sequentially connected; the first feedforward neural network unit is also connected with a second feedforward neural network unit; the dual-gating circulation unit network unit comprises a state-gating circulation unit network unit and a measurement-gating circulation unit network unit which are connected in parallel.

In this embodiment, the input features of the RNN structure of the lightweight recurrent neural network module include a measurement feature difference, a measurement innovation, a state feature difference and a state estimation innovation, where the measurement feature difference is calculated according to measurement vector information at a current time and a previous time, the measurement innovation is calculated according to the measurement vector information at the current time and the measurement prior first moment, the state feature difference is calculated according to the optimal state posterior estimate at two consecutive times before the current time, and the state estimation innovation is calculated according to the optimal state posterior estimate and the state prior first moment at the previous time, that is, the calculation formulas of the four input features are respectively:

In the method, in the process of the invention, Representing the measured characteristic difference; y _t represents measurement vector information at the current time t; y _t-1 represents the measurement vector information at the previous time (t-1); Δy _t represents measurement information; /(I)Representing a measurement prior first moment at the current time t; /(I)Representing a state characteristic difference value; /(I)The optimal state posterior estimation values of two continuous moments before the current moment are represented; /(I)Representing state estimation innovation; /(I)Representing a state prior first moment of a previous moment (t-1); it should be noted that, the state estimation innovation calculation formula is derived from a state update formula, and in actual scene application, the difference between the state estimation at the current time and the state estimation at the previous time and the difference between the state posterior estimation at the current time and the prior estimation are the characteristics of (t-1) time, so the input characteristics in the RNN are/>And/>

In this embodiment, the measured feature difference value, the measured information, the state feature difference value and the state estimation information are input into the lightweight cyclic neural network module for learning, so as to obtain an optimal kalman gain, and the specific steps include:

Specifically, as with the conventional Kalman filtering method, the filtering requires presetting an initial value x ₀ of a state value process noise covariance matrix Q ₀ and a measurement noise covariance matrix R ₀, wherein x ₀ is given outside the RNN structure, and the RNN requires Q ₀ and R ₀ as inputs at given time points, as shown in FIG. 4, the present embodiment takes the state estimation innovationAnd the state characteristic difference value/>The input state gating cyclic unit network unit GRU1 is used for learning, the GRU is a gating cyclic unit network, belongs to a variant structure of a cyclic neural network (Recurrent Neural Network), and keeps and updates information of input data through an update gate and a reset gate, so that state priori second-order moment sigma _t∣t-1 at the current moment is obtained; let the measured information Δy _t and the measured characteristic difference/>Inputting the measurement gating cycle unit into the GRU2 for learning, wherein the GRU2 captures and updates information of input data by using an update gate and a reset gate, so as to obtain a measurement prior second moment S _t∣t-1 at the current moment; then, the state prior second moment and the measurement prior second moment are input into the first feedforward neural network unit FCN1 (Feedforward Neural Network) for learning, so that an optimal Kalman gain is obtained; meanwhile, in this embodiment, the state prior second moment Σ _t∣t-1, the measured prior second moment S _t∣t-1 and the optimal kalman gain are input into the second feedforward neural network unit FCN2 to learn, so as to obtain a state posterior second moment at the current moment, and the state posterior second moment at the current moment is fed back to the state gating and circulating unit network unit, where the calculation formula of the optimal kalman gain is as follows:

The calculation formula of the state posterior second moment at the current moment is as follows:

Σ_t∣t＝Σ_t∣t-1-K_t·S_t∣t-1·K_t ^T

Wherein K _t represents the optimal Kalman gain; Representing a transpose of the metrology jacobian matrix; Σ _t∣t represents the state posterior second moment at the current time t; Σ _t∣t-1 represents the state prior second moment at the current time t; s _t∣t-1 represents a measurement prior second moment at the current time t; superscript-1 is the inverse of the matrix; t is the transposed symbol.

In general, compared with another variation LSTM of RNN, the stu adopted in this embodiment directly uses a gate to control the balance between input and forgetting, and has the advantages of simplicity and lower complexity, and in this embodiment, after learning to obtain the optimal kalman gain through the lightweight recurrent neural network module, the optimal kalman gain at the current time t, the state priori first moment and the measurement innovation are input into the kalman filtering update module to update, so as to obtain the optimal state posterior estimation value at the current time, where the specific calculation formula is:

In the method, in the process of the invention, The optimal state posterior estimation value of the current time t is represented; k _t represents the optimal kalman gain;

From the formula It can be seen that the kalman gain directly depends on the statistical information of the internal states and the observed data, so that the features input to the RNN need to include comprehensive statistical information of state change, observation process and estimation uncertainty, i.e. the embodiment takes the measured feature difference value, measured information, state feature difference value and state estimation information as the input features of the RNN structure of the lightweight recurrent neural network module.

In the training stage of the RNN structure network, the embodiment needs to design a reasonable loss function to find the optimal solution, wherein the square error loss function is one of the choices, and the square error loss is as follows:

And solving a bias guide for the optimal Kalman gain to obtain a gain bias guide formula:

from this, the gain partial derivative formula establishes a back propagation relationship from output to optimal kalman gain, indicating that the optimal kalman gain can be learned end-to-end using the squaring error loss function training KALMANNET, from which the optimal state posterior estimate is re-expressed as:

Where K _t (Θ) represents the Kalman gain learned by the neural network, Is a learnable parameter of the neural network, and lambda is an empirical value.

Is provided withIs a dataset containing L tracks of state-observation pairs, the first track containing data samples: /(I)

Defining a square error loss function:

And (3) obtaining the partial derivative of Θ:

Wherein,

When theta gets the optimal parameters theta ^*, there areAt this time, the RNN network outputs an optimal kalman gain K _t(Θ^*), updates the state, and obtains an optimal estimate/>

At present, deep reinforcement learning is a learning mode combining the advantage of deep learning on environmental cognition and the advantage of reinforcement learning decision, and an end-to-end learning method is applied, so that an intelligent agent can take optimal countermeasures for input signals in the environment, further the problem of more complex cognition decision is processed, the learning efficiency and accuracy of the intelligent agent can be improved through the deep reinforcement learning, the adaptability of the intelligent agent in the complex environment can be expanded, the possibility is provided for solving the wider practical problem, in the embodiment, the step of obtaining optimal navigation information comprises the following steps of:

Performing transformation processing on the optimal state posterior estimation value at the current moment to obtain an agent state space, and defining an agent action space and a reward function;

Initializing an experience pool, outputting actions in the current state according to a strategy, and transferring to the next state to obtain an experience tuple; the experience tuple comprises a current state, an action, a reward, a next state and a navigation ending mark;

The parameters of the neural network are updated using these empirical data when experience is accumulated to a certain storage threshold by adding the experience tuples to the experience pool, so as to maximize the jackpot, and the SAC algorithm is optimized in combination with actor network and critic network, wherein actor network is a strategy network for generating actions, critic network is a value for evaluating the current state, and the experience in the experience pool and the parameters of the current network can be periodically saved for subsequent training and path planning during training.

Specifically, a SAC (Soft Actor-Critic) algorithm based on collision probability is a deep reinforcement learning algorithm based on an Actor-Critic framework, compared with a deterministic strategy DDPG, the SAC algorithm uses a random strategy, and has the greatest characteristics that an entropy regularization technology is applied, so that the method has certain advantages, and the embodiment defines action entropy:

H(π(a_t|s_t))＝-logπ(a_t|s_t)

In the formula, S _t epsilon S represents the state of the agent at the time t; a _t epsilon A represents the action taken by the agent at time t; pi is the mapping from state space to action space, i.e., policy; pi (a|s) represents the probability of performing action a in state s.

The SAC algorithm target strategy is:

Where rt ₊₁ represents the reward for performing action a _t in state s _t; gamma epsilon (0, 1) represents a discount factor that is used to measure the importance of both the transient and long-term rewards; alpha represents an entropy regularization coefficient, and the higher the alpha, the greater the exploration randomness.

The state value function represents the cumulative return expectation for all actions taken to transition to all subsequent states at state s and thereafter according to policy pi, expressed as:

the state-action value function represents the cumulative return expectation for taking action a to transition to the next state at state s and then all actions to transition to all subsequent states according to policy pi, expressed as:

the relation between the state value function and the state-action value function is that

As shown in fig. 5, the SAC algorithm includes an Actor network, two Critic networks (Critic 1 and Critic 2, parameters are phi ₁ and phi ₂, respectively), and two Critic target networks (CRITIC TARGET 1 and CRITIC TARGET, parameters are phi _tgt,1 and phit _gt,2, respectively), and in this embodiment, the basic flow of the SAC algorithm is as follows:

Initially, the agent inputs the state st to the Actor, the Actor samples and outputs according to the probability distribution, obtains the prize r _t, the state transitions to s _t+1, puts the generated experience (s _t,a_t,r_t,s_t+1) into the experience pool, samples a batch of experience from the experience pool when the experience reaches a certain amount, and learns the Q function and the strategy pi.

Knowing the target point (x ^d,y^d), at the autonomous path planning network module, the state of the agent is specifically defined as:

s＝{d₁,d₂,...,d_N,ρ,ψ,v,P_c}

ψ＝arctan(y^d-y^a,x^d-x^a)-θ^a

P_c＝[p₁,p₂,...,p_i,...,p_N]

/>

Wherein d _i, i=, 1,2,..n is the ranging result of the ranging module in the direction of θ _i; ρ is the relative distance between the agent and the target point; psi is the relative orientation of the agent and the target point; v is the movement speed of the agent; p _i is the probability of collision in the direction of θ _i, which is described in terms of a softmax function.

The actions of the agent are defined as:

a∈{θ₁,θ₂,...,θ_N}

In this embodiment, the action space is all the angular directions measured by the ranging module, and the reward function of the agent is designed to be related to the relative distance and the relative orientation of the target point, specifically defined as:

Where β is an adjustable coefficient that adjusts the order of magnitude of the numerator to about the order of magnitude of the denominator.

After the state space, the action space and the rewarding function are defined in this embodiment, the learning process of the Q function and the policy pi will be specifically described, where the Q function learning process is as follows:

the loss functions of Critic 1 and Critic 2 are:

Wherein m is a termination mark, when the agent arrives in the field with the radius r ₀ of the target point (x ^d,y^d), navigation is finished, m=1, otherwise, m=0, wherein y (r, s', m) is a Q function target value, and a calculation formula is that

The strategy pi learning process is as follows:

the optimized loss function of the policy network Actor is:

It should be noted that, the optimization objective in the Q function target value formula and the policy network Actor optimization loss function is to take the minimum value of Q estimation of two Critic networks with the same function, and prevent over-estimation of Q value through the idea of SAC.

The deep learning is an algorithm requiring a large amount of sample data to train to obtain good generalization performance, in the embodiment, KALMANNET and SAC algorithms are both related to the deep learning, KALMANNET cannot be directly applied to a real environment, real data are required to perform parameter learning and adjustment of a network, in the process of deep reinforcement learning training, an intelligent body needs to interact with the environment in a large amount, learning is performed in a trial-and-error mode, if training is directly performed in the real environment, a system is damaged due to collision in the exploration process, and meanwhile, the training time is too long, so that the embodiment firstly performs training in a simulation environment, then evaluates a model in the simulation environment, and finally deploys the learned model on the real system to test, namely offline learning, wherein for KALMANNET filtering processing, high-precision sensor data can be adopted as a state truth value in an offline learning stage, and the obtained network parameters can be deployed on a low-cost sensor and applied to an unmanned navigation system; for the SAC algorithm, the system can be deployed only when the experience pool reaches a certain number (for example 10000) and the network is pre-trained, in practical application, the parameters of the lightweight intelligent filtering module are not updated any more, the autonomous path planning network module continuously accumulates experiences, the experience pool and the network parameters are updated at the same time, and when the experience pool reaches a certain condition, the network parameters are still updated.

The concept of collision probability is introduced in the deep reinforcement learning adopted by the embodiment, and an important reference is provided for the action selection of the initial intelligent agent, so that the method is helpful for reducing the collision times, the intelligent agent learns through positive feedback as much as possible instead of 'error making', the strategy is beneficial for optimizing the strategy network and evaluating the parameters of the network, and the robust navigation capability of the unmanned intelligent agent in the unknown complex environment is remarkably enhanced by the navigation method provided by the embodiment, so that the unmanned intelligent agent can conduct efficient and safe navigation under various conditions.

The embodiment of the invention provides an unmanned carrier intelligent robust navigation method under an unknown complex environment, which comprises the steps of acquiring state space information under the unknown complex environment; carrying out path planning on the state space information by utilizing a pre-constructed intelligent navigation network model to obtain optimal navigation information; the intelligent navigation network model comprises a lightweight intelligent filtering module and an autonomous path planning network module adopting a deep reinforcement learning SAC algorithm which are connected in sequence; the lightweight intelligent filtering module comprises a Kalman filtering prediction module, a lightweight cyclic neural network module and a Kalman filtering updating module which are sequentially connected. Compared with the prior art, the navigation method provided by the embodiment can effectively learn and adapt to nonlinear factors and dynamic changes of the environment by combining deep learning, reduces the risk of model mismatch, improves the reliability of navigation, and can effectively process noise and interference in the environment by embedding the lightweight cyclic neural network module in the Kalman filtering algorithm due to the fact that the deep learning has strong representation capability and generalization capability, so that the robustness of a navigation system is enhanced, and even under the condition that the interference and the noise exist, the method can obtain accurate and reliable filtering results, thereby improving the reliability and the interpretability of the navigation system.

It should be noted that, the sequence number of each process does not mean that the execution sequence of each process is determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

In one embodiment, as shown in fig. 6, an embodiment of the present invention provides an unmanned carrier intelligent robust navigation system in an unknown complex environment, the system comprising:

the data acquisition module 101 is configured to acquire state space information in an unknown complex environment, where the state space information includes state vector information and measurement vector information;

The intelligent navigation module 102 is configured to perform path planning on the state space information by using a pre-constructed intelligent navigation network model to obtain optimal navigation information; the intelligent navigation network model comprises a lightweight intelligent filtering module and an autonomous path planning network module adopting a deep reinforcement learning SAC algorithm which are connected in sequence; the lightweight intelligent filtering module comprises a Kalman filtering prediction module, a lightweight cyclic neural network module and a Kalman filtering updating module which are sequentially connected.

For specific limitation of the unmanned carrier intelligent robust navigation system under an unknown complex environment, reference may be made to the above limitation of the unmanned carrier intelligent robust navigation method under an unknown complex environment, and details are not repeated here. Those of ordinary skill in the art will appreciate that the various modules and steps described in connection with the disclosed embodiments of the application may be implemented in hardware, software, or a combination of both. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the invention provides an unmanned carrier intelligent robust navigation system under an unknown complex environment, which acquires state space information under the unknown complex environment through a data acquisition module; and inputting the state space information into a pre-constructed intelligent navigation network model through an intelligent navigation module to carry out path planning, so as to obtain the optimal navigation information. According to the embodiment, the Kalman filtering algorithm embedded with the lightweight cyclic neural network module and the deep reinforcement learning are combined to be applied to unmanned intelligent robust navigation in an unknown complex environment, kalman gain can be adaptively learned, the self-adaption capacity, robustness and interpretability of an unmanned intelligent body are improved, the reconstruction map work after scene switching is avoided, and the efficiency and the universality of navigation decision are improved.

FIG. 7 is a diagram of a computer device including a memory, a processor, and a transceiver connected by a bus, according to an embodiment of the present invention; the memory is used to store a set of computer program instructions and data and the stored data may be transferred to the processor, which may execute the program instructions stored by the memory to perform the steps of the above-described method.

Wherein the memory may comprise volatile memory or nonvolatile memory, or may comprise both volatile and nonvolatile memory; the processor may be a central processing unit, a microprocessor, an application specific integrated circuit, a programmable logic device, or a combination thereof. By way of example and not limitation, the programmable logic device described above may be a complex programmable logic device, a field programmable gate array, general purpose array logic, or any combination thereof.

In addition, the memory may be a physically separate unit or may be integrated with the processor.

It will be appreciated by those of ordinary skill in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be implemented, and that a particular computer device may include more or fewer components than those shown, or may combine some of the components, or have the same arrangement of components.

In one embodiment, an embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method.

According to the unmanned carrier intelligent robust navigation method and system under the unknown complex environment, the RNN network structure is embedded in the Kalman filtering algorithm, so that the measured data can be effectively filtered, the noise problem under the complex unknown environment is solved, the Kalman gain can be adaptively learned, the model has high interpretability, meanwhile, the filtering process is more efficient due to the light weight of the RNN network structure, and the real-time requirement is met; in addition, the embodiment adopts a deep reinforcement learning method to realize intelligent robust navigation of the unmanned carrier, and can avoid heavy graph reconstruction work after scene switching, thereby improving the efficiency and universality of navigation decision and being capable of better coping with various complex situations.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., SSD), etc.

Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed, may comprise the steps of embodiments of the methods described above.

The foregoing examples represent only a few preferred embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the application. It should be noted that modifications and substitutions can be made by those skilled in the art without departing from the technical principles of the present application, and such modifications and substitutions should also be considered to be within the scope of the present application. Therefore, the protection scope of the patent of the application is subject to the protection scope of the claims.

Claims

1. The intelligent robust navigation method of the unmanned carrier in the unknown complex environment is characterized by comprising the following steps:

Carrying out path planning on the state space information by utilizing a pre-constructed intelligent navigation network model to obtain optimal navigation information; the intelligent navigation network model comprises a lightweight intelligent filtering module and an autonomous path planning network module adopting a deep reinforcement learning SAC algorithm which are connected in sequence; the lightweight intelligent filtering module comprises a Kalman filtering prediction module, a lightweight cyclic neural network module and a Kalman filtering updating module which are sequentially connected;

the step of processing the state space information by utilizing the pre-constructed intelligent navigation network model to obtain optimal navigation information comprises the following steps:

2. The unmanned carrier intelligent robust navigation method in an unknown complex environment of claim 1, wherein: the lightweight cyclic neural network module comprises an input layer, a double-gate cyclic unit network unit, a first feedforward neural network unit and an output layer which are sequentially connected; the first feedforward neural network unit is also connected with a second feedforward neural network unit; the dual-gating circulation unit network unit comprises a state-gating circulation unit network unit and a measurement-gating circulation unit network unit which are connected in parallel.

3. The method for intelligent robust navigation of unmanned carrier in unknown complex environment according to claim 2, wherein the step of filtering the state space information by using a lightweight intelligent filtering module to obtain the optimal state posterior estimated value at the current moment comprises:

4. The unmanned carrier intelligent robust navigation method under an unknown complex environment of claim 3, wherein the step of learning the measurement information and the state estimation information by using the lightweight recurrent neural network module to obtain an optimal kalman gain comprises:

5. The intelligent robust navigation method of unmanned carrier in unknown complex environment as claimed in claim 3, wherein the calculation formula of the optimal state posterior estimation value is:

Wherein,

In the method, in the process of the invention,The optimal state posterior estimation value of the current time t is represented; /(I)Representing a state prior first moment at the current moment t; /(I)Representing the optimal Kalman gain at the current time t; /(I)Representing measurement information; /(I)Representing a transpose of the jacobian matrix of measurements; /(I)Representing a state prior second moment at the current time t; /(I)Representing a measured prior second moment at the current moment; /(I)Representing metrology vector information; /(I)Representing the measured a priori first moment at the current time t.

6. The intelligent robust navigation method of unmanned carrier in unknown complex environment according to claim 1, wherein the step of obtaining optimal navigation information by performing path planning on the optimal state posterior estimate at the current time according to an autonomous path planning network module adopting a deep reinforcement learning SAC algorithm comprises:

7. An unmanned carrier intelligent robust navigation system in an unknown complex environment, the system comprising:

The intelligent navigation module is used for planning a path of the state space information by utilizing a pre-constructed intelligent navigation network model to obtain optimal navigation information; the intelligent navigation network model comprises a lightweight intelligent filtering module and an autonomous path planning network module adopting a deep reinforcement learning SAC algorithm which are connected in sequence; the lightweight intelligent filtering module comprises a Kalman filtering prediction module, a lightweight cyclic neural network module and a Kalman filtering updating module which are sequentially connected;

the method for processing the state space information by utilizing the pre-constructed intelligent navigation network model to obtain optimal navigation information specifically comprises the following steps:

8. A computer device, characterized by: comprising a processor and a memory, the processor being connected to the memory, the memory being for storing a computer program, the processor being for executing the computer program stored in the memory to cause the computer device to perform the method of any one of claims 1 to 6.

9. A computer-readable storage medium, characterized by: the computer readable storage medium has stored therein a computer program which, when run, implements the method of any of claims 1 to 6.