CN113134187A - Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning - Google Patents

Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning Download PDF

Info

Publication number
CN113134187A
CN113134187A CN202110419574.2A CN202110419574A CN113134187A CN 113134187 A CN113134187 A CN 113134187A CN 202110419574 A CN202110419574 A CN 202110419574A CN 113134187 A CN113134187 A CN 113134187A
Authority
CN
China
Prior art keywords
robot
fire
inspection
function
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110419574.2A
Other languages
Chinese (zh)
Other versions
CN113134187B (en
Inventor
陈刚
刘智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202110419574.2A priority Critical patent/CN113134187B/en
Publication of CN113134187A publication Critical patent/CN113134187A/en
Application granted granted Critical
Publication of CN113134187B publication Critical patent/CN113134187B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A62LIFE-SAVING; FIRE-FIGHTING
    • A62CFIRE-FIGHTING
    • A62C27/00Fire-fighting land vehicles
    • AHUMAN NECESSITIES
    • A62LIFE-SAVING; FIRE-FIGHTING
    • A62CFIRE-FIGHTING
    • A62C37/00Control of fire-fighting equipment
    • AHUMAN NECESSITIES
    • A62LIFE-SAVING; FIRE-FIGHTING
    • A62CFIRE-FIGHTING
    • A62C37/00Control of fire-fighting equipment
    • A62C37/50Testing or indicating devices for determining the state of readiness of the equipment

Landscapes

  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Manipulator (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention relates to a multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning, and belongs to the field of robots. The system comprises a hardware layer, an interaction layer, a sensing layer and a control layer; and the hardware layer adopts a DSP as a controller, and sends data acquired by the odometer and the gyroscope into the DSP for processing, and calculates the position of the robot in the routing inspection map in real time. Sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire-fighting inspection robot adopts crawler-type drive; when the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion track of a target point to which the mechanical arm moves, discretizes the planned motion track and sends the discretized motion track to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point.

Description

Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning
Technical Field
The invention belongs to the field of robots, and relates to a multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning.
Background
The main structure of the current common fire-fighting inspection robot is as follows: wheel type driving is adopted in the aspect of driving; installing flame detectors and temperature sensors around the robot to facilitate the detection of fire; a camera is arranged in front of the robot so as to transmit the inspection picture to the control room through a wireless module; a fire-fighting nozzle with a fixed but rotatable chassis is also arranged above the robot and is used for externally connecting a water pipe or a small water pump to extinguish a fire point; in the aspect of robot control, along with the development of multi-machine cooperation thought and theory, in order to complete the inspection of a large area, meanwhile, in order to improve the inspection efficiency and reduce the inspection difficulty, a plurality of fire inspection robots are usually matched with each other to complete the operation, a centralized control mode is adopted on the cooperative control of the plurality of fire inspection robots, the inspection task allocation and the work scheduling of all robots are completed through a main control program, the specific inspection implementation mode is that a map constructed by using a laser radar and a planned inspection route are respectively led into the interior of each robot after area division, each robot automatically inspects a key area marked on the map according to the acquired planned route after being started, in addition, when some specific fire extinguishing or inspection operation needs to be finished remotely, the fire fighter can remotely operate the fire fighting or inspection operation through the remote controller.
However, the above system has many disadvantages, that is, the wheel-type driving makes the robot have poor passing performance in dealing with stairs and rugged road, and the flexibility of steering and rotating is not high enough; the accuracy and timeliness of flame detection by using the flame detector and the temperature sensor cannot be well guaranteed, and the range of flame detection is small; secondly, after the flame is detected, only the alarm function can be realized, and the position of a fire point and the image of the fire situation obtained by the camera are transmitted to a fire control room, a few fire inspection robots can also be matched with a fire nozzle carried by the robots to extinguish the fire point under the remote control of fire fighters, but the flexibility and the initiative in the fire situation responding aspect are lacked on the whole; finally, in the aspect of multi-robot cooperative control, each individual robot does not have the ability of selecting actions and coordinating with each other in a centralized control mode, so that the inspection efficiency, robustness and expandability of the whole system are poor, and the optimal time and energy of each robot in the inspection process cannot be guaranteed, so that the integral cruising ability and the disturbance resistance to the outside are reduced, and the autonomous control aspect and the intelligent degree are improved.
Disclosure of Invention
In view of this, the present invention provides a cooperative robot system for multiple fire inspection patrolling based on integral reinforcement learning.
In order to achieve the purpose, the invention provides the following technical scheme:
the multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning comprises a hardware layer, an interaction layer, a sensing layer and a control layer;
and the hardware layer adopts a DSP as a controller, and sends data acquired by the odometer and the gyroscope into the DSP for processing, and calculates the position of the robot in the routing inspection map in real time. Sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire-fighting inspection robot adopts crawler-type drive; when the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion track of a target point to which the mechanical arm moves, discretizes the planned motion track and sends the discretized motion track to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point.
1. Track drive system
The track is in two sections, each section being driven by a separate servo motor. The front-section crawler belt is used for lifting a chassis of the robot to smoothly pass through when encountering a higher obstacle, and the height of the robot is adjusted by adjusting the front-section crawler belt, so that a larger operation radius is provided for the mechanical arm; the second half section of the crawler belt mainly plays a role in driving the robot, is coaxially driven by a servo motor, and can be used for decelerating and braking the crawler belt on one side when the robot turns. The rated voltage of the servo motor is 24V, the output power is 100W, and the speed information of x and y axes issued by the upper PC is converted into the rotating speed of the servo motor after being coded by the DSP so as to realize steering and driving.
2. Mechanical arm servo control
A four-axis mechanical arm is arranged above the robot, a rotatable claw-shaped clamping device is arranged at the front section of the mechanical arm, and a fire extinguishing device is arranged on the clamping device; after the fire extinguishing device is additionally arranged, the fire extinguishing device is matched with the mechanical arm to realize accurate extinguishing of a fire point; the four-axis mechanical arm is driven by four servo motors to move each axis, and the motion information of each axis is calculated by a moveit | in an upper computer ros system! And generating after path planning.
Firstly, the calibration of the mechanical arm under the condition that eyes are out of hand is completed
And the coordinate transformation of the target point under the world coordinate system to the coordinate relative to the coordinate system of the mechanical arm is finished through a calibration mode of 'eyes are out of hands'. For the calibration mode of 'eyes are out of hand', a transformation matrix Tgc from a manipulator base coordinate system Tg to a camera coordinate system Tc is constant, a transformation matrix Tbe from a calibration plate coordinate system Tb to a manipulator tail end coordinate system Te is constant, and the relation of coordinate transformation satisfies the following formula:
for the ith time: tbci=Tbe*Tegi*Tgc (1-1)
Time i + 1: tbci+1=Tbe*Tegi+1*Tgc (1-2)
Finishing to obtain: (Teg)i)-1*(Teg)i+1=Tgc*(Tbc)i -1*(Tbc)i+1*Tgc-1 (1-3)
Then A ═ Tegi)-1*(Teg)i+1Is the motion relationship of the object relative to the robot arm tip coordinate system Te.
② utilize moveit! Completing the planning of the motion trail of the mechanical arm
Using Moveit! And combining the independent functional components for controlling the mechanical arm, and then providing the combined functional components for a user to use in an action and service communication mode in the ROS. In moveit! In the method, a URDF model is created according to the real size and the number of axes of the mechanical arm, and after the URDF model is input, the model is used for calculating the weight of the mechanical arm according to the weight of the mechanical arm! The setup authority generates a corresponding configuration file according to the setting of the setup authority, and the configuration file comprises a collision matrix of the mechanical arm so as to avoid collision between the axes caused by the planned track, connection information of each joint, a defined initial position and the like. And then adding a control plug-in controller of the mechanical arm, wherein the control controller comprises a defined follow _ join _ project node and names of all axes, finally writing a program to realize that the PC is connected with the mechanical arm in a socket communication mode, and observing a real-time motion track of the mechanical arm in rviz by subscribing to a join _ state topic. The method comprises the steps of firstly, completing flame identification and detection through a fast convolution neural network, obtaining three-dimensional coordinates of an ignition point relative to a robot through point cloud data of a depth camera after successful identification, then obtaining the position of the tail end of a mechanical arm to be reached through TF coordinate change, and then completing the solution of a track through an algorithm integrated in the interior. The solved track information is composed of a large number of discrete points, and the track information includes angular velocity and angular acceleration of each axis to reach the points. When the solved points are enough, a very smooth motion track is fitted, and after the information of the discrete points is published and subscribed through topics, the mechanical arm moves to the target point smoothly according to the planned points.
The sensing layer is used for a laser radar for establishing a picture, an infrared sensor for avoiding obstacles, a flame detector for detecting flame, a temperature sensor, a realsense D435i depth camera, an odometer and a gyroscope.
Infrared sensor obstacle avoidance
The infrared sensor is used for detecting obstacles encountered by the inspection robot in the inspection process in real time, when the obstacle exists at the front side, the infrared sensor detects Euclidean distances between the robot and the obstacles, and the specific coordinates of the obstacle are calculated by the distances and odometer and gyroscope data obtained from the DSP. And after the coordinates are obtained, immediately designing an obstacle avoidance path by a control algorithm, wherein the obstacle avoidance path is arc-shaped and is required to keep a minimum distance from the obstacle in the whole process, and after the obstacle avoidance is finished, immediately returning to the previously planned optimal routing inspection path.
② flame identification based on fast convolution neural network
The flame characteristics are extracted and detected by adopting a fast convolutional neural network Faster R-CNN, and the method comprises the following steps:
② 1: inputting a shot flame picture;
2.: sending the picture into a Convolutional Neural Network (CNN) for feature extraction;
② -3: after feature extraction, performing feature mapping, wherein the feature mapping jointly acts on a subsequent full-connection layer and a subsequent region to generate a network RPN;
② 3.1: the feature mapping is entered into RPN, firstly, a series of region candidate suggestion frames are passed through, and then the suggestion frames are fed into two 1 × 1 convolution layers respectively, wherein the first convolution layer is used for region classification, namely, the intersection ratio IOU value of the suggestion frames is generated by calculation to distinguish positive and negative samples; and the other is subjected to non-maximization inhibition to generate a more accurate target detection frame due to the boundary box regression judgment.
② 3.2: the feature map is entered into the ROI pooling layer for subsequent network computations.
② 4: and after the pooled feature maps pass through the full-connection layer, classifying the suggestion frames by utilizing softmax again, identifying whether the detection frames are objects or not, and performing boundary frame regression judgment on the suggestion frames again.
The specific method for generating the detection frame by the RPN is to slide on the input feature mapping through a sliding frame to generate 9 suggestion frames on each pixel point, and the size of the suggestion frames is 1282、2562、5122Aspect ratio of 1: 1. 1: 2. 2: 1, distinguishing positive and negative samples by using intersection ratio of the detection frames and IOU intersection ratio, wherein the IOU value of the positive sample is more than 0.7, the IOU value of the negative sample is less than 0.3, and the proportion of the positive sample to the negative sample is set as 1: 1. aiming at different characteristics of flame in an image, a method of guiding anchoring is adopted to accelerate the detection speed of RPN, and the improved sparse anchoring strategy is as follows:
Figure BDA0003027378310000041
wherein x and y are coordinates of pixel points, F (x and y) represents the generated flame color mask, if the coordinates are 1, the pixel points generate a suggestion box, if the coordinates are 0, the pixel points do not generate, and m isR(x,y)、mG(x,y)、mB(x, y) are the RGB channel values, T, of the image pixels, respectivelyRIs a threshold value set in advance.
In addition, the principle of correcting the detection frame by utilizing boundary regression judgment is that an original suggestion frame A is mapped G to obtain a regression suggestion frame F which is closer to the real condition. This mapping G is obtained by translation and scaling:
firstly, translation: fx=Aw·dx(A)+Ax (2-2)
Fy=Ah·dy(A)+Ay (2-3)
Rescaling: fw=Aw·exp(dw(A)) (2-4)
Fh=Ah·exp(dh(A)) (2-5)
Wherein x, y, w, h respectively represent the center coordinates of the proposed box, width, height, dx、dy、dw、dhRespectively, a transformation relationship, and when the difference between the original frame a and the real frame F is not large, the transformation is regarded as linear.
The output is the probability of being identified as a flame.
The interaction layer is as follows: in the inspection process, pictures captured by the camera in real time need to be sent to the control room and the mobile terminal through a wireless network, corresponding APP is developed in a matched mode, the inspection robot is controlled correspondingly at the remote terminal, and accordingly the inspection of an area needing to be inspected again by an operator is achieved. After the flame is detected, an alarm signal is sent to the control room immediately, and corresponding fire extinguishing measures can be automatically taken immediately. After the fire extinguishing measures are implemented, if the fire condition can not be restrained, the automatic mode is switched to the remote control mode, a professional in a control room takes over the control of the inspection robot comprehensively, the operation of the crawler belt and the action of the mechanical arm are controlled manually to realize the accurate extinguishing of a fire point, and whether the operations of cutting off a power supply, closing a gas valve and transferring inflammable matters need to be carried out or not is judged according to the fire condition. Each inspection machine can be connected with the whole fire fighting system in a grid mode, if the fire situation is still large after measures are taken, a request for taking over a fire fighting network is sent to a control room, the control room agrees or does not respond within one minute, a local spraying pipe network in a building is opened, meanwhile, a comprehensive fire fighting alarm is sent, and all fire fighting channels and emergency lighting facilities are opened. And an emergency stop button is arranged at the top end of the robot. And after the fire is extinguished, marking the ignition point on the patrol inspection map as a key patrol inspection area.
The control layer is as follows:
the whole fire-fighting inspection area is provided with N robots for cooperative inspection, and the N robots are respectively positioned at initial positions (x)i0,yi0) To reach respective destinations (x)iD,yiD) And i belongs to {1, 2.. eta., N }, and the position L of the ith fire-fighting inspection robot at the moment t is seti(t)=[Lix(t),Liy(t)]TVelocity Vi(t)=[Vix(t),Viy(t)]TController input Ui(t)=[uix(t),uiy(t)]TControl input and unknown environmental disturbance Wi(t)=[Wix(t),Wiy(t)]TIn order to avoid saturation of the actuator, the input is constrained, and | U (t) | is required to be less than or equal to λ, wherein λ is a normal number. Set the distance r between two inspection robotsij(t)=||Li(t)-Lj(t) | |, a safety distance r is set for avoiding collision of two inspection robotssAnd r is required to be satisfied at any time in the inspection processij(t)≥rsAnd when N robots reach the inspection destination, ensuring rij(t)>>rsIn this case, i ≠ j.
Then consider the second order linear dynamics model of the ith fire inspection robot as:
Figure BDA0003027378310000051
wherein the system matrix is A, the input matrix is B, the output matrix is C, the interference matrix is D,
Figure BDA0003027378310000052
is the state of the robot at the time t,
Figure BDA0003027378310000053
to input, yi(t) is the system unique output.
The global dynamics model is written as:
Figure BDA0003027378310000054
wherein
Figure BDA0003027378310000055
Is the product of Kronecker, X (t) ═ x1(t),x2(t),...,xn(t)]T,Y(t)=[y1(t),y2(t),...,yn(t)]T,INIs an N-order identity matrix, and is set to L (t) [ < L >1t,L2t,...,LNt]T,LD=[L1D,L2D,...,LND]T,U0=[U1,U2,...,UN]TThe positions at time t, the target point positions and the control inputs of the N robots are respectively.
In order to achieve optimal control of the N fire inspection robots under unknown disturbances with minimum time and energy in continuous time, continuous state and control input space and to avoid collisions in the whole process, the following cost functions are considered:
Figure BDA0003027378310000056
wherein ζ>And 0 is used for representing the specific gravity of time in the routing inspection process, and R is a positive definite matrix. In order to solve the path planning problem that the minimum arrival time T of the robot is unknown, a hyperbolic tangent function is introduced to rewrite a cost function into an infinite integral form so as to solve the problem, in addition, in order to avoid the saturation of an actuator, the input is also required to be constrained, so that the common U (T)TRU (t) Linear quadratic rewrite to non-quadratic Performance function φ (U (t)) for approximating minimum energy cost and capturing input constraints and for avoidingAn artificial potential field function is introduced when the two robots collide, and the cost function is approximately rewritten as:
V(X(t),U(t))=∫t ζtanh(L(t)-LD)T(L(t)-LD)+φ(U(t))+ΛR(t)TfR(rij(t))dt (4-4)
where ζ is a normal number, tanh is a hyperbolic tangent function, which is a monotonically increasing odd function and continuously differentiable, and the cost function is an IRL-solvable form. Rewriting ζ to ζ tanh (L (t) -L)D)T(L(t)-LD) When the current position L (t) of the robot is away from the target point LDTime ζ tanh (L (t) -L)D)T(L(t)-LD) Approximately ζ, ξ tanh (L (t) -L) upon arrival at the target pointD)T(L(t)-LD) The T integral of unknown time is converted to an infinite integral independent of the arrival time T to achieve an optimal solution to the value function.
Will U (t)TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t) is used to approximate the minimum energy cost and capture the input constraints:
Figure BDA0003027378310000061
wherein input constraint is | U (t) | is less than or equal to λ, λ and σ are both normal numbers, and R ═ diag (R)1,r2...rm)>0。
Adding an artificial potential field function f to avoid collision of any pair of inspection robotsR(rij(t)) the two robots are caused to emit repulsive potential fields to avoid each other, and in order to make V (x (t), U (t)) after the potential field function is added bounded, a weight matrix Λ is designedR(t) for canceling the non-zero tail. Will reject function fR(rij(t)) defines the form of a gaussian function, and this gaussian function is always greater than 0:
Figure BDA0003027378310000062
where the larger s the greater the steepness of the rejection function and the larger σ the larger the rejection range. To capture the repulsive distance rij(t) solving for s and σ in the rejection function, provided with:
fR(rs)=K0;fR(rs+Δ)=K1 (4-7)
wherein 0 < K1<K0Less than 1; Δ is a positive increment, and is substituted to obtain:
Figure BDA0003027378310000063
by a weight matrix ΛR(t)=[Λ12(t),Λ13(t),...,ΛN-1,N(t)]TSo that the value function after introducing the artificial potential field function is bounded and the weight matrix depends on the distance to the target point.
ΛR(t)=βtanh(||Li(t)-LiD||2+||Lj(t)-LjD||2) (4-9)
Lambda when robot principle targets pointsR(t) ═ β, Λ when the robot reaches the target pointRAnd (t) is 0, beta is a collision coefficient, and the size of beta is determined by the importance of avoiding collision in the inspection process.
Solving the optimal control input by using the cost function in (4-4), and carrying out derivation on t on two sides of the formula (4-4) and writing the Bellman equation as follows:
V(x(t)),U(t))=-ζtanh(L(t)-LD)T(L(t)-LD)-φ(U(t))-ΛR(t)TfR(rij(t)) (4-10)
let Fζ(t)=ζtanh(L(t)-LD)T(L(t)-LD) Defining the optimum function as:
V*(X(t),U(t))=min∫t Fζ(t)+φ(U(t))+ΛR(t)TfR(rij(t))dt (4-11)
defining the HJB equation according to equation (4-10) as:
Figure BDA0003027378310000071
wherein
Figure BDA0003027378310000072
Under stable conditions have
Figure BDA0003027378310000073
(4-12) deriving U on both sides of the formula:
Figure BDA0003027378310000074
obtaining the optimal control input u after term shifting*Comprises the following steps:
Figure BDA0003027378310000075
substituting (4-14) into (4-5) to obtain:
Figure BDA0003027378310000076
wherein l is a column vector of all one, and substituting (4-14) into (4-15) to obtain:
Figure BDA0003027378310000077
wherein
Figure BDA0003027378310000078
Substituting (4-16) into (4-12) to obtain:
Figure BDA0003027378310000079
and solving the HJB equation by utilizing a strategy iterative algorithm based on the integral reinforcement learning, wherein the integral reinforcement learning uses signals in (T, T + T) for learning, and a specific dynamic model of the system is not required to be known.
Firstly, the value function is rewritten into the form of integral difference value, and the following Bellman equation is obtained:
Figure BDA00030273783100000710
in order to solve (4-18) online in real time, an operator-critic neural network algorithm is introduced to realize real-time updating in the strategy iteration process. The value function V (X) is first approximated by a critic neural network, since
Figure BDA00030273783100000711
Wherein the first term is quadratic form easy to obtain, only the second term is approximated, and
Figure BDA00030273783100000712
using neural network pair V0(X) approximating:
Figure BDA0003027378310000081
wherein wcIs the weight, psi, of the critical neural networkc(X) is a basis function, εc(X) is the approximation error;
differentiating X on two sides of (4-20) to obtain:
Figure BDA0003027378310000082
substituting (4-20) into (4-18) results in a new bellman equation:
Figure BDA0003027378310000083
wherein epsilone(t)=εc(X(t+T))-εc(X (t)) is the error of the Bellman equation,. DELTA.. psic(X(t)=ψc(X(t+T)-ψc(X(t)。
To determine wcThe (4-20) is rewritten as:
Figure BDA0003027378310000084
wherein
Figure BDA0003027378310000085
Is a V0(ii) an approximation of (X),
Figure BDA0003027378310000086
for ideal approximation coefficients, then (4-22) is:
Figure BDA0003027378310000087
order to
Figure BDA0003027378310000088
For Bellman tracking errors and constructing an objective function by making εe(t) minimizing to adjust the weighting coefficients of the critic neural network:
Figure BDA0003027378310000089
the two sides of the formula (4-25) are aligned
Figure BDA00030273783100000810
Derivation, obtained by the chain rule:
Figure BDA00030273783100000811
wherein beta isc>0 is the learning rate, and 0 is the learning rate,
Figure BDA00030273783100000812
Figure BDA00030273783100000813
is delta psicAn approximation of (d).
Will EeSubstituting into (4-26) to obtain weight coefficient of criticc neural network
Figure BDA00030273783100000814
Should be subject to:
Figure BDA00030273783100000815
the obtained ideal weight coefficient is substituted into (4-14) to obtain an optimal control strategy, however, the optimal strategy obtained through the value function of critic approximation cannot ensure the stability of a closed-loop system, and an actor neural network is introduced into an actuator to ensure convergence to an optimal solution and ensure the stability of the system:
Figure BDA00030273783100000816
Figure BDA00030273783100000817
is the optimal approximation coefficient of the actor neural network,
Figure BDA00030273783100000818
is determined by the following lyapunov function:
Figure BDA00030273783100000819
when w isaWhen the following formula is satisfied, the approximated strategy makes the system consistent and finally bounded, and the system is finally bounded by
Figure BDA00030273783100000820
To obtain U*(t)。
Figure BDA00030273783100000821
Wherein K1,K2In order to design a good normal number,
Figure BDA00030273783100000822
based on the expressions (4-19), (4-27), (4-28) and (4-30), the critic algorithm and the actor algorithm are respectively utilized to realize synchronous updating of the value function and the strategy function, and an online integral reinforcement learning algorithm based on strategy iteration is designed to solve the HJB equation so as to solve the optimal control input.
The algorithm is as follows: online IRL algorithm based on strategy iteration
Initialization: given a feasible actuator input
Figure BDA0003027378310000091
Step 1: policy evaluation, given initialization
Figure BDA0003027378310000092
Solving by
Figure BDA0003027378310000093
Figure BDA0003027378310000094
Step 2: improvement of the strategy is that
Figure BDA0003027378310000095
Substituting the following formula for updating
Figure BDA0003027378310000096
Figure BDA0003027378310000097
Step 3: order to
Figure BDA0003027378310000098
Go back to Step1 until
Figure BDA0003027378310000099
Converging to a minimum value.
The invention has the beneficial effects that:
1. the invention adopts a distributed control mode in the multi-fire-fighting inspection cooperative robot system, so that the autonomy, flexibility, reliability and response speed of each robot under the system are improved.
2. The four-axis mechanical arm is designed at the top of each fire inspection robot, the mechanical arm is matched with a specially-made fire extinguisher, so that an ignition point can be automatically and accurately put out after a fire is discovered, the mechanical arm can be remotely and manually controlled by a fireman to complete the operations of turning off a power switch, turning off a gas valve, removing combustible materials and the like, and the initiative and the operability after the fire is discovered are obviously improved.
3. The invention provides an improved fast convolutional neural network based on visual recognition to finish the recognition and detection of flame by matching with the picture acquired by a depth camera realsense D435i for more accurately recognizing flame and reducing the false alarm rate, and simultaneously introduces a method of guiding anchoring to improve the RPN detection speed in the fast convolutional neural network.
4. The approximation function designed in the controller algorithm can convert the finite integral of the minimum arrival time T unknown in the optimal path planning problem into an infinite integral form so as to be convenient for solving, and introduces a non-quadratic performance function for approximating the minimum energy cost and capturing input constraint.
5. The invention introduces an artificial potential field function to avoid collision among robots in the inspection process of the multi-fire inspection cooperative robot system, and designs a special weight coefficient matrix to offset the non-zero tail.
6. The invention uses the integral reinforcement learning algorithm in the multi-robot control algorithm to solve the problem that the matrix of the inspection robot system is unknown, and uses the critic and actor neural network algorithm to synchronously iterate and solve the Bellman equation on line in real time to obtain the optimal strategy, thereby obviously improving the inspection efficiency and the robustness of the multi-fire-fighting inspection robot system.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a bottom level diagram of hardware;
FIG. 2 is a schematic diagram of coordinate transformation;
FIG. 3 is a flow chart of motion trajectory generation;
FIG. 4 is a flow chart of obstacle avoidance for the fire inspection robot;
FIG. 5 is a fast convolutional neural network training process;
FIG. 6 is a fire inspection robot interaction structure;
FIG. 7 is an overall structure diagram of the fire inspection robot;
FIG. 8 is a schematic diagram of a multi-fire inspection cooperative robot system inspection;
FIG. 9 is a flowchart of the operation of the robotic arm to extinguish a fire;
fig. 10 is a flow chart of the work of the fire inspection robot.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
Aiming at each single fire inspection tour robot, the fire inspection tour robot is provided with a depth camera realsense D435i for finding the fire quickly and accurately on the basis of matching with a flame detector and a temperature sensor, the depth camera can realize the identification of the fire at a longer distance by extracting the characteristics of a scene, and the identification accuracy and the rapidity are improved compared with the sensors. Meanwhile, the depth camera transmits the inspection image to the main control room and the mobile terminal in real time, so that the inspection image can be observed by a controller conveniently, and control instructions sent by the control room and the mobile terminal can be received at any time. The inspection robot sends an alarm signal to the main control room immediately after finding a fire, but the alarm signal is far from enough, so that in order to improve the processing capacity of the inspection robot after finding the fire, a four-shaft mechanical arm is also assembled above the robot, and the front end of the mechanical arm is provided with a clamping jaw which is arranged to be beneficial to adding subsequent equipment; after the fire is discovered, under the remote control of fire fighters, the work of cutting off the power supply, removing a gas valve and combustible materials and the like can be completed through the mechanical arm under the necessary condition. In addition, a specially-made fire extinguishing device (such as a specially-made small fire extinguisher) can be arranged at the position of the clamping jaw of the mechanical arm to match with the mechanical arm to realize accurate extinguishing of an ignition point, so that the fire is prevented from spreading to the greatest extent, and greater economic loss is caused.
On the cooperative control of the multiple fire-fighting inspection robots, the multiple robots are required to complete the optimal online path planning of the minimum arrival time T unknown under the conditions of collision avoidance, constraint on input of an actuator and unknown external disturbance in the inspection process, in addition, the inspection efficiency, robustness and expandability of the whole system are ensured, and the robots cannot collide in the whole inspection process.
In order to meet the requirements, the software and hardware design scheme of the invention is as follows:
the novel multi-fire-fighting inspection cooperative robot system designed by the invention adopts the idea of layered design and respectively comprises a hardware layer, an interaction layer, a sensing layer and a control layer, wherein the first part to the third part introduce the specific software and hardware structure of each robot under the whole multi-fire-fighting inspection cooperative robot system, and the fourth part introduces the specific control algorithm realization of the multi-fire-fighting inspection cooperative robot system.
Hardware layer design of first part fire-fighting inspection robot
The hardware layer is used as a controller by the DSP, data collected by the odometer and the gyroscope are sent into the DSP to be processed, and the position of the robot in the routing inspection map can be calculated in real time. Sending a speed instruction to the DSP through the upper computer, and coding the obtained speed information by the DSP to control the operation of the servo motor; the fire inspection robot adopts a crawler-type drive, and aims to improve the passing capacity (such as stairs) and the steering flexibility of a complex road section of the fire inspection robot. When the mechanical arm needs to move, a ros system in an upper computer passes through a robot on a moveit! The platform plans a motion track of a target point to which the mechanical arm moves, discretizes the planned motion track and sends the discretized motion track to the DSP, and the DSP obtains the angular velocity and the acceleration of each shaft and then controls a servo motor of the mechanical arm to move so as to reach the target point.
The bottom design plan of the hardware layer is shown in fig. 1.
1. Track drive system
In order to adapt to various inspection environments and improve the flexibility and the trafficability characteristic in the inspection process, the inspection robot adopts crawler-type driving. The track structure is designed into two sections, and each section is driven by a separate servo motor. The front-section crawler belt is mainly used for lifting a chassis of the robot to smoothly pass through when encountering a higher obstacle, and can also be used for adjusting the height of the robot by adjusting the front-section crawler belt so as to provide a larger operating radius for the mechanical arm; the second half section of the crawler belt mainly plays a role in driving the robot, is coaxially driven by a servo motor, and can be used for decelerating and braking the crawler belt on one side when the robot turns. The rated voltage of the servo motor is 24V, the output power is 100W, and the speed information of x and y axes issued by the upper PC is converted into the rotating speed of the servo motor after being coded by the DSP so as to realize steering and driving.
2. Mechanical arm servo control
In order to improve the processing capacity of the inspection robot when finding a fire, a four-axis mechanical arm is arranged above the robot. A rotatable claw-shaped clamping device is installed at the front section of the mechanical arm, and a specially-made small fire extinguishing device (such as a fire extinguisher, a small water pump and the like) can be installed on the clamping device according to specific requirements. After the fire extinguishing device is additionally arranged, the fire extinguishing device can be matched with the mechanical arm to realize accurate extinguishing of a fire point; the fire extinguishing device is not additionally arranged, and whether a mechanical arm is manually controlled by a fire fighter to cut off a local power supply, close a gas valve, remove surrounding inflammable matters, close a fire door and the like is determined according to the fire degree when a fire is found, so that the fire spread is prevented to the maximum extent, and the economic loss is reduced. The four-axis mechanical arm is driven by four servo motors to move each axis, and the motion information of each axis is calculated by a moveit | in an upper computer ros system! And generating after path planning.
Firstly, the calibration of the mechanical arm under the condition that eyes are out of hand is completed
The coordinate transformation of the target point under the world coordinate system To the coordinate relative To the coordinate system of the mechanical arm is completed by a calibration form of Eye-To-Hand (Eye-Hand). For the calibration mode of 'eyes are out of hand', a transformation matrix Tgc from a manipulator base coordinate system Tg to a camera coordinate system Tc is constant, a transformation matrix Tbe from a calibration plate coordinate system Tb to a manipulator tail end coordinate system Te is constant, and the relation of coordinate transformation satisfies the following formula:
for the ith time: tbci+1=Tbe*Tegi+1*Tgc (1-1)
Time i + 1: tbci+1=Tbe*Tegi+1*Tgc (1-2)
Finishing to obtain: (Teg)i)-1*(Teg)i+1=Tgc*(Tbc)i -1*(Tbc)i+1*Tgc-1 (1-3)
Then A ═ Tegi)-1*(Teg)i+1Is the motion relationship of the object relative to the robot arm tip coordinate system Te.
A schematic diagram of the coordinate transformation is shown in fig. 2.
② utilize moveit! Completing the planning of the motion trail of the mechanical arm
The ros (robot Operating system) is an Operating system specially used for realizing the control of the robot system, and can be developed in the Linux environment, and is especially suitable for a complex and multi-node control system such as a robot due to its simple operation mode, powerful function and strong expandability. In the control of the mechanical arm, a special integrated tool is arranged in an ROS system for completing the motion trail planning of the mechanical arm, which is moveit! . Moveit! May be considered an "integrator" by which the individual functional components controlling the robotic arm may be combined and then made available to the user through action and service communication in the ROS. In moveit! In the method, a model (URDF model) which is in accordance with the real size and the number of axes of the mechanical arm is created, and after the model is input, the moveit! The setup authority generates a corresponding configuration file according to the setting of the setup authority, and the configuration file comprises a collision matrix of the mechanical arm so as to avoid collision between the axes caused by the planned track, connection information of each joint, a defined initial position and the like. And then adding a control plug-in (controller) of the mechanical arm, wherein the controller mainly comprises a follow _ join _ project node and names for setting various axes, and finally writing a program to realize that the PC is connected with the mechanical arm in a socket communication mode, and observing a real-time motion track of the mechanical arm in rviz by subscribing to a join _ state topic. The method comprises the steps of firstly completing flame identification and detection through a fast convolution neural network, obtaining a three-dimensional coordinate of an ignition point relative to a robot through point cloud data of a depth camera after successful identification, obtaining a position where the tail end of a mechanical arm needs to reach through TF coordinate change, and then immediately completing track solving through an internally integrated algorithm (usually adopting cubic spline interpolation). The solved track information is composed of a large number of discrete points whose information includes the angular velocity and angular acceleration of each axis to which the point is to be reached. When the solved points are enough, a smooth motion track can be fitted, and the mechanical arm can smoothly move to a target point according to the planned points after the information of the points is published and subscribed through topics. Moveit! A flow chart for generating a motion trajectory is shown in fig. 3.
Second part fire-fighting inspection robot sensing layer design
The sensing layer design of the fire-fighting inspection robot mainly comprises a laser radar for drawing construction, an infrared sensor for obstacle avoidance, a flame detector for detecting flame, a temperature sensor, a realsense D435i depth camera, an odometer, a gyroscope and the like.
Infrared sensor obstacle avoidance
The infrared sensor is used for detecting the obstacles encountered by the inspection robot in the inspection process in real time, when the obstacle exists at the front side, the infrared sensor can detect the Euclidean distance between the robot and the obstacle, and the specific coordinates of the obstacle can be calculated by the distance and the odometer and gyroscope data obtained from the DSP. After the coordinates are obtained, an obstacle avoidance path can be designed immediately through a control algorithm, the obstacle avoidance path is usually arc-shaped, a minimum distance between the obstacle avoidance path and an obstacle is required to be kept in the whole process, and after obstacle avoidance is finished, the previously planned optimal routing inspection path needs to be returned immediately. The obstacle avoidance flow chart is shown in fig. 4.
② flame identification based on fast convolution neural network
In the inspection process, the detection of the flame is particularly critical, and along with the rapid development of computer technology, the flame is detected more rapidly and accurately by using vision than a fixed flame detector. However, since there are many objects similar to the flame in color in the inspection scene and the shape and texture of the flame are various, it is a difficult task to detect the position of the flame in the image. The invention adopts the fast convolutional neural network (Faster R-CNN) to extract and detect the flame characteristics, not only can accurately identify the flame, but also can accurately calculate the position where the flame is generated, and can reduce the false alarm rate of flame detection to the greatest extent.
The training steps of the fast convolutional neural network are as follows:
secondly-1, inputting the shot flame picture;
secondly, sending the picture into a Convolutional Neural Network (CNN) for feature extraction;
secondly-3, after feature extraction, feature maps (featuremaps) are used, and the feature maps jointly act on a subsequent full connection layer and an RPN (region generation network);
secondly-3.1 feature mapping into RPN, firstly, through a series of region candidate suggestion boxes, namely anchors (anchors), feeding the suggestion boxes into two 1 × 1 convolution layers respectively, wherein the first convolution layer is used for region classification, namely, positive and negative samples are distinguished by calculating IOU (intersection ratio) values for generating the suggestion boxes; and the other is subjected to non-maximization inhibition to generate a more accurate target detection frame due to the boundary box regression judgment.
And the-3.2 characteristic mapping enters the ROI pooling layer for subsequent network calculation.
And 4, after the pooled feature maps pass through a full connection layer, classifying the suggestion boxes by utilizing softmax again, namely identifying whether the detection boxes are objects or not, and performing boundary box regression judgment on the suggestion boxes again in order to further improve the accuracy of the target detection boxes.
A schematic diagram of the training process is shown in fig. 5.
The above steps using the RPN to generate the detection box (anchors) is the biggest advantage of FasterR-CNN compared with the traditional detection algorithm. The specific method for generating the detection frame by the RPN is to slide on the input feature mapping through a sliding frame to generate 9 suggestion frames on each pixel point, and the size of the suggestion frames can be 1282、2562、5122Aspect ratio of 1: 1. 1: 2. 2: 1, and distinguishing positive and negative samples by using an intersection ratio (IOU) of the detection boxes, wherein the IOU value of the positive sample is more than 0.7, the IOU value of the positive sample is less than the negative sample of 0.3, and the proportion of the positive sample to the negative sample is set as 1: 1. however, the number of the suggested frames drawn by such a method is still large, so that the method of the invention can adopt a guided anchoring method to accelerate the detection speed of the RPN according to different characteristics of flames in the image, and the improved sparse anchoring strategy is as follows:
Figure BDA0003027378310000141
wherein x and y are coordinates of pixel points, F (x and y) represents the generated flame color mask, if the coordinates are 1, the pixel points generate a suggestion box, if the coordinates are 0, the pixel points do not generate, and m isR(x,y)、mG(x,y)、mB(x, y) are the RGB channel values, T, of the image pixels, respectivelyRIs a threshold value set in advance.
In addition, the principle of correcting the detection frame by using boundary regression judgment (bounding box regression) is to map G the original suggestion frame A to obtain a regression suggestion frame F closer to the real situation. This mapping G can be obtained by translation and scaling:
firstly, translation: fx=Aw.dx(A)+Ax (2-2)
Fy=Ah.dy(A)+Ay (2-3)
Rescaling: fw=Aw.exp(dw(A)) (2-4)
Fh=Ah.exp*dh(A)) (2-5)
Wherein x, y, w, h respectively represent the center coordinates of the proposed box, width, height, dx、dy、dw、dhFor the transformation relations we are looking for, respectively, when the original frame a and the real frame F are not very different, the transformation can be generally considered to be linear.
The output is the probability of being identified as a flame.
Interaction layer design of third part fire-fighting inspection robot
In the inspection process, pictures captured by the camera in real time need to be sent to the control room and the mobile terminal through a wireless network, corresponding APPs are developed in a matched mode, inspection pictures and alarm signals can be received at terminals such as a PC (personal computer), a web, a mobile phone and a pad at any time and any place, and the inspection robot can be controlled correspondingly at a remote terminal, so that the inspection of an area needing to be inspected again by an operator is realized. After the flame is detected, an alarm signal should be sent to the control room immediately and corresponding fire extinguishing measures should be automatically taken immediately. After the fire extinguishing measures are implemented, if the fire condition is still not restrained, the automatic mode can be immediately switched to the remote control mode, a professional in a control room comprehensively takes over the control of the inspection robot, the operation of the crawler and the action of the mechanical arm are manually controlled to accurately extinguish a fire point, and whether the operations of cutting off a power supply, closing a gas valve, transferring inflammable matters and the like are needed or not is judged according to the fire condition. In addition, each inspection machine can be connected with the whole fire fighting system in a grid mode, if the fire condition is still large after measures are taken, a request for taking over a fire fighting network can be sent to the control room, the control room agrees or the fire fighting control room does not respond within one minute, a local spraying pipe network in the building can be opened, meanwhile, a comprehensive fire fighting alarm is sent, all fire fighting channels and emergency lighting facilities are opened, and therefore property loss and casualties are reduced to the maximum degree, and precious time is won for rescue. Meanwhile, in order to avoid sudden failures of the inspection robot in the inspection process, an emergency stop button is installed at the top end of the robot, and surrounding personnel are prevented from being injured. After the fire is extinguished, the ignition point is marked as a key inspection area on the inspection map so as to facilitate later inspection. The schematic diagram of the interactive structure of the fire inspection robot is shown in fig. 6.
Fourth part multi-fire-fighting inspection cooperative robot system control algorithm
Because the common fire-fighting inspection task is cooperatively completed by a plurality of robots, and the optimal path planning under the minimum arrival time in the inspection process is required to be realized in the whole multi-robot control process, the inspection range can be comprehensively covered, and the endurance time of the multi-robot inspection system can be ensured; and the interference usually existing to the inspection environment in the inspection process is unknown. In addition, in order to avoid saturation of the actuator, the input of the actuator is generally required to be constrained; meanwhile, for the sake of safety, the robots cannot collide with each other in the whole inspection process. Aiming at the control requirements of the multi-fire-fighting inspection cooperative robot system, a minimum arrival time T, unknown disturbance to the outside, unknown model of a system part and constraint on input are required to be designed, the robots are required to avoid collision, and in addition, accurate external information is difficult to be adopted under the actual condition, so that offline solution is changed into online solution, and the optimal controller based on integral reinforcement learning and an AC neural network algorithm is designed.
The whole fire-fighting inspection area is provided with N robots for cooperative inspection, and the N robots are respectively positioned at initial positions (x)i0,yi0) To reach respective destinations (x)iD,yiD) And i belongs to {1, 2.. eta., N }, and the position L of the ith fire-fighting inspection robot at the moment t is seti(t)=[Lix(t),Liy(t)]TVelocity Vi(t)=[Vix(t),Viy(t)]TController input Ui(t)=[uix(t),uiy(t)]TControl input and unknown environmental disturbance Wi(t)=[Wix(t),Wiy(t)]TMeanwhile, in order to avoid saturation of the actuator, input is constrained, and | U (t) | is required to be less than or equal to lambda, wherein lambda is a normal number. Set the distance r between two inspection robotsij(t)=||Li(t)-Lj(t) | |, a safety distance r is set for avoiding collision of two inspection robotssAnd r is required to be satisfied at any time in the inspection processij(t)≥rsAnd we assume that r is guaranteed after N robots reach the inspection destinationij(t)>>rsIn this case, i ≠ j.
Then consider the second order linear dynamics model of the ith fire inspection robot as:
Figure BDA0003027378310000161
wherein the system matrix is A, the input matrix is B, the output matrix is C, the interference matrix is D,
Figure BDA0003027378310000162
is the state of the robot at the time t,
Figure BDA0003027378310000163
to input, yi(t) is the system unique output.
The global dynamics model is written as:
Figure BDA0003027378310000164
wherein
Figure BDA0003027378310000165
Is the product of Kronecker, X (t) ═ x1(t),x2(t),...,xn(t)]T,Y(t)=[y1(t),y2(t),...,yn(t)]T,INIs an N-order identity matrix, and is set to L (t) [ < L >1t,L2t,...,LNt]T,LD=[L1D,L2D,...,LND]T,U0=[U1,U2,...,UN]TThe positions at time t, the target point positions and the control inputs of the N robots are respectively.
In order for the N fire inspection robots to achieve optimal control of minimum time and energy in continuous time, continuous state and control input space under unknown disturbances and to avoid collisions throughout the process, the following cost functions are considered:
Figure BDA0003027378310000166
wherein ζ>And 0 is used for representing the specific gravity of time in the routing inspection process, and R is a positive definite matrix. In order to solve the path planning problem that the minimum arrival time T of the robot is unknown, a hyperbolic tangent function is introduced to rewrite a cost function into an infinite integral form so as to solve the problem, in addition, in order to avoid the saturation of an actuator, the input is also required to be constrained, so that the common U (T)TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t)) is used to approximate the minimum energy cost and capture the input constraints, and to introduce an artificial potential field function in order to avoid collisions between two robots, the cost function is approximately rewritten as:
V(X(t),U(t))=∫t ζtanh(L(t)-LD)T(L(t)-LD)+φ(U(t))+ΛR(t)TfR(rij(t))dt (4-4)
where ζ is a normal number and tanh is a hyperbolic tangent function, which is an odd function that monotonically increases and is continuously differentiable, so the cost function after overwriting remains in an IRL-resolvable form. Rewriting ζ to ζ tanh (L (t) -L)D)T(L(t)-LD) When the current position L (t) of the robot is away from the target point LDTime ζ tanh (L (t) -L)D)T(L(t)-LD) Approximately ζ, ξ tanh (L (t) -L) upon arrival at the target pointD)T(L(t)-LD) This translates the T integral at unknown time to an infinite integral independent of the time of arrival T, to achieve an optimal solution to the value function.
And because the robot system usually has constraints on input, the common U (t)TRu (t) linear quadratic rewrite to non-quadratic performance function phi (u (t) is used to approximate the minimum energy cost and capture the input constraints:
Figure BDA0003027378310000171
wherein input constraint is | U (t) | is less than or equal to λ, λ and σ are both normal numbers, and R ═ diag (R)1,r2...rm)>0。
In order to avoid collision of any pair of inspection robots, an artificial potential field function f is addedR(rij(t)) makes the two robots send out repulsive potential field to avoid each other, and a special weight matrix Lambda is designed for making V (x (t), U (t)) after adding the potential field function to be boundedR(t) for canceling the non-zero tail. Will reject function fR(rij(t)) defines the form of a gaussian function, and this gaussian function is always greater than 0:
Figure BDA0003027378310000172
where the larger s the greater the steepness of the rejection function and the larger σ the larger the rejection range. To capture the repulsive distance rij(t), solving for s and σ in the rejection function, assuming:
fR(rs)=K0;fR(rs+Δ)=K1 (4-7)
wherein 0 < K1<K0Less than 1; Δ is a positive increment. Substituting the above formula into one can obtain:
Figure BDA0003027378310000173
by a weight matrix ΛR(t)=[Λ12(t),Λ13(t),...,ΛN-1,N(t)]TSo that the value function after introducing the artificial potential field function is bounded and the weight matrix depends on the distance to the target point.
ΛR(t)=βtanh(||Li(t)-LiD||2+||Lj(t)-LjD||2) (4-9)
It can be seen that Λ is the robot principle target pointR(t) ═ β, Λ when the robot reaches the target pointRSince (t) is 0, β is a collision coefficient, and the magnitude of β is determined by the importance of avoiding collision in the polling process.
Following the solution of the optimal control input using the cost function in (4-4), it is clear that V can be minimized, and t is derived on both sides of the (4-4) equation, so the Bellman equation can be written as:
V(x(t),U(t))=-ζtanh(L(t)-LD)T(L(t)-LD)-φ(U(t))-ΛR(t)TfR(rij(t)) (4-10)
let Fζ(t)=ζtanh(L(t)-LD)T(L(t)-LD) Defining the optimum function as:
V*(X)t),U(t))=min∫t Fζ(t)+φ(U(t))+ΛR(t)TfR(rij(t))dt (4-11)
defining the HJB equation according to equation (4-10) as:
Figure BDA0003027378310000181
wherein
Figure BDA0003027378310000182
Under stable conditions have
Figure BDA0003027378310000183
(4-12) deriving U on both sides simultaneously:
Figure BDA0003027378310000184
the optimal control input u can be obtained after term shifting*Comprises the following steps:
Figure BDA0003027378310000185
substituting (4-14) into (4-5) to obtain:
Figure BDA0003027378310000186
wherein l is a column vector of all one, and substituting (4-14) into (4-15) to obtain:
Figure BDA0003027378310000187
wherein
Figure BDA0003027378310000188
Substituting (4-16) into (4-12) to obtain:
Figure BDA0003027378310000189
however, in practical situations, the HJB equation is difficult to solve directly, and because the system model part is unknown, the HJB equation is obtained
Figure BDA00030273783100001810
The method can not be directly solved, so that the HJB equation can be solved by utilizing a strategy iterative algorithm based on the integral reinforcement learning, the integral reinforcement learning uses signals in (T, T + T) for learning, and the specific system does not need to be knownThe kinetic model of (2).
Firstly, the value function is rewritten into the form of integral difference value, and the following Bellman equation can be obtained:
Figure BDA00030273783100001811
in order to solve (4-18) online in real time, an operator-critic neural network algorithm is introduced to realize real-time update in the strategy iteration process. The value function V (X) is first approximated by a critic neural network, since
Figure BDA00030273783100001812
The first term is quadratic form which is easy to obtain, so that only the second term needs to be approximated and set
Figure BDA00030273783100001813
Using neural network pair V0(X) approximating:
Figure BDA00030273783100001814
wherein wcIs the weight, psi, of the critical neural networkc(X) is a basis function, εc(X) is the approximation error;
differentiating X on two sides of (4-20) to obtain:
Figure BDA0003027378310000191
substituting (4-20) into (4-18) can obtain a new bellman equation:
Figure BDA0003027378310000192
wherein epsilone(t)=εc(X(t+T))-εc(X (t)) is the error of the Bellman equation,. DELTA.. psic(X(t)=ψc(X(t+T)-ψc(X(t)。
But due to the coefficient w of the critic neural networkcIs unknown and therefore (4-18) cannot be solved directly, in order to determine wcTherefore we directly rewrite (4-20) to:
Figure BDA0003027378310000193
wherein
Figure BDA0003027378310000194
Is a V0(ii) an approximation of (X),
Figure BDA0003027378310000195
for ideal approximation coefficients, then (4-22) is:
Figure BDA0003027378310000196
order to
Figure BDA0003027378310000197
For Bellman tracking errors and constructing an objective function by making εe(t) minimizing to adjust the weighting coefficients of the critic neural network:
Figure BDA0003027378310000198
the two sides of the formula (4-25) are aligned
Figure BDA0003027378310000199
Derivation, which can be obtained from the chain rule:
Figure BDA00030273783100001910
wherein beta isc>0 is the learning rate, and 0 is the learning rate,
Figure BDA00030273783100001911
Figure BDA00030273783100001912
is delta psicAn approximation of (d).
Will EeSubstituting into (4-26) to obtain the weight coefficient of critic neural network
Figure BDA00030273783100001913
Should be subject to:
Figure BDA00030273783100001914
the obtained ideal weight coefficient is substituted into (4-14) to obtain an optimal control strategy, however, the optimal strategy obtained through the value function of critic approximation cannot ensure the stability of a closed-loop system, so an actor neural network is introduced into the actuator to ensure convergence to an optimal solution and ensure the stability of the system:
Figure BDA00030273783100001915
Figure BDA00030273783100001916
is the optimal approximation coefficient of the actor neural network,
Figure BDA00030273783100001917
is determined by the following lyapunov function:
Figure BDA00030273783100001918
it can be shown that when waWhen the following equation is satisfied, the approximated strategy may be such that the system is consistent and ultimately bounded,at this time can pass
Figure BDA00030273783100001919
To obtain U*(t)。
Figure BDA00030273783100001920
Wherein K1、K2In order to design a good normal number,
Figure BDA00030273783100001921
based on the expressions (4-19), (4-27), (4-28) and (4-30), the critic algorithm and the actor algorithm are respectively utilized to realize synchronous updating of the value function and the strategy function, and an online integral reinforcement learning algorithm based on strategy iteration is designed to solve the HJB equation so as to solve the optimal control input.
The algorithm is as follows: online IRL algorithm based on strategy iteration
Initialization: given a feasible actuator input
Figure BDA0003027378310000201
Step 1: policy evaluation, given initialization
Figure BDA0003027378310000202
Solving by
Figure BDA0003027378310000203
Figure BDA0003027378310000204
Step 2: improvement of the strategy is that
Figure BDA0003027378310000205
Substituting the following formula for updating
Figure BDA0003027378310000206
Figure BDA0003027378310000207
Step 3: order to
Figure BDA0003027378310000208
Go back to Step1 until
Figure BDA0003027378310000209
Converging to a minimum value.
The overall structure of the fire-fighting inspection robot is shown in fig. 7.
The inspection schematic diagram of the multi-fire inspection cooperative robot system is shown in fig. 8.
The whole square frame is an area to be inspected, the dotted line is an area dividing line, the light color pentagram represents a key inspection area, the dark color pentagram represents a fire finding point, and the bidirectional arrow represents that information interaction exists between the robots.
The workflow diagram for operating the robot to extinguish a fire is shown in fig. 9.
The workflow diagram of the fire inspection robot is shown in fig. 10.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (5)

1.基于积分强化学习的多消防巡检协作机器人系统,其特征在于:包括依次连接的硬件层、交互层、感知层和控制层;1. A multi-fire inspection collaborative robot system based on integral reinforcement learning, characterized in that: it comprises a hardware layer, an interaction layer, a perception layer and a control layer that are connected in sequence; 所述硬件层采用DSP作为控制器,将里程计和陀螺仪采集到的数据送入DSP内部进行处理,实时计算出机器人在巡检地图中的位置;通过上位机向DSP发送速度指令,DSP将获取到速度信息编码后以控制伺服电机的运转;消防巡检机器人采用的是履带式驱动;当机械臂需要动作时,由上位机中的ros系统通过在moveit!平台对机械臂将要移动到的目标点进行运动轨迹规划,将规划好的运动轨迹离散化后发送到DSP中,DSP获得的各个轴的角速度、加速度后控制机械臂的伺服电机运动以到达目标点;The hardware layer uses DSP as the controller, sends the data collected by the odometer and gyroscope into the DSP for processing, and calculates the position of the robot in the inspection map in real time; sends the speed command to the DSP through the host computer, and the DSP will After obtaining the speed information code, it controls the operation of the servo motor; the fire inspection robot adopts a crawler drive; when the mechanical arm needs to move, the ros system in the upper computer passes through the moveit! The platform plans the motion trajectory of the target point that the robotic arm will move to, and then discretizes the planned motion trajectory and sends it to the DSP. The angular velocity and acceleration of each axis obtained by the DSP control the motion of the servo motor of the robotic arm to reach the target point. ; 所述感知层用于建图的激光雷达、避障的红外线传感器、检测火焰的火焰探测器、温度传感器和realsenseD435i深度摄像头、里程计和陀螺仪。The perception layer is used for lidar for mapping, infrared sensor for obstacle avoidance, flame detector for flame detection, temperature sensor and realsenseD435i depth camera, odometer and gyroscope. 2.根据权利要求1所述的基于积分强化学习的多消防巡检协作机器人系统,其特征在于:所述硬件层包括履带驱动系统和机械臂伺服控制;2. The multi-fire-fighting inspection collaborative robot system based on integral reinforcement learning according to claim 1, wherein the hardware layer comprises a crawler drive system and a robotic arm servo control; 1、履带驱动系统1. Crawler drive system 履带为两段,每段由单独的伺服电机驱动;前段履带用于在障碍物时将机器人的底盘抬起以便顺利通过,通过调整前段履带来调整机器人的高度,为机械臂提供更大的操作半径;后半段履带主要起机器人的驱动作用,由一个伺服电机同轴驱动,转向时将一侧的履带进行减速制动;伺服电机的额定电压为24V,输出功率为100W,上层PC发布的x,y轴的速度信息通过DSP编码后转化伺服电机的转速,以实现转向和驱动;The crawler is divided into two sections, each driven by a separate servo motor; the front crawler is used to lift the chassis of the robot in case of obstacles for smooth passage, and the height of the robot can be adjusted by adjusting the front crawler to provide greater operation for the robotic arm Radius; the second half of the crawler mainly drives the robot and is coaxially driven by a servo motor, which decelerates and brakes the crawler on one side when turning; the rated voltage of the servo motor is 24V and the output power is 100W, which is issued by the upper PC. The speed information of the x and y axes is encoded by DSP and converted into the speed of the servo motor to realize steering and driving; 2、机械臂伺服控制2. Robotic arm servo control 机器人上方设置四轴的机械臂,机械臂前段设置能够转动的爪状夹持装置,夹持装置上设置灭火装置;加装灭火装置后配合机械臂实现对着火点实现精准扑灭;四轴的机械臂由四个伺服电机驱动每个轴的运动,每个轴的运动信息由上位机ros系统中的moveit!进行路径规划后产生;A four-axis robotic arm is set above the robot, a claw-shaped gripping device that can be rotated is set at the front of the robotic arm, and a fire extinguishing device is set on the gripping device. The movement of each axis is driven by four servo motors, and the movement information of each axis is obtained by the moveit in the host computer ros system! Generated after path planning; ①完成“眼在手外”下对机械臂的标定①Complete the calibration of the robotic arm under "eyes outside the hands" 通过“眼在手外”的标定形式完成将目标点在世界坐标系下的坐标到相对于机械臂坐标系的坐标变换;对与“眼在手外”的标定方式,机械手基座坐标系Tg到相机坐标系Tc的变换矩阵Tgc是恒定的,标定板坐标系Tb到机械臂末端坐标系Te的变换矩阵Tbe是恒定的,坐标变换的关系满足下式:The coordinate transformation from the coordinates of the target point in the world coordinate system to the coordinate system relative to the robot arm is completed through the "eye outside the hand" calibration method; for the "eye outside the hand" calibration method, the robot base coordinate system Tg The transformation matrix Tgc to the camera coordinate system Tc is constant, the transformation matrix Tbe from the calibration board coordinate system Tb to the robot arm end coordinate system Te is constant, and the coordinate transformation relationship satisfies the following formula: 对第i个时刻:Tbci=Tbe*Tegi*Tgc (1-1)For the ith moment: Tbc i =Tbe*Teg i *Tgc (1-1) 第i+1个时刻:Tbci+1=Tbe*Tegi+1*Tgc (1-2)The i+1th moment: Tbc i+1 =Tbe*Teg i+1 *Tgc (1-2) 整理得:(Tegi)-1*(Teg)i+1=Tgc*(Tbc)i -1*(Tbc)i+1*Tgc-1 (1-3)Arranged: (Teg i ) -1 *(Teg) i+1 =Tgc*(Tbc) i -1 *(Tbc) i+1 *Tgc -1 (1-3) 则A=(Tegi)-1*(Teg)i+1就是物体相对于机械臂末端坐标系Te下的运动关系;Then A=(Teg i ) -1 *(Teg) i+1 is the motion relationship of the object relative to the coordinate system Te of the end of the manipulator; ②利用moveit!完成对机械臂的运动轨迹规划②Use moveit! Complete the motion trajectory planning of the robotic arm 利用Moveit!将控制机械臂的各个独立功能部件组合起来,然后通过ROS中的action和service通信方式供用户使用;在moveit!中,创建一个符合机械臂真实尺寸和轴数的模型URDF模型,输入模型之后,利用moveit!的setup assistant按照自己的设定生成相应的配置文件,内容包括机械臂的碰撞矩阵以避免规划出的轨迹使得各轴之间发生碰撞,各个关节的连接信息以及定义的初始位置等;然后再添加机械臂的控制插件controller,controller包括定义follow_joint_trajectory节点和设置各个轴的名字,最后再编写程序实现PC与机械臂通过socket通信方式连接,通过订阅joint_state话题在rviz中观察到机械臂的实时运动轨迹;先由快速卷积神经网络完成对火焰的识别检测,识别成功后通过深度摄像头的点云数据得到着火点相对于机器人的三维坐标,再通过TF坐标变化就能得知机械臂末端需要到达的位置,之后由内部集成好的算法完成对轨迹的求解;求解出来的轨迹信息是由大量离散的点构成的,轨迹信息包括要达到该点每个轴的角速度、角加速度;当求解出的点足够多时,拟合出一条十分光滑的运动轨迹,将这些离散的点的信息通过话题发布和订阅之后使得机械臂按照规划的点平滑地运动至目标点。Take advantage of Moveit! Combine the independent functional components that control the manipulator, and then use the action and service communication methods in ROS for users to use; in moveit! , create a model URDF model that conforms to the actual size and number of axes of the robotic arm. After entering the model, use moveit! The setup assistant generates a corresponding configuration file according to its own settings, including the collision matrix of the robotic arm to avoid collision between the axes due to the planned trajectory, the connection information of each joint and the defined initial position, etc.; then add The control plug-in controller of the robotic arm, the controller includes defining the follow_joint_trajectory node and setting the name of each axis, and finally writing a program to realize the connection between the PC and the robotic arm through socket communication, and observing the real-time motion trajectory of the robotic arm in rviz by subscribing to the joint_state topic; First, the flame recognition and detection is completed by the fast convolutional neural network. After the recognition is successful, the point cloud data of the depth camera is used to obtain the three-dimensional coordinates of the ignition point relative to the robot, and then the position that the end of the robot arm needs to reach can be known through the change of the TF coordinates. Afterwards, the internally integrated algorithm completes the solution of the trajectory; the solved trajectory information is composed of a large number of discrete points, and the trajectory information includes the angular velocity and angular acceleration of each axis to reach the point; when there are enough points solved , a very smooth motion trajectory is fitted, and the information of these discrete points is published and subscribed through the topic to make the robotic arm move smoothly to the target point according to the planned point. 3.根据权利要求1所述的基于积分强化学习的多消防巡检协作机器人系统,其特征在于:所述感知层包括红外传感器避障和基于快速卷积神经网络的火焰识别;3. The multi-fire-fighting inspection collaborative robot system based on integral reinforcement learning according to claim 1, wherein the perception layer comprises infrared sensors for obstacle avoidance and flame recognition based on fast convolutional neural networks; ①红外传感器避障①Infrared sensor to avoid obstacles 利用红外传感器实时检测巡检机器人在巡检过程中遇到的障碍物,当前方有障碍物时,红外传感器检测出机器人与障碍物之间的欧几里得距离,将这些距离与DSP中获得的里程计和陀螺仪数据推算出障碍物的具体坐标;获取坐标后,立即由控制算法设计出避障路径,该避障路径是弧形的,并且在整个过程中要求保持与障碍物有一个最小距离,避障结束后,要立即回到先前规划好的最优巡检路径;The infrared sensor is used to detect the obstacles encountered by the inspection robot during the inspection process in real time. When there is an obstacle ahead, the infrared sensor detects the Euclidean distance between the robot and the obstacle, and compares these distances with the DSP obtained. The specific coordinates of the obstacle are calculated from the odometer and gyroscope data; after the coordinates are obtained, the obstacle avoidance path is designed by the control algorithm immediately. Minimum distance, after obstacle avoidance, return to the previously planned optimal inspection path immediately; ②基于快速卷积神经网络的火焰识别②Flame recognition based on fast convolutional neural network 采用快速卷积神经网络Faster R-CNN对火焰特征进行提取检测,步骤如下:The flame features are extracted and detected by the fast convolutional neural network Faster R-CNN. The steps are as follows: ②-1:输入拍摄到的火焰图片;②-1: Input the captured flame picture; ②-2.:将图片送入卷积神经网络CNN中进行特征提取;②-2.: Send the image to the convolutional neural network CNN for feature extraction; ②-3:特征提取后特征映射,特征映射将共同作用于后续的全连接层和区域生成网络RPN:②-3: Feature mapping after feature extraction, which will act together on the subsequent fully connected layer and region generation network RPN: ②-3.1:特征映射进入RPN,首先经过一系列的区域候选建议框,将这些建议框再分别馈入到两个1×1的卷积层,其中第一个卷积层用于进行区域分类,即通过计算生成建议框的交并比IOU值来区分正负样本;另一个由于边界框回归判定,通过非最大化抑制后以生成更精确的目标检测框;②-3.1: The feature map enters the RPN. First, it passes through a series of regional candidate proposal boxes, and then feeds these proposal boxes into two 1×1 convolutional layers. The first convolutional layer is used for regional classification. , that is, the positive and negative samples are distinguished by calculating the intersection ratio IOU value of the generated proposal frame; the other is due to the regression judgment of the bounding box, and a more accurate target detection frame is generated after non-maximization suppression; ②-3.2:特征映射进入ROI池化层,用于后续网络的计算;②-3.2: The feature map enters the ROI pooling layer for subsequent network calculations; ②-4:将池化后的特征映射经过全连接层后,再次利用softmax对建议框进行分类,识别检测框框中的是否为物体,对建议框再次进行边界框回归判定;②-4: After passing the pooled feature map through the fully connected layer, use softmax again to classify the proposed frame, identify whether the detection frame is an object, and perform bounding box regression judgment on the proposed frame again; RPN生成检测框的具体方法是通过一个滑动框对输入特征映射上滑动,在每个像素点上生成9个建议框,这些建议框的大小为1282、2562、5122,长宽比为1∶1、1∶2、2∶1,并利用这些检测框的交并比IOU交集比来区别正负样本,正样本的IOU值大于0.7,IOU值小于0.3的负样本,正负样本的比例设置为1∶1;针对图像中火焰的不同特征,采用引导锚定的方法来加快RPN的检测速度,改进的稀疏锚定策略为:The specific method of RPN generating detection frame is to slide the input feature map through a sliding frame, and generate 9 suggestion frames on each pixel point. The sizes of these suggestion frames are 128 2 , 256 2 , 512 2 , and the aspect ratio is 1:1, 1:2, 2:1, and use the intersection ratio of these detection frames to distinguish positive and negative samples, the IOU value of positive samples is greater than 0.7, and the IOU value of negative samples is less than 0.3. The ratio is set to 1:1; according to the different characteristics of the flame in the image, the guided anchoring method is used to speed up the detection speed of the RPN. The improved sparse anchoring strategy is:
Figure FDA0003027378300000031
Figure FDA0003027378300000031
其中x,y为像素点坐标,F(x,y)表示生成的火焰颜色掩码,为1则该像素点生成建议框,0则不生成,mR(x,y)、mG(x,y)、mB(x,y)分别为图像像素点的RGB通道值,TR为事先设定的阈值;Where x, y are the coordinates of the pixel point, F(x, y) represents the generated flame color mask, if it is 1, the pixel point will generate a suggestion frame, 0 will not generate it, m R (x, y), m G (x , y), m B (x, y) are the RGB channel values of the image pixels, respectively, and T R is the preset threshold; 另外利用边界回归判定去修正检测框的原理为将原始的建议框A经过映射G得到一个更接近真实情况的回归建议框F;这种映射G通过平移和缩放得到:In addition, the principle of using boundary regression judgment to correct the detection frame is to pass the original suggestion frame A through the mapping G to obtain a regression suggestion frame F that is closer to the real situation; this mapping G is obtained by translation and scaling: 先平移:Fx=Aw.dx(A)+Ax (2-2)Translate first: F x =A w .d x (A)+A x (2-2) Fy=Ah.dy(A)+Ay (2-3)F y =A h .dy (A)+A y ( 2-3) 再缩放:Fw=Aw.exp(dw(A)) (2-4)Rescaling: F w =A w .exp(d w (A)) (2-4) Fh=Ah.exp(dh(A)) (2-5)F h =A h .exp(d h (A)) (2-5) 其中x,y,w,h分别表示建议框的中心坐标,宽、高,dx、dy、dw、dh分别为变换关系,当原始框A和真实框F差距不大时,将这种变换视为线性的;where x, y, w, h represent the center coordinates of the proposed box, width and height, respectively, d x , dy , d w , and dh h are the transformation relationships. When the difference between the original box A and the real box F is not large, the This transformation is considered linear; 输出是识别为火焰的概率。The output is the probability of being identified as a flame.
4.根据权利要求1所述的基于积分强化学习的多消防巡检协作机器人系统,其特征在于:所述交互层为:在巡检过程中需要实时将摄像头所捕捉到的画面通过无线网络发送到控制室和移动终端,并配套开发有相应的APP,在远程终端对巡检机器人进行相应的控制,以实现操作人员对想要再次巡检的区域的巡查;在检测到火焰后,立即向控制室发出警报信号并且能立即自动的做出相应的灭火措施;在实施灭火措施之后,若火情仍然得不到抑制,即将自动模式切换到远程操控模式,由控制室内的专业人员全面接管巡检机器人的控制,手动控制履带运转和机械臂动作以实现对着火点的精准扑灭,并根据火情判断是否需要做出切断电源、关闭燃气阀门、转移易燃物操作;将每个巡检机器都能人与整个消防系统进行并网,若采取措施后火情仍然较大,向控制室发出接管消防网络的请求,在得到控制室同意下或消防控制室一分钟内未做出应答,将建筑内局部的喷淋管网打开,同时发出全面消防警报,打开所有消防通道与应急照明设施;在机器人顶端安装急停按键;在火情扑灭后,将着火点在巡检地图上标注为重点巡检区域。4. The multi-fire-fighting inspection collaborative robot system based on integral reinforcement learning according to claim 1, wherein the interaction layer is: in the inspection process, the picture captured by the camera needs to be sent through a wireless network in real time Go to the control room and mobile terminal, and develop the corresponding APP, and control the inspection robot correspondingly at the remote terminal, so as to realize the inspection of the area that the operator wants to inspect again; The control room sends out an alarm signal and can immediately and automatically take corresponding fire-fighting measures; after the fire-fighting measures are implemented, if the fire still cannot be suppressed, the automatic mode will be switched to the remote control mode, and the professionals in the control room will take over the patrol. The control of the inspection robot, manually control the operation of the crawler and the movement of the mechanical arm to achieve accurate fire extinguishing, and determine whether it is necessary to cut off the power supply, close the gas valve, and transfer flammable materials according to the fire situation; The capable person is connected to the entire fire protection system. If the fire is still large after taking measures, a request is sent to the control room to take over the fire protection network. The local sprinkler pipe network is opened, and a comprehensive fire alarm is issued at the same time, and all fire passages and emergency lighting facilities are turned on; an emergency stop button is installed on the top of the robot; after the fire is extinguished, the fire point is marked on the inspection map as a key inspection. area. 5.根据权利要求1所述的基于积分强化学习的多消防巡检协作机器人系统,其特征在于:所述控制层为:5. The multi-fire patrol inspection collaborative robot system based on integral reinforcement learning according to claim 1, characterized in that: the control layer is: 设整个消防巡检区域下共有N个机器人协同巡检,N个机器人从各自的初始位置(xi0,yi0)要到达各自的目的地(xiD,yiD),i∈{1,2,...,N},设第i个消防巡检机器人在t时刻的位置Lj(t)=[Lix(T),Liy(t)]T,速度Vi(t)=[Vix(t),Viy((t)]T,控制器输入Ui(t)=[uix(t),uiy(t)]T,控制输入和未知的环境扰动Wi(t)=[Wix(t),Wiy(t)]T,为避免执行器饱和,对输入进行约束,要求|U(t)|≤λ,其中λ为正常数。设两个巡检机器人之间的距离rij(t)=||Li(t)-Lj(t)||,为避免两个巡检机器人发生碰撞需要设置一个安全距离rs,要求在巡检过程中的任意时刻都要满足rij(t)≥rs,设当N个机器人到达巡检目的地后保证rij(t)>>rs,此时i≠j;Suppose there are N robots in the entire fire inspection area to conduct cooperative inspections, and N robots from their respective initial positions (x i0 , y i0 ) to reach their respective destinations (x iD , y iD ), i∈{1, 2 , . _ _ _ V ix (t), V iy ((t)] T , controller input U i (t)=[u ix (t), u iy (t)] T , control input and unknown environmental disturbance Wi (t )=[W ix (t), W iy (t)] T , in order to avoid actuator saturation, the input is constrained, requiring |U(t)|≤λ, where λ is a positive number. Let two inspection robots The distance r ij (t)=||L i (t)-L j (t)||, in order to avoid the collision of two inspection robots, a safety distance rs needs to be set. At any time, r ij (t)≥rs must be satisfied, and it is assumed that r ij ( t ) >> rs when N robots arrive at the inspection destination, and i≠j at this time; 则考虑第i个消防巡检机器人的二阶线性动力学模型为:Then consider the second-order linear dynamic model of the i-th fire inspection robot as:
Figure FDA0003027378300000041
Figure FDA0003027378300000041
其中系统矩阵为A,输入矩阵为B,输出矩阵为C,干扰矩阵为D,
Figure FDA0003027378300000042
为机器人在t时刻的状态,
Figure FDA0003027378300000043
为输入,yi(t)为系统唯一输出。
where the system matrix is A, the input matrix is B, the output matrix is C, and the interference matrix is D,
Figure FDA0003027378300000042
is the state of the robot at time t,
Figure FDA0003027378300000043
is the input, and y i (t) is the only output of the system.
将全局动力学模型写为:Write the global dynamics model as:
Figure FDA0003027378300000044
Figure FDA0003027378300000044
其中
Figure FDA0003027378300000045
为Kronecker乘积,X(t)=[x1(t),x2(t),...,xn(t)]T,Y(t)=[y1(t),y2(t),...,yn(t)]T,IN为N阶单位矩阵,且设L(t)=[L1t,L2t,...,LNt]T,LD=[L1D,L2D,...,LND]T,U0=[U1,U2,...,UN]T分别为N个机器人的在t时刻的位置、目标点位置和控制输入;
in
Figure FDA0003027378300000045
is the Kronecker product, X(t) = [x 1 (t), x 2 (t), ..., x n (t)] T , Y(t) = [y 1 (t), y 2 (t) ) , . _ _ _ _ _ 1D , L 2D , . _ _ _ ;
为使N个消防巡检机器人在未知的扰动下实现在连续时间、连续状态和控制输入空间中的最小时间和能量的最优控制,并且在整个过程中要避免碰撞,考虑以下成本函数:In order to make N fire inspection robots realize optimal control of minimum time and energy in continuous time, continuous state and control input space under unknown disturbance, and avoid collisions in the whole process, the following cost function is considered:
Figure FDA0003027378300000046
Figure FDA0003027378300000046
其中ζ>0,用于表示巡检过程中时间的比重,R为正定矩阵;为求解机器人最小到达时间T未知的路径规划问题,引入双曲正切函数将成本函数改写成无穷积分的形式以便求解,另外为避免执行器饱和,还想要对输入进行约束,将U(t)TRU(t)线性二次型改写成非二次型性能函数φ(U(t)用于逼近最小能量成本并且捕获输入约束,且为避免两个机器人之间发生碰撞引入了人工势场函数,将成本函数近似改写为:Among them, ζ>0 is used to represent the proportion of time in the inspection process, and R is a positive definite matrix; in order to solve the path planning problem where the minimum arrival time T of the robot is unknown, the hyperbolic tangent function is introduced to rewrite the cost function into the form of infinite integral to solve the problem. , in addition, in order to avoid actuator saturation, and also want to constrain the input, the linear quadratic U(t) T RU(t) is rewritten into a non-quadratic performance function φ(U(t) for approximating the minimum energy cost And the input constraints are captured, and an artificial potential field function is introduced to avoid collision between the two robots, and the cost function is approximately rewritten as:
Figure FDA0003027378300000051
Figure FDA0003027378300000051
其中ζ为正常数,tanh为双曲正切函数,该函数为单调递增的奇函数且连续可微,成本函数是IRL可解的形式。将ζ改写为ζtanh(L(t)-LD)T(L(t)-LD),当机器人当前位置L(t)距离目标点LD时ζtanh(L(t)-LD)T(L(t)-LD)近似为ζ,到达目标点时ζtanh(L(t)-LD)T(L(t)-LD)=0,将未知时间的T积分转化为与到达时间T无关的无穷积分,以实现对值函数的最优求解。Among them, ζ is a constant, tanh is a hyperbolic tangent function, which is a monotonically increasing odd function and is continuously differentiable, and the cost function is a form that can be solved by IRL. Rewrite ζ as ζtanh(L(t)-L D ) T (L(t)-L D ), when the robot’s current position L(t) is far from the target point LD ζtanh(L(t)-L D ) T (L(t)-L D ) is approximately ζ, when reaching the target point, ζ tanh(L(t)-L D ) T (L(t)-L D )=0, transform the T integral of the unknown time into and reach the target point Time T-independent infinite integration to achieve an optimal solution to the value function. 将U(t)TRU(t)线性二次型改写成非二次型性能函数φ(U(t)用于逼近最小能量成本并且捕获输入约束:Rewriting U(t) T RU(t) linear quadratic into a non-quadratic performance function φ(U(t) is used to approximate the minimum energy cost and capture the input constraints:
Figure FDA0003027378300000052
Figure FDA0003027378300000052
其中输入约束为|U(t)|≤λ,λ和σ均为正常数,R=diag(r1,r2...rm)>0。The input constraint is |U(t)|≤λ, λ and σ are both positive constants, and R=diag(r 1 , r 2 . . . r m )>0. 为避免任何一对巡检机器人发生碰撞,加入人工势场函数fR(rij(t))使得两个机器人之间发出排斥势场使得二者相互避开,为使得加入势场函数之后的V(x(t),U(t))有界,设计权重矩阵ΛR(t),用于抵消非零尾部;将排斥函数fR(rij(t))定义高斯函数的形式,且该高斯函数总是大于0:In order to avoid any pair of inspection robots from colliding, an artificial potential field function f R (r ij (t)) is added to make the two robots emit a repulsive potential field so that they avoid each other. V(x(t), U(t)) is bounded, the weight matrix Λ R (t) is designed to cancel the non-zero tail; the repulsion function f R (r ij (t)) is defined as the form of a Gaussian function, and This Gaussian function is always greater than 0:
Figure FDA0003027378300000053
Figure FDA0003027378300000053
其中s越大则排斥函数的陡度就越大,σ越大排斥范围也越大;为捕捉排斥距离rij(t),求解排斥函数中的s和σ,设有:The larger s is, the steeper the repulsion function is, and the larger the σ is, the larger the repulsion range is; in order to capture the repulsion distance r ij (t), to solve s and σ in the repulsion function, we have: fR(rs)=K0;fR(rs+A)=K1 (4-7)f R ( rs )=K 0 ; f R ( rs +A)=K 1 (4-7) 其中0<K1<K0<1;Δ为正增量,代入得:where 0<K 1 <K 0 <1; Δ is a positive increment, which can be substituted into:
Figure FDA0003027378300000054
Figure FDA0003027378300000054
通过权重矩阵ΛR(t)=[Λ12(t),Λ13(T),...,ΛN-1,N(t)]T来使得引入人工势场函数后的值函数是有界的,且权重矩阵取决于与目标点的距离;Through the weight matrix Λ R (t) = [Λ 12 (t), Λ 13 (T), ..., Λ N-1, N (t)] T , the value function after introducing the artificial potential field function is bounded, and the weight matrix depends on the distance from the target point; ΛR(t)=βtanh(||Li(t)-LiD||2+||Lj(t)-LjD|2) (4-9)Λ R (t)=βtanh(||L i (t)-L iD || 2 +||L j (t)-L jD | 2 ) (4-9) 当机器人原理目标点时ΛR(t)=β,当机器人到达目标点时ΛR(t)=0,β为碰撞系数,β的大小由巡检过程中避免碰撞的重要性决定;When the robot is at the target point in principle, Λ R (t)=β, when the robot reaches the target point, Λ R (t)=0, β is the collision coefficient, and the size of β is determined by the importance of avoiding collision during the inspection process; 下面利用(4-4)中的成本函数求解最优控制输入,(4-4)式两边对t求导,贝尔曼方程写为:Next, the cost function in (4-4) is used to solve the optimal control input, and both sides of equation (4-4) are derived for t, and the Bellman equation is written as: V(x(t),U(t))=-ζtanh(L(t)-LD)T(L(t)-LD)-φ(U(t))-ΛR(t)TfR(rij(t)) (4-10)V(x(t), U(t))=-ζtanh(L(t)-L D ) T (L(t)-L D )-φ(U(t))-Λ R (t) T f R (r ij (t)) (4-10) 令Fζ(t)=ζtanh(L(t)-LD)T(L(t)-LD),定义最优值函数为:Let F ζ (t)=ζtanh(L(t)-L D ) T (L(t)-L D ), define the optimal value function as:
Figure FDA0003027378300000061
Figure FDA0003027378300000061
根据(4-10)式定义HJB方程为:According to the formula (4-10), the HJB equation is defined as:
Figure FDA0003027378300000062
Figure FDA0003027378300000062
其中
Figure FDA0003027378300000063
in
Figure FDA0003027378300000063
在稳定性条件下有
Figure FDA0003027378300000064
(4-11)式两边同时对U求导得:
under stable conditions
Figure FDA0003027378300000064
(4-11), both sides of equation (4-11) are derived from U at the same time:
Figure FDA0003027378300000065
Figure FDA0003027378300000065
移项后得最优控制输入u*为:After shifting the term, the optimal control input u * is:
Figure FDA0003027378300000066
Figure FDA0003027378300000066
将(4-14)代入到(4-5)中得:Substitute (4-14) into (4-5) to get:
Figure FDA0003027378300000067
Figure FDA0003027378300000067
其中l为全为一的列向量,将(4-14)代入(4-15)中得:where l is an all-one column vector, substituting (4-14) into (4-15) to get:
Figure FDA0003027378300000068
Figure FDA0003027378300000068
其中
Figure FDA0003027378300000069
将(4-16)代入(4-12)中得:
in
Figure FDA0003027378300000069
Substitute (4-16) into (4-12) to get:
Figure FDA00030273783000000610
Figure FDA00030273783000000610
利用基于积分强化学习的策略迭代算法求解HJB方程,积分强化学习使用(t,t+T)内的信号用于学习,不需要知道系统具体的动力学模型。The HJB equation is solved by a policy iterative algorithm based on integral reinforcement learning. The integral reinforcement learning uses the signal in (t, t+T) for learning, and does not need to know the specific dynamic model of the system. 首先将值函数改写成积分差值的形式,得到如下的贝尔曼方程:First, the value function is rewritten into the form of integral difference, and the following Bellman equation is obtained:
Figure FDA00030273783000000611
Figure FDA00030273783000000611
为能够在线实时地求解(4-18),引入actor-critic神经网络算法来实现策略迭代过程中的实时更新。首先通过critic神经网络对值函数V(X)进行近似逼近,因为In order to solve (4-18) online and in real time, the actor-critic neural network algorithm is introduced to realize real-time update in the policy iteration process. First, the value function V(X) is approximated by the critic neural network, because
Figure FDA0003027378300000071
Figure FDA0003027378300000071
而其中第一项为易求得的二次型,只对第二项进行逼近,并设
Figure FDA0003027378300000072
用神经网络对V0(X)进行逼近得:
The first term is the quadratic form that is easy to obtain, and only the second term is approximated, and set
Figure FDA0003027378300000072
Approximate V 0 (X) with a neural network as:
Figure FDA0003027378300000073
Figure FDA0003027378300000073
其中wc为critic神经网络的权重,ψc(X)趵为基函数,εc(X)为逼近误差;where w c is the weight of the critic neural network, ψ c (X) is the basis function, and ε c (X) is the approximation error; 将(4-20)两边对X求微分得:Differentiate both sides of (4-20) with respect to X to get:
Figure FDA0003027378300000074
Figure FDA0003027378300000074
将(4-20)代入到(4-18)中得到新的贝尔曼方程:Substitute (4-20) into (4-18) to get the new Bellman equation:
Figure FDA0003027378300000075
Figure FDA0003027378300000075
其中εe(t)=εc(X(t+T))-εc(X(t))为贝尔曼方程误差,Δψc(X(t)=ψc(X(t+T)-ψc(X(t)。where ε e (t)=ε c (X(t+T))-ε c (X(t)) is the Bellman equation error, Δψ c (X(t)=ψ c (X(t+T)- ψ c (X(t). 为确定wc,将(4-20)改写成:To determine w c , rewrite (4-20) as:
Figure FDA0003027378300000076
Figure FDA0003027378300000076
其中
Figure FDA0003027378300000077
为V0(X)的近似值,
Figure FDA0003027378300000078
为理想的逼近系数,则(4-22)式为:
in
Figure FDA0003027378300000077
is an approximation of V 0 (X),
Figure FDA0003027378300000078
is the ideal approximation coefficient, then the formula (4-22) is:
Figure FDA0003027378300000079
Figure FDA0003027378300000079
Figure FDA00030273783000000710
为贝尔曼跟踪误差,并构造以下目标函数,通过使得εe(t)最小化来调整critic神经网络的权重系数:
make
Figure FDA00030273783000000710
is the Bellman tracking error and constructs the following objective function to adjust the weight coefficients of the critic neural network by minimizing ε e (t):
Figure FDA00030273783000000711
Figure FDA00030273783000000711
将(4-25)式两边对
Figure FDA00030273783000000712
求导,再由链式法则得:
Align both sides of (4-25)
Figure FDA00030273783000000712
Take the derivative, and then use the chain rule to get:
Figure FDA00030273783000000713
Figure FDA00030273783000000713
其中βc>0为学习率,
Figure FDA00030273783000000714
Figure FDA00030273783000000715
为Δψc的近似值。
where β c > 0 is the learning rate,
Figure FDA00030273783000000714
Figure FDA00030273783000000715
is an approximation of Δψc .
将Ee代入到(4-26)得critic神经网络的权重系数
Figure FDA00030273783000000716
的更新应服从:
Substitute E e into (4-26) to get the weight coefficient of the critic neural network
Figure FDA00030273783000000716
The update should obey:
Figure FDA00030273783000000717
Figure FDA00030273783000000717
将得到的理想权重系数代入到(4-14)中得最优控制策略,然而通过critic逼近的值函数所求得的最优策略却并不能保证闭环系统的稳定性,要为执行器引入actor神经网络来保证收敛到最优解的同时还能够保证系统的稳定性:Substitute the obtained ideal weight coefficient into (4-14) to obtain the optimal control strategy. However, the optimal strategy obtained by the value function of critic approximation cannot guarantee the stability of the closed-loop system. It is necessary to introduce an actor into the actuator. Neural network to ensure convergence to the optimal solution while also ensuring the stability of the system:
Figure FDA00030273783000000718
Figure FDA00030273783000000718
Figure FDA00030273783000000719
为actor神经网络的最优逼近系数,
Figure FDA00030273783000000720
的更新由以下李雅普诺夫函数来确定:
Figure FDA00030273783000000719
is the optimal approximation coefficient of the actor neural network,
Figure FDA00030273783000000720
The update of is determined by the following Lyapunov function:
Figure FDA0003027378300000081
Figure FDA0003027378300000081
当wa满足下式时,所逼近的策略使得系统一致最终有界,通过
Figure FDA0003027378300000082
得到U*(t)。
When wa satisfies the following formula, the approximated strategy makes the system uniform and eventually bounded, by
Figure FDA0003027378300000082
to get U * (t).
Figure FDA0003027378300000083
Figure FDA0003027378300000083
其中K1.K2为设计好的正常数,
Figure FDA0003027378300000084
where K 1 .K 2 is a designed constant,
Figure FDA0003027378300000084
基于(4-19)、(4-27)、(4-28)和(4-30)式,分别利用critic和actor算法实现对值函数和策略函数的同步更新,设计一种基于策略迭代的在线积分强化学习算法来求解HJB方程,以求解最优控制输入。Based on equations (4-19), (4-27), (4-28) and (4-30), the critic and actor algorithms are used to update the value function and the strategy function synchronously, and a strategy iteration-based algorithm is designed. Online integral reinforcement learning algorithm to solve the HJB equation to solve for optimal control inputs. 算法:基于策略迭代的在线IRL算法Algorithm: Online IRL Algorithm Based on Policy Iteration 初始化:给定一个可行的执行器输入
Figure FDA0003027378300000085
Initialization: Given a feasible actuator input
Figure FDA0003027378300000085
Step1:策略评估,给定初始
Figure FDA0003027378300000086
利用下式求解
Figure FDA0003027378300000087
Step1: Policy evaluation, given initial
Figure FDA0003027378300000086
Solve using the following formula
Figure FDA0003027378300000087
Figure FDA0003027378300000088
Figure FDA0003027378300000088
Step2:策略改进,将
Figure FDA0003027378300000089
代入下式更新
Figure FDA00030273783000000810
Step2: Strategy improvement, will
Figure FDA0003027378300000089
Substitute the following update
Figure FDA00030273783000000810
Figure FDA00030273783000000811
Figure FDA00030273783000000811
Step3:令
Figure FDA00030273783000000812
返回Step1,直至
Figure FDA00030273783000000813
收敛到最小值。
Step3: make
Figure FDA00030273783000000812
Return to Step1 until
Figure FDA00030273783000000813
converge to the minimum value.
CN202110419574.2A 2021-04-19 2021-04-19 Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning Expired - Fee Related CN113134187B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110419574.2A CN113134187B (en) 2021-04-19 2021-04-19 Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110419574.2A CN113134187B (en) 2021-04-19 2021-04-19 Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning

Publications (2)

Publication Number Publication Date
CN113134187A true CN113134187A (en) 2021-07-20
CN113134187B CN113134187B (en) 2022-04-29

Family

ID=76812679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110419574.2A Expired - Fee Related CN113134187B (en) 2021-04-19 2021-04-19 Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning

Country Status (1)

Country Link
CN (1) CN113134187B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114442606A (en) * 2021-12-17 2022-05-06 重庆特斯联智慧科技股份有限公司 A kind of alarm and early warning robot and its control method
CN114639088A (en) * 2022-03-23 2022-06-17 姜妹英 Big data automatic navigation method
CN115167444A (en) * 2022-07-27 2022-10-11 成都群智微纳科技有限公司 ROS-based multi-agent autonomous inspection method and system
CN117444985A (en) * 2023-12-20 2024-01-26 安徽大学 Mechanical arm trolley control method and system
CN117944043A (en) * 2023-11-22 2024-04-30 广州深度医疗器械科技有限公司 Robot control method and robot thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109173125A (en) * 2018-09-29 2019-01-11 北京力升高科科技有限公司 A kind of collaboration working method and system of fire-fighting robot
CN109276833A (en) * 2018-08-01 2019-01-29 吉林大学珠海学院 A kind of robot patrol fire-fighting system and its control method based on ROS
CN109976161A (en) * 2019-04-23 2019-07-05 哈尔滨工业大学 A kind of finite time optimization tracking and controlling method of uncertain nonlinear system
CN110782011A (en) * 2019-10-21 2020-02-11 辽宁石油化工大学 A distributed optimization control method for networked multi-agent systems based on reinforcement learning
CN111408089A (en) * 2020-04-22 2020-07-14 北京新松融通机器人科技有限公司 Fire-fighting robot and fire-fighting robot fire extinguishing system
CN112130570A (en) * 2020-09-27 2020-12-25 重庆大学 Blind guiding robot of optimal output feedback controller based on reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109276833A (en) * 2018-08-01 2019-01-29 吉林大学珠海学院 A kind of robot patrol fire-fighting system and its control method based on ROS
CN109173125A (en) * 2018-09-29 2019-01-11 北京力升高科科技有限公司 A kind of collaboration working method and system of fire-fighting robot
CN109976161A (en) * 2019-04-23 2019-07-05 哈尔滨工业大学 A kind of finite time optimization tracking and controlling method of uncertain nonlinear system
CN110782011A (en) * 2019-10-21 2020-02-11 辽宁石油化工大学 A distributed optimization control method for networked multi-agent systems based on reinforcement learning
CN111408089A (en) * 2020-04-22 2020-07-14 北京新松融通机器人科技有限公司 Fire-fighting robot and fire-fighting robot fire extinguishing system
CN112130570A (en) * 2020-09-27 2020-12-25 重庆大学 Blind guiding robot of optimal output feedback controller based on reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
段锁林 等: "基于卷积神经网络的火焰识别", 《计算机工程与设计》 *
赵毓: "基于多智能体强化学习的空间机械臂轨迹规划", 《航空学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114442606A (en) * 2021-12-17 2022-05-06 重庆特斯联智慧科技股份有限公司 A kind of alarm and early warning robot and its control method
CN114442606B (en) * 2021-12-17 2024-04-05 北京未末卓然科技有限公司 Alert condition early warning robot and control method thereof
CN114639088A (en) * 2022-03-23 2022-06-17 姜妹英 Big data automatic navigation method
CN115167444A (en) * 2022-07-27 2022-10-11 成都群智微纳科技有限公司 ROS-based multi-agent autonomous inspection method and system
CN117944043A (en) * 2023-11-22 2024-04-30 广州深度医疗器械科技有限公司 Robot control method and robot thereof
CN117444985A (en) * 2023-12-20 2024-01-26 安徽大学 Mechanical arm trolley control method and system
CN117444985B (en) * 2023-12-20 2024-03-12 安徽大学 A mechanical arm trolley control method and system

Also Published As

Publication number Publication date
CN113134187B (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN113134187B (en) Multi-fire-fighting inspection cooperative robot system based on integral reinforcement learning
CN111300372B (en) Air-ground collaborative intelligent inspection robot and inspection method
CN107589752B (en) Method and system for realizing cooperative formation of unmanned aerial vehicle and ground robot
Wang et al. Development of a search and rescue robot system for the underground building environment
CN110561432A (en) safety cooperation method and device based on man-machine co-fusion
US20080009969A1 (en) Multi-Robot Control Interface
CN113325837A (en) Control system and method for multi-information fusion acquisition robot
WO2009148672A1 (en) System and method for seamless task-directed autonomy for robots
Zhang et al. Design of intelligent fire-fighting robot based on multi-sensor fusion and experimental study on fire scene patrol
CN110673603A (en) A fire field autonomous navigation reconnaissance robot
Kalinov et al. Warevr: Virtual reality interface for supervision of autonomous robotic system aimed at warehouse stocktaking
Kästner et al. Deep-reinforcement-learning-based semantic navigation of mobile robots in dynamic environments
Xiao et al. Autonomous visual assistance for robot operations using a tethered uav
CN114505840B (en) Intelligent service robot for independently operating box type elevator
CN113759901A (en) Mobile robot autonomous obstacle avoidance method based on deep reinforcement learning
CN110883772B (en) A method and system for using robots to deal with potential safety hazards in railway stations
CN113730860A (en) Autonomous fire extinguishing method of fire-fighting robot in unknown environment
CN118092437A (en) Multi-vehicle parallel intelligent collaborative search and rescue system based on digital twin and its construction method
Shim et al. Direction-driven navigation using cognitive map for mobile robots
Hager et al. Toward domain-independent navigation: Dynamic vision and control
Xu et al. Avoidance of manual labeling in robotic autonomous navigation through multi-sensory semi-supervised learning
CN118576940A (en) Intelligent fire extinguishing and detection robot
CN109917670B (en) A Simultaneous Localization and Mapping Method for Intelligent Robot Clusters
Afonso et al. Autonomous navigation of wheelchairs in indoor environments using deep reinforcement learning and computer vision
Bhuiyan et al. Towards real-time motion planning for industrial robots in collaborative environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220429

CF01 Termination of patent right due to non-payment of annual fee