CN116203972B - Unknown environment exploration path planning method, system, equipment and medium - Google Patents

Unknown environment exploration path planning method, system, equipment and medium Download PDF

Info

Publication number
CN116203972B
CN116203972B CN202310491011.3A CN202310491011A CN116203972B CN 116203972 B CN116203972 B CN 116203972B CN 202310491011 A CN202310491011 A CN 202310491011A CN 116203972 B CN116203972 B CN 116203972B
Authority
CN
China
Prior art keywords
grid map
current moment
grid
path
moment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310491011.3A
Other languages
Chinese (zh)
Other versions
CN116203972A (en
Inventor
张旭东
樊杰
邹渊
刘颖群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202310491011.3A priority Critical patent/CN116203972B/en
Publication of CN116203972A publication Critical patent/CN116203972A/en
Application granted granted Critical
Publication of CN116203972B publication Critical patent/CN116203972B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0219Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory ensuring the processing of the whole working surface
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a planning method, a system, equipment and a medium for an unknown environment exploration path, which relate to the technical field of unknown environment exploration, and are characterized in that a plurality of edge points of a grid map at the current moment are determined by utilizing an edge detection algorithm, partial edge points are uniformly selected as potential target points, a potential target point is selected as a target edge point based on a reinforcement learning algorithm, so that a collision-free path from the current position of an unmanned vehicle at the current moment to the target edge point is further planned, the unmanned vehicle is controlled to move along the collision-free path, the process is continued until the unknown environment exploration is completed, and the path planning process for the unknown environment is completed, so that the grid map of the unknown environment is generated. According to the method, the target edge point is selected through the reinforcement learning algorithm, the problems of long planning path and premature termination of exploration can be solved, the exploration efficiency of the unknown environment is improved, and full exploration of the unknown environment can be realized.

Description

Unknown environment exploration path planning method, system, equipment and medium
Technical Field
The present invention relates to the field of unknown environment exploration technologies, and in particular, to a method, a system, an apparatus, and a medium for planning an unknown environment exploration path.
Background
The unknown environment exploration has important application in the fields of disaster area rescue, household cleaning, outer space exploration and the like, and the current commonly used path planning method facing the unknown environment exploration comprises two types of path guiding methods based on edge points and path optimizing methods based on reinforcement learning, and the map construction of the unknown environment is completed through the path planning methods.
Although the path guiding method based on the edge points is easy to deploy and realize, the optimality is poor because the selection of the target edge points is usually based on engineering experience, and the planning path for exploring the map is usually longer; and under partial special conditions, the situation that the exploration of the unknown environment is terminated in advance due to the fact that the target edge point falls into dead spots possibly exists, and the full exploration of the unknown environment is difficult to realize.
Although the path optimizing method based on reinforcement learning is good in optimality, the planned path is relatively short, the optimal action is required to be searched in a full action space (namely the speed and the direction of the ground unmanned vehicle), and the training time of the model is relatively long; and the next exploration space is closer to the current position, so that a myopia effect is easy to form, namely the method is difficult to quickly reach a long-distance unknown space for exploration.
Based on this, a new technology for exploring the path of unknown environments is needed.
Disclosure of Invention
The invention aims to provide a planning method, a system, equipment and a medium for an unknown environment exploration path, which are combined with a path guiding method based on edge points and a path optimizing method based on reinforcement learning, and a target edge point is selected through the reinforcement learning algorithm, so that the problems of planning path length and early termination of exploration can be solved, the exploration efficiency of the unknown environment is improved, and the full exploration of the unknown environment can be realized.
In order to achieve the above object, the present invention provides the following solutions:
an unknown environment exploration path planning method, the unknown environment exploration path planning method comprising:
acquiring vehicle pose data and laser radar data at the current moment; the vehicle pose data are pose data of unmanned vehicles for exploring unknown environments; the laser radar data are scanning data obtained by scanning surrounding environments by a laser radar carried on the unmanned vehicle;
updating the grid map at the previous moment by utilizing the vehicle pose data and the laser radar data at the current moment to obtain the grid map at the current moment; the grid map comprises a plurality of grids obtained by dividing the area where the unknown environment is located, and the occupation probability of each grid; the occupancy probability is used to represent the likelihood that the grid is occupied;
Performing edge point detection on the grid map at the current moment by using an edge detection algorithm to obtain a plurality of edge points; based on the positions of all the edge points in the grid map at the current moment, uniformly selecting part of the edge points as potential target points;
taking the grid map at the current moment as input, and selecting one potential target point as a target edge point based on a reinforcement learning algorithm;
planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point according to the grid map at the current moment and the target edge point, and controlling the unmanned vehicle to move along the collision-free path until reaching the target edge point;
judging whether the exploration of the unknown environment is completed or not; if not, returning to the step of acquiring the vehicle pose data and the laser radar data at the current moment.
In some embodiments, updating the grid map at the previous time by using the vehicle pose data and the laser radar data at the current time, and obtaining the grid map at the current time specifically includes:
determining the update occupation probability of each grid at the current moment according to the vehicle pose data and the laser radar data at the current moment;
And for each grid, calculating the occupancy probability of the grid at the current moment according to the updated occupancy probability of the grid at the current moment and the prior occupancy probability at the initial moment so as to update the grid map at the previous moment and obtain the grid map at the current moment.
In some embodiments, the calculation formula of the occupancy probability of the grid at the current moment includes:
wherein ,the occupancy probability of the grid m at the current time t is given; />Lidar data for each time between the initial time and the current time t; />The intermediate parameter is the current time t;
wherein ,is the intermediate parameter of the last time t-1; />Updating the occupancy probability of the grid m at the current time t; />The laser radar data at the current time t; />The prior occupancy probability of the grid m at the initial moment.
In some embodiments, the edge detection algorithm is a Sobel edge detection algorithm.
In some embodiments, before selecting one of the potential target points as a target edge point based on a reinforcement learning algorithm with the grid map at the current time as an input, the unknown environment exploration path planning method further includes: training an initial reinforcement learning algorithm;
The cumulative return function used in the training process is as follows:
wherein ,the accumulated return for the time T; t0 is the initial time; />Is a forgetting factor; />For middle time->Is a real-time reward for (a);
wherein ,is an instant rewards; />Is a weighted constant; />Rewarding for entropy increase; />The step cost; />For unexpected termination costs; />To fully explore the return.
In some embodiments, the method of calculating the entropy augmentation reward includes: calculating the difference value between the image entropy of the training grid map at the middle moment and the image entropy of the training grid map at the last moment of the middle moment to obtain entropy rewards;
the step cost is a constant negative number;
the unexpected termination cost is a constant negative number;
the method for calculating the full exploration return comprises the following steps: calculating the difference between the training grid map at the middle moment and the real grid map at the middle moment; if the difference is smaller than a preset threshold, the sufficient exploration returns to be rewarding values; if not, the full exploration return is 0.
In some embodiments, the planning, according to the grid map at the current time and the target edge point, a collision-free path from the current position of the unmanned vehicle at the current time to the target edge point, and controlling the unmanned vehicle to move along the collision-free path specifically includes:
Using the grid map at the current moment and the target edge point as inputs, and planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point by using a PRM algorithm; the collision-free path is composed of a plurality of path points;
taking the collision-free path and the vehicle pose data at the current moment as inputs, and calculating the linear speed and the angular speed corresponding to each path point of the unmanned vehicle by using a pure tracking algorithm;
taking the laser radar data at the current moment as input, and correcting the angular speed corresponding to each path point of the unmanned vehicle by utilizing a VFH algorithm to obtain corrected angular speeds corresponding to each path point of the unmanned vehicle;
and controlling the unmanned vehicle to move along the collision-free path based on the linear speed and the corrected angular speed corresponding to each path point.
An unknown environmental exploration path planning system, the unknown environmental exploration path planning system comprising:
the data acquisition module is used for acquiring vehicle pose data and laser radar data at the current moment; the vehicle pose data are pose data of unmanned vehicles for exploring unknown environments; the laser radar data are scanning data obtained by scanning surrounding environments by a laser radar carried on the unmanned vehicle;
The map updating module is used for updating the grid map at the previous moment by utilizing the vehicle pose data and the laser radar data at the current moment to obtain the grid map at the current moment; the grid map comprises a plurality of grids obtained by dividing the area where the unknown environment is located, and the occupation probability of each grid; the occupancy probability is used to represent the likelihood that the grid is occupied;
the edge detection module is used for detecting edge points of the grid map at the current moment by utilizing an edge detection algorithm to obtain a plurality of edge points; based on the positions of all the edge points in the grid map at the current moment, uniformly selecting part of the edge points as potential target points;
the target point selection module is used for taking the grid map at the current moment as input, and selecting one potential target point as a target edge point based on a reinforcement learning algorithm;
the motion control module is used for planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point according to the grid map at the current moment and the target edge point, and controlling the unmanned vehicle to move along the collision-free path until reaching the target edge point;
The return module is used for judging whether the unknown environment is explored; if not, returning to the step of acquiring the vehicle pose data and the laser radar data at the current moment.
An unknown environment exploration path planning apparatus, comprising:
a processor; and
a memory in which computer-readable program instructions are stored,
wherein the above-described unknown environment exploration path planning method is performed when the computer readable program instructions are executed by the processor.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the above described unknown environment exploration path planning method.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a method, a system, equipment and a medium for planning an unknown environment exploration path. And then, carrying out edge point detection on the grid map at the current moment by utilizing an edge detection algorithm to obtain a plurality of edge points, uniformly selecting part of the edge points as potential target points, and selecting a potential target point as a target edge point based on a reinforcement learning algorithm. And finally, planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point according to the grid map at the current moment and the target edge point, controlling the unmanned vehicle to move along the collision-free path until the target edge point is reached, and continuing the process until the unknown environment exploration is completed, thereby completing the path planning process of the unknown environment exploration and generating the grid map of the unknown environment. The invention combines the path guiding method based on the edge points and the path optimizing method based on reinforcement learning, selects the target edge points through the reinforcement learning algorithm, can solve the problems of planning path length and early termination of exploration, improves the exploration efficiency of the unknown environment, and can realize the full exploration of the unknown environment.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a path planning method according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of a hierarchical data flow processing architecture for unknown environment exploration path planning provided in embodiment 1 of the present invention;
FIG. 3 is a schematic diagram of a convolution filter of the Sobel operator according to embodiment 1 of the present invention; wherein fig. 3 (a) is a schematic diagram of a horizontal convolution filter; FIG. 3 (b) is a schematic diagram of a vertical convolution filter;
fig. 4 is a system block diagram of a path planning system according to embodiment 2 of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a planning method, a system, equipment and a medium for an unknown environment exploration path, which are combined with a path guiding method based on edge points and a path optimizing method based on reinforcement learning, and a target edge point is selected through the reinforcement learning algorithm, so that the problems of planning path length and early termination of exploration can be solved, the exploration efficiency of the unknown environment is improved, and the full exploration of the unknown environment can be realized.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1:
the embodiment is used for providing an unknown environment exploration path planning method, as shown in fig. 1, including:
s1: acquiring vehicle pose data and laser radar data at the current moment; the vehicle pose data are pose data of unmanned vehicles for exploring unknown environments; the laser radar data are scanning data obtained by scanning surrounding environments by a laser radar carried on the unmanned vehicle;
in this embodiment, only the movement of the unmanned vehicle in the two-dimensional plane is considered, so the vehicle pose data in this embodiment includes the position and the pose of the unmanned vehicle at the current moment, the position may be represented by two-dimensional coordinates (i.e., the abscissa x of the position of the unmanned vehicle and the ordinate y of the position of the unmanned vehicle), the pose may be represented by the heading angle of the unmanned vehicle, the vehicle pose data may be acquired by a sensor, and the unmanned vehicle may be a ground unmanned vehicle (Unmanned Ground Vehicle, UGV). The laser radar data are scanning data obtained by 360-degree scanning of the surrounding environment of the position where the unmanned vehicle is located by the laser radar mounted on the unmanned vehicle.
S2: updating the grid map at the previous moment by utilizing the vehicle pose data and the laser radar data at the current moment to obtain the grid map at the current moment; the grid map comprises a plurality of grids obtained by dividing the area where the unknown environment is located, and the occupation probability of each grid; the occupancy probability is used to represent the likelihood that the grid is occupied;
s3: performing edge point detection on the grid map at the current moment by using an edge detection algorithm to obtain a plurality of edge points; based on the positions of all the edge points in the grid map at the current moment, uniformly selecting part of the edge points as potential target points;
s4: taking the grid map at the current moment as input, and selecting one potential target point as a target edge point based on a reinforcement learning algorithm;
s5: planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point according to the grid map at the current moment and the target edge point, and controlling the unmanned vehicle to move along the collision-free path until reaching the target edge point;
s6: judging whether the exploration of the unknown environment is completed or not; if not, returning to the step of acquiring the vehicle pose data and the laser radar data at the current moment.
After the target edge point is reached, returning to the step S1, and sequentially executing the steps S1-S3 to determine the number of the edge points obtained after the edge point detection, and if the number of the edge points corresponding to the non-obstacle is smaller than the preset number in the edge points, considering that the exploration is completed. The method used for judging whether the non-obstacle belongs to the edge point corresponding to the non-obstacle is as follows: judging whether the occupancy probability of the grid where the edge point is located is 1, if not, indicating that the edge point belongs to the edge point corresponding to the non-obstacle.
The embodiment can adopt a hierarchical data stream processing architecture (namely a hierarchical architecture) facing to unknown environment exploration path planning, and adopts a path planning method integrating reinforcement learning and edge points under the hierarchical architecture, wherein the path planning method comprises an edge point generation method based on image edge detection, a target edge point screening method based on reinforcement learning and the like. As shown in fig. 2, the hierarchical data stream processing architecture used in the present embodiment includes a perception layer, a planning layer, and a control layer. Firstly, inputting laser radar data and vehicle pose data of a ground unmanned vehicle into a perception layer, and iteratively updating a grid map (corresponding to S1 and S2) by the perception layer based on the laser radar data and the vehicle pose data by using a Bayesian estimation criterion (also called Bayesian rule); then, the updated grid map is input to a planning layer, and two data streams are processed in the planning layer, wherein the edge detection is performed on the grid map to extract edge points (corresponding to S3), and the grid map and the edge points are input to a reinforcement learning algorithm, so that a reinforcement learning agent selects target edge points from the edge points (corresponding to S4). Finally, inputting the target edge point into a control layer, wherein the control layer takes the current position of the vehicle as a starting position, takes the target edge point as a target position, plans a collision-free path from the starting position to the target position through a probability roadmap (Probabilistic Road map, PRM) algorithm, and realizes path tracking control (corresponding to S5) of the unmanned vehicle on the collision-free path based on a pure tracking algorithm (which can also be called as a pure tracking algorithm) and a vector field histogram (Vector Field Histogram, VFH) algorithm. The above process is repeated until the unknown environment is completely explored (corresponding to S6).
In this embodiment, a grid map (the grid map is a two-dimensional map) is used to describe an unknown environment, a size range of an area where the unknown environment is located is predetermined to draw an initial map, and then the initial map is subjected to grid division to obtain a grid map, wherein each grid in the grid map has an occupancy probability indicating the possibility of being occupied, and the higher the value of the occupancy probability is, the higher the probability of being occupied is. At the initial moment, the occupation probability of each grid in the grid map is equal to the prior occupation probability, and then the grid map is iteratively updated based on the laser radar data and the vehicle pose data acquired at each moment by using a Bayesian estimation criterion (namely, the posterior probability at the t-1 moment (or called the prior probability at the t moment) and the laser radar data at the t moment) to estimate the posterior probability at the t moment, so as to acquire the grid map at each moment.
In S2, updating the grid map at the previous time by using the vehicle pose data and the laser radar data at the current time, the obtaining the grid map at the current time may include:
(1) And determining the updated occupation probability of each grid at the current moment according to the vehicle pose data and the laser radar data at the current moment.
The laser radar generally uses a heading angle of an unmanned vehicle (which can be determined according to vehicle pose data at the current moment) as a starting angle (0 degree), and scans the surrounding environment of the current position of the unmanned vehicle at the current moment by 360 degrees to obtain laser radar data, wherein the laser radar data comprises a scanning length of each scanning angle between 0 and 360 degrees. Since the laser radar has the longest scanning length, if the scanning length of a certain scanning angle is smaller than the longest scanning length, the laser radar indicates that an obstacle exists in the direction of the scanning angle, the grid where the obstacle exists can be determined according to the scanning length of the scanning angle, the state of the grid where the obstacle exists is set as an occupied state, and the states of the grid where the current position of the unmanned vehicle exists and the grid where the obstacle exists along the grid between the scanning angles are set as idle states; if the scanning length of a certain scanning angle is equal to the longest scanning length, the fact that no barrier exists in the direction of the scanning angle is indicated, the farthest grid can be determined according to the longest scanning length, and the states of the grid where the current position of the unmanned vehicle is located and the grid where the farthest grid is located along the scanning angle are set to be idle states; for other grids of the grid map which are not scanned by the laser radar, the states of the other grids are all set to be unknown states. The method comprises the steps of setting the update occupancy probability of a grid in an occupied state at the current moment to be a first preset value, setting the update occupancy probability of a grid in an idle state at the current moment to be a second preset value, and setting the update occupancy probability of a grid in an unknown state at the current moment to be a third preset value, so that the update occupancy probability of each grid at the current moment is determined according to vehicle pose data and laser radar data at the current moment. The first preset value may be 0.8, the second preset value may be 0.2, and the third preset value may be 0.5. Since one grid may correspond to a plurality of scanning angles, if it is determined that the state of a certain grid at the current time is both the occupied state and the idle state based on the plurality of scanning angles, the state of the certain grid at the current time is considered to be the occupied state.
(2) And for each grid, calculating the occupancy probability of the grid at the current moment according to the updated occupancy probability of the grid at the current moment and the prior occupancy probability at the initial moment so as to update the grid map at the previous moment and obtain the grid map at the current moment.
Suppose that at the current time t, the unmanned vehicle is in the vehicle pose data x t Reading laser radar data z t The purpose of the perceptual layer is then to determine the occupation probability p (m|z) of each grid m at the current instant t t ,x t ). Based on Bayesian estimation criteria, continuously updating the grid map by laser radar data, deducing the grid map at the time t by using the grid map at the time t-1, and recursing the following steps:
; (1)
in the formula (1), the components are as follows,the intermediate parameter is the current time t; />Is the intermediate parameter of the last time t-1; />Updating the occupancy probability of the grid m at the current time t; />The laser radar data at the current time t; />At the initial moment for grid mThe present embodiment initializes the prior occupancy probabilities of all grids to 0.5.
The present embodiment defines the intermediate parameters according to the following formula (2):
; (2)
in the formula (2), the amino acid sequence of the compound,the occupancy probability of the grid m at the current time t is given; />For lidar data at each time between the initial time and the current time t.
The embodiment derives according to the formula (2), so as to obtain a calculation formula of the occupancy probability of the grid at the current moment, as follows:
; (3)
intermediate parameter of current time t is calculated by formula (1)And then, calculating the occupancy probability of the grid at the current moment by using the formula (3). And replacing the occupation probability of each grid in the grid map at the previous moment by the occupation probability of each grid at the current moment to obtain the grid map at the current moment.
In the S3, when the edge detection algorithm is used for carrying out edge point detection on the grid map at the current moment to obtain a plurality of edge points, the edge detection algorithm can be a Sobel edge detection algorithm, and the Sobel edge detection algorithm has the advantages of good robustness and small operation amount. Of course, other existing edge detection algorithms may also be used in this embodiment.
In the following, a Sobel edge detection algorithm is taken as an example, and a detection process of an edge point in this embodiment is discussed in detail:
taking a grid map at the current moment as image data, wherein one grid is equivalent to one pixel point, the occupied probability of the grid at the current moment is equivalent to the pixel value of the pixel point, and extracting edge points in the grid map at the current moment based on a Sobel edge detection algorithm. The core of the Sobel edge detection algorithm is the Sobel operator, which first performs weighted averaging on an image using a 3×3 convolution filter, and then detects edge points of the image by first-order differentiation. Assuming that f (x, y) represents an image, its gradient can be expressed as:
; (4)
In the formula (4), the amino acid sequence of the compound,is an image gradient; />Is a gradient in the horizontal x direction; />Is a gradient in the vertical y-direction.
The Sobel operator includes a horizontal direction convolution filter and a vertical direction convolution filter, as shown in fig. 3, fig. 3 (a) is a horizontal direction convolution filter, and fig. 3 (b) is a vertical direction convolution filter. The specific steps of the Sobel edge detection algorithm are as follows: (1) Traversing the image to be processed corresponding to the grid map at the current moment from left to right and from top to bottom by utilizing the horizontal convolution filter and the vertical convolution filter respectively, performing discrete convolution operation on each traversed target pixel point, specifically performing weighted summation on pixel values (including the target pixel point) of the horizontal convolution filter and all pixel points in a 3*3 adjacent area taking the target pixel point as a center to obtain a horizontal convolution result, and performing weighted summation on pixel values (including the target pixel point) of the vertical convolution filter and all pixel points in a 3*3 adjacent area taking the target pixel point as a center to obtain a vertical convolution result. (2) And for each target pixel point, replacing the pixel value of the target pixel point in the image to be processed with the maximum value of the horizontal convolution result and the vertical convolution result of the target pixel point to obtain the processed image. (3) For each pixel point of the processed image, if the pixel value of the pixel point is more than or equal to a threshold value T, determining the pixel point as an edge point of the processed image so as to extract and obtain a plurality of edge points of the grid map at the current moment.
After the plurality of edge points are extracted, the embodiment may further uniformly select a part of the edge points as the potential target points based on the positions of all the edge points in the grid map at the current time, that is, the selected potential target points are uniformly distributed in the grid map at the current time. Specifically, the selection mode may be: each edge point is assigned with a random number, and a specific number of edge points are randomly selected as potential target points in the procedure by generating the random number. The potential target point should be an edge point belonging to a non-obstacle correspondence. The present embodiment may further sort all the potential target points in order of the distance from large to small or from small to large according to the distance from the potential target point to the current position of the unmanned vehicle at the current time, where the sorting purpose is to quickly determine which potential target point the target edge point refers to when outputting the nth potential target point as the target edge point through the reinforcement learning algorithm.
The present embodiment performs optimal selection on the potential target point based on the reinforcement learning algorithm to determine an optimal target edge point, that is, a grid that the unmanned vehicle needs to reach at the next moment. Specifically, the grid map at the current moment output by the perception layer is used as the input state s of the reinforcement learning algorithm, the potential target points output by the planning layer form the action space a of the reinforcement learning algorithm, and the optimal strategy of the reinforcement learning algorithm is utilized And selecting the optimal action in the action space, namely selecting an optimal potential target point from all potential target points as a target edge point. Strong strengthThe output of the learning algorithm is the nth potential target point as the target edge point, and since the potential target points are ordered according to the distance, the potential target points in the action space of the reinforcement learning algorithm are sequential, and at this time, it can be quickly determined which potential target point the target edge point is.
Because the dimension of the input layer of the reinforcement learning network (neural network) corresponding to the reinforcement learning algorithm is preset, and the shapes of the different searched unknown environments may have different aspect ratios, before the grid map at the current moment is input into the reinforcement learning algorithm, the embodiment adjusts the image size of the grid map at the current moment through the resize function in the matlab so as to enable the grid map to conform to the dimension of the input layer of the neural network, and the subsequent neural network can process the grid map at the current moment.
The reinforcement learning algorithm of the embodiment may be any reinforcement learning algorithm, such as a deep Q-learning algorithm or a confidence domain policy optimization algorithm. Due to the need to utilize an optimal strategy of reinforcement learning algorithm in determining the target edge points Therefore, before taking the grid map at the current moment as input and selecting a potential target point as a target edge point based on the reinforcement learning algorithm, the unknown environment exploration path planning method of the embodiment further includes: training the initial reinforcement learning algorithm, namely training the network corresponding to the reinforcement learning algorithm, and determining the optimal strategy +.>
In the training process, a grid map is used as a state, potential target points are used as action spaces, and a maximized cumulative return function is used as a training target.
Wherein, the cumulative return function used is:
; (5)
in the formula (5), the amino acid sequence of the compound,the accumulated return for the time T; t0 is the initial time, ">=t0,t0+1,...,T;/>Is forgetting factor [0,1 ]]An internal constant; />For middle time->Instant rewards obtained from the environment.
; (6)
In the formula (6), the amino acid sequence of the compound,is an instant rewards; />Is a weighted constant; />Rewarding for entropy increase; />The step cost; />For unexpected termination costs; />To fully explore the return.
Specifically, the entropy-increasing reward is entropy increase of the updated grid map, and is used for encouraging the unmanned vehicle to explore the unknown environment, and the calculation method of the entropy-increasing reward comprises the following steps: and calculating the difference value between the image entropy of the training grid map at the middle moment and the image entropy of the training grid map at the moment above the middle moment to obtain the entropy increasing rewards. The specific calculation formula is as follows:
; (7)
In the formula (7), the amino acid sequence of the compound,training grid map for intermediate time t>Is a picture entropy of (2); />The image entropy of the training grid map at the time t-1 immediately preceding the intermediate time t.
; (8)
In the formula (8), the amino acid sequence of the compound,grid map for training->Probability of being in a certain interval n, which probability is +.>And (3) frequency statistics is carried out on all grids in the grid. Specifically, the range of 0-1 of the occupancy probability of the grid is divided into N intervals, the N intervals do not overlap each other, n=1, 2. Then, for each section, counting the number of grids in the section, dividing the number of grids in the section by the total number of grids, and calculating the probability corresponding to the section (namely, training grid map->Probability of being in that interval). And summing all the intervals to obtain the image entropy.
The step cost aims to prompt the reinforcement learning algorithm to find a relatively short path, and the faster the unmanned vehicle explores an unknown environment, the lower the accumulated step cost, thereby encouraging the unmanned vehicle to improve the exploration efficiency, so the step cost is a constant negative number.
The unexpected termination cost is set to a constant negative number for punishing that the environment is not fully explored, which can be set to-20 in the present embodiment.
If the difference between the real map and the explored map is below a certain value, then rewards for full exploration will be provided to the reinforcement learning algorithm, so the calculation method for full exploration rewards includes: calculating an intermediate timeIs a training grid map and intermediate time->Is a difference of the real grid map; if the difference is smaller than the preset threshold value, fully exploring the rewards as rewards; if not, the full exploration return is 0. The calculation method of the difference is as follows: and calculating the error of the average value of the free space area (namely, the area with the occupied probability of not 1) in the explored training grid map and the occupied probability of all grids of the free space area in the real grid map, and obtaining the probability average error, wherein the probability average error is taken as the difference. If the probability average error is smaller than the preset threshold value of 0.05, fully exploring the return to be the rewarding value, and finishing the training process by the fact that the exploration is completed. The preset threshold may also be other smaller values.
After determining the state, the action space and the accumulated return function, any reinforcement learning algorithm can be adopted to obtain the function mapping optimal strategy from the state s to the optimal actionAccording to the optimal strategy->The optimal potential target point may be selected from the random potential target points as a target edge point (also referred to as a path target tracking point). The sample data used in the training process includes a plurality of real grid maps of the explored unknown environment, and for each real grid map, the training process is performed once, and the training process may include: and (3) working according to S1-S6, in the working process, updating a grid map once every time vehicle pose data and laser radar data are obtained, namely calculating the accumulated return at the current moment according to the formula, calculating a state function in reinforcement learning through the accumulated return, comparing the difference value between the state function actually output by the current network and the state function output according to the return function, and carrying out parameter updating on the reinforcement learning network according to the difference value through gradient descent until the parameter converges. In training the model, the search is considered to be completed when the probability average error of the free space region in the search-obtained training grid map to all the grids of the free space region in the real grid map is less than 0.05. The training process belongs to a conventional training process of the reinforcement learning algorithm, and is not described herein.
The control layer takes a grid map at the current moment from the perception layer and target edge points from the planning layer as inputs, and firstly generates a collision-free path connecting the current position and the target edge points by using a PRM algorithm. And then inputting the generated collision-free path into a pure tracking algorithm to obtain the control linear speed and angular speed of the unmanned vehicle. In order to avoid the obstacle and improve the collision-free performance, the angular velocity output by the pure tracking algorithm is further processed by the VFH algorithm, so that the corrected angular velocity is output. And finally, the linear speed output by the pure tracking algorithm and the corrected angular speed output by the VFH algorithm are acted on the unmanned vehicle, so that the unmanned vehicle moves to a target edge point along a collision-free path, pose information of the unmanned vehicle is updated, and an unknown environment is continuously explored. Then in S5, planning a collision-free path from the current position of the unmanned vehicle at the current time to the target edge point according to the grid map at the current time and the target edge point, and controlling the unmanned vehicle to move along the collision-free path may include: using a grid map at the current moment and a target edge point as inputs, and planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point by using a PRM algorithm, wherein the collision-free path consists of a plurality of path points; taking a collision-free path and vehicle pose data at the current moment as input, and calculating the linear speed and the angular speed corresponding to each path point of the unmanned vehicle by using a pure tracking algorithm; taking laser radar data at the current moment as input, and correcting the angular speed corresponding to each path point of the unmanned vehicle by utilizing a VFH algorithm to obtain corrected angular speed corresponding to each path point of the unmanned vehicle; and controlling the unmanned vehicle to move along the collision-free path based on the linear speed and the corrected angular speed corresponding to each path point of the unmanned vehicle.
According to the unknown environment exploration path planning method, the target edge point is selected through the reinforcement learning algorithm by combining the edge point-based path guiding method and the reinforcement learning-based path optimizing method, the situations that the planned path length and exploration are terminated in advance in the edge point-based path guiding method can be solved, the exploration efficiency of the unknown environment is improved, and full exploration of the unknown environment can be achieved.
Example 2:
the embodiment is used for providing an unknown environment exploration path planning system, as shown in fig. 4, which includes:
the data acquisition module M1 is used for acquiring vehicle pose data and laser radar data at the current moment; the vehicle pose data are pose data of unmanned vehicles for exploring unknown environments; the laser radar data are scanning data obtained by scanning surrounding environments by a laser radar carried on the unmanned vehicle;
the map updating module M2 is configured to update the grid map at the previous time by using the vehicle pose data and the laser radar data at the current time, so as to obtain the grid map at the current time; the grid map comprises a plurality of grids obtained by dividing the area where the unknown environment is located, and the occupation probability of each grid; the occupancy probability is used to represent the likelihood that the grid is occupied;
The edge detection module M3 is used for detecting edge points of the grid map at the current moment by utilizing an edge detection algorithm to obtain a plurality of edge points; based on the positions of all the edge points in the grid map at the current moment, uniformly selecting part of the edge points as potential target points;
the target point selection module M4 is configured to select one of the potential target points as a target edge point based on a reinforcement learning algorithm with the grid map at the current time as an input;
the motion control module M5 is used for planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point according to the grid map at the current moment and the target edge point, and controlling the unmanned vehicle to move along the collision-free path until reaching the target edge point;
a return module M6, configured to determine whether the exploration of the unknown environment is completed; if not, returning to the step of acquiring the vehicle pose data and the laser radar data at the current moment.
Example 3:
the embodiment is used for providing an unknown environment exploration path planning device, which comprises the following steps:
a processor; and
a memory in which computer-readable program instructions are stored,
Wherein the computer readable program instructions, when executed by the processor, perform the unknown environment exploration path planning method as described in embodiment 1.
Example 4:
the present embodiment is directed to a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the unknown environment exploration path planning method described in embodiment 1.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (9)

1. An unknown environment exploration path planning method, which is characterized by comprising the following steps:
acquiring vehicle pose data and laser radar data at the current moment; the vehicle pose data are pose data of unmanned vehicles for exploring unknown environments; the laser radar data are scanning data obtained by scanning surrounding environments by a laser radar carried on the unmanned vehicle;
updating the grid map at the previous moment by utilizing the vehicle pose data and the laser radar data at the current moment to obtain the grid map at the current moment; the grid map comprises a plurality of grids obtained by dividing the area where the unknown environment is located, and the occupation probability of each grid; the occupancy probability is used to represent the likelihood that the grid is occupied;
performing edge point detection on the grid map at the current moment by using an edge detection algorithm to obtain a plurality of edge points; based on the positions of all the edge points in the grid map at the current moment, part of the edge points are uniformly selected as potential target points, and specifically, the selection mode can be as follows: assigning a random number to each edge point, randomly selecting a specific number of edge points as potential target points in a program in a mode of generating the random number, wherein the potential target points are the edge points corresponding to non-obstacle;
Taking the grid map at the current moment as input, and selecting one potential target point as a target edge point based on a reinforcement learning algorithm; before the grid map at the current moment is input into the reinforcement learning algorithm, the size of the image of the grid map at the current moment is adjusted through a size function in matlab;
planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point according to the grid map at the current moment and the target edge point, and controlling the unmanned vehicle to move along the collision-free path until reaching the target edge point;
judging whether the exploration of the unknown environment is completed or not; if not, returning to the step of acquiring the vehicle pose data and the laser radar data at the current moment;
the step of planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point according to the grid map at the current moment and the target edge point, and controlling the unmanned vehicle to move along the collision-free path specifically comprises the following steps:
using the grid map at the current moment and the target edge point as inputs, and planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point by using a PRM algorithm; the collision-free path is composed of a plurality of path points;
Taking the collision-free path and the vehicle pose data at the current moment as inputs, and calculating the linear speed and the angular speed corresponding to each path point of the unmanned vehicle by using a pure tracking algorithm;
taking the laser radar data at the current moment as input, and correcting the angular speed corresponding to each path point of the unmanned vehicle by utilizing a VFH algorithm to obtain corrected angular speeds corresponding to each path point of the unmanned vehicle;
and controlling the unmanned vehicle to move along the collision-free path based on the linear speed and the corrected angular speed corresponding to each path point.
2. The unknown environment exploration path planning method according to claim 1, wherein updating the grid map of the previous time by using the vehicle pose data and the laser radar data of the current time, and obtaining the grid map of the current time specifically comprises:
determining the update occupation probability of each grid at the current moment according to the vehicle pose data and the laser radar data at the current moment;
and for each grid, calculating the occupancy probability of the grid at the current moment according to the updated occupancy probability of the grid at the current moment and the prior occupancy probability at the initial moment so as to update the grid map at the previous moment and obtain the grid map at the current moment.
3. The unknown environmental exploration path planning method of claim 2, wherein a calculation formula of an occupancy probability of said grid at a current time comprises:
wherein p (m|z) 1:t ) The occupancy probability of the grid m at the current time t is given; z 1:t Lidar data for each time between the initial time and the current time t;l t the intermediate parameter is the current time t;
wherein ,l t-1 is the intermediate parameter of the last time t-1; p (m|z) t ) Updating the occupancy probability of the grid m at the current time t; z t The laser radar data at the current time t; p (m) is the prior occupancy probability of grid m at the initial moment.
4. The unknown environmental exploration path planning method of claim 1, wherein said edge detection algorithm is a Sobel edge detection algorithm.
5. The unknown environment exploration path planning method of claim 1, wherein prior to selecting one of said potential target points as a target edge point based on a reinforcement learning algorithm with a grid map of said current time as an input, said unknown environment exploration path planning method further comprises: training an initial reinforcement learning algorithm;
the cumulative return function used in the training process is as follows:
wherein ,Gt0 The accumulated return for the time T; t is t 0 Is the initial time; gamma is a forgetting factor; r is (r) t' Instant rewards for intermediate time t';
wherein r is instant rewards; alpha is a weighting constant; r is (r) EI Rewarding for entropy increase; r is (r) l The step cost; r is (r) f For unexpected termination costs; r is (r) w To fully explore the return.
6. The unknown environmental exploration path planning method of claim 5, wherein said entropy increase rewards calculation method comprises: calculating the difference value between the image entropy of the training grid map at the middle moment and the image entropy of the training grid map at the last moment of the middle moment to obtain entropy rewards;
the step cost is a constant negative number;
the unexpected termination cost is a constant negative number;
the method for calculating the full exploration return comprises the following steps: calculating the difference between the training grid map at the middle moment and the real grid map at the middle moment; if the difference is smaller than a preset threshold, the sufficient exploration returns to be rewarding values; if not, the full exploration return is 0.
7. An unknown environmental exploration path planning system, said unknown environmental exploration path planning system comprising:
the data acquisition module is used for acquiring vehicle pose data and laser radar data at the current moment; the vehicle pose data are pose data of unmanned vehicles for exploring unknown environments; the laser radar data are scanning data obtained by scanning surrounding environments by a laser radar carried on the unmanned vehicle;
The map updating module is used for updating the grid map at the previous moment by utilizing the vehicle pose data and the laser radar data at the current moment to obtain the grid map at the current moment; the grid map comprises a plurality of grids obtained by dividing the area where the unknown environment is located, and the occupation probability of each grid; the occupancy probability is used to represent the likelihood that the grid is occupied;
the edge detection module is used for detecting edge points of the grid map at the current moment by utilizing an edge detection algorithm to obtain a plurality of edge points; based on the positions of all the edge points in the grid map at the current moment, part of the edge points are uniformly selected as potential target points, and specifically, the selection mode can be as follows: assigning a random number to each edge point, randomly selecting a specific number of edge points as potential target points in a program in a mode of generating the random number, wherein the potential target points are the edge points corresponding to non-obstacle;
the target point selection module is used for taking the grid map at the current moment as an input, selecting one potential target point as a target edge point based on a reinforcement learning algorithm, and adjusting the image size of the grid map at the current moment through a resize function in a matlab before the grid map at the current moment is input into the reinforcement learning algorithm;
The motion control module is configured to plan a collision-free path from a current position of the unmanned vehicle at the current time to the target edge point according to the grid map at the current time and the target edge point, and control the unmanned vehicle to move along the collision-free path until reaching the target edge point, and plan a collision-free path from the current position of the unmanned vehicle at the current time to the target edge point according to the grid map at the current time and the target edge point, and control the unmanned vehicle to move along the collision-free path specifically includes:
using the grid map at the current moment and the target edge point as inputs, and planning a collision-free path from the current position of the unmanned vehicle at the current moment to the target edge point by using a PRM algorithm; the collision-free path is composed of a plurality of path points;
taking the collision-free path and the vehicle pose data at the current moment as inputs, and calculating the linear speed and the angular speed corresponding to each path point of the unmanned vehicle by using a pure tracking algorithm;
taking the laser radar data at the current moment as input, and correcting the angular speed corresponding to each path point of the unmanned vehicle by utilizing a VFH algorithm to obtain corrected angular speeds corresponding to each path point of the unmanned vehicle;
Controlling the unmanned vehicle to move along the collision-free path based on the linear speed and the corrected angular speed corresponding to each path point;
the return module is used for judging whether the unknown environment is explored; if not, returning to the step of acquiring the vehicle pose data and the laser radar data at the current moment.
8. An unknown environment exploration path planning apparatus, comprising:
a processor; and
a memory in which computer-readable program instructions are stored,
wherein the computer readable program instructions, when executed by the processor, perform the unknown environment exploration path planning method of any of claims 1-6.
9. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the unknown environment exploration path planning method of any of claims 1-6.
CN202310491011.3A 2023-05-05 2023-05-05 Unknown environment exploration path planning method, system, equipment and medium Active CN116203972B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310491011.3A CN116203972B (en) 2023-05-05 2023-05-05 Unknown environment exploration path planning method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310491011.3A CN116203972B (en) 2023-05-05 2023-05-05 Unknown environment exploration path planning method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN116203972A CN116203972A (en) 2023-06-02
CN116203972B true CN116203972B (en) 2023-08-18

Family

ID=86511535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310491011.3A Active CN116203972B (en) 2023-05-05 2023-05-05 Unknown environment exploration path planning method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN116203972B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116429145B (en) * 2023-06-07 2023-08-25 福龙马城服机器人科技有限公司 Automatic docking navigation method and system for unmanned vehicle and garbage can under complex scene

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112033410A (en) * 2020-09-03 2020-12-04 中南大学 Mobile robot environment map construction method, system and storage medium
CN113433937A (en) * 2021-06-08 2021-09-24 杭州未名信科科技有限公司 Heuristic exploration-based layered navigation obstacle avoidance system and layered navigation obstacle avoidance method
CN114494329A (en) * 2022-04-15 2022-05-13 之江实验室 Guide point selection method for mobile robot to independently explore in non-planar environment
CN114859932A (en) * 2022-05-20 2022-08-05 郑州大学产业技术研究院有限公司 Exploration method and device based on reinforcement learning and intelligent equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112033410A (en) * 2020-09-03 2020-12-04 中南大学 Mobile robot environment map construction method, system and storage medium
CN113433937A (en) * 2021-06-08 2021-09-24 杭州未名信科科技有限公司 Heuristic exploration-based layered navigation obstacle avoidance system and layered navigation obstacle avoidance method
CN114494329A (en) * 2022-04-15 2022-05-13 之江实验室 Guide point selection method for mobile robot to independently explore in non-planar environment
CN114859932A (en) * 2022-05-20 2022-08-05 郑州大学产业技术研究院有限公司 Exploration method and device based on reinforcement learning and intelligent equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种面向未知非结构化环境的无人车辆运动规划算法;张旭东等;《2021中国汽车工程学会年会论文集(1)》;全文 *

Also Published As

Publication number Publication date
CN116203972A (en) 2023-06-02

Similar Documents

Publication Publication Date Title
WO2022188663A1 (en) Target detection method and apparatus
CN111626128A (en) Improved YOLOv 3-based pedestrian detection method in orchard environment
CN116203972B (en) Unknown environment exploration path planning method, system, equipment and medium
Botteghi et al. On reward shaping for mobile robot navigation: A reinforcement learning and SLAM based approach
US11790661B2 (en) Image prediction system
US11703596B2 (en) Method and system for automatically processing point cloud based on reinforcement learning
Jin et al. Neu-nbv: Next best view planning using uncertainty estimation in image-based neural rendering
CN108537825B (en) Target tracking method based on transfer learning regression network
US20230121534A1 (en) Method and electronic device for 3d object detection using neural networks
CN113593035A (en) Motion control decision generation method and device, electronic equipment and storage medium
Bauer et al. Deep, spatially coherent inverse sensor models with uncertainty incorporation using the evidential framework
CN117389305A (en) Unmanned aerial vehicle inspection path planning method, system, equipment and medium
CN116385493A (en) Multi-moving-object detection and track prediction method in field environment
Forkel et al. Dynamic resolution terrain estimation for autonomous (dirt) road driving fusing lidar and vision
CN114764252A (en) Dynamic obstacle avoidance method and device and mobile robot
CN115268461A (en) Mobile robot autonomous navigation method with fusion algorithm
CN114820973A (en) Cost map generation method and device, computer equipment and storage medium
CN115047871A (en) Multi-unmanned vehicle collaborative search method, device, equipment and medium for dynamic target
CN114326821A (en) Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning
Polk et al. A parallel architecture for curvature-based road scene classification
CN112396611B (en) Self-adaptive optimization method, device and storage medium for point-line visual odometer
Choi et al. Robust modeling and prediction in dynamic environments using recurrent flow networks
CN116148879B (en) Method for improving obstacle marking precision by robot
CN115527034B (en) Vehicle end point cloud dynamic and static segmentation method, device and medium
CN117706942B (en) Environment sensing and self-adaptive driving auxiliary electronic control method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant