CN110222697B - Planetary surface landform active perception method based on reinforcement learning - Google Patents

Planetary surface landform active perception method based on reinforcement learning Download PDF

Info

Publication number
CN110222697B
CN110222697B CN201910343241.9A CN201910343241A CN110222697B CN 110222697 B CN110222697 B CN 110222697B CN 201910343241 A CN201910343241 A CN 201910343241A CN 110222697 B CN110222697 B CN 110222697B
Authority
CN
China
Prior art keywords
feature
landform
camera
reinforcement learning
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910343241.9A
Other languages
Chinese (zh)
Other versions
CN110222697A (en
Inventor
余萌
李爽
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201910343241.9A priority Critical patent/CN110222697B/en
Publication of CN110222697A publication Critical patent/CN110222697A/en
Application granted granted Critical
Publication of CN110222697B publication Critical patent/CN110222697B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a planetary surface landform active perception method based on reinforcement learning, which comprises the following steps: firstly, on the basis of a modern set theory, describing planet landforms in real time by using a local feature description operator of an image and an image global significance method to generate an actively-perceived knowledge base; on the basis, a reward function based on a finite feature description algorithm set is designed in combination with an enhanced learning framework, and a learning framework for active perception of target landforms is constructed. Considering the limitation of the computing power of the spaceborne computer, defining the learning step length as a limited step length in the framework, and finally completing training and learning by combining with a planet landform description operator knowledge base to form an integral landform active perception method. The invention can realize the autonomous perception of the planet landform, the patrol device can autonomously identify the interesting landform, and the scientific exploration efficiency of the star catalogue task can be actively and effectively improved.

Description

Planetary surface landform active perception method based on reinforcement learning
Technical Field
The invention belongs to the technical field of task planning and pattern recognition, and particularly relates to a planetary surface landform active perception method based on reinforcement learning.
Background
In consideration of reliability, the on-board computer computing and storing capability of the Mars rover is limited (the dominant frequency of the Hodgkin's CPU is only 200 MHz), so that the rover can only store and upload a small part of observed scientific materials to a ground workstation in Mars working days (sol). With the rapid development of aerospace technology, the size of a patrol instrument for performing a roaming patrol task on the surface of a remote celestial body is also increasing generation by generation, and the curio number (Curiosity) of the fourth generation Mars patrol instrument in the United states is about 3 meters long and up to 900kg, which is 2-5 times that of the Mars patrol instrument in the previous generation. The increase of the volume enables the curiosity numbers to carry more scientific exploration loads, 17 sensors are commonly carried in an actual task, and due to the reliability and safety considerations, when a patroller encounters a complex road condition, field collected materials need to be transmitted back to the ground for landform identification and environment understanding, and a subsequent exploration task is executed by waiting for a subsequent instruction returned by the ground. The flexibility of the patrol instrument task and the ability of acquiring scientific targets are greatly restricted due to the long communication delay between the celestial body and the ground. In recent years, astronauts have been discussing autonomous exploration schemes with high exploration efficiency. Scientists of the national aeronautics and astronautics administration (NASA) of the united states propose to equip active sensing devices on a patrol instrument, such as acquiring rock hardness by touching a wall surface with a smart hand, and autonomously performing operation analysis to improve exploration efficiency; in addition, some scientists also propose to use artificial intelligence methods to perform autonomous analysis of landforms, such as autonomous extraction of scientific interest areas, detection of obstacles, and the like.
Compared with a landform analysis means depending on manual remote control, the landform autonomous perception method has many advantages, and firstly, the higher autonomy of a star catalogue exploration task is given. The Mars patrol instrument can explore more scientific targets in limited working time without waiting for command instructions of ground workers, greatly improves the efficiency of exploration tasks of the patrol instrument, and can obtain scientific return with higher value. Through the online independent perception of landform, the patrol device can screen scientific materials (such as dynamic environments of rocks, cloud layers, sand storms and the like) with higher scientific values for ground workers to study. However, due to the reliability, the on-board computer of the planetary inspection device has limited computation and storage capabilities, and the surface features of the planet are generally single in color and poor in texture, so that some identification methods which have achieved significant results in ground applications may not be suitable for the special environment of planetary feature exploration. At present, no system scheme aiming at the autonomous perception of the planet landform exists.
Disclosure of Invention
In view of the above disadvantages of the prior art, an object of the present invention is to provide an active sensing method for planet surface landform based on reinforcement learning, so as to solve the problem that no system solution for active sensing planet landform exists in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the invention discloses a planetary surface landform active perception method based on reinforcement learning, which comprises the following steps:
step 1): extracting SURF local feature descriptors of the images from a series of planet landform image sets, and cataloguing feature descriptor sets corresponding to landforms one by one according to the landform categories, namely cataloguing SURF feature descriptors belonging to the same type of landforms in a set form;
step 2): checking the feature repetition degree of the SURF feature descriptor set, eliminating feature pairs with high similarity and features with undersized feature scales, reserving the rest SURF feature descriptors, and establishing a feature knowledge base;
and step 3): describing landform perception in the form of observing feature proportion in a feature knowledge base, giving out joint distribution posterior probability, and establishing a corresponding reward function in an enhanced learning frame according to the posterior probability;
step 4): setting a triggering condition for actively sensing the planet landform, and analyzing the local saliency of the image of the satellite-borne camera in real time in the roaming and patrolling process of the patrolling device; when the saliency of the local image meets the trigger condition, executing SURF local feature descriptor extraction, transmitting the extracted SURF local feature descriptor serving as an observed quantity to a reinforcement learning training system, wherein the control quantity in the reinforcement learning system is the camera holder adjustment angle theta c And focal length f of camera c
Step 5): changing a strategy iteration step in reinforcement learning into a finite step mode, and training a satellite-borne camera identification action sequence by combining an reinforcement learning reward function established in the step 3) and the characteristic knowledge base established in the step 1) to finish the active landform identification work;
step 6): and storing the landform sensing result, and continuing the roaming patrol task by the patrol device.
Preferably, the checking of the feature repetition degree and the building of the feature knowledge base in the step 2) are as follows:
21 Using SURF feature descriptors to extract local features in the satellite images of the target patrol area;
22 The extracted 64-dimensional SURF feature descriptors are subjected to repetition degree screening to remove feature pairs with high similarity, wherein the similarity judgment is realized by point multiplication of normalized feature description vectors, and the feature pairs with the descriptor point multiplication product larger than 0.9 are removed;
23 Culling feature descriptors with a feature size of less than 3 pixels;
24 Retaining feature descriptor subsets after two rounds of screening to build a landform knowledge base.
Preferably, the SURF local feature descriptor extraction range performed in the step 3) is a local image area within the saliency detection area.
Preferably, the incentive function in step 3) is designed as follows:
31 Establishing the correlation between the feature observed quantity and the landform feature set, and describing by adopting a Bayes condition posterior probability model:
Figure GDA0002205168790000021
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0002205168790000022
for a feature description subset corresponding to the kth feature in the feature knowledge base, be->
Figure GDA0002205168790000023
For the current observation quantity->
Figure GDA0002205168790000024
And &>
Figure GDA0002205168790000025
Is combined with>
Figure GDA0002205168790000026
Is the prior probability of the correlation of the kth landform with the observed quantity, and is described as:
Figure GDA0002205168790000027
wherein the content of the first and second substances,
Figure GDA0002205168790000031
wherein the content of the first and second substances,
Figure GDA0002205168790000032
uniformly initializing the probability of different types of landforms to 1/K for the observed probability, wherein K is the total number of the landforms in the characteristic knowledge base;
32 After obtaining the correlation posterior probability, the entropy of the discrete fragrance concentration information is normalized to describe the completeness of the posterior probability distribution:
Figure GDA0002205168790000033
wherein N is m (k) Is a feature knowledge base
Figure GDA0002205168790000034
The number of the landforms which are intersected with the SURF feature description subset extracted from the observed quantity; />
Figure GDA0002205168790000035
Observation feature set for time k>
Figure GDA0002205168790000036
Feature set->
Figure GDA0002205168790000037
The intersection of (a); />
Figure GDA0002205168790000038
The likelihood degree of a certain landform in the current observation characteristic set and the landform characteristic knowledge base is described;
33 Based on the posterior probability distribution description established in step 32), a reward function is established:
Figure GDA0002205168790000039
wherein R is k () is a reward function; x is the number of k The state parameters of the camera at the moment k are obtained; a is a k =[θ c (k),f c (k)] T Controlling quantities for camera parameters;
Figure GDA00022051687900000310
For example, the entropy increment of the posterior probability distribution after performing the camera parameter adjustment can be regarded as a measure of the degree of uncertainty reduction of the posterior probability for identifying a certain type of landform; c R > Δ I is a reward constant that is set to the current state quantity x k Or x k+1 When the extreme value is reached (maximum/minimum focal length or rotational angle of the head), a reward function is assigned to terminate the controlled variable, C stop A constant less than ai is assigned and the control step is terminated when the reward gained by control cessation is greater than any control executed.
Preferably, the triggering condition for setting the active sensing of the planet landform in the step 4) is specifically:
41 A single planetary image is subjected to significance analysis by using a spectral residual error method;
42 Record the area of the detected salient outline pixels,
Figure GDA00022051687900000311
wherein s is 1 ~s N Pixel area of 1-N significant regions, based on the sum of the values of the pixels>
Figure GDA00022051687900000312
Is a collection of areas;
43 From)
Figure GDA00022051687900000313
To select the maximum pixel outline area S max1 And the second large pixel outline area S max2 When S is max1 /S max2 And if the current frame is more than 1.5, the landform with the observation value is considered to be covered in the current frame, and the landform active perception is triggered.
Preferably, in the step 5), the strategy iteration step in the reinforcement learning method is modified in a targeted manner as follows:
51 Define a camera parameter control strategy with a corresponding reward function:
Figure GDA0002205168790000041
Figure GDA0002205168790000042
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0002205168790000043
respectively being an action space for zooming the focal length of the camera and an action space for rotating the angle of the camera holder; f. of c + denotes focal length of magnification 1.2 times, f c -represents a reduction of the focal length by a factor of 0.9 (superposition of symbols by a factor of multiple); theta.theta. c + denotes a 5 degree rotation of the camera platform to the right, θ c -represents a 5 degree turn of the platform to the left;
52 Define the evaluation function in a policy iteration as:
Figure GDA0002205168790000044
wherein R (-) corresponds to the reward function R in step 33) k X is the camera state quantity, v π (x) Is an evaluation function; e π [·]An expected yield obtained after executing the camera control strategy pi; h is the total length of the control space, and H is the length of each step; gamma e (0,1) is a time penalty term aiming at weakening the future reward term; p (x) i |x i-1 A) is the state transition probability function, p (x) i |x i-1 A) =0.99, that is, it is considered that there is a camera parameter adjustment failure rate of 1%;
53 In a reinforcement learning framework, the strategy estimation-strategy update iteration needs to be repeated until convergence.
Preferably, the processing capacity of the on-board computer is considered in the policy updating step, and a finite step size iteration policy is set, that is, the following termination criterion is introduced in the iteration process:
the maximum iteration step number is 20 steps, and the reinforcement learning task is terminated when effective convergence is not completed in the 20 steps;
and (5) examining the image significance in real time during iteration, and terminating the reinforcement learning task if the following conditions are met: the Euclidean distance from the centroid of the maximum closed salient region in the current frame image to the image center is less than 40 pixels (1024 × 1024 resolution camera), and the area ratio of the pixels of the maximum salient region to the area of the imaging plane of the camera is 0.25-0.5;
the camera parameters have reached a limit value and no further parameter adjustment can be performed.
The invention has the beneficial effects that:
1. the invention constructs the training knowledge base by introducing the feature description subset, thereby ensuring the query efficiency on the basis of saving the storage space;
2. the invention establishes the reward function by seeking the intersection of the observation feature set and the corresponding landform membership feature set in the feature knowledge base, can effectively avoid the interference of non-interesting landforms on the learning process, and can effectively improve the success rate that the perceived landforms are interesting landforms cataloged in the feature knowledge base.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram illustrating a result of a camera action planning based on reinforcement learning;
FIG. 3 is a diagram illustrating the relationship between the number of training times and the number of steps per iteration;
fig. 4 is a schematic diagram of the final landform active re-observation result.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.
Referring to fig. 1, the planetary surface landform active perception method based on reinforcement learning of the invention comprises the following steps:
step 1): extracting SURF local feature descriptors of the images from a series of planet landform image sets, and cataloging feature descriptor sets corresponding to landforms one by one according to the landform types, namely cataloging SURF feature descriptors belonging to the same type of landforms in a set form.
Step 2): checking the feature repetition degree of the SURF feature descriptor set, eliminating feature pairs with high similarity and features with undersized feature scales, reserving the rest SURF feature descriptors, and establishing a feature knowledge base;
the feature repetition degree check and the feature knowledge base establishment are as follows:
21 Adopting SURF feature descriptors to extract local features in the satellite images of the target patrol area;
22 Repeat screening is carried out on the extracted 64-dimensional SURF feature descriptors, and feature pairs with high similarity are removed, wherein the similarity judgment is realized by point multiplication of normalized feature description vectors, and the feature pairs with the product of point multiplication of the descriptors larger than 0.9 are removed;
23 Culling feature descriptors with a feature size of less than 3 pixels;
24 Retaining feature descriptor subsets after two rounds of screening to build a landform knowledge base.
And step 3): describing landform perception in the form of observing feature proportion in a feature knowledge base, giving out joint distribution posterior probability, and establishing a corresponding reward function in an enhanced learning frame according to the posterior probability;
the SURF local feature descriptor extraction range is executed in the step 3) and is a local image area in the significance detection area.
The reward function is designed as follows:
31 Establishing the correlation between the feature observed quantity and the landform feature set, and describing by adopting a Bayes condition posterior probability model:
Figure GDA0002205168790000051
wherein the content of the first and second substances,
Figure GDA0002205168790000052
for a feature description subset corresponding to the kth feature in the feature knowledge base, be->
Figure GDA0002205168790000053
For the current observation>
Figure GDA0002205168790000054
And/or>
Figure GDA0002205168790000055
Is combined with>
Figure GDA0002205168790000056
Is the prior probability of the correlation between the kth landform and the observed quantity, and is described as follows:
Figure GDA0002205168790000057
wherein the content of the first and second substances,
Figure GDA0002205168790000061
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0002205168790000062
uniformly initializing the probability of different types of landforms to 1/K for the observed probability, wherein K is the total number of the landforms in the characteristic knowledge base;
32 After obtaining the correlation posterior probability, the entropy of the discrete fragrance concentration information is normalized to describe the completeness of the posterior probability distribution:
Figure GDA0002205168790000063
wherein, N m (k) Is a feature knowledge base
Figure GDA0002205168790000064
The number of landforms which have intersection with the SURF feature description subset extracted from the observed quantity is determined; />
Figure GDA0002205168790000065
Observation feature set for time k>
Figure GDA0002205168790000066
Feature set->
Figure GDA0002205168790000067
The intersection of (a); />
Figure GDA0002205168790000068
The system is used for describing the likelihood degree of a certain landform in the current observation characteristic set and the landform characteristic knowledge base; />
33 Based on the posterior probability distribution description established in step 32), a reward function is established:
Figure GDA0002205168790000069
wherein R is k (. H) is a reward function; x is a radical of a fluorine atom k The state parameters of the camera at the moment k are obtained; a is k =[θ c (k),f c (k)] T Controlling the quantity for the camera parameter;
Figure GDA00022051687900000610
for example, the entropy increment of the posterior probability distribution after performing the camera parameter adjustment can be regarded as a measure of the degree of uncertainty reduction of the posterior probability for identifying a certain type of landform; c R > Δ I is the reward constant, which is set for the purpose of the state quantity x k Or x k+1 When an extreme value is reached (maximum/minimum focal length or rotational angle of the head), a reward function is assigned to terminate the control quantity, C stop Is a constant less than Δ I, and is given when the reward gained by the control ceases is greater than any control performed and terminates the control step.
Step 4): setting a triggering condition for actively sensing the planet landform, and analyzing the local saliency of the image of the satellite-borne camera in real time in the roaming and patrolling process of the patrolling device; when the local image significance meets the trigger condition, SURF local feature descriptor extraction is executed, and the extracted SURF office is usedThe part feature descriptors are transmitted to a reinforcement learning training system as observed quantities, and the control quantities are used for adjusting the angle theta of the camera holder in the reinforcement learning system c And focal length f of camera c
The triggering conditions for setting the active sensing of the planet landform are specifically as follows:
41 A single planetary image is subjected to significance analysis by using a spectral residual error method;
42 Record the area of the detected salient outline pixels,
Figure GDA00022051687900000611
wherein s is 1 ~s N Pixel area of 1-N significant regions, based on the area of the pixel in the image frame>
Figure GDA0002205168790000071
Is a collection of areas;
43 From)
Figure GDA0002205168790000072
Medium-selected maximum pixel outline area S max1 And the second large pixel outline area S max2 When S is max1 /S max2 And if the current frame is more than 1.5, the landform with the observation value is considered to be covered in the current frame, and the landform active perception is triggered.
Step 5): changing a strategy iteration step in reinforcement learning into a finite step length mode, and training a satellite-borne camera recognition action sequence by combining a reinforcement learning reward function established in the step 3) and the characteristic knowledge base established in the step 1) to finish the active landform recognition work;
the strategy iteration step in the reinforcement learning method is modified in a targeted way as follows:
51 Define a camera parameter control strategy with a corresponding reward function:
Figure GDA0002205168790000073
Figure GDA0002205168790000074
wherein the content of the first and second substances,
Figure GDA0002205168790000075
respectively an action space for zooming the focal length of the camera and an action space for rotating the angle of the camera holder; f. of c + denotes focal length of magnification 1.2 times, f c -represents a reduction of the focal length by a factor of 0.9 (superposition of symbols by a factor of multiple); theta.theta. c + denotes a 5 degree rotation of the camera platform to the right, θ c -represents a 5 degree turn of the platform to the left;
52 Define the evaluation function in a policy iteration as:
Figure GDA0002205168790000076
wherein R (-) corresponds to the reward function R in step 33) k X is the camera state quantity, v π (x) Is an evaluation function; e π [·]An expected yield obtained after executing the camera control strategy pi; h is the total length of the control space, and H is the length of each step; gamma epsilon (0,1) is a time penalty item aiming at weakening a future reward item; p (x) i |x i-1 A) is the state transition probability function, p (x) i |x i-1 A) =0.99, i.e. there is a 1% failure rate of camera parameter adjustment;
53 In the reinforcement learning framework, the strategy estimation-strategy update iteration needs to be repeated until convergence.
In the strategy updating step, the processing capacity of the on-board computer is considered, and a finite step length iteration strategy is set, namely the following termination criterion is introduced in the iteration process:
the maximum iteration step number is 20 steps, and the reinforcement learning task is terminated when effective convergence is not completed in the 20 steps;
and (5) examining the image significance in real time during iteration, and terminating the reinforcement learning task if the following conditions are met: the Euclidean distance from the centroid of the maximum closed salient region in the current frame image to the center of the image is less than 40 pixels (1024 × 1024 resolution camera), and the area ratio of the pixels of the maximum salient region to the area of the imaging plane of the camera is 0.25-0.5;
the camera parameters have reached a limit value and no further parameter adjustment can be performed.
Step 6): and storing the landform sensing result, and continuing the roaming patrol task by the patrol device.
2-4 are simulation examples of the method of the present invention, which uses Rhino software to generate a three-dimensional planetary landscape, extracts SURF feature points from a rendered planetary image to construct a planetary landscape set, and places planetary vehicles at random positions in the three-dimensional planetary landscape to perform active landscape identification. Fig. 2 is a schematic diagram of a camera motion planning result obtained based on reinforcement learning, wherein S represents a camera parameter at the start time, and G represents a camera parameter after reinforcement learning is finished; FIG. 3 is a schematic diagram showing the relationship between the number of strategy iterations and the number of steps of camera actions planned in each iteration, wherein the reinforcement learning method successfully converges at step 14; fig. 4 is a current frame of the camera at the beginning and at the end of active relief perception. The result of reinforcement learning is to zoom in 1.4 times the camera angle of view and rotate 5 degrees to the left, i.e. to the right
Figure GDA0002205168790000081
Compared with the original landform observation, the landform in the image can be more clearly distinguished after parameter adjustment. In most of the rest simulation groups, the landform observation results adjusted by the active recognition algorithm are improved to different degrees, and meanwhile, the active recognition algorithm is also found to be closely related to the landform set construction quality: if the current landform to be identified is not fully described in the previous-stage constructed landform set (for example, the shooting angle is poor, the distance is too far/too close, etc.), the active landform identification effect is also poor. Therefore, the completeness of the satellite geomorphic set in the early stage work also determines the improvement degree of the online geomorphic perception effect.
While the invention has been described in terms of its preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims (6)

1. A planet surface landform active perception method based on reinforcement learning is characterized by comprising the following steps:
step 1): extracting SURF local feature descriptors of the images from a series of planet landform image sets, and cataloguing feature descriptor sets corresponding to landforms one by one according to the landform categories, namely cataloguing SURF feature descriptors belonging to the same type of landforms in a set form;
step 2): checking the feature repetition degree of the SURF feature descriptor set, eliminating feature pairs with high similarity and features with undersized feature scales, reserving the rest SURF feature descriptors, and establishing a feature knowledge base;
step 3): describing the landform perception in the form of observing feature proportion in a feature knowledge base, giving out a joint distribution posterior probability, and establishing a corresponding reward function in an enhanced learning frame according to the posterior probability;
and step 4): setting a triggering condition for actively sensing the planet landform, and analyzing the local saliency of the image of the satellite-borne camera in real time in the roaming and patrolling process of the patrolling device; when the local image significance meets the trigger condition, SURF local feature descriptor extraction is executed, the extracted SURF local feature descriptor is used as an observed quantity and is transmitted to a reinforcement learning training system, and the control quantity in the reinforcement learning training system is used for adjusting the angle theta of the camera holder c And focal length f of camera c
Step 5): changing a strategy iteration step in reinforcement learning into a finite step length mode, and training a satellite-borne camera recognition action sequence by combining a reinforcement learning reward function established in the step 3) and the characteristic knowledge base established in the step 1) to finish the active landform recognition work;
step 6): storing the landform sensing result, and continuing the roaming patrol task by the patrol device;
the reward function in the step 3) is designed as follows:
31 Establishing the correlation between the feature observed quantity and the landform feature set, and describing by adopting a Bayes condition posterior probability model:
Figure FDA0004029012220000011
wherein the content of the first and second substances,
Figure FDA0004029012220000012
for a feature description subset corresponding to the kth feature in the feature knowledge base, based on the feature description value of the kth feature, for>
Figure FDA0004029012220000013
For the current observed quantity
Figure FDA0004029012220000014
And/or>
Figure FDA0004029012220000015
In conjunction with (a), or (b)>
Figure FDA0004029012220000016
Is the prior probability of the correlation of the kth landform with the observed quantity, and is described as:
Figure FDA0004029012220000017
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004029012220000018
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004029012220000021
uniformly initializing the probability of the observed different types of landforms into 1K, wherein K is the total number of the landforms in the characteristic knowledge base;
32 After obtaining the correlation posterior probability, the entropy of the discrete fragrance concentration information is normalized to describe the completeness of the posterior probability distribution:
Figure FDA0004029012220000022
wherein N is m (k) Is a feature knowledge base
Figure FDA0004029012220000023
The number of landforms which have intersection with the SURF feature description subset extracted from the observed quantity is determined; />
Figure FDA0004029012220000024
Observation feature set for time k>
Figure FDA0004029012220000025
Feature set->
Figure FDA0004029012220000026
The intersection of (a); />
Figure FDA0004029012220000027
The system is used for describing the likelihood degree of a certain landform in the current observation characteristic set and the landform characteristic knowledge base;
33 Based on the posterior probability distribution description established in step 32), a reward function is established:
Figure FDA0004029012220000028
wherein R is k (. H) is a reward function; x is a radical of a fluorine atom k The state parameters of the camera at the moment k are obtained; a is a k =[θ c (k),f c (k)] T Controlling the quantity for the camera parameter;
Figure FDA0004029012220000029
entropy increment of posterior probability distribution after performing camera parameter adjustment; c R > Δ I is a reward constant that,C stop is a constant less than al.
2. The active perception method for planetary surface landforms based on reinforcement learning of claim 1, wherein the feature repetition degree inspection and the feature knowledge base construction in the step 2) are specifically as follows:
21 Adopting SURF feature descriptors to extract local features in the satellite images of the target patrol area;
22 The extracted 64-dimensional SURF feature descriptors are subjected to repetition degree screening to remove feature pairs with high similarity, wherein the similarity judgment is realized by point multiplication of normalized feature description vectors, and the feature pairs with the descriptor point multiplication product larger than 0.9 are removed;
23 Culling feature descriptors with a feature size of less than 3 pixels;
24 Retaining feature descriptor subsets after two rounds of screening to build a landform knowledge base.
3. The active perception method for planetary surface topography based on reinforcement learning according to claim 1, wherein the SURF local feature descriptor extraction range in step 3) is performed as a local image area within a significance detection area.
4. The planetary surface topography active perception method based on reinforcement learning according to claim 1, wherein the triggering conditions for planetary topography active perception set in the step 4) are specifically:
41 A single planetary image is subjected to significance analysis by using a spectral residual error method;
42 Record the area of the detected salient outline pixels,
Figure FDA0004029012220000031
wherein s is 1 ~s N Pixel area of 1-N significant regions, based on the area of the pixel in the image frame>
Figure FDA0004029012220000032
Is a collection of areas;
43 From)
Figure FDA0004029012220000033
To select the maximum pixel outline area S max1 And the second large pixel outline area S max2 When S is max1 /S max2 And if the current frame is more than 1.5, the landform with the observation value is considered to be covered in the current frame, and the landform active perception is triggered.
5. The active perception method for planetary surface landforms based on reinforcement learning of claim 1, wherein the step 5) is to make targeted modification to the strategy iteration step in the reinforcement learning method as follows:
51 Define a camera parameter control strategy with a corresponding reward function:
Figure FDA0004029012220000034
Figure FDA0004029012220000035
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004029012220000036
respectively being an action space for zooming the focal length of the camera and an action space for rotating the angle of the camera holder; f. of c + denotes focal length of magnification 1.2 times, f c -represents a reduction of the focal length by a factor of 0.9; theta c + represents a 5 degree rotation of the camera platform to the right,. Theta. c -represents a 5 degree turn of the platform to the left;
52 Define the evaluation function in a policy iteration as:
Figure FDA0004029012220000037
wherein R (-) corresponds to step 33)Reward function R k (. G), x is the camera state quantity, v π (x) Is an evaluation function; e π [·]Expected gains obtained after executing the camera control strategy pi; h is the total length of the control space, and H is the step length; gamma e (0,1) is a time penalty term aiming at weakening the future reward term; p (x) i |x i-1 A) is the state transition probability function, p (x) i |x i-1 A) =0.99, that is, it is considered that there is a camera parameter adjustment failure rate of 1%;
53 In the reinforcement learning framework, the strategy estimation-strategy update iteration needs to be repeated until convergence.
6. The active perception method for planetary surface landforms based on reinforcement learning of claim 5, wherein a finite step size iteration strategy is set in consideration of the processing capability of an on-board computer in the strategy updating step, namely the following termination criteria are introduced in the iteration process:
the maximum iteration step number is 20 steps, and when effective convergence is not completed in the 20 steps, the reinforcement learning task is terminated;
and (5) examining the image significance in real time during iteration, and terminating the reinforcement learning task if the following conditions are met: the Euclidean distance from the centroid of the maximum closed salient region in the current frame image to the image center is less than 40 pixels, and the area ratio of the pixels of the maximum salient region to the area of the camera imaging plane is 0.25-0.5;
the camera parameters have reached a limit value and no further parameter adjustment can be performed.
CN201910343241.9A 2019-04-26 2019-04-26 Planetary surface landform active perception method based on reinforcement learning Active CN110222697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910343241.9A CN110222697B (en) 2019-04-26 2019-04-26 Planetary surface landform active perception method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910343241.9A CN110222697B (en) 2019-04-26 2019-04-26 Planetary surface landform active perception method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN110222697A CN110222697A (en) 2019-09-10
CN110222697B true CN110222697B (en) 2023-04-18

Family

ID=67820072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910343241.9A Active CN110222697B (en) 2019-04-26 2019-04-26 Planetary surface landform active perception method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN110222697B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103645480B (en) * 2013-12-04 2015-11-18 北京理工大学 Based on the topography and landform character construction method of laser radar and fusing image data
CN107292339B (en) * 2017-06-16 2020-07-21 重庆大学 Unmanned aerial vehicle low-altitude remote sensing image high-resolution landform classification method based on feature fusion
CN108319693A (en) * 2018-02-01 2018-07-24 张文淑 A kind of geomorphic feature clustering method based on three-dimensional Remote Sensing Database

Also Published As

Publication number Publication date
CN110222697A (en) 2019-09-10

Similar Documents

Publication Publication Date Title
Maggio et al. Loc-nerf: Monte carlo localization using neural radiance fields
Chen et al. Parallel planning: A new motion planning framework for autonomous driving
CN110874578B (en) Unmanned aerial vehicle visual angle vehicle recognition tracking method based on reinforcement learning
Scorsoglio et al. Image-based deep reinforcement learning for autonomous lunar landing
CN108921893A (en) A kind of image cloud computing method and system based on online deep learning SLAM
Nubert et al. Self-supervised learning of lidar odometry for robotic applications
CN111240356B (en) Unmanned aerial vehicle cluster convergence method based on deep reinforcement learning
CN114625151B (en) Underwater robot obstacle avoidance path planning method based on reinforcement learning
CN111950873A (en) Satellite real-time guiding task planning method and system based on deep reinforcement learning
Yang et al. Real-time optimal navigation planning using learned motion costs
Scorsoglio et al. Safe Lunar landing via images: A Reinforcement Meta-Learning application to autonomous hazard avoidance and landing
CN113553943B (en) Target real-time detection method and device, storage medium and electronic device
Liu et al. A hierarchical reinforcement learning algorithm based on attention mechanism for uav autonomous navigation
Prasetyo et al. Spatial Based Deep Learning Autonomous Wheel Robot Using CNN
Ozaki et al. DNN-based self-attitude estimation by learning landscape information
Kulkarni et al. Semantically-enhanced deep collision prediction for autonomous navigation using aerial robots
CN110222697B (en) Planetary surface landform active perception method based on reinforcement learning
Goupilleau et al. Active learning for object detection in high-resolution satellite images
Piccinin et al. Deep reinforcement learning approach for small bodies shape reconstruction enhancement
Short et al. Abio-inspiredalgorithminimage-based pathplanning and localization using visual features and maps
Lu et al. Monocular semantic occupancy grid mapping with convolutional variational auto-encoders
Ribeiro et al. 3D monitoring of woody crops using an unmanned ground vehicle
Gao et al. Adaptability preserving domain decomposition for stabilizing sim2real reinforcement learning
Lyu et al. Ttr-based reward for reinforcement learning with implicit model priors
Qin et al. A path planning algorithm based on deep reinforcement learning for mobile robots in unknown environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant