WO2024041392A1 - 图像处理方法、装置、存储介质及设备 - Google Patents

图像处理方法、装置、存储介质及设备 Download PDF

Info

Publication number
WO2024041392A1
WO2024041392A1 PCT/CN2023/112209 CN2023112209W WO2024041392A1 WO 2024041392 A1 WO2024041392 A1 WO 2024041392A1 CN 2023112209 W CN2023112209 W CN 2023112209W WO 2024041392 A1 WO2024041392 A1 WO 2024041392A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
image
observation
distribution
semantic
Prior art date
Application number
PCT/CN2023/112209
Other languages
English (en)
French (fr)
Inventor
荆雅
孔涛
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2024041392A1 publication Critical patent/WO2024041392A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • Embodiments of the present disclosure relate to an image processing method, device, storage medium and equipment.
  • the picture data used for training is not a fixed data set collected from the Internet; Collected by moving in virtual or real three-dimensional (3D) space; (2)
  • the static perception model processes each training sample separately.
  • the robot's perception model is the robot's perception of the same object from different perspectives during its movement in space. Objects are observed; (3) How to effectively learn the exploration strategy and collect training samples is the key to the training task of the robot's perception model.
  • Embodiments of the present disclosure provide an image processing method, device, storage medium, equipment and program product that can measure semantic distribution differences based on a three-dimensional semantic distribution map and learn exploration trajectories based on semantic distribution inconsistency and class distribution uncertainty.
  • the perception model is finally fine-tuned based on annotated difficult sample images, which reduces the annotation cost and improves efficiency. Perceptual accuracy of perceptual models.
  • embodiments of the present disclosure provide an image processing method, which method includes: obtaining observation information collected by a target robot in a target observation space, where the observation information includes observation image, depth image and sensor pose information; according to the observation information, obtain a three-dimensional semantic distribution map; based on the three-dimensional semantic distribution map, based on the conditions of semantic distribution inconsistency and class distribution uncertainty, learn the target robot's Exploration strategy; move the target robot according to the exploration strategy to obtain an exploration trajectory of the target robot, where the exploration trajectory includes target observation images collected during the movement of the target robot within the target observation space; Based on at least one condition of the semantic distribution inconsistency and the class distribution uncertainty, a difficult sample image is obtained from the target observation image corresponding to the exploration trajectory, and the difficult sample image is used to characterize the predicted semantic distribution Images with inconsistent results and/or uncertain predicted class distribution results; adjust the perception model of the target robot based on the difficult sample images.
  • the semantic distribution inconsistency indicates that the target robot obtains inconsistent prediction distribution results when observing the same target object from different perspectives during movement; the class distribution uncertainty indicates that the target robot obtains inconsistent prediction distribution results.
  • the category of the target object is predicted into multiple categories, and the predicted category probabilities obtained by two of the multiple categories are similar and both are greater than the first The case of preset threshold.
  • the observation image includes a first observation image and a second observation image
  • the first target observation image is collected when observing the same target object from different perspectives.
  • the second target observation image is an observation image collected when observing the same target object from the same perspective;
  • the three-dimensional semantic distribution map is based on the conditions of semantic distribution inconsistency and class distribution uncertainty
  • Learning the exploration strategy of the target robot includes: based on the first observation image, obtaining the current prediction results when the target robot observes the same target object from different perspectives during movement, and based on the current prediction results and the Use the three-dimensional semantic distribution map to calculate the first semantic distribution inconsistency reward; obtain the first predicted category probabilities of all target objects in the second observation image, and calculate the first predicted category probabilities based on the first predicted category probabilities of all target objects.
  • Class distribution uncertainty reward learn the exploration strategy of the target robot according to the first semantic distribution inconsistency reward and the first class distribution uncertainty reward.
  • the target observation image corresponding to the exploration trajectory includes a first target observation image and a second target observation image
  • the first target observation image is an observation image collected when observing the same target object from different perspectives
  • the second target observation image is an observation image collected when observing the same target object from the same perspective
  • the semantic distribution is inconsistent based on the According to at least one condition between property and the uncertainty of the class distribution, the difficult sample image is obtained from the target observation image corresponding to the exploration trajectory, including:
  • a second difficult sample image is obtained from the second target observation image corresponding to the exploration trajectory, and the second difficult sample image is used to represent an image in which the predicted class distribution result is uncertain.
  • obtaining the second difficult sample image from the second target observation image corresponding to the exploration trajectory based on the condition of the class distribution uncertainty includes: obtaining the second difficult sample image corresponding to the exploration trajectory.
  • Target observation image calculate the second predicted category probability of all target objects in the second target observation image corresponding to the exploration trajectory; calculate the second category probability based on the second predicted category probability of all target objects in the second target observation image
  • Distribution uncertainty determine the image corresponding to the second type of distribution uncertainty greater than the first preset threshold in the second target observation image corresponding to the exploration trajectory as the second difficult sample image.
  • obtaining the first difficult sample image from the first target observation image corresponding to the exploration trajectory includes: obtaining the first target corresponding to the exploration trajectory Observe the image; according to the first target observation image, obtain the target prediction results when the target robot observes the same target object from different perspectives during movement, and calculate based on the target prediction results and the three-dimensional semantic distribution map Second semantic distribution inconsistency; determine the image in the first target observation image for which the second semantic distribution inconsistency is greater than the second preset threshold as the first difficult sample image.
  • moving the target robot according to the exploration strategy to obtain the exploration trajectory of the target robot includes: according to the exploration strategy and the target observations collected by the target robot at the current time ti information to determine the traveling direction of the target robot at the next time ti+1, where the traveling direction is used to indicate the direction in which the target robot should move at the next time ti+1, and the target observation information includes target observation image, target depth image and target sensor pose information, i ⁇ 0; control the target robot to perform a moving operation based on the traveling direction to obtain the exploration trajectory of the target robot, and each time step on the exploration trajectory target observation image.
  • obtaining a three-dimensional semantic distribution map based on the observation information includes: inputting the observation image into a pre-trained perception model to obtain a semantic category prediction result of the observation image, and the semantic category prediction is The results are used to characterize the predicted probability distribution of each pixel in the observation image among C categories, where C represents the number of predicted target object types; a point cloud corresponding to the target observation space is established based on the depth image.
  • each point in the point cloud corresponds to the corresponding semantic category prediction result; based on the sensor pose information, the point cloud is converted into a three-dimensional space to obtain a voxel representation; based on the exponential moving average formula To aggregate the voxel representations at the same position that change over time to obtain the three-dimensional semantic distribution map.
  • adjusting the perception model of the target robot according to the difficult sample image includes: obtaining the difficult sample image and semantic annotation information of the difficult sample image, wherein the semantic annotation information includes The bounding boxes of all target objects in each difficult sample image, the pixels corresponding to each target object, and the category to which each target object belongs; input the difficult sample image into the pre-trained perception model, Obtain the semantic category prediction result corresponding to the difficult sample image; adjust the parameters of the pre-trained perceptual model based on the semantic category prediction result corresponding to the difficult sample image and the semantic annotation information to obtain the adjusted perceptual model .
  • the method before learning the exploration strategy of the target robot, the method further includes: inputting the three-dimensional semantic distribution map into a global policy network to select a long-term goal, where the long-term goal is one of the three-dimensional semantic distribution maps.
  • the x-y coordinates of The long-term target is sampled with a local step size of a certain number to obtain sampling data, which is used to learn discrete actions of the target robot.
  • obtaining the observation information collected by the target robot in the target observation space includes: obtaining observation images and depth images corresponding to each time step within a preset time period based on the shooting device of the target robot, Wherein, the observation image is a color image, and the depth image is an image using the distance value of each point in the target observation space collected by the shooting device as a pixel value; the preset time period is obtained based on the sensor of the target robot.
  • the sensor pose information corresponding to each time step within the sensor includes at least three degrees of freedom pose information.
  • an image processing device which includes:
  • the first acquisition unit is used to acquire observation information collected by the target robot in the target observation space, where the observation information includes observation images, depth images and sensor pose information;
  • a second acquisition unit configured to acquire a three-dimensional semantic distribution map according to the observation information
  • a learning unit configured to learn the exploration strategy of the target robot based on the three-dimensional semantic distribution map and based on the conditions of semantic distribution inconsistency and class distribution uncertainty;
  • a determination unit configured to move the target robot according to the exploration strategy to obtain an exploration trajectory of the target robot, where the exploration trajectory includes target observations collected while the target robot moves within the target observation space. image;
  • a third acquisition unit configured to acquire a difficult sample image from the target observation image corresponding to the exploration trajectory based on at least one condition of the semantic distribution inconsistency and the class distribution uncertainty, where the difficult sample image Used to characterize images in which the predicted semantic distribution results are inconsistent and/or the predicted class distribution results are uncertain;
  • An adjustment unit configured to adjust the perception model of the target robot according to the difficult sample image.
  • embodiments of the present disclosure provide a computer-readable storage medium that stores a computer program, and the computer program is suitable for loading by a processor to execute the steps described in any of the above embodiments.
  • Image processing methods are provided.
  • inventions of the present disclosure provide a computer device.
  • the computer device includes a processor and a memory.
  • a computer program is stored in the memory.
  • the processor calls the computer program stored in the memory. Used to perform the image processing method described in any of the above embodiments.
  • embodiments of the present disclosure provide a computer program product, which includes a computer program.
  • the computer program is executed by a processor, the image processing method as described in any of the above embodiments is implemented.
  • Figure 1 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure
  • Figure 2 is a schematic diagram of an application scenario of the image processing method provided by an embodiment of the present disclosure
  • Figure 3 is a schematic structural diagram of an image processing device provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
  • Embodiments of the present disclosure provide an image processing method, device, computer-readable storage medium, computer equipment, and computer program product.
  • the image processing method of the embodiment of the present disclosure can be directly applied to a robot, can also be applied to a server, or can be applied to a system including a terminal and a server, and is implemented through the interaction between the terminal and the server.
  • the robot in the embodiment of the present disclosure refers to a robot that needs to move in space. It is necessary to obtain the observation information collected by the target robot in the target observation space.
  • the observation information includes observation images, depth images and sensor pose information, and based on the observation
  • the information is obtained from the three-dimensional semantic distribution map, and then based on the three-dimensional semantic distribution map and based on the conditions of semantic distribution inconsistency and class distribution uncertainty, the exploration trajectory of the target robot is learned, and then based on the conditions of class distribution uncertainty, the corresponding exploration trajectory is learned
  • the difficult sample images are used to characterize images with uncertain predicted class distribution results, and the robot's perception model is adjusted based on the difficult sample images.
  • This embodiment does not limit the specific type and model of the robot.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • Each embodiment of the present disclosure provides an image processing method, which can be executed by a robot or a server, or jointly by a robot and a server; the embodiments of the present disclosure take the image processing method being executed by a robot (computer device) as an example. illustrate.
  • Figure 1 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure
  • Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure. The method is implemented through the following steps 110 to 150, specifically including:
  • Step 110 Obtain observation information collected by the target robot in the target observation space.
  • the observation information includes observation images, depth images, and sensor pose information.
  • the observation information collected while the target robot is moving within the target observation space is obtained.
  • the observation information includes observation images, depth images, and sensor pose information.
  • the target observation space can be an observation space that is relatively close to the real environment of the application of the target robot.
  • the target robot is a mobile robot, depending on the application scenario, if the target robot is a household robot (such as a sweeping robot), the target observation space is an indoor home environment or an indoor office environment.
  • the target robot is a logistics robot (such as a cargo handling robot), the target observation space is a real environment with logistics channels.
  • the observation information collected by the target robot in the target observation space contains an RGB observation image a depth image and a three-degree-of-freedom sensor attitude x t ⁇ R 3 .
  • the three-degree-of-freedom sensor attitude is used to represent the xy coordinates and robot direction.
  • the robot has three discrete actions: moving forward, turning left and turning right. These three discrete actions can correspond to the x-y coordinates plus the robot direction. For example, when the current robot orientation is forward, the coordinates and robot direction after movement can be calculated based on the distance the robot moves one step. For example, if the current robot orientation is When moving forward, the robot direction remains unchanged.
  • the acquisition of observation information collected by the target robot in the target observation space includes:
  • the shooting device based on the target robot obtains the observation image and depth image corresponding to each time step within the preset time period, wherein the observation image is a color image, and the depth image is the image collected by the shooting device.
  • the distance value of each point in the target observation space is an image of pixel value;
  • sensor pose information corresponding to each time step within a preset time period is obtained, and the sensor pose information includes at least three degrees of freedom pose information.
  • the photographing device may be a device mounted on the target robot for acquiring images of the surrounding environment.
  • the photographing device continuously photographs to acquire consecutive frames of images of the surrounding environment.
  • the environment image can contain RGB observation images and depth images.
  • the shooting device can be an RGBD camera.
  • the RGBD camera is a camera based on structured light technology. Generally, there are two cameras. An RGB camera collects RGB observation images, and an IR camera collects infrared images. The infrared images can be used as depth images.
  • three-degree-of-freedom pose information can be obtained based on a three-degree-of-freedom sensor mounted on the target robot.
  • Step 120 Obtain a three-dimensional semantic distribution map based on the observation information.
  • obtaining a three-dimensional semantic distribution map based on the observation information includes:
  • the observation image is input into the pre-trained perception model to obtain the semantic category prediction result of the observation image.
  • the semantic category prediction result is used to characterize the predicted probability distribution of each pixel in the observation image among C categories.
  • the C represents the number of predicted target object types;
  • the voxel representations at the same position changing over time are aggregated based on an exponential moving average formula to obtain the three-dimensional semantic distribution map.
  • Embodiments of the present disclosure can use three-dimensional (3D) semantic distribution maps to fuse semantic predictions of different frames during the movement of the target robot.
  • 3D three-dimensional
  • a pre-trained perceptual model for example: Mask RCNN
  • semantic prediction to predict the semantic category of the observed object in the observation image to obtain the semantic category prediction result of the observation image, where , the semantic category prediction result is the predicted probability distribution between C categories for each pixel in the observed image.
  • the pre-trained perception model can adopt the Mask R-CNN model.
  • Mask R-CNN is an instance segmentation algorithm, which mainly performs segmentation on the basis of target detection.
  • the observation image is input into the pre-trained perception model, and the semantic category prediction result of the observation image includes the bounding box of all target objects in the observation image, the pixels corresponding to each target object (segmentation mask), and the segmentation mask of each target object.
  • the target object can be an object to be observed in the target observation space.
  • all target objects in the observation image can include chairs, sofas, potted plants, beds, bathrooms, and TVs.
  • the pixels (segmentation mask) corresponding to each target object are: Which pixels in the observation image belong to the chair, which pixels in the observation image belong to the sofa, which pixels in the observation image belong to the potted plant, which pixels in the observation image belong to the bed, which pixels in the observation image belong to the bathroom, and which pixels in the observation image belong to the TV.
  • the depth image Dt is used to calculate the point cloud through mathematical transformation, where each point in the point cloud corresponds to the corresponding semantic category prediction result.
  • Depth image also called distance image, it refers to an image in which the distance (depth) value from the image collector (camera device) to each point in the scene (target observation space) is used as a pixel value.
  • Methods for obtaining depth images include: lidar depth imaging method, computer stereo vision imaging, coordinate measuring machine method, moiré fringe method, structured light method, etc.
  • Point cloud When a laser beam strikes the surface of an object, the reflected laser light will carry information such as orientation and distance. If the laser beam is scanned according to a certain trajectory, the reflected laser point information will be recorded while scanning. Because the scanning is extremely fine, a large number of laser points can be obtained, thus forming a laser point cloud.
  • Point cloud formats include *.las; *.pcd; *.txt, etc.
  • Depth images can be calculated into point cloud data after coordinate conversion; point cloud data with rules and necessary information can also be back-calculated into depth images.
  • converting the depth image into a point cloud may be a transformation of the coordinate system: converting the image coordinate system into a world coordinate system.
  • the constraints of the transformation are the camera internal parameters, and the transformation formula is as follows:
  • (x, y, z) is the point cloud coordinate system
  • (x’, y’) is the image coordinate system
  • D is the depth value.
  • the distortion correction function undistort operation can be performed on (x’, y’) to correct the distortion.
  • the process of camera imaging is actually the process of converting points in the world coordinate system to the camera coordinate system, projecting the image coordinate system, and then converting it into a pixel coordinate system. Since the lens precision and craftsmanship will introduce distortion (the so-called distortion means that the straight line in the world coordinate system is transformed into other coordinate systems and is no longer a straight line), resulting in distortion. In order to solve this problem, the camera distortion correction model was introduced.
  • the point cloud is converted into 3D space to obtain the voxel representation.
  • L, W, and H represent length, width, and height respectively.
  • the multiple depth images used to construct the point cloud may be depth images collected by the target robot based on different perspectives. Therefore, the coordinate system of the point cloud may be different from the coordinate system of the target observation space (3D space).
  • the reference coordinate system of the target observation space can be represented by the world coordinate system (Cartesian coordinate system). Therefore, the point cloud obtained through the depth image needs to be converted into the reference coordinate system of the target observation space.
  • the sensor pose information can include position and attitude, corresponding to displacement and rotation.
  • mapping represents the same point between different coordinate systems. coordinate conversion between. Among them, mapping includes translation and rotation. Translation is related to the position of the origin of the rigid body coordinate system, and rotation is related to the attitude of the rigid body coordinate system.
  • the rigid body coordinate system corresponding to the point cloud can be determined based on the differentiable geometric transformation of the sensor pose information, and the rigid body coordinate system can be converted into a world coordinate system to obtain a voxel representation.
  • the 3D semantic distribution map is initialized with all zeros at the beginning.
  • is a hyperparameter used to control the ratio of the currently predicted voxel representation mt to the 3D semantic distribution map Mt-1 obtained in the previous step.
  • can be set to 0.3.
  • Step 130 Learn the exploration strategy of the target robot based on the three-dimensional semantic distribution map and based on the conditions of semantic distribution inconsistency and class distribution uncertainty.
  • 3D semantic distribution maps can be used to calculate semantic distribution inconsistency rewards.
  • the semantic distribution inconsistency means that when the target robot observes the same target object from different perspectives during movement, it obtains inconsistent prediction distribution results
  • the class distribution uncertainty means that when the target robot observes the same target object from the same perspective during movement, it predicts the category of the target object into multiple categories, and two of the multiple categories are The predicted category probabilities obtained by the categories are similar and both are greater than the first preset threshold.
  • the category can be divided into six categories: chair, couch, potted, bed, toilet, and TV.
  • the target object is a TV.
  • the probability of predicting a TV is 0.6
  • the probability of predicting a chair is 0.2.
  • the probability of being a sofa is 0.1
  • the probability of being a potted plant is 0.05
  • the probability of being a bed is 0.02
  • the probability of being a bathroom is 0.03; for example, when observing the target object from the side: the probability of being a TV is 0.2, and the probability of being a bathroom is 0.2.
  • the probability of a chair being predicted is 0.1, the probability of being predicted to be a sofa is 0.5, the probability of being predicted to be a potted plant is 0.15, the probability of being predicted to be a bed is 0.02, and the probability of being predicted to be a bathroom is 0.03; it can be seen from this that when the target object is viewed from the front, the probability of predicting it is 0.15.
  • the target object is a TV
  • the first preset threshold is 0.3.
  • the probability of predicting a TV is 0.4
  • the probability of predicting a chair is 0.35.
  • the probability of predicting a sofa is 0.15
  • the probability of predicting a potted plant is 0.05
  • the probability of predicting a bed is 0.02
  • the probability of predicting a bathroom is 0.03; it can be seen that the probability of predicting a TV (0.4) is the same as the probability of predicting a chair.
  • the probabilities (0.35) are relatively similar, and the probability of predicting a TV (0.4) is greater than the first preset threshold (0.3), and the probability of predicting a chair (0.35) is greater than the first preset threshold (0.3), indicating that there are If the probability of predicting two categories (TV and chair) is relatively large and relatively close, it is determined that there is class distribution uncertainty in the predicted distribution result.
  • the observation image includes a first observation image and a second observation image.
  • the first target observation image is an observation image collected when observing the same target object from different viewing angles.
  • the second target observation image It is an observation image collected when observing the same target object from the same perspective; the exploration strategy of the target robot is learned based on the three-dimensional semantic distribution map and based on the conditions of semantic distribution inconsistency and class distribution uncertainty, including :
  • the current prediction results when the same target object is observed from different perspectives during the movement of the target robot are obtained, and the semantic distribution inconsistency is calculated based on the current prediction results and the three-dimensional semantic distribution map. award;
  • the exploration strategy of the target robot is learned according to the first semantic distribution inconsistency reward and the first type of distribution uncertainty reward.
  • ⁇ (I t , ⁇ ) where ⁇ represents the policy network that needs to be trained, I t represents the observed image, and ⁇ represents the parameters of the policy network.
  • This exploration strategy can be used to determine the exploration trajectory of the target robot. That is, the 3D semantic distribution map is used to learn the exploration strategy of the target robot through semantic distribution inconsistency and class distribution uncertainty in a self-supervised manner.
  • the purpose of maximizing the semantic distribution inconsistency and class distribution uncertainty of the target robot during movement is to hope that the semantic distribution inconsistency and class distribution uncertainty of the images in the exploration trajectory determined based on the learned exploration strategy Sex is relatively high.
  • the first semantic distribution inconsistency reward r can be calculated based on the 3D semantic distribution map and the first observation images corresponding to different viewing angles; and the i-th second observation image of the single frame corresponding to the same viewing angle can be calculated for each target.
  • the first semantic distribution inconsistency reward is defined as the Kullback-Leibler divergence between the current prediction result corresponding to the first observation image and the 3D semantic distribution map
  • KL divergence can be used to measure the degree of difference between two distributions. If the difference between the two is smaller, the KL divergence is smaller, and vice versa; when the distribution of the two is consistent, the KL divergence is 0.
  • the first type of distribution uncertainty reward is used to explore target objects in the second observation image that are predicted to be in multiple categories, and the confidence levels of two of the multiple categories are relatively close.
  • the first type of distribution uncertainty reward u SECmax(Pi)
  • Pi is the first predicted category probability of the i-th target object in the second observation image of a single frame
  • SECmax represents the second maximum value in .
  • u is greater than the first preset threshold ⁇
  • the predicted class distribution result is considered uncertain.
  • the first preset threshold ⁇ may be set to 0.1.
  • the first preset threshold ⁇ may be set to 0.3.
  • the method before learning the exploration strategy of the target robot, the method further includes:
  • the long-term target is sampled based on a preset number of local steps to obtain sampling data, which is used to learn discrete actions of the target robot.
  • the policy network can be divided into two parts. One is called the global policy network, which is used to predict possible xy coordinates; the other is called the local policy network, which uses the fast travel method for path planning to predict the target based on the coordinates. Predicted discrete actions of robots.
  • the 3D semantic distribution map is first input into the global policy network to select the long-term goal, which represents the xy coordinates in the 3D semantic distribution map. Then, the long-term goal is input into the local policy network for path planning, and the predicted discrete actions of the target robot are obtained.
  • the predicted discrete actions include at least one of moving forward, turning left, and turning left.
  • the local policy network is the fast traveling method, using fast
  • the traveling method performs path planning, which uses low-dimensional navigation actions to implement
  • the target robot's predicted discrete action is predicted based on the coordinates of the long-term target, whether it is to move forward, turn left, or turn right.
  • the preset number is 25, and the long-term target is sampled every 25 local steps to shorten the time range of reinforcement learning exploration to obtain sampled data, which is used in the process of training the detection strategy.
  • Input the policy network to learn the discrete actions of the target robot, and then learn to explore the trajectory based on the learned discrete actions of the target robot.
  • the long-term goal (x-y coordinates) predicted by the global policy network into the local policy network (such as the Fast Marching Method network) for path planning, and obtain the predicted discrete action of the target robot (one of forward, left turn, and right turn) , after taking one step, continue to predict the predicted discrete action of the target robot corresponding to the next step (one of forward, turn left, turn right), until the 25th step is taken, then update the global policy network and predict the new long-term goal (x-y coordinates), and continue to use the new long-term goal to re-predict the predicted discrete actions of the target robot corresponding to the 25 local steps of the next round.
  • the local policy network such as the Fast Marching Method network
  • Step 140 Move the target robot according to the exploration strategy to obtain an exploration trajectory of the target robot.
  • the exploration trajectory includes target observation images collected while the target robot moves within the target observation space.
  • moving the target robot according to the exploration strategy to obtain the exploration trajectory of the target robot includes:
  • the traveling direction of the target robot at the next time ti +1 is determined, where the traveling direction is used to indicate the target
  • the direction in which the robot should move at the next time t i+1 the target observation information includes the target observation image, the target depth image and the target sensor pose information, i ⁇ 0;
  • the target robot is controlled to perform a moving operation based on the traveling direction to obtain an exploration trajectory of the target robot and a target observation image at each time step on the exploration trajectory.
  • the learned exploration strategy can guide the robot to move and the exploration trajectory obtained can have more samples with inconsistent semantic distribution and more inconsistent class distributions. Deterministic sample.
  • the target observation information includes the target observation image, the target depth image and the target sensor pose information
  • the policy network can directly output the target The traveling direction of the robot at time t 1 , which indicates which direction the target robot should go at time t 1 .
  • Step 150 Based on at least one condition of the semantic distribution inconsistency and the class distribution uncertainty, obtain a difficult sample image from the target observation image corresponding to the exploration trajectory, and the difficult sample image is used to characterize prediction. Images with inconsistent semantic distribution results and/or uncertain predicted class distribution results.
  • the target observation image corresponding to the exploration trajectory includes a first target observation image and a second target observation image
  • the first target observation image is an observation image collected when observing the same target object from different perspectives
  • the second target observation image is an observation image collected when observing the same target object from the same perspective
  • Obtaining difficult sample images from target observation images corresponding to the exploration trajectory based on at least one condition of the semantic distribution inconsistency and the class distribution uncertainty includes:
  • a second difficult sample image is obtained from the second target observation image corresponding to the exploration trajectory, and the second difficult sample image is used to represent an image in which the predicted class distribution result is uncertain.
  • obtaining the second difficult sample image from the second target observation image corresponding to the exploration trajectory based on the condition of the class distribution uncertainty includes:
  • the image corresponding to the second type of distribution uncertainty greater than the first preset threshold in the second target observation image is determined as the second difficult sample image.
  • the second predicted category probabilities of all target objects in the second target observation image corresponding to the exploration trajectory are calculated, and based on the second predicted category probabilities of all target objects, the second category distribution uncertainty is calculated, and the exploration trajectory corresponds to
  • the image corresponding to the second type of distribution uncertainty in the second target observation image is greater than the first preset threshold is determined as the second difficult sample image.
  • the images sampled from the exploration trajectory are generally the second target observation image of a single frame corresponding to the same viewing angle (single viewing angle), and only the second type of distribution uncertainty corresponding to the same viewing angle can be considered.
  • selecting the second most difficult sample image through the uncertainty of the second class distribution corresponding to the same perspective is more helpful for perceptual model adjustment. By focusing on the class distribution uncertainty predicted from the same view, more difficult sample images can be selected.
  • obtaining the first difficult sample image from the first target observation image corresponding to the exploration trajectory includes:
  • the target prediction results when the same target object is observed from different perspectives during the movement of the target robot, and calculate the second semantics based on the target prediction results and the three-dimensional semantic distribution map Distribution inconsistency;
  • the image in the first target observation image for which the second semantic distribution inconsistency is greater than the second preset threshold is determined as the first difficult sample image.
  • the target robot saves the exploration trajectory within the entire target observation space during movement, it can sample the first target observation images corresponding to different perspectives (multiple perspectives) from the exploration trajectory, and then observe the first target Semantic category prediction is performed on the image to obtain the target prediction results when the same target object is observed from different perspectives during the movement of the target robot, and based on the target prediction results and the three-dimensional semantic distribution map, the second semantic distribution inconsistency is calculated, and based on the second The first difficult sample image with inconsistent semantic distribution results is selected from the second target prediction image.
  • we can Select more difficult sample images.
  • the difficult sample image includes the first difficult sample image and the second difficult sample image
  • more difficult sample images can be selected by paying attention to the uncertainty of the class distribution predicted from the same perspective and the inconsistency of the semantic distribution predicted from different perspectives. sample images, and highlight the importance of difficult sample images.
  • Step 160 Adjust the perception model of the target robot according to the difficult sample image.
  • adjusting the perception model of the target robot according to the difficult sample image includes:
  • the difficult sample image and the semantic annotation information of the difficult sample image wherein the semantic annotation information includes the bounding boxes of all target objects in each of the difficult sample images, the pixels corresponding to each of the target objects, and The category to which each said target object belongs;
  • parameters of the pre-trained perceptual model are adjusted to obtain an adjusted perceptual model.
  • the simplest method is to label all target observation images on the exploration trajectory as sample images.
  • the exploration trajectories learned by the trained exploration strategy can find more objects with semantic distribution inconsistency and class distribution uncertainty, there are still many target observation images that can be accurately recognized by the pre-trained perception model. Therefore, in order to effectively fine-tune the perception model, based on all target observation images obtained from the exploration trajectory, the sample images that can be accurately recognized by the pre-trained perception model can be ignored, and then the perceptual models that cannot be accurately identified by the pre-trained perception model can be screened out Identifying difficult sample images to fine-tune perceptual models.
  • the first difficult sample image with inconsistent predicted semantic distribution results can be selected based on the second semantic distribution inconsistency and/or the second type of distribution uncertainty by calculating the second semantic distribution inconsistency, and/or based on the second semantic distribution inconsistency.
  • Class distribution uncertainty selects the second difficult sample image for which the predicted class distribution result is uncertain, and selects the first difficult sample image for which the selected semantic distribution result is inconsistent and/or the second difficult sample image for which the class distribution result is uncertain Annotation is performed and all hard sample images are used to fine-tune the perceptual model.
  • the semantic annotation information of the difficult sample image is annotated, specifically the bounding boxes of all target objects in each difficult sample image, the pixels corresponding to each target object, and the category to which each target object belongs. Then, all difficult sample images are input into the pre-trained perception model to obtain the semantic category prediction results corresponding to each difficult sample image. Then, based on The semantic category prediction results and semantic annotation information corresponding to each difficult sample image are adjusted to adjust the parameters of the pre-trained perceptual model so that the semantic category prediction results output by the perceptual model for the difficult sample image are closer to the target object in the annotated semantic annotation information.
  • the parameters of the perceptual model are the parameters in Mask RCNN; and are tested based on a randomly collected test sample set, and the corresponding accuracy rate based on the test sample set will no longer increase. Stop training to get the adjusted perception model.
  • the method (Ours) adopted in the embodiment of the present disclosure has achieved the best performance compared with related technologies on the Matterport3D data set.
  • This performance represents the performance of AP50 on object detection (Bbox) and instance segmentation (Segm), characterizing the accuracy of perception.
  • the optimal performance of AP50 is 100%.
  • Table 2 shows that when the exploration strategy is iteratively trained based on the latest fine-tuned perceptual model, it is based on the following target objects: chair (chair), sofa (couch), potted plant (potted), bed (bed), bathroom ( toilet), television (Tv), etc., to perform perceptual prediction performance. According to Table 2, it can be seen that the performance can be further improved by iteratively training the exploration strategy based on the latest fine-tuned perception model.
  • the average performance of AP50 is 34.07%; when the number of iterations n is 2, the average performance of AP50 is 34.71%; when the number of iterations n is 3, the average performance of AP50 is 35.03%.
  • the embodiment of the present disclosure obtains observation information collected by the target robot in the target observation space.
  • the observation information includes observation images, depth images and sensor pose information; according to the observation information, a three-dimensional semantic distribution map is obtained; according to the three-dimensional semantic distribution map, a semantic-based Under the conditions of distribution inconsistency and class distribution uncertainty, learn the exploration strategy of the target robot; move the target robot according to the exploration strategy to obtain the exploration trajectory of the target robot.
  • the exploration trajectory includes information collected during the movement of the target robot within the target observation space.
  • Target observation image based on at least one of the conditions of semantic distribution inconsistency and class distribution uncertainty, obtain difficult sample images from the target observation image corresponding to the exploration trajectory, and the difficult sample image is used to characterize the predicted semantic distribution result inconsistency and/or Or images with uncertain predicted class distribution results; adjust the target robot's perception model based on difficult sample images.
  • Embodiments of the present disclosure learn to explore trajectories through semantic distribution inconsistency and class distribution uncertainty in a self-supervised manner by utilizing a three-dimensional semantic distribution map, and utilize at least one condition of semantic distribution inconsistency and class distribution uncertainty to Collect difficult sample images on the learned exploration trajectory.
  • the perception model After semantically annotating the collected difficult sample images, fine-tune the perception model based on the annotated difficult sample images, measure the semantic distribution difference based on the three-dimensional semantic distribution map, and combine the semantic distribution Inconsistency and class distribution uncertainty are used to learn and explore trajectories, to focus on the uncertainty of class distribution predicted from the same perspective and the inconsistency of semantic distribution predicted from different perspectives, and to highlight the importance of difficult sample images, and finally based on the difficulty of annotation
  • the sample image fine-tunes the perception model, reducing the labeling cost and improving the perception accuracy of the perception model.
  • the embodiment of the present disclosure also provides an image processing device.
  • FIG. 3 is a schematic structural diagram of an image processing device provided by an embodiment of the present disclosure.
  • the image processing device 200 may include:
  • the first acquisition unit 210 is used to acquire observation information collected by the target robot in the target observation space, where the observation information includes observation images, depth images, and sensor pose information;
  • the second acquisition unit 220 is used to acquire a three-dimensional semantic distribution map according to the observation information
  • the learning unit 230 is configured to learn the exploration strategy of the target robot based on the three-dimensional semantic distribution map and based on the conditions of semantic distribution inconsistency and class distribution uncertainty;
  • Determining unit 240 configured to move the target robot according to the exploration strategy to obtain an exploration trajectory of the target robot, where the exploration trajectory includes targets collected during the movement of the target robot within the target observation space. Observation images;
  • the third acquisition unit 250 is configured to acquire a difficult sample image from the target observation image corresponding to the exploration trajectory based on at least one condition of the semantic distribution inconsistency and the class distribution uncertainty, where the difficult sample Images are used to characterize images in which the predicted semantic distribution results are inconsistent and/or the predicted class distribution results are uncertain;
  • the adjustment unit 260 is configured to adjust the perception model of the target robot according to the difficult sample image.
  • the semantic distribution inconsistency indicates that the target robot obtains inconsistent prediction distribution results when observing the same target object from different perspectives during movement; the class distribution uncertainty indicates that the target robot obtains inconsistent prediction distribution results.
  • the category of the target object is predicted into multiple categories, and the predicted category probabilities obtained by two of the multiple categories are similar and both are greater than the first The case of preset threshold.
  • the observation image includes a first observation image and a second observation image.
  • the first target observation image is an observation image collected when observing the same target object from different viewing angles.
  • the second target observation image are observation images collected when observing the same target object from the same perspective;
  • the learning unit 230 is specifically used to: according to the first observation image, obtain the observation of the same target object from different perspectives during the movement of the target robot
  • the current prediction result at the time and based on the current prediction result and the three-dimensional semantic distribution map, calculate the first semantic distribution inconsistency reward; obtain the first prediction category probability of all target objects in the second observation image, and calculate the first prediction category probability based on the current prediction result and the three-dimensional semantic distribution map.
  • the first predicted category probabilities of all target objects are calculated to calculate the first category distribution uncertainty reward; based on the first semantic distribution inconsistency reward and the first category distribution uncertainty reward, learn the target robot's Explore strategies.
  • the target observation image corresponding to the exploration trajectory includes a first target observation image and a second target observation image
  • the first target observation image is an observation image collected when observing the same target object from different perspectives
  • the second target observation image is an observation image collected when observing the same target object from the same perspective
  • the third acquisition unit 250 is specifically used to:
  • the second target corresponding to the exploration trajectory A second difficult sample image is obtained from the observed image, and the second difficult sample image is used to characterize an image in which the predicted class distribution result is uncertain.
  • the third acquisition unit 250 when the third acquisition unit 250 acquires the second difficult sample image from the second target observation image corresponding to the exploration trajectory based on the condition of the class distribution uncertainty, it is specifically used to: Obtain the second target observation image corresponding to the exploration trajectory; calculate the second predicted category probability of all target objects in the second target observation image corresponding to the exploration trajectory; based on the second predicted category probability of all target objects in the second target observation image 2. Predict the class probability and calculate the second class distribution uncertainty; determine the image corresponding to the second class distribution uncertainty in the second target observation image that is greater than the first preset threshold as the second difficult sample image.
  • the third acquisition unit 250 when the third acquisition unit 250 acquires the first difficult sample image from the first target observation image corresponding to the exploration trajectory based on the condition of semantic distribution inconsistency, it is specifically used to: acquire The first target observation image corresponding to the exploration trajectory; according to the first target observation image, obtain the target prediction results when the target robot observes the same target object from different perspectives during movement, and based on the target prediction results Calculate the second semantic distribution inconsistency with the three-dimensional semantic distribution map; determine the image in the first target observation image for which the second semantic distribution inconsistency is greater than the second preset threshold as the first difficult sample image.
  • the determination unit 240 is specifically configured to: determine the travel of the target robot at the next time ti+1 based on the exploration strategy and the target observation information collected by the target robot at the current time ti direction, where the traveling direction is used to indicate the direction in which the target robot should move at the next time ti+1, and the target observation information includes target observation images, target depth images and target sensor pose information, i ⁇ 0 ; Control the target robot to perform a moving operation based on the traveling direction to obtain the exploration trajectory of the target robot and the target observation image at each time step on the exploration trajectory.
  • the second acquisition unit 220 is specifically configured to: input the observation image into a pre-trained perception model to obtain a semantic category prediction result of the observation image, and the semantic category prediction result is used to characterize The predicted probability distribution of each pixel in the observation image among C categories, where C represents the number of categories of predicted target objects; a point cloud corresponding to the target observation space is established based on the depth image, where, Each point in the point cloud corresponds to the corresponding semantic category prediction result; based on the sensor pose information, the point cloud is converted into a three-dimensional space to obtain a voxel representation; and the exponential moving average formula is used to aggregate the following time changes The voxel representation of a position is obtained to obtain the three-dimensional semantic distribution map.
  • the adjustment unit 250 is specifically configured to: obtain the difficult sample image and the semantic annotation information of the difficult sample image, wherein the semantic annotation information includes all the difficult sample images in each difficult sample image.
  • the learning unit 230 before learning the exploration trajectory of the target robot, is also configured to: input the three-dimensional semantic distribution map into the global policy network to select a long-term goal, where the long-term goal is the x-y coordinates in the three-dimensional semantic distribution map; input the long-term target into the local policy network for path planning, and obtain the predicted discrete actions of the target robot; the predicted discrete actions include at least one of forward movement, left turn, and left turn.
  • a method of sampling the long-term target based on a preset number of local steps to obtain sampling data, and the sampling data is used to learn discrete actions of the target robot.
  • the first acquisition unit 210 is specifically configured to acquire observation images and depth images corresponding to each time step within a preset time period based on the shooting device of the target robot, wherein the observation The image is a color image, and the depth image is an image in which the distance value of each point in the target observation space collected by the shooting device is used as a pixel value; each time step in the preset time period is acquired based on the sensor of the target robot.
  • the sensor pose information includes at least three degrees of freedom pose information.
  • Each unit in the above-mentioned image processing device 200 can be implemented in whole or in part by software, hardware, and combinations thereof.
  • Each of the above units may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to each of the above units.
  • the image processing device 200 can be integrated into a terminal or a server that has a memory and a processor installed and has computing capabilities, or the image processing device 200 is the terminal or the server.
  • the present disclosure also provides a computer device, including a memory and a processor.
  • a computer program is stored in the memory.
  • the processor executes the computer program, it implements the steps in the above method embodiments.
  • the present disclosure also provides a computer device, including a memory and a processor.
  • a computer program is stored in the memory.
  • the processor executes the computer program, it implements the steps in the above method embodiments.
  • Figure 4 is a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
  • the computer device may be a terminal.
  • the computer device 300 includes a processor 301 with one or more processing cores, a memory 302 with one or more computer-readable storage media, and a computer program stored on the memory 302 and executable on the processor.
  • the processor 301 is electrically connected to the memory 302.
  • the structure of the computer equipment shown in the figures does not constitute a limitation on the computer equipment, and may include more or fewer components than shown in the figures, or combine certain components, or arrange different components.
  • the processor 301 is the control center of the computer device 300, using various interfaces and lines to connect various parts of the entire computer device 300, by running or loading software programs and/or modules stored in the memory 302, and calling the software programs and/or modules stored in the memory 302. data, perform various functions of the computer device 300 and process data, thereby performing overall processing on the computer device 300.
  • the processor 301 in the computer device 300 will follow the following steps to load instructions corresponding to the processes of one or more application programs into the memory 302, and the processor 301 will run the instructions stored in the memory 302. 302 applications to achieve various functions:
  • the observation information includes observation images, depth images and sensor pose information.
  • a three-dimensional semantic distribution map is obtained.
  • a three-dimensional semantic distribution map is obtained based on Under the conditions of semantic distribution inconsistency and class distribution uncertainty, learn the exploration strategy of the target robot; move the target robot according to the exploration strategy to obtain the exploration trajectory of the target robot, the exploration trajectory includes the Target observation images collected while the target robot is moving within the target observation space; based on at least one condition of the semantic distribution inconsistency and the class distribution uncertainty, from the target observation corresponding to the exploration trajectory
  • Obtain difficult sample images from the image, and the difficult sample images are used to characterize images in which predicted semantic distribution results are inconsistent and/or predicted class distribution results are uncertain; and the perception model of the target robot is adjusted according to the difficult sample images.
  • the computer device 300 further includes: a touch display screen 303 , a radio frequency circuit 304 , an audio circuit 305 , an input unit 306 and a power supply 307 .
  • the processor 301 is electrically connected to the touch display screen 303, the radio frequency circuit 304, the audio circuit 305, the input unit 306 and the power supply 307 respectively.
  • the structure of the computer equipment shown in FIG. 4 does not constitute a limitation on the computer equipment, and may include more or fewer components than shown in the figure, or combine certain components, or arrange different components.
  • the touch display screen 303 can be used to display a graphical user interface and receive operation instructions generated by the user acting on the graphical user interface.
  • the touch display screen 303 may include a display panel and a touch panel.
  • the display panel can be used to display information input by the user or information provided to the user as well as various graphical user interfaces of the computer device. These graphical user interfaces can be composed of graphics, text, icons, videos, and any combination thereof.
  • the display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
  • LCD Liquid Crystal Display
  • OLED Organic Light-Emitting Diode
  • the touch panel can be used to collect the user's touch operations on or near it (such as the user's operations on or near the touch panel using a finger, stylus, or any suitable object or accessory), and generate corresponding operations instruction, and the operation instruction executes the corresponding program.
  • the touch panel may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact point coordinates, and then sends it to the touch controller. to the processor 301, and can receive commands sent by the processor 301 and execute them.
  • the touch panel can cover the display panel.
  • the touch panel When the touch panel detects a touch operation on or near the touch panel, it is sent to the processor 301 to determine the type of the touch event. Then the processor 301 provides information on the display panel according to the type of the touch event. Corresponding visual output.
  • the touch panel and the display panel can be integrated into the touch display 303 to realize input and output functions.
  • the touch panel and the touch panel can be used as two independent components to implement input and output functions. That is, the touch display screen 303 can also be used as a part of the input unit 306 to implement the input function.
  • the radio frequency circuit 304 can be used to send and receive radio frequency signals to establish wireless communication with network equipment or other computer equipment through wireless communication, and to send and receive signals with network equipment or other computer equipment.
  • the audio circuit 305 may be used to provide an audio interface between the user and the computer device through speakers and microphones.
  • the audio circuit 305 can transmit the electrical signal converted from the received audio data to the speaker, which converts it into a sound signal and outputs it; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received and converted by the audio circuit 305 is the audio data, and then the audio After processing by the audio data output processor 301, the audio data is sent to, for example, another computer device via the radio frequency circuit 304, or the audio data is output to the memory 302 for further processing.
  • Audio circuitry 305 may also include an earphone jack to provide communication of peripheral headphones to the computer device.
  • the input unit 306 can be used to receive input numbers, character information or object feature information (such as fingerprints, iris, facial information, etc.), and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control. .
  • character information or object feature information such as fingerprints, iris, facial information, etc.
  • Power supply 307 is used to power various components of computer device 300 .
  • the power supply 307 can be logically connected to the processor 301 through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system.
  • Power supply 307 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators, and other arbitrary components.
  • the computer device 300 may also include a camera, a sensor, a wireless fidelity module, a Bluetooth module, etc., which will not be described again here.
  • the present disclosure also provides a robot, which is equipped with a shooting device and a sensor.
  • the robot also includes a memory and a processor.
  • a computer program is stored in the memory. When the processor executes the computer program, it implements the above method embodiments. step.
  • the present disclosure also provides a computer-readable storage medium for storing a computer program.
  • the computer-readable storage medium can be applied to computer equipment, and the computer program causes the computer equipment to execute corresponding processes in the image processing method in the embodiments of the present disclosure. For the sake of brevity, details will not be described again here.
  • the present disclosure also provides a computer program product, which includes a computer program stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, causing the computer device to execute the corresponding process in the image processing method in the embodiment of the present disclosure.
  • details will not be repeated here. .
  • the present disclosure also provides a computer program, the computer program includes a computer program, and the computer program is stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, causing the computer device to execute the corresponding process in the image processing method in the embodiment of the present disclosure. For the sake of brevity, details will not be repeated here. .
  • the processor in the embodiment of the present disclosure may be an integrated circuit chip with signal processing capabilities.
  • each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software.
  • the above-mentioned processor can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other available processors.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the steps of the method disclosed in conjunction with the embodiments of the present disclosure can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • non-volatile memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory.
  • Erase programmable read-only memory Electrodeically EPROM, EEPROM
  • Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • enhanced SDRAM ESDRAM
  • Synchlink DRAM SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in the embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present disclosure essentially or contributes or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes a number of instructions to A computer device (which may be a personal computer or a server) is caused to execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本公开的实施例提供了一种图像处理方法、装置、存储介质及设备,该方法包括:获取目标机器人在目标观察空间内采集的观测信息,观测信息包括观测图像、深度图像和传感器位姿信息;根据观测信息获取三维语义分布图;根据三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习目标机器人的探索策略,并根据探索策略确定目标机器人的探索轨迹;基于语义分布不一致性与类分布不确定性中的至少一种条件,从探索轨迹对应的目标观测图像中获取难样本图像;根据难样本图像调整目标机器人的感知模型,降低了标注成本,提升感知模型的感知准确性。

Description

图像处理方法、装置、存储介质及设备
本申请要求于2022年8月23日递交的中国专利申请第202211014186.7号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及一种图像处理方法、装置、存储介质及设备。
背景技术
随着感知模型在机器人领域的广泛应用,如何有效地将感知模型推广到真实的三维环境的研究,已成为重要的研究课题。其中,训练机器人的感知模型与传统计算机视觉中基于从互联网上收集的图片训练静态感知模型存在以下不同点:(1)用于训练的图片数据不是从互联网上收集的固定数据集,而是需要在虚拟或真实的三维(3D)空间中移动来进行收集的;(2)静态感知模型是将每一个训练样本单独处理,机器人的感知模型是机器人在空间的移动过程中从不同的视角对同一个物体进行观测;(3)如何有效地学习探索策略和收集训练样本的方法,是机器人的感知模型的训练任务的关键之处。
发明内容
本公开实施例提供一种图像处理方法、装置、存储介质、设备及程序产品,可以基于三维语义分布图来衡量语义分布差异,并结合语义分布不一致性和类分布不确定性来学习探索轨迹,以关注同一个视角预测的类分布不确定性以及关注不同视角预测的语义分布不一致性,并突出难样本图像的重要性,最终基于标注的难样本图像微调感知模型,降低了标注成本,提升了感知模型的感知准确性。
一方面,本公开实施例提供一种图像处理方法,所述方法包括:获取目标机器人在目标观察空间内采集的观测信息,所述观测信息包括观测 图像、深度图像和传感器位姿信息;根据所述观测信息,获取三维语义分布图;根据所述三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习所述目标机器人的探索策略;根据所述探索策略移动所述目标机器人,以得到所述目标机器人的探索轨迹,所述探索轨迹包括所述目标机器人在所述目标观察空间内移动的过程中采集的目标观测图像;基于所述语义分布不一致性与所述类分布不确定性中的至少一种条件,从所述探索轨迹对应的目标观测图像中获取难样本图像,所述难样本图像用于表征预测的语义分布结果不一致和/或预测的类分布结果不确定的图像;根据所述难样本图像调整所述目标机器人的感知模型。
在一些实施例中,所述语义分布不一致性表示所述目标机器人在移动过程中从不同视角观测同一个目标对象时,得到不一致的预测分布结果;所述类分布不确定性表示所述目标机器人在移动过程中从同一个视角观测同一个目标对象时,将所述目标对象的类别预测为多个类别、且所述多个类别中有两个类别得到的预测类别概率相近且均大于第一预设阈值的情况。
在一些实施例中,所述根据所述三维语义分布图,所述观测图像包括第一观测图像和第二观测图像,所述第一目标观测图像为从不同视角观测同一个目标对象时采集的观测图像,所述第二目标观测图像为从同一个视角观测同一个目标对象时采集的观测图像;所述根据所述三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习所述目标机器人的探索策略,包括:根据所述第一观测图像,获取所述目标机器人移动过程中从不同视角观测同一个目标对象时的当前预测结果,并基于所述当前预测结果与所述三维语义分布图,计算第一语义分布不一致性奖励;获取所述第二观测图像中所有目标对象的第一预测类别概率,并基于所述所有目标对象的第一预测类别概率,计算第一类分布不确定性奖励;根据所述第一语义分布不一致性奖励与所述第一类分布不确定性奖励,学习所述目标机器人的探索策略。
在一些实施例中,所述探索轨迹对应的目标观测图像包括第一目标观测图像和第二目标观测图像,所述第一目标观测图像为从不同视角观测同一个目标对象时采集的观测图像,所述第二目标观测图像为从同一个视角观测同一个目标对象时采集的观测图像;所述基于所述语义分布不一致 性与所述类分布不确定性中的至少一种条件,从所述探索轨迹对应的目标观测图像中获取难样本图像,包括:
基于所述语义分布不一致性的条件,从所述探索轨迹对应的第一目标观测图像中获取第一难样本图像,所述第一难样本图像用于表征预测的语义分布结果不一致的图像;和/或
基于所述类分布不确定性的条件,从所述探索轨迹对应的第二目标观测图像中获取第二难样本图像,所述第二难样本图像用于表征预测的类分布结果不确定的图像。
在一些实施例中,所述基于所述类分布不确定性的条件,从所述探索轨迹对应的第二目标观测图像中获取第二难样本图像,包括:获取所述探索轨迹对应的第二目标观测图像;计算所述探索轨迹对应的第二目标观测图像中所有目标对象的第二预测类别概率;基于所述第二目标观测图像中所有目标对象的第二预测类别概率,计算第二类分布不确定性;将所述探索轨迹对应的第二目标观测图像中所述第二类分布不确定性大于第一预设阈值对应的图像确定为第二难样本图像。
在一些实施例中,所述基于所述语义分布不一致性的条件,从所述探索轨迹对应的第一目标观测图像中获取第一难样本图像,包括:获取所述探索轨迹对应的第一目标观测图像;根据所述第一目标观测图像,获取所述目标机器人移动过程中从不同视角观测同一个目标对象时的目标预测结果,并基于所述目标预测结果与所述三维语义分布图,计算第二语义分布不一致性;将所述第一目标观测图像中所述第二语义分布不一致性大于第二预设阈值对于的图像确定为第一难样本图像。
在一些实施例中,所述根据所述探索策略移动所述目标机器人,以得到所述目标机器人的探索轨迹,包括:根据所述探索策略以及所述目标机器人在当前时刻ti采集到的目标观测信息,确定所述目标机器人在下一时刻ti+1的行进方向,其中,所述行进方向用于指示所述目标机器人在下一时刻ti+1的应该移动的方向,所述目标观测信息包括目标观测图像、目标深度图像和目标传感器位姿信息,i≥0;控制所述目标机器人基于所述行进方向执行移动操作,以得到所述目标机器人的探索轨迹,以及所述探索轨迹上每个时间步的目标观测图像。
在一些实施例中,所述根据所述观测信息,获取三维语义分布图,包括:将所述观测图像输入预训练的感知模型,得到所述观测图像的语义类别预测结果,所述语义类别预测结果用于表征所述观测图像中每个像素在C个类别之间的预测概率分布,所述C表示预测的目标对象的种类数;基于所述深度图像建立所述目标观察空间对应的点云,其中,所述点云中的每个点对应相应的所述语义类别预测结果;基于所述传感器位姿信息,将所述点云转换到三维空间来获取体素表示;基于指数移动平均公式来聚合随着时间变化的同一位置的所述体素表示,以得到所述三维语义分布图。
在一些实施例中,所述根据所述难样本图像调整所述目标机器人的感知模型,包括:获取所述难样本图像以及所述难样本图像的语义标注信息,其中,所述语义标注信息包括每一所述难样本图像中所有目标对象的边界框、每个所述目标对象对应的像素以及每个所述目标对象所属的类别;将所述难样本图像输入所述预训练的感知模型,得到所述难样本图像对应的语义类别预测结果;基于所述难样本图像对应的语义类别预测结果与所述语义标注信息,调整所述预训练的感知模型的参数,以得到调整后的感知模型。
在一些实施例中,在所述学习所述目标机器人的探索策略之前,还包括:将所述三维语义分布图输入全局策略网络中选择长期目标,所述长期目标为所述三维语义分布图中的x-y坐标;将所述长期目标输入局部策略网络进行路径规划,得到所述目标机器人的预测离散动作,所述预测离散动作包括前移、左转和左转中的至少一种;基于预设个数的局部步长对所述长期目标进行采样,得到采样数据,所述采样数据用于学习所述目标机器人的离散动作。
在一些实施例中,所述获取目标机器人在目标观察空间内采集的观测信息,包括:基于所述目标机器人的拍摄装置获取预设时间段内每个时间步长对应的观测图像和深度图像,其中,所述观测图像为彩色图像,所述深度图像为将所述拍摄装置采集到的目标观察空间中各点的距离值作为像素值的图像;基于所述目标机器人的传感器获取预设时间段内每个时间步长对应的传感器位姿信息,所述传感器位姿信息至少包括三自由度的位姿信息。
另一方面,本公开实施例提供一种图像处理装置,所述装置包括:
第一获取单元,用于获取目标机器人在目标观察空间内采集的观测信息,所述观测信息包括观测图像、深度图像和传感器位姿信息;
第二获取单元,用于根据所述观测信息,获取三维语义分布图;
学习单元,用于根据所述三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习所述目标机器人的探索策略;
确定单元,用于根据所述探索策略移动所述目标机器人,以得到所述目标机器人的探索轨迹,所述探索轨迹包括所述目标机器人在所述目标观察空间内移动的过程中采集的目标观测图像;
第三获取单元,用于基于所述语义分布不一致性与所述类分布不确定性中的至少一种条件,从所述探索轨迹对应的目标观测图像中获取难样本图像,所述难样本图像用于表征预测的语义分布结果不一致和/或预测的类分布结果不确定的图像;
调整单元,用于根据所述难样本图像调整所述目标机器人的感知模型。
另一方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序适于处理器进行加载,以执行如上任一实施例所述的图像处理方法。
另一方面,本公开实施例提供一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行如上任一实施例所述的图像处理方法。
另一方面,本公开实施例提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上任一实施例所述的图像处理方法。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域技术人员来讲,在不付出创造性劳 动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的图像处理方法的流程示意图;
图2为本公开实施例提供的图像处理方法的应用场景示意图;
图3为本公开实施例提供的图像处理装置的结构示意图;以及
图4为本公开实施例提供的计算机设备的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开实施例提供一种图像处理方法、装置、计算机可读存储介质、计算机设备及计算机程序产品。具体地,本公开实施例的图像处理方法,可以直接应用于机器人中,也可以应用于服务器中,还可以应用于包括终端和服务器的系统,并通过终端和服务器的交互实现。其中,本公开实施例中的机器人指的是需要进行空间移动的机器人,需要获取目标机器人在目标观察空间内采集的观测信息,观测信息包括观测图像、深度图像和传感器位姿信息,并根据观测信息获取三维语义分布图,然后根据三维语义分布图,并基于语义分布不一致性和类分布不确定性的条件,学习目标机器人的探索轨迹,然后基于类分布不确定性的条件,从探索轨迹对应的目标观测图像中获取难样本图像,该难样本图像用于表征预测的类分布结果不确定的图像,并根据难样本图像调整机器人的感知模型。本实施例对机器人的具体类型以及型号不做限定。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
以下分别进行详细说明。需说明的是,以下实施例的描述顺序不作为对实施例优先顺序的限定。
本公开各实施例提供了一种图像处理方法,该方法可以由机器人或服务器执行,也可以由机器人和服务器共同执行;本公开实施例以图像处理方法由机器人(计算机设备)执行为例来进行说明。
请参阅图1至图2,图1为本公开实施例提供的图像处理方法的流程示意图,图2为本公开实施例提供的应用场景示意图。该方法通过如下步骤110至步骤150实现,具体包括:
步骤110,获取目标机器人在目标观察空间内采集的观测信息,所述观测信息包括观测图像、深度图像和传感器位姿信息。
具体的,获取目标机器人在目标观察空间内移动过程中采集的观测信息,所述观测信息包括观测图像、深度图像和传感器位姿信息。
例如,目标观察空间,可以为与目标机器人的应用的真实环境比较接近的观察空间。比如,目标机器人为移动机器人,按照应用场景区别,若目标机器人为家用机器人(如扫地机器人),则目标观察空间为室内家居环境或室内办公环境。比如,若目标机器人为物流机器人(如货物搬运机器人),则目标观察空间为具有物流通道的真实环境。
例如,在每个时间步长中,目标机器人在目标观察空间内采集的观测信息包含一个RGB观测图像一个深度图像和一个三自由度传感器姿态xt∈R3。其中,三自由度传感器姿态用于表示x-y坐标和机器人方向。
其中,机器人具有三个离散动作:前移、左转和右转。这三个离散动作可以与x-y坐标加上机器人方向相对应,比如在当前机器人朝向为前移时,根据机器人移动一步的距离,可以算出移动后的坐标和机器人方向,例如,若当前机器人朝向为前移时,则机器人方向不变。
在一些实施例中,所述获取目标机器人在目标观察空间内采集的观测信息,包括:
基于所述目标机器人的拍摄装置获取预设时间段内每个时间步长对应的观测图像和深度图像,其中,所述观测图像为彩色图像,所述深度图像为将所述拍摄装置采集到的目标观察空间中各点的距离值作为像素值的图像;
基于所述目标机器人的传感器获取预设时间段内每个时间步长对应的传感器位姿信息,所述传感器位姿信息至少包括三自由度的位姿信息。
例如,该拍摄装置可以为搭载于目标机器人上的用于获取周围环境图像的装置,拍摄装置连续拍摄以获取连续帧的周围环境图像,该周围环 境图像可以包含RGB观测图像和深度图像。例如,该拍摄装置可以是RGBD相机,RGBD相机是基于结构光技术的相机,一般会有两个摄像头,一个RGB摄像头采集RGB观测图像,一个IR摄像头采集红外图像,该红外图像可以作为深度图像。
例如,可以基于搭载于目标机器人上的三自由度传感器获取三自由度的位姿信息。
步骤120,根据所述观测信息,获取三维语义分布图。
在一些实施例中,所述根据所述观测信息,获取三维语义分布图,包括:
将所述观测图像输入预训练的感知模型,得到所述观测图像的语义类别预测结果,所述语义类别预测结果用于表征所述观测图像中每个像素在C个类别之间的预测概率分布,所述C表示预测的目标对象的种类数;
基于所述深度图像建立所述目标观察空间对应的点云,其中,所述点云中的每个点对应相应的所述语义类别预测结果;
基于所述传感器位姿信息,将所述点云转换到三维空间来获取体素表示;
基于指数移动平均公式来聚合随着时间变化的同一位置的所述体素表示,以得到所述三维语义分布图。
本公开实施例可以使用三维(3D)语义分布图去融合目标机器人移动过程中不同帧的语义预测。如图2所示,图2左侧显示了一个时间步长的语义映射过程。
首先,对于观测到的观测图像It,使用预训练的感知模型(例如:Mask RCNN)进行语义预测来预测出观测图像中观测到的对象的语义类别,以得到观测图像的语义类别预测结果,其中,语义类别预测结果是观测图像中每个像素在C个类别之间的预测概率分布。
例如,该预训练的感知模型可以采用Mask R-CNN模型。Mask R-CNN是一个实例分割(Instance segmentation)算法,主要是在目标检测的基础上再进行分割。例如,将观测图像输入预训练的感知模型,得到的观测图像的语义类别预测结果包括观测图像中所有目标对象的边界框(bounding box)、每个目标对象对应的像素(segmentation mask)以及每 个目标对象所属的类别。比如目标对象可以为目标观察空间中待观察的物体,比如观测图像中所有目标对象可以包括椅子、沙发、盆栽、床、卫生间和电视,则每个目标对象对应的像素(segmentation mask)即为:观测图像中哪些像素属于椅子、观测图像中哪些像素属于沙发、观测图像中哪些像素属于盆栽、观测图像中哪些像素属于床、观测图像中哪些像素属于卫生间、观测图像中哪些像素属于电视。
然后,使用深度图像Dt通过数学变换计算点云,其中,点云中的每个点都对应相应的语义类别预测结果。
深度图像:也叫距离影像,是指将从图像采集器(摄像装置)到场景(目标观察空间)中各点的距离(深度)值作为像素值的图像。获取深度图像的方法有:激光雷达深度成像法、计算机立体视觉成像、坐标测量机法、莫尔条纹法、结构光法等。
点云:当一束激光照射到物体表面时,所反射的激光会携带方位、距离等信息。若将激光束按照某种轨迹进行扫描,便会边扫描边记录到反射的激光点信息,由于扫描极为精细,则能够得到大量的激光点,因而就可形成激光点云。点云格式有*.las;*.pcd;*.txt等。
深度图像经过坐标转换可以计算为点云数据;有规则及必要信息的点云数据也可以反算为深度图像。
在一些实施例中,深度图像转为点云,可以是坐标系的变换:将图像坐标系转换为世界坐标系。变换的约束条件就是相机内参,变换公式如下:
其中(x,y,z)是点云坐标系,(x’,y’)是图像坐标系,D为深度值。
其中,相机内参为一般是4个:fx、fy、u0、v0;其中,fx=F/dx,fy=F/dy;其中,F表示焦距的长度;dx和dy表示:x方向和y方向的一个像素分别占多少长度单位,即一个像素代表的实际物理值的大小,dx和dy是实现图像物理坐标系与像素坐标系转换的关键;u0和v0表示: 图像的中心像素坐标和图像原点像素坐标之间相差的横向和纵向像素数。理论值应该是图像宽度、高度的一半,越好的摄像头,u0和v0越接近于分辨率的一半。
例如,在进行上述转换之前可以对(x’,y’)进行畸变矫正函数undistort运算,以进行畸变的矫正。相机成像的过程实际就是将世界坐标系的点,转换到相机坐标系,投影得到图像坐标系,进而转化为像素坐标系的过程。而由于透镜精度和工艺会引入畸变(所谓畸变,就是指在世界坐标系中的直线转化到其他坐标系不在是直线),从而导致失真,为了解决这个问题,从而引入了相机畸变校正模型。
然后,基于传感器位姿信息的可微几何变换,将点云转换到3D空间来获得体素表示其中,L、W、H分别表示长、宽、高。
例如,用于构建点云的多个深度图像,可能是目标机器人基于不同的视角采集到的深度图像,因此,该点云的坐标系可能跟目标观察空间(3D空间)的坐标系是不一样的,比如该目标观察空间的参考坐标系可以用世界坐标系(笛卡尔坐标系)表示。因此,需要将通过深度图像得到的点云转换到目标观察空间的参考坐标系中。其中,传感器位姿信息可以包括位置与姿态,对应位移与旋转。通常有两个坐标系,一个是用于参考的世界坐标系(笛卡尔坐标系),一个是以刚体(比如机器人)质心为原点的刚体坐标系,映射表示的是同一点在不同坐标系之间的坐标转换。其中,映射包括平移和旋转,平移与刚体坐标系原点的位置相关,旋转与刚体坐标系的姿态相关。在本公开实施例中,可以基于传感器位姿信息的可微几何变换来确定点云对应的刚体坐标系,并将刚体坐标系转换为世界坐标系,以得到体素表示。
随后,使用指数移动平均公式来聚合随着时间变化同一位置的体素表示来得到3D语义分布图Mt
Mt=Mt-1,t=1;
Mt=λ*Mt-1+(1-λ)*mt,t>1;
其中,3D语义分布图在开始时用全零初始化。λ是一个超参数,用于控制当前预测的体素表示mt与上一步得到的3D语义分布图Mt-1的比 例,比如λ可以设为0.3。
步骤130,根据所述三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习所述目标机器人的探索策略。
例如,3D语义分布图可以用来计算语义分布不一致性奖励。
在一些实施例中,所述语义分布不一致性表示所述目标机器人在移动过程中从不同视角观测同一个目标对象时,得到不一致的预测分布结果;
所述类分布不确定性表示所述目标机器人在移动过程中从同一个视角观测同一个目标对象时,将所述目标对象的类别预测为多个类别、且所述多个类别中有两个类别得到的预测类别概率相近且均大于第一预设阈值的情况。
例如,类别可以划分为椅子(chair)、沙发(couch)、盆栽(potted)、床(bed)、卫生间(toilet)、电视(TV)这6个类别。
例如,目标对象以电视为例,对于语义分布不一致性,从不同视角观察该目标对象时,比如从正面观测该目标对象时:预测为电视的概率是0.6,预测为椅子的概率是0.2,预测为沙发的概率是0.1,预测为盆栽的概率是0.05,预测为床的概率是0.02,预测为卫生间的概率是0.03;比如从侧面观测该目标对象时:预测为电视的概率是0.2,预测为椅子的概率是0.1,预测为沙发的概率是0.5,预测为盆栽的概率是0.15,预测为床的概率是0.02,预测为卫生间的概率是0.03;由此可知,从正面观测该目标对象时预测为电视的概率与从侧面观测该目标对象时预测为电视的概率存在语义分布不一致性。
例如,目标对象以电视为例,第一预设阈值为0.3,对于类分布不确定性,从同一个视角观察同该目标对象时:预测为电视的概率是0.4,预测为椅子的概率是0.35,预测为沙发的概率是0.15,预测为盆栽的概率是0.05,预测为床的概率是0.02,预测为卫生间的概率是0.03;由此可知,预测为电视的概率(0.4)与预测为椅子的概率(0.35)比较相近,且预测为电视的概率(0.4)大于第一预设阈值(0.3),预测为椅子的概率(0.35)大于第一预设阈值(0.3),说明预测分布结果中有预测为两个类别(电视和椅子)的概率较大且比较接近,则确定预测分布结果中存在类分布不确定性。
在一些实施例中,所述观测图像包括第一观测图像和第二观测图像,所述第一目标观测图像为从不同视角观测同一个目标对象时采集的观测图像,所述第二目标观测图像为从同一个视角观测同一个目标对象时采集的观测图像;所述根据所述三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习所述目标机器人的探索策略,包括:
根据所述第一观测图像,获取所述目标机器人移动过程中从不同视角观测同一个目标对象时的当前预测结果,并基于所述当前预测结果与所述三维语义分布图,计算语义分布不一致性奖励;
获取所述第二观测图像中所有目标对象的第一预测类别概率,并基于所述所有目标对象的第一预测类别概率,计算第一类分布不确定性奖励;
根据所述第一语义分布不一致性奖励与所述第一类分布不确定性奖励,学习所述目标机器人的探索策略。
如图2所示,本公开实施例提出了两种新的基于分布的奖励方式,通过最大化该目标机器人在移动过程中的语义分布不一致性和类分布不确定性来训练探索策略at=π(It,θ),其中,π表示需要训练的策略网络,It表示观测图像,θ表示策略网络的参数。该探索策略可以用于确定目标机器人的探索轨迹。即利用3D语义分布图以自监督的方式通过语义分布不一致和类分布不确定性来学习目标机器人的探索策略。其中,最大化该目标机器人在移动过程中的语义分布不一致性和类分布不确定性的目的为,希望基于学习到的探索策略确定的探索轨迹中的图像的语义分布不一致性和类分布不确定性是比较高的。
其中,可以基于3D语义分布图与不同视角对应的第一观测图像计算第一语义分布不一致性奖励r;且可以基于同一个视角对应的单帧的第二观测图像中第i个针对每一目标对象的第一预测类别概率计算第一类分布不确定性奖励u;然后,基于第一语义分布不一致性奖励r与第一类分布不确定性奖励u的和,得到目标奖励reward,即reward=r+u;并使用强化学习PPO算法来训练探索策略,at=π(It,θ),其中,θ<-PPO[reward,π(θ)],PPO表示强化学习PPO算法。
其中,第一语义分布不一致性奖励,被定义为第一观测图像对应的当前预测结果与3D语义分布图之间的Kullback-Leibler散度,第一语义分 布不一致性奖励鼓励目标机器人不仅探索新的目标对象而且探索跨视角具有不同预测分布结果的对象:r=KL(mt,Mt-1),其中,r表示第一语义分布不一致性奖励,mt表示第一观测图像对应的当前预测的体素表示,Mt-1表示上一步得到的3D语义分布图。
其中,KL散度可以用来衡量两个分布之间的差异程度。若两者差异越小,KL散度越小,反之亦反;当两者分布一致时,其KL散度为0。
其中,第一类分布不确定性奖励,用于探索第二观测图像中预测为多个类别、且多个类别中有两个类别的置信度比较接近的目标对象。其中,第一类分布不确定性奖励u=SECmax(Pi),Pi是单帧的第二观测图像中第i个目标对象的第一预测类别概率,SECmax表示中的第二个最大值。若u大于第一预设阈值δ,则认为预测的类分布结果是不确定的。例如,该第一预设阈值δ可以设置为0.1。或者,该第一预设阈值δ可以设置为0.3。
在一些实施例中,在所述学习所述目标机器人的探索策略之前,还包括:
将所述三维语义分布图输入全局策略网络中选择长期目标,所述长期目标为所述三维语义分布图中的x-y坐标;
将所述长期目标输入局部策略网络进行路径规划,得到所述目标机器人的预测离散动作,所述预测离散动作包括前移、左转和左转中的至少一种;
基于预设个数的局部步长对所述长期目标进行采样,得到采样数据,所述采样数据用于学习所述目标机器人的离散动作。
例如,策略网络可以分为两个部分,一个称之为全局策略网络,用于预测可能的x-y坐标;另一个称之为局部策略网络,就是使用快速行进方法进行路径规划,以根据坐标预测目标机器人的预测离散动作。为了训练探索策略,首先将3D语义分布图输入到全局策略网络中来选择长期目标,该长期目标表示3D语义分布图中的x-y坐标。然后,将长期目标输入局部策略网络进路径规划,得到目标机器人的预测离散动作,预测离散动作包括前移、左转和左转中的至少一种,该局部策略网络就是快速行进方法,使用快速行进方法进行路径规划,该方法使用低维度导航动作来实 现目标,根据长期目标的坐标来预测目标机器人的预测离散动作是前移、左转、还是右转。其中,例如,预设个数为25个,以每25个局部步长对长期目标进行采样来缩短强化学习探索的时间范围,以得到采样数据,该采样数据用于在训练探测策略的过程中输入策略网络,来学习目标机器人的离散动作,进而根据学习到的目标机器人的离散动作学习探索轨迹。
例如,将全局策略网络预测的长期目标(x-y坐标),输入局部策略网络(比如Fast Marching Method网络)进行路径规划,得到目标机器人的预测离散动作(前进、左转、右转中的一种),走完一步后,继续预测下一步对应的目标机器人的预测离散动作(前进、左转、右转中的一种),一直到走第25步,再对全局策略网络进行更新,并预测新的长期目标(x-y坐标),并继续用新的长期目标重新预测下一轮的25个局部步长对应的目标机器人的预测离散动作。
步骤140,根据所述探索策略移动所述目标机器人,以得到所述目标机器人的探索轨迹,所述探索轨迹包括所述目标机器人在所述目标观察空间内移动的过程中采集的目标观测图像。
在一些实施例中,所述根据所述探索策略移动所述目标机器人,以得到所述目标机器人的探索轨迹,包括:
根据所述探索策略以及所述目标机器人在当前时刻ti采集到的目标观测信息,确定所述目标机器人在下一时刻ti+1的行进方向,其中,所述行进方向用于指示所述目标机器人在下一时刻ti+1的应该移动的方向,所述目标观测信息包括目标观测图像、目标深度图像和目标传感器位姿信息,i≥0;
控制所述目标机器人基于所述行进方向执行移动操作,以得到所述目标机器人的探索轨迹,以及所述探索轨迹上每个时间步的目标观测图像。
例如,通过语义分布不一致性和类分布不确定性学习探索策略后,可以使得该学习到的探索策略指引机器人移动得到的探索轨迹中可以出现更多语义分布不一致性的样本和更多类分布不确定性的样本。
例如,学习到探索策略后,基于学习到的探索策略,根据目标机器人在t0时刻对应的起始点的目标观测信息,该目标观测信息包括目标观测图像、目标深度图像和目标传感器位姿信息,策略网络直接可以输出目标 机器人在t1时刻的行进方向,该行进方向表示目标机器人在t1时刻应该向哪个方向走。控制目标机器人基于t1时刻的行进方向执行移动操作之后,继续基于学习到的探索策略以及当前时刻ti所在位置采集到的目标观测信息,利用策略网络输出目标机器人在下一时刻ti+1的行进方向,并控制目标机器人基于下一时刻ti+1的行进方向执行移动操作。进过上述操作,就会获得一条表示目标机器人运动路径的探索轨迹,以及该探索轨迹上每个时间步的目标观测图像。
例如,在采集探索轨迹上的目标观测图像时,除了保存同一个视角对应的第二目标观测图像,还需保存整个目标观察空间内不同视角对应的第一目标观测图像。
步骤150,基于所述语义分布不一致性与所述类分布不确定性中的至少一种条件,从所述探索轨迹对应的目标观测图像中获取难样本图像,所述难样本图像用于表征预测的语义分布结果不一致和/或预测的类分布结果不确定的图像。
在一些实施例中,所述探索轨迹对应的目标观测图像包括第一目标观测图像和第二目标观测图像,所述第一目标观测图像为从不同视角观测同一个目标对象时采集的观测图像,所述第二目标观测图像为从同一个视角观测同一个目标对象时采集的观测图像;
所述基于所述语义分布不一致性与所述类分布不确定性中的至少一种条件,从所述探索轨迹对应的目标观测图像中获取难样本图像,包括:
基于所述语义分布不一致性的条件,从所述探索轨迹对应的第一目标观测图像中获取第一难样本图像,所述第一难样本图像用于表征预测的语义分布结果不一致的图像;和/或
基于所述类分布不确定性的条件,从所述探索轨迹对应的第二目标观测图像中获取第二难样本图像,所述第二难样本图像用于表征预测的类分布结果不确定的图像。
在一些实施例中,所述基于所述类分布不确定性的条件,从所述探索轨迹对应的第二目标观测图像中获取第二难样本图像,包括:
获取所述探索轨迹对应的第二目标观测图像;
计算所述探索轨迹对应的第二目标观测图像中所有目标对象的第二 预测类别概率;
基于所述第二目标观测图像中所有目标对象的第二预测类别概率,计算第二类分布不确定性;
将所述第二目标观测图像中所述第二类分布不确定性大于第一预设阈值对应的图像确定为第二难样本图像。
示例性地,计算探索轨迹对应的第二目标观测图像中所有目标对象的第二预测类别概率,并基于所有目标对象的第二预测类别概率,计算第二类分布不确定性,将探索轨迹对应的第二目标观测图像中第二类分布不确定性大于第一预设阈值对应的图像确定为第二难样本图像。例如,在实际应用中,从探索轨迹中采样的图像,一般是采样同一个视角(单视角)对应的单帧的第二目标观测图像,可以只考虑同一个视角对应的第二类分布不确定性来选出类分布结果不确定的第二难样本图像,在实际应用中,通过同一个视角对应的第二类分布不确定性来选出第二难样本图像对感知模型调整更有帮助。通过关注同一个视角预测的类分布不确定性,可以选出更多的难样本图像。
在一些实施例中,所述基于所述语义分布不一致性的条件,从所述探索轨迹对应的第一目标观测图像中获取第一难样本图像,包括:
获取所述探索轨迹对应的第一目标观测图像;
根据所述第一目标观测图像,获取所述目标机器人移动过程中从不同视角观测同一个目标对象时的目标预测结果,并基于所述目标预测结果与所述三维语义分布图,计算第二语义分布不一致性;
将所述第一目标观测图像中所述第二语义分布不一致性大于第二预设阈值对于的图像确定为第一难样本图像。
示例性地,若目标机器人在移动的过程中保存了整个目标观察空间内的探索轨迹,则可以从探索轨迹中采样不同视角(多视角)对应的第一目标观测图像,然后对第一目标观测图像进行语义类别预测,以获取目标机器人移动过程中从不同视角观测同一个目标对象时的目标预测结果,并基于目标预测结果与三维语义分布图,计算第二语义分布不一致性,并基于第二语义分布不一致性从第二目标预测图像中选出语义分布结果不一致性的第一难样本图像。通过关注不同视角预测的语义分布不一致性,可以 选出更多的难样本图像。
例如,若难样本图像包括第一难样本图像和第二难样本图像,则通过关注同一个视角预测的类分布不确定性以及关注不同视角预测的语义分布不一致性,可以选出更多的难样本图像,并突出难样本图像的重要性。
步骤160,根据所述难样本图像调整所述目标机器人的感知模型。
在一些实施例中,所述根据所述难样本图像调整所述目标机器人的感知模型,包括:
获取所述难样本图像以及所述难样本图像的语义标注信息,其中,所述语义标注信息包括每一所述难样本图像中所有目标对象的边界框、每个所述目标对象对应的像素以及每个所述目标对象所属的类别;
将所述难样本图像输入所述预训练的感知模型,得到所述难样本图像对应的语义类别预测结果;
基于所述难样本图像对应的语义类别预测结果与所述语义标注信息,调整所述预训练的感知模型的参数,以得到调整后的感知模型。
例如,在获得探索轨迹后,最简单的方法是标注探索轨迹上的所有目标观测图像作为样本图像。但是,尽管经过训练的探索策略学习到的探索轨迹能够找到更多语义分布不一致性和类分布不确定性的对象,但仍有许多目标观测图像能被预训练的感知模型准确识别。因此,为了有效地微调感知模型,在探索轨迹获取到的所有目标观测图像的基础上,可以忽略掉能被预训练的感知模型准确识别的样本图像,然后筛选出不能被预训练的感知模型准确识别的难样本图像来微调感知模型。例如,可以通过计算第二语义分布不一致性和/或第二类分布不确定性,基于第二语义分布不一致性选出预测的语义分布结果不一致的第一难样本图像,和/或基于第二类分布不确定性选出预测的类分布结果不确定的第二难样本图像,并对选定的语义分布结果不一致的第一难样本图像和/或类分布结果不确定的第二难样本图像进行标注,并使用所有难样本图像来微调感知模型。
具体的,在获取难样本图像后,标注难样本图像的语义标注信息,具体标注每一难样本图像中所有目标对象的边界框、每个目标对象对应的像素以及每个目标对象所属的类别。然后,将所有难样本图像输入预训练的感知模型,得到每一难样本图像对应的语义类别预测结果。然后,基于 每一难样本图像对应的语义类别预测结果与语义标注信息,调整预训练的感知模型的参数,使得感知模型针对该难样本图像输出的语义类别预测结果更接近已标注的语义标注信息中目标对象所属的类别,进而提升感知模型的感知准确性,其中,该感知模型的参数为Mask RCNN中的参数;并基于随机收集的测试样本集进行测试,并基于测试样本集对应的正确率不再增加时停止训练,以得到调整后的感知模型。
如下表1所示,本公开实施例采用的方法(Ours)在Matterport3D数据集上与相关技术相比,取得了最好的性能。该性能表示在物体检测(Bbox)和实例分割(Segm)上的AP50的性能,表征感知的准确性。其中,AP50的性能的最优是100%。
表1
如下表2所示,示出了通过基于最新微调过的感知模型迭代训练探索策略时,基于如下目标对象:椅子(chair)、沙发(couch)、盆栽(potted)、床(bed)、卫生间(toilet)、电视(Tv)等,进行感知预测的性能。根据表2可知,基于最新微调过的感知模型迭代训练探索策略,性能可以进一步提升。比如,当迭代次数n为1时,AP50的平均性能为34.07%;当迭代次数n为2时,AP50的平均性能为34.71%;当迭代次数n为3时,AP50的平均性能为35.03%。
表2
上述所有的技术方案,可以采用任意结合形成本公开的可选实施例, 在此不再一一赘述。
本公开实施例通过获取目标机器人在目标观察空间内采集的观测信息,观测信息包括观测图像、深度图像和传感器位姿信息;根据观测信息,获取三维语义分布图;根据三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习目标机器人的探索策略;根据探索策略移动目标机器人,以得到目标机器人的探索轨迹,探索轨迹包括目标机器人在目标观察空间内移动的过程中采集的目标观测图像;基于语义分布不一致性与类分布不确定性中的至少一种条件,从探索轨迹对应的目标观测图像中获取难样本图像,难样本图像用于表征预测的语义分布结果不一致和/或预测的类分布结果不确定的图像;根据难样本图像调整目标机器人的感知模型。本公开实施例通过利用三维语义分布图以自监督的方式通过语义分布不一致性和类分布不确定性来学习探索轨迹,并利用语义分布不一致性与类分布不确定性中的至少一种条件来收集学习到的探索轨迹上的难样本图像,在对收集到的难样本图像进行语义标注后,基于标注的难样本图像微调感知模型,基于三维语义分布图来衡量语义分布差异,并结合语义分布不一致性和类分布不确定性来学习探索轨迹,以关注同一个视角预测的类分布不确定性以及关注不同视角预测的语义分布不一致性,并突出难样本图像的重要性,最终基于标注的难样本图像微调感知模型,降低了标注成本,提升了感知模型的感知准确性。
为便于更好的实施本公开实施例的图像处理方法,本公开实施例还提供一种图像处理装置。请参阅图3,图3为本公开实施例提供的图像处理装置的结构示意图。其中,该图像处理装置200可以包括:
第一获取单元210,用于获取目标机器人在目标观察空间内采集的观测信息,所述观测信息包括观测图像、深度图像和传感器位姿信息;
第二获取单元220,用于根据所述观测信息,获取三维语义分布图;
学习单元230,用于根据所述三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习所述目标机器人的探索策略;
确定单元240,用于根据所述探索策略移动所述目标机器人,以得到所述目标机器人的探索轨迹,所述探索轨迹包括所述目标机器人在所述目标观察空间内移动的过程中采集的目标观测图像;
第三获取单元250,用于基于所述语义分布不一致性与所述类分布不确定性中的至少一种条件,从所述探索轨迹对应的目标观测图像中获取难样本图像,所述难样本图像用于表征预测的语义分布结果不一致和/或预测的类分布结果不确定的图像;
调整单元260,用于根据所述难样本图像调整所述目标机器人的感知模型。
在一些实施例中,所述语义分布不一致性表示所述目标机器人在移动过程中从不同视角观测同一个目标对象时,得到不一致的预测分布结果;所述类分布不确定性表示所述目标机器人在移动过程中从同一个视角观测同一个目标对象时,将所述目标对象的类别预测为多个类别、且所述多个类别中有两个类别得到的预测类别概率相近且均大于第一预设阈值的情况。
在一些实施例中,所述观测图像包括第一观测图像和第二观测图像,所述第一目标观测图像为从不同视角观测同一个目标对象时采集的观测图像,所述第二目标观测图像为从同一个视角观测同一个目标对象时采集的观测图像;所述学习单元230,具体用于:根据所述第一观测图像,获取所述目标机器人移动过程中从不同视角观测同一个目标对象时的当前预测结果,并基于所述当前预测结果与所述三维语义分布图,计算第一语义分布不一致性奖励;获取所述第二观测图像中所有目标对象的第一预测类别概率,并基于所述所有目标对象的第一预测类别概率,计算第一类分布不确定性奖励;根据所述第一语义分布不一致性奖励与所述第一类分布不确定性奖励,学习所述目标机器人的探索策略。
在一些实施例中,所述探索轨迹对应的目标观测图像包括第一目标观测图像和第二目标观测图像,所述第一目标观测图像为从不同视角观测同一个目标对象时采集的观测图像,所述第二目标观测图像为从同一个视角观测同一个目标对象时采集的观测图像;所述第三获取单元250,具体用于:
基于所述语义分布不一致性的条件,从所述探索轨迹对应的第一目标观测图像中获取第一难样本图像,所述第一难样本图像用于表征预测的语义分布结果不一致的图像;和/或
基于所述类分布不确定性的条件,从所述探索轨迹对应的第二目标 观测图像中获取第二难样本图像,所述第二难样本图像用于表征预测的类分布结果不确定的图像。
在一些实施例中,所述第三获取单元250在基于所述类分布不确定性的条件,从所述探索轨迹对应的第二目标观测图像中获取第二难样本图像时,具体用于:获取所述探索轨迹对应的第二目标观测图像;计算所述探索轨迹对应的第二目标观测图像中所有目标对象的第二预测类别概率;基于所述第二目标观测图像中所有目标对象的第二预测类别概率,计算第二类分布不确定性;将所述第二目标观测图像中所述第二类分布不确定性大于第一预设阈值对应的图像确定为第二难样本图像。
在一些实施例中,所述第三获取单元250在基于所述语义分布不一致性的条件,从所述探索轨迹对应的第一目标观测图像中获取第一难样本图像时,具体用于:获取所述探索轨迹对应的第一目标观测图像;根据所述第一目标观测图像,获取所述目标机器人移动过程中从不同视角观测同一个目标对象时的目标预测结果,并基于所述目标预测结果与所述三维语义分布图,计算第二语义分布不一致性;将所述第一目标观测图像中所述第二语义分布不一致性大于第二预设阈值对于的图像确定为第一难样本图像。
在一些实施例中,所述确定单元240,具体用于:根据所述探索策略以及所述目标机器人在当前时刻ti采集到的目标观测信息,确定所述目标机器人在下一时刻ti+1的行进方向,其中,所述行进方向用于指示所述目标机器人在下一时刻ti+1的应该移动的方向,所述目标观测信息包括目标观测图像、目标深度图像和目标传感器位姿信息,i≥0;控制所述目标机器人基于所述行进方向执行移动操作,以得到所述目标机器人的探索轨迹,以及所述探索轨迹上每个时间步的目标观测图像。
在一些实施例中,所述第二获取单元220,具体用于:将所述观测图像输入预训练的感知模型,得到所述观测图像的语义类别预测结果,所述语义类别预测结果用于表征所述观测图像中每个像素在C个类别之间的预测概率分布,所述C表示预测的目标对象的种类数;基于所述深度图像建立所述目标观察空间对应的点云,其中,所述点云中的每个点对应相应的所述语义类别预测结果;基于所述传感器位姿信息,将所述点云转换到三维空间来获取体素表示;基于指数移动平均公式来聚合随着时间变化的同 一位置的所述体素表示,以得到所述三维语义分布图。
在一些实施例中,所述调整单元250,具体用于:获取所述难样本图像以及所述难样本图像的语义标注信息,其中,所述语义标注信息包括每一所述难样本图像中所有目标对象的边界框、每个所述目标对象对应的像素以及每个所述目标对象所属的类别;将所述难样本图像输入所述预训练的感知模型,得到所述难样本图像对应的语义类别预测结果;基于所述难样本图像对应的语义类别预测结果与所述语义标注信息,调整所述预训练的感知模型的参数,以得到调整后的感知模型。
在一些实施例中,所述学习单元230在所述学习所述目标机器人的探索轨迹之前,还用于:将所述三维语义分布图输入全局策略网络中选择长期目标,所述长期目标为所述三维语义分布图中的x-y坐标;将所述长期目标输入局部策略网络进行路径规划,得到所述目标机器人的预测离散动作,所述预测离散动作包括前移、左转和左转中的至少一种;基于预设个数的局部步长对所述长期目标进行采样,得到采样数据,所述采样数据用于学习所述目标机器人的离散动作。
在一些实施例中,所述第一获取单元210,具体用于:基于所述目标机器人的拍摄装置获取预设时间段内每个时间步长对应的观测图像和深度图像,其中,所述观测图像为彩色图像,所述深度图像为将所述拍摄装置采集到的目标观察空间中各点的距离值作为像素值的图像;基于所述目标机器人的传感器获取预设时间段内每个时间步长对应的传感器位姿信息,所述传感器位姿信息至少包括三自由度的位姿信息。
上述图像处理装置200中的各个单元可全部或部分通过软件、硬件及其组合来实现。上述各个单元可以以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行上述各个单元对应的操作。
图像处理装置200,可以集成在具备储存器并安装有处理器而具有运算能力的终端或服务器中,或者该图像处理装置200为该终端或服务器。
在一些实施例中,本公开还提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现上述各方法实施例中的步骤。
在一些实施例中,本公开还提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现上述各方法实施例中的步骤。
如图4所示,图4为本公开实施例提供的计算机设备的结构示意图,该计算机设备可以是终端。该计算机设备300包括有一个或者一个以上处理核心的处理器301、有一个或一个以上计算机可读存储介质的存储器302及存储在存储器302上并可在处理器上运行的计算机程序。其中,处理器301与存储器302电性连接。本领域技术人员可以理解,图中示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
处理器301是计算机设备300的控制中心,利用各种接口和线路连接整个计算机设备300的各个部分,通过运行或加载存储在存储器302内的软件程序和/或模块,以及调用存储在存储器302内的数据,执行计算机设备300的各种功能和处理数据,从而对计算机设备300进行整体处理。
在本公开实施例中,计算机设备300中的处理器301会按照如下的步骤,将一个或一个以上的应用程序的进程对应的指令加载到存储器302中,并由处理器301来运行存储在存储器302中的应用程序,从而实现各种功能:
获取目标机器人在目标观察空间内采集的观测信息,所述观测信息包括观测图像、深度图像和传感器位姿信息;根据所述观测信息,获取三维语义分布图;根据所述三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习所述目标机器人的探索策略;根据所述探索策略移动所述目标机器人,以得到所述目标机器人的探索轨迹,所述探索轨迹包括所述目标机器人在所述目标观察空间内移动的过程中采集的目标观测图像;基于所述语义分布不一致性与所述类分布不确定性中的至少一种条件,从所述探索轨迹对应的目标观测图像中获取难样本图像,所述难样本图像用于表征预测的语义分布结果不一致和/或预测的类分布结果不确定的图像;根据所述难样本图像调整所述目标机器人的感知模型。
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。
在一些实施例中,如图4所示,计算机设备300还包括:触控显示屏303、射频电路304、音频电路305、输入单元306以及电源307。其中,处 理器301分别与触控显示屏303、射频电路304、音频电路305、输入单元306以及电源307电性连接。本领域技术人员可以理解,图4中示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
触控显示屏303可用于显示图形用户界面以及接收用户作用于图形用户界面产生的操作指令。触控显示屏303可以包括显示面板和触控面板。其中,显示面板可用于显示由用户输入的信息或提供给用户的信息以及计算机设备的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。在一些实施例中,可以采用液晶显示器(LCD,Liquid Crystal Display)、有机发光二极管(OLED,Organic Light-Emitting Diode)等形式来配置显示面板。触控面板可用于收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板上或在触控面板附近的操作),并生成相应的操作指令,且操作指令执行对应程序。在一些实施例中,触控面板可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器301,并能接收处理器301发来的命令并加以执行。触控面板可覆盖显示面板,当触控面板检测到在其上或附近的触摸操作后,传送给处理器301以确定触摸事件的类型,随后处理器301根据触摸事件的类型在显示面板上提供相应的视觉输出。在本公开实施例中,可以将触控面板与显示面板集成到触控显示屏303而实现输入和输出功能。但是在某些实施例中,触控面板与触控面板可以作为两个独立的部件来实现输入和输出功能。即触控显示屏303也可以作为输入单元306的一部分实现输入功能。
射频电路304可用于收发射频信号,以通过无线通信与网络设备或其他计算机设备建立无线通讯,与网络设备或其他计算机设备之间收发信号。
音频电路305可以用于通过扬声器、传声器提供用户与计算机设备之间的音频接口。音频电路305可将接收到的音频数据转换后的电信号,传输到扬声器,由扬声器转换为声音信号输出;另一方面,传声器将收集的声音信号转换为电信号,由音频电路305接收后转换为音频数据,再将音 频数据输出处理器301处理后,经射频电路304以发送给比如另一计算机设备,或者将音频数据输出至存储器302以便进一步处理。音频电路305还可能包括耳塞插孔,以提供外设耳机与计算机设备的通信。
输入单元306可用于接收输入的数字、字符信息或对象特征信息(例如指纹、虹膜、面部信息等),以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
电源307用于给计算机设备300的各个部件供电。在一些实施例中,电源307可以通过电源管理系统与处理器301逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源307还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
尽管图4中未示出,计算机设备300还可以包括摄像头、传感器、无线保真模块、蓝牙模块等,在此不再赘述。
本公开还提供了一种机器人,该机器人上搭载有拍摄装置和传感器,该机器人还包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现上述各方法实施例中的步骤。
本公开还提供了一种计算机可读存储介质,用于存储计算机程序。该计算机可读存储介质可应用于计算机设备,并且该计算机程序使得计算机设备执行本公开实施例中的图像处理方法中的相应流程,为了简洁,在此不再赘述。
本公开还提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机程序,处理器执行该计算机程序,使得计算机设备执行本公开实施例中的图像处理方法中的相应流程,为了简洁,在此不再赘述。
本公开还提供了一种计算机程序,该计算机程序包括计算机程序,计算机程序存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机程序,处理器执行该计算机程序,使得计算机设备执行本公开实施例中的图像处理方法中的相应流程,为了简洁,在此不再赘述。
应理解,本公开实施例的处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本公开实施例中的公开的各方法、步骤及逻辑框图。结合本公开实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
可以理解,本公开实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬 件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能若以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不 局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。

Claims (15)

  1. 一种图像处理方法,包括:
    获取目标机器人在目标观察空间内采集的观测信息,所述观测信息包括观测图像、深度图像和传感器位姿信息;
    根据所述观测信息,获取三维语义分布图;
    根据所述三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习所述目标机器人的探索策略;
    根据所述探索策略移动所述目标机器人,以得到所述目标机器人的探索轨迹,所述探索轨迹包括所述目标机器人在所述目标观察空间内移动的过程中采集的目标观测图像;
    基于所述语义分布不一致性与所述类分布不确定性中的至少一种条件,从所述探索轨迹对应的目标观测图像中获取难样本图像,所述难样本图像用于表征预测的语义分布结果不一致和/或预测的类分布结果不确定的图像;
    根据所述难样本图像调整所述目标机器人的感知模型。
  2. 如权利要求1所述的图像处理方法,其中,所述语义分布不一致性表示所述目标机器人在移动过程中从不同视角观测同一个目标对象时,得到不一致的预测分布结果;
    所述类分布不确定性表示所述目标机器人在移动过程中从同一个视角观测同一个目标对象时,将所述目标对象的类别预测为多个类别、且所述多个类别中有两个类别得到的预测类别概率相近且均大于第一预设阈值的情况。
  3. 如权利要求2所述的图像处理方法,其中,所述观测图像包括第一观测图像和第二观测图像,所述第一目标观测图像为从不同视角观测同一个目标对象时采集的观测图像,所述第二目标观测图像为从同一个视角观测同一个目标对象时采集的观测图像;
    所述根据所述三维语义分布图,基于语义分布不一致性和类分布不确定性的条件,学习所述目标机器人的探索策略,包括:
    根据所述第一观测图像,获取所述目标机器人移动过程中从不同视角观测同一个目标对象时的当前预测结果,并基于所述当前预测结果与所述三维语义分布图,计算第一语义分布不一致性奖励;
    获取所述第二观测图像中所有目标对象的第一预测类别概率,并基于所述所有目标对象的第一预测类别概率,计算第一类分布不确定性奖励;
    根据所述第一语义分布不一致性奖励与所述第一类分布不确定性奖励,学习所述目标机器人的探索策略。
  4. 如权利要求2或3所述的图像处理方法,其中,所述探索轨迹对应的目标观测图像包括第一目标观测图像和第二目标观测图像,所述第一目标观测图像为从不同视角观测同一个目标对象时采集的观测图像,所述第二目标观测图像为从同一个视角观测同一个目标对象时采集的观测图像;
    所述基于所述语义分布不一致性与所述类分布不确定性中的至少一种条件,从所述探索轨迹对应的目标观测图像中获取难样本图像,包括:
    基于所述语义分布不一致性的条件,从所述探索轨迹对应的第一目标观测图像中获取第一难样本图像,所述第一难样本图像用于表征预测的语义分布结果不一致的图像;和/或
    基于所述类分布不确定性的条件,从所述探索轨迹对应的第二目标观测图像中获取第二难样本图像,所述第二难样本图像用于表征预测的类分布结果不确定的图像。
  5. 如权利要求4所述的图像处理方法,其中,所述基于所述类分布不确定性的条件,从所述探索轨迹对应的第二目标观测图像中获取第二难样本图像,包括:
    获取所述探索轨迹对应的第二目标观测图像;
    计算所述探索轨迹对应的第二目标观测图像中所有目标对象的第二预测类别概率;
    基于所述第二目标观测图像中所有目标对象的第二预测类别概率,计算第二类分布不确定性;
    将所述第二目标观测图像中所述第二类分布不确定性大于第一预设阈值对应的图像确定为第二难样本图像。
  6. 如权利要求4或5所述的图像处理方法,其中,所述基于所述语 义分布不一致性的条件,从所述探索轨迹对应的第一目标观测图像中获取第一难样本图像,包括:
    获取所述探索轨迹对应的第一目标观测图像;
    根据所述第一目标观测图像,获取所述目标机器人移动过程中从不同视角观测同一个目标对象时的目标预测结果,并基于所述目标预测结果与所述三维语义分布图,计算第二语义分布不一致性;
    将所述第一目标观测图像中所述第二语义分布不一致性大于第二预设阈值对应的图像确定为第一难样本图像。
  7. 如权利要求1-6任一项所述的图像处理方法,其中,所述根据所述探索策略移动所述目标机器人,以得到所述目标机器人的探索轨迹,包括:
    根据所述探索策略以及所述目标机器人在当前时刻ti采集到的目标观测信息,确定所述目标机器人在下一时刻ti+1的行进方向,其中,所述行进方向用于指示所述目标机器人在下一时刻ti+1的应该移动的方向,所述目标观测信息包括目标观测图像、目标深度图像和目标传感器位姿信息,i≥0;
    控制所述目标机器人基于所述行进方向执行移动操作,以得到所述目标机器人的探索轨迹,以及所述探索轨迹上每个时间步的目标观测图像。
  8. 如权利要求1-7任一项所述的图像处理方法,其中,所述根据所述观测信息,获取三维语义分布图,包括:
    将所述观测图像输入预训练的感知模型,得到所述观测图像的语义类别预测结果,所述语义类别预测结果用于表征所述观测图像中每个像素在C个类别之间的预测概率分布,所述C表示预测的目标对象的种类数;
    基于所述深度图像建立所述目标观察空间对应的点云,其中,所述点云中的每个点对应相应的所述语义类别预测结果;
    基于所述传感器位姿信息,将所述点云转换到三维空间来获取体素表示;
    基于指数移动平均公式来聚合随着时间变化的同一位置的所述体素表示,以得到所述三维语义分布图。
  9. 如权利要求8所述的图像处理方法,其中,所述根据所述难样本 图像调整所述目标机器人的感知模型,包括:
    获取所述难样本图像以及所述难样本图像的语义标注信息,其中,所述语义标注信息包括每一所述难样本图像中所有目标对象的边界框、每个所述目标对象对应的像素以及每个所述目标对象所属的类别;
    将所述难样本图像输入所述预训练的感知模型,得到所述难样本图像对应的语义类别预测结果;
    基于所述难样本图像对应的语义类别预测结果与所述语义标注信息,调整所述预训练的感知模型的参数,以得到调整后的感知模型。
  10. 如权利要求1-9任一项所述的图像处理方法,其中,在所述学习所述目标机器人的探索策略之前,所述图像处理方法还包括:
    将所述三维语义分布图输入全局策略网络中选择长期目标,所述长期目标为所述三维语义分布图中的x-y坐标;
    将所述长期目标输入局部策略网络进行路径规划,得到所述目标机器人的预测离散动作,所述预测离散动作包括前移、左转和左转中的至少一种;
    基于预设个数的局部步长对所述长期目标进行采样,得到采样数据,所述采样数据用于学习所述目标机器人的离散动作。
  11. 如权利要求1-10任一项所述的图像处理方法,其中,所述获取目标机器人在目标观察空间内采集的观测信息,包括:
    基于所述目标机器人的拍摄装置获取预设时间段内每个时间步长对应的观测图像和深度图像,其中,所述观测图像为彩色图像,所述深度图像为将所述拍摄装置采集到的目标观察空间中各点的距离值作为像素值的图像;
    基于所述目标机器人的传感器获取预设时间段内每个时间步长对应的传感器位姿信息,所述传感器位姿信息至少包括三自由度的位姿信息。
  12. 一种图像处理装置,包括:
    第一获取单元,用于获取目标机器人在目标观察空间内采集的观测信息,所述观测信息包括观测图像、深度图像和传感器位姿信息;
    第二获取单元,用于根据所述观测信息,获取三维语义分布图;
    学习单元,用于根据所述三维语义分布图,基于语义分布不一致性 和类分布不确定性的条件,学习所述目标机器人的探索策略;
    确定单元,用于根据所述探索策略移动所述目标机器人,以得到所述目标机器人的探索轨迹,所述探索轨迹包括所述目标机器人在所述目标观察空间内移动的过程中采集的目标观测图像;
    第三获取单元,用于基于所述语义分布不一致性与所述类分布不确定性中的至少一种条件,从所述探索轨迹对应的目标观测图像中获取难样本图像,所述难样本图像用于表征预测的语义分布结果不一致和/或预测的类分布结果不确定的图像;
    调整单元,用于根据所述难样本图像调整所述目标机器人的感知模型。
  13. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序适于处理器进行加载,以执行如权利要求1-11任一项所述的图像处理方法。
  14. 一种计算机设备,包括处理器和存储器,其中,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行权利要求1-11任一项所述的图像处理方法。
  15. 一种计算机程序产品,包括计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1-11任一项所述的图像处理方法。
PCT/CN2023/112209 2022-08-23 2023-08-10 图像处理方法、装置、存储介质及设备 WO2024041392A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211014186.7A CN115471731B (zh) 2022-08-23 2022-08-23 图像处理方法、装置、存储介质及设备
CN202211014186.7 2022-08-23

Publications (1)

Publication Number Publication Date
WO2024041392A1 true WO2024041392A1 (zh) 2024-02-29

Family

ID=84367693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/112209 WO2024041392A1 (zh) 2022-08-23 2023-08-10 图像处理方法、装置、存储介质及设备

Country Status (2)

Country Link
CN (1) CN115471731B (zh)
WO (1) WO2024041392A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471731B (zh) * 2022-08-23 2024-04-09 北京有竹居网络技术有限公司 图像处理方法、装置、存储介质及设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102547A (zh) * 2018-07-20 2018-12-28 上海节卡机器人科技有限公司 基于物体识别深度学习模型的机器人抓取位姿估计方法
CN110955466A (zh) * 2018-09-27 2020-04-03 罗伯特·博世有限公司 用于测定智能体的策略的方法、装置和计算机程序
CN111524187A (zh) * 2020-04-22 2020-08-11 北京三快在线科技有限公司 一种视觉定位模型的训练方法及装置
CN111814683A (zh) * 2020-07-09 2020-10-23 北京航空航天大学 一种基于语义先验和深度学习特征的鲁棒视觉slam方法
US20210118184A1 (en) * 2019-10-17 2021-04-22 Toyota Research Institute, Inc. Systems and methods for self-supervised scale-aware training of a model for monocular depth estimation
CN113744301A (zh) * 2021-08-05 2021-12-03 深圳供电局有限公司 移动机器人的运动轨迹估计方法、装置和存储介质
CN115471731A (zh) * 2022-08-23 2022-12-13 北京有竹居网络技术有限公司 图像处理方法、装置、存储介质及设备

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330973A (zh) * 2017-07-03 2017-11-07 深圳市唯特视科技有限公司 一种基于多视角监督的单视角重建方法
CN111862213A (zh) * 2020-07-29 2020-10-30 Oppo广东移动通信有限公司 定位方法及装置、电子设备、计算机可读存储介质
CN113239629B (zh) * 2021-06-03 2023-06-16 上海交通大学 一种轨迹空间行列式点过程的强化学习探索和利用的方法
CN113593035A (zh) * 2021-07-09 2021-11-02 清华大学 一种运动控制决策生成方法、装置、电子设备及存储介质
CN113743417B (zh) * 2021-09-03 2024-02-23 北京航空航天大学 语义分割方法和语义分割装置
CN114089752A (zh) * 2021-11-11 2022-02-25 深圳市杉川机器人有限公司 机器人的自主探索方法、机器人及计算机可读存储介质
CN114372520A (zh) * 2021-12-29 2022-04-19 同济大学 一种基于双智能体竞争强化学习的机器人路径探索方法
CN114782530A (zh) * 2022-03-28 2022-07-22 杭州国辰机器人科技有限公司 室内场景下的三维语义地图构建方法、装置、设备及介质
CN114879660B (zh) * 2022-04-14 2023-08-15 海南大学 一种基于目标驱动的机器人环境感知方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102547A (zh) * 2018-07-20 2018-12-28 上海节卡机器人科技有限公司 基于物体识别深度学习模型的机器人抓取位姿估计方法
CN110955466A (zh) * 2018-09-27 2020-04-03 罗伯特·博世有限公司 用于测定智能体的策略的方法、装置和计算机程序
US20210118184A1 (en) * 2019-10-17 2021-04-22 Toyota Research Institute, Inc. Systems and methods for self-supervised scale-aware training of a model for monocular depth estimation
CN111524187A (zh) * 2020-04-22 2020-08-11 北京三快在线科技有限公司 一种视觉定位模型的训练方法及装置
CN111814683A (zh) * 2020-07-09 2020-10-23 北京航空航天大学 一种基于语义先验和深度学习特征的鲁棒视觉slam方法
CN113744301A (zh) * 2021-08-05 2021-12-03 深圳供电局有限公司 移动机器人的运动轨迹估计方法、装置和存储介质
CN115471731A (zh) * 2022-08-23 2022-12-13 北京有竹居网络技术有限公司 图像处理方法、装置、存储介质及设备

Also Published As

Publication number Publication date
CN115471731A (zh) 2022-12-13
CN115471731B (zh) 2024-04-09

Similar Documents

Publication Publication Date Title
US20230281422A1 (en) Update of local features model based on correction to robot action
US11126257B2 (en) System and method for detecting human gaze and gesture in unconstrained environments
WO2019170164A1 (zh) 基于深度相机的三维重建方法、装置、设备及存储介质
WO2022262152A1 (zh) 地图构建方法及装置、电子设备、存储介质和计算机程序产品
WO2020098076A1 (zh) 跟踪目标的定位方法、装置、设备及存储介质
WO2024041392A1 (zh) 图像处理方法、装置、存储介质及设备
JP2017523487A (ja) 適応ホモグラフィ写像に基づく視線追跡
TW202115366A (zh) 機率性多機器人slam的系統及方法
CN116261706A (zh) 用于使用融合数据进行对象跟踪的系统和方法
US20240161254A1 (en) Information processing apparatus, information processing method, and program
US9919429B2 (en) Robot, control method, and program
CN110007764B (zh) 一种手势骨架识别方法、装置、系统及存储介质
TWI832032B (zh) 透過問答生成訓練資料的系統及其方法
WO2021164000A1 (en) Image processing method, apparatus, device and medium
WO2024021504A1 (zh) 人脸识别模型训练方法、识别方法、装置、设备及介质
WO2023160301A1 (zh) 物体信息确定方法、移动机器人系统及电子设备
CN114022570A (zh) 相机间外参的标定方法及电子设备
US10891755B2 (en) Apparatus, system, and method for controlling an imaging device
US20220345621A1 (en) Scene lock mode for capturing camera images
CN116309538B (zh) 绘图考试评定方法、装置、计算机设备及存储介质
WO2023201723A1 (zh) 目标检测模型的训练方法、目标检测方法及装置
CN115830110B (zh) 即时定位与地图构建方法、装置、终端设备及存储介质
US11630510B2 (en) System, method and storage medium for 2D on-screen user gaze estimation
WO2022153910A1 (ja) 検出システム、検出方法、及びプログラム
WO2023240583A1 (zh) 一种跨媒体对应知识的生成方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856482

Country of ref document: EP

Kind code of ref document: A1