WO2020238790A1 - Camera positioning - Google Patents

Camera positioning Download PDF

Info

Publication number
WO2020238790A1
WO2020238790A1 PCT/CN2020/091768 CN2020091768W WO2020238790A1 WO 2020238790 A1 WO2020238790 A1 WO 2020238790A1 CN 2020091768 W CN2020091768 W CN 2020091768W WO 2020238790 A1 WO2020238790 A1 WO 2020238790A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
camera
processed
pose
pixel
Prior art date
Application number
PCT/CN2020/091768
Other languages
French (fr)
Chinese (zh)
Inventor
鲍虎军
章国锋
黄昭阳
许龑
Original Assignee
浙江商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江商汤科技开发有限公司 filed Critical 浙江商汤科技开发有限公司
Priority to JP2021534170A priority Critical patent/JP2022513868A/en
Priority to KR1020217019918A priority patent/KR20210095925A/en
Publication of WO2020238790A1 publication Critical patent/WO2020238790A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to the field of computer vision, in particular to a camera positioning method and device, and storage medium.
  • Visual positioning has a wide range of applications. In the actual application environment, factors such as object movement may affect the accuracy of visual positioning, and even directly cause visual positioning to fail.
  • the present disclosure provides a camera positioning method, device, and storage medium.
  • a camera positioning method including:
  • the absolute pose of the camera that collects the image to be processed in the world coordinate system is determined according to the target image.
  • a camera positioning device including:
  • An acquisition module for acquiring the prior probability of a movable object appearing at each of the multiple pixels included in the image template
  • the execution module is configured to perform an operation of discarding some pixels for an image to be processed that is as large as the image template according to the prior probability to obtain a target image;
  • the positioning module is configured to determine, according to the target image, the absolute pose of the camera that collects the image to be processed in the world coordinate system.
  • a computer-readable storage medium stores a computer program, and the computer program is used to execute the camera positioning method described in the first aspect.
  • a camera positioning device comprising: a processor; and a memory for storing executable instructions of the processor.
  • the processor is configured to call executable instructions stored in the memory to implement the camera positioning method described in the first aspect.
  • the prior probability that a movable object appears at each of the multiple pixels included in the image template can be obtained first, and some pixels are discarded based on the prior probability of the image to be processed that is as large as the image template.
  • Fig. 1 is a flowchart of a camera positioning method according to an exemplary embodiment of the present disclosure
  • Fig. 2 is a flowchart of step 110 according to an exemplary embodiment of the present disclosure
  • Fig. 3 is a schematic diagram showing an image template according to an exemplary embodiment of the present disclosure
  • Fig. 4 is a flowchart showing step 120 according to an exemplary embodiment of the present disclosure.
  • Fig. 5 is a flowchart of a camera positioning method according to another exemplary embodiment of the present disclosure.
  • Fig. 6 is a flowchart of step 150 according to an exemplary embodiment of the present disclosure.
  • Fig. 7 is a schematic diagram showing multiple absolute poses according to an exemplary embodiment of the present disclosure.
  • Fig. 8 is a schematic diagram showing a process of determining and correcting a pose according to an exemplary embodiment of the present disclosure
  • Fig. 9 is a schematic diagram showing an optimized pose graph according to an exemplary embodiment of the present disclosure.
  • Fig. 10 is a flowchart of a camera positioning method according to another exemplary embodiment of the present disclosure.
  • Fig. 11 is a flowchart showing step 230 according to an exemplary embodiment of the present disclosure.
  • 12A to 12B are schematic diagrams showing a self-attention mechanism according to an exemplary embodiment of the present disclosure
  • Fig. 13A is a schematic diagram showing an image to be processed according to an exemplary embodiment of the present disclosure
  • Fig. 13B is a schematic diagram showing a feature extraction image after weight value adjustment according to an exemplary embodiment of the present disclosure
  • Fig. 14 is a flowchart of a camera positioning method according to another exemplary embodiment of the present disclosure.
  • Fig. 15 is a frame diagram of a target neural network according to an exemplary embodiment of the present disclosure.
  • Fig. 16 is a block diagram showing a camera positioning device according to an exemplary embodiment of the present disclosure.
  • Fig. 17 is a block diagram showing an obtaining module according to an exemplary embodiment of the present disclosure.
  • Fig. 18 is a block diagram showing an execution module according to an exemplary embodiment of the present disclosure.
  • Fig. 19 is a block diagram showing a positioning module according to an exemplary embodiment of the present disclosure.
  • Fig. 20 is a block diagram showing a camera positioning device according to another exemplary embodiment of the present disclosure.
  • Fig. 21 is a block diagram showing a second determining module according to an exemplary embodiment of the present disclosure.
  • Fig. 22 is a block diagram showing a camera positioning device according to another exemplary embodiment of the present disclosure.
  • Fig. 23 is a block diagram showing an obtaining module according to an exemplary embodiment of the present disclosure.
  • Fig. 24 is a block diagram showing an execution module according to an exemplary embodiment of the present disclosure.
  • Fig. 25 is a block diagram showing a positioning module according to an exemplary embodiment of the present disclosure.
  • Fig. 26 is a block diagram showing a camera positioning device according to another exemplary embodiment of the present disclosure.
  • Fig. 27 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
  • word “if” as used herein can be interpreted as "when” or “when” or “in response to a certainty”.
  • the embodiments of the present disclosure provide a camera positioning method, which can discard parts of the image to be processed that are as large as the image template according to the prior probability of a movable object appearing at each of the multiple pixels included in the image template
  • the pixel points are used to obtain the target image, and then the absolute pose of the camera is determined according to the target image, which reduces the influence of the movement of objects in the scene where the camera collects the image on the camera positioning result, and improves the accuracy and precision of the camera positioning.
  • the camera positioning method provided by the embodiments of the present disclosure can be applied to a movable machine and equipment to position a camera provided on the movable machine and equipment.
  • Movable machinery and equipment include, but are not limited to, drones, unmanned vehicles, and robots with cameras.
  • the accuracy of camera positioning can improve the accuracy of mobile machinery and equipment when performing various tasks. For example, according to the image of the forward environment of the vehicle collected by the camera installed on the unmanned vehicle, the current location information of the camera can be determined, and the current location information of the vehicle can be located according to the location information of the camera, so that the unmanned vehicle can be The vehicle performs at least one intelligent driving control such as path planning, trajectory tracking, and collision warning.
  • the camera positioning method provided by the embodiment of the present disclosure may include the following steps 110-130:
  • step 110 the prior probability of a movable object appearing at each of the multiple pixels included in the image template is obtained.
  • the image template may include a template corresponding to the current scene and used to record the prior probability of a movable object appearing at each of multiple pixels on an image as large as the image template.
  • Movable objects include, but are not limited to, various objects that can move on their own or under control, such as buses, cars, people, bicycles, trucks, motorcycles, animals, etc.
  • the prior probability refers to the probability that each pixel on the image is a movable object obtained by analyzing the image that is the same or similar to the current scene collected in the past.
  • the prior probability corresponding to a certain pixel is high, it means that there is a high possibility of a movable object at that pixel in the image collected for the scene; conversely, if the prior probability corresponding to a certain pixel is low, it means that The possibility of a movable object at this pixel point in the image collected by the scene is low.
  • the image template can reflect the a priori possibility of movable objects appearing at different pixels in the collected image.
  • the probability of a movable object appearing at each pixel on each image in the above image collection can be analyzed for a collection of images collected from a scene that is the same or similar to the current scene, and this probability can be used as each image template corresponding to the current scene.
  • the prior probability that a movable object appears at each pixel can be analyzed for a collection of images collected from a scene that is the same or similar to the current scene, and this probability can be used as each image template corresponding to the current scene.
  • the prior probability that a movable object appears at each pixel can be used as each image template corresponding to the current scene.
  • the image collection collected in the same or similar scene as the current scene can include the main street of the city. At least one image.
  • step 120 an operation of discarding some pixels is performed on an image to be processed that is as large as the image template according to the prior probability to obtain a target image.
  • the image to be processed may be at least one image collected by a camera provided on the movable machine equipment during the movement of the movable machine equipment.
  • the mobile machinery and equipment can perform the discarding part on at least one image that is as large as the image template collected by the camera set on the mobile machinery and equipment according to the prior probability corresponding to each pixel on the image template corresponding to the current scene The operation of the pixel points to obtain the target image.
  • the operation of discarding part of the pixels includes, but is not limited to, discarding all or all pixels whose a priori probability sampling value is greater than a preset value on at least one image collected by the camera and the size of the image template. Partially discarded randomly.
  • step 130 the absolute pose of the camera collecting the image to be processed in the world coordinate system is determined according to the target image.
  • the mobile machine equipment can determine the absolute pose of the camera in the world coordinate system according to the target image through regression loss function.
  • the regression loss function can be a mean square error loss function (such as L2 loss function), average absolute error (such as L1 loss function), smooth average absolute error loss function (such as Huber loss function), logarithmic hyperbolic cosine loss function or Quantile loss function, etc.
  • the movable machinery and equipment can be combined with the prior probability that a movable object appears at each of the multiple pixels on the image template corresponding to the current scene. At least one of the collected images is discarded with some pixels to obtain the target image, and the target image is used to determine the absolute pose of the camera, which can effectively reduce the negative impact of the movement of objects in the current scene on the camera positioning and improve the camera positioning Accuracy and precision.
  • a camera set on a movable machine equipment its pose may be changed due to factors such as the movement of the movable machine equipment and/or the position adjustment of the camera, so that the camera needs to be positioned.
  • the inventor of the present disclosure found that if there is movement of an object in the field of view of the image captured by the camera, the movement of the object will cause poor imaging quality of the corresponding part of the image captured by the camera, such as image blur, jitter, etc. Poor quality parts will affect the quality of the overall features of the captured image, and further affect the accuracy and precision of camera positioning based on the overall features of the image. However, some immobile or fixed objects in the captured image are actually useful for camera positioning.
  • the embodiments of the present disclosure determine the probability of a movable object appearing at each pixel in the captured image (that is, the prior probability) by combining prior knowledge, and perform partial pixel discarding on the captured image based on the determined probability , Such as discarding some of the pixels with a higher a priori probability of moving objects, which can reduce the negative impact of these pixels on the overall quality of the image, which is beneficial to improve the overall quality of the image after the local pixel is discarded.
  • the accuracy of positioning is the probability of a movable object appearing at each pixel in the captured image (that is, the prior probability) by combining prior knowledge, and perform partial pixel discarding on the captured image based on the determined probability .
  • step 110 may be performed by an electronic device, which may be a mobile machine device, or an electronic device for training a neural network, such as a cloud platform, which is not limited in the present disclosure. As shown in Figure 2, step 110 may include steps 111-113:
  • step 111 pixel-level semantic segmentation is performed on each image in a predetermined image set associated with the current scene.
  • the predetermined image set associated with the current scene includes multiple pictures collected in the same or similar scene as the current scene.
  • the electronic device can obtain the pixel-level semantic segmentation result of each image by searching for the content existing on each image in the predetermined image set.
  • the predetermined image set associated with the current scene may include images m1, m2,...mN as shown in FIG. 3.
  • step 112 the first pixel belonging to the movable object and the second pixel belonging to the background in each image are determined according to the result of pixel-level semantic segmentation.
  • the background is an immovable object in the image, for example, other objects in the image that are not determined to be movable objects, such as sky, buildings, trees, roads, etc.
  • step 113 based on the statistical distribution of the first pixel and the second pixel on each image in the predetermined image set, it is determined that each of the multiple pixels included in the image template that is the same size as the image in the predetermined image set The prior probability that a movable object appears at each pixel.
  • the electronic device obtains a statistical distribution corresponding to the current scene based on the statistical distribution of the first pixel of the movable object and the second pixel of the background in each image in the predetermined image set associated with the current scene
  • An image template such as the image template M in FIG. 3, is used to record the prior probability that a movable object appears at each pixel in an image that is as large as the image template collected in the current scene.
  • the prior probability of a movable object at each pixel recorded on the image template is a statistical distribution range, rather than a fixed value.
  • the subsequent operation of discarding some pixels of an image to be processed that is as large as the image template is performed according to the prior probability, different pixels can be discarded according to the statistical distribution range of the prior probability each time to obtain different targets.
  • image determining the absolute pose of the camera based on multiple different target images can obtain better camera positioning results, especially in large-scale urban traffic scenes.
  • the prior probability that a movable object appears at each pixel included in the image template may conform to a Gaussian distribution, as shown in formula 1:
  • i represents the pixel point of the i-th row on the image template
  • j represents the pixel point of the j-th column on the image template
  • (i,j) corresponds to the pixel coordinates
  • the mathematical expectation of the pixel point (i,j) is ⁇ (i ,j)
  • N is the number of pixels
  • the variance of the pixel (i,j) is ⁇ 2 (i,j)
  • ⁇ 2 (i,j) ⁇ (i,j)(1- ⁇ (i,j))
  • P(M(i,j)) is the prior probability of pixel (i,j).
  • step 120 may include:
  • step 121 a priori probability corresponding to at least some pixels included in the image to be processed is sampled.
  • the distribution of the prior probability that a movable object appears at each pixel on each to-be-processed image satisfies a Gaussian distribution.
  • the mobile machine and equipment can sample the prior probability corresponding to at least some of the pixels included in the to-be-processed image to obtain the image on the to-be-processed image after this sampling.
  • the sampling value of the prior probability corresponding to at least some pixels.
  • step 122 pixel points whose a priori probability sampling value is greater than a preset threshold are removed from the image to be processed to obtain a target image corresponding to this sampling.
  • the mobile machine equipment can remove all pixels with a priori probability that the sampling value is greater than a preset threshold for the to-be-processed image in the above-mentioned manner, or randomly remove part of the priori probability For pixel points whose sampling value is greater than the preset threshold value, the target image corresponding to the current sampling of the image to be processed is obtained.
  • the prior probability of the same pixel on the same image to be processed can be The sampling values are different, so that there is at least one different pixel point between the multiple target images obtained after the operation of discarding some pixels.
  • the sampling value of the prior probability corresponding to pixel 1 on image 1 to be processed is P 1
  • the sampling of the prior probability corresponding to pixel 1 on image 1 to be processed The value is P 2 and the preset threshold is T. Among them, P 1 ⁇ T ⁇ P 2 . Then the target image obtained after the first sampling retains pixel 1, and the target image obtained after the second sampling needs to remove pixel 1.
  • the mobile equipment can sample the prior probabilities corresponding to the pixels on the same image to be processed multiple times, and accordingly obtain multiple different target images for camera positioning, which is beneficial to guarantee the final result The accuracy of camera positioning.
  • step 130 may include: inputting the to-be-processed image into a target neural network to obtain the absolute pose of the camera in the world coordinate system.
  • the mobile machine equipment can input the image to be processed into the target neural network, and the target neural network directly outputs the absolute pose of the camera that collects the image to be processed in the world coordinate system.
  • the movable machine equipment discards at least some pixels on the image to be processed with a prior probability greater than a preset value according to the prior probability that each pixel on the image template belongs to the movable object, thereby improving the camera positioning Accuracy.
  • the image to be processed includes k frames of images (k is an integer greater than or equal to 2) that are acquired by the camera in time sequence, that is, time sequence, as shown in FIG. 5,
  • the method also includes steps 140-150:
  • step 140 the relative pose of the camera when shooting the k frames of images is determined according to the k frames of images.
  • the movable machine equipment can use the visual odometry method to determine the relative pose of the camera when acquiring the k-th frame image with respect to the acquisition of the k-1 frame image.
  • step 150 the corrected pose of the camera is determined according to the relative pose and absolute pose of the camera.
  • the mobile machine equipment can use the camera's absolute pose in the world coordinate system when acquiring the first frame image (also referred to as the first frame image) in the sequence of k frames of images as a reference , According to the relative pose and absolute pose of the camera when the second frame image adjacent to the first frame image is collected, the corrected pose of the camera is determined.
  • the first frame image also referred to as the first frame image
  • the corrected pose of the camera is determined.
  • Subsequent movable machinery and equipment can adjust the pose of the camera according to the corrected pose, thereby reducing the impact of the movement of objects in the scene on the positioning of the camera, which can help ensure the accuracy of the movable machinery and equipment in performing various tasks.
  • step 150 may specifically include steps 151-153:
  • step 151 the deterministic probability of the absolute pose is determined.
  • the deterministic probability is an evaluation of the accuracy of the result of the absolute pose. If the probability of certainty is higher, the result of the absolute pose is more accurate, otherwise the result of the absolute pose is less accurate.
  • the movable machinery and equipment can adopt a random sampling method, such as Monte Carlo method, to sample the prior probabilities corresponding to the k-frame images with sequential nature collected by the camera to obtain the sampling result of multiple sampling.
  • k is an integer greater than or equal to 2.
  • the current image can be sampled multiple times based on the prior probability of each pixel included in the image template M, and multiple absolute positions corresponding to the current image can be determined based on the target image corresponding to each sample. posture.
  • the deterministic probability of the absolute pose corresponding to the current image is determined according to the multiple absolute poses corresponding to the current image. For example, if the difference between the absolute poses corresponding to the current image is large, it can be determined that the absolute pose corresponding to the current image has a low probability of certainty. Otherwise, the absolute pose corresponding to the current image is determined to be certain The probability is higher.
  • step 152 the first weight of the relative pose and the second weight of the absolute pose are determined according to the deterministic probability of the absolute pose.
  • the movable machine equipment can determine the first weight of the relative pose corresponding to each frame of image according to the deterministic probability of the absolute pose corresponding to each frame of image And the second weight of the absolute pose corresponding to each frame of image.
  • the second weight of the absolute pose corresponding to the current image can be increased; if the absolute pose corresponding to the current image has a low certainty probability, it can be increased The first weight of the relative pose corresponding to the current image.
  • step 153 the corrected pose of the camera is determined according to the relative pose, the first weight, the absolute pose, and the second weight.
  • the sliding window is adopted to move sequentially, and the relative position corresponding to the second frame of image
  • the pose, the first weight, the absolute pose and the second weight determine the corrected pose of the second frame of image relative to the first frame of image.
  • the weight of the relative pose can be increased, and if the absolute pose is more accurate, the weight of the absolute pose can be increased. In this way, by making the relative pose and the absolute pose each have different weights to determine the corrected pose, the corrected pose can be made more accurate, and the camera positioning can be more accurate.
  • the optimized pose map can be shown in Figure 9.
  • the triangle in Figure 9 represents the absolute pose when the camera collects each frame of image, with The arrow line represents the relative pose, and the circle represents the sliding window.
  • the corrected absolute pose and relative pose in Fig. 9 correspond to the absolute pose and relative pose in Fig. 8 from the upper left corner to the lower right corner in sequence according to the arrow direction.
  • the pose determined by the VO (Visual Odometry) method may be used as the relative pose corresponding to the image.
  • the VO method is to determine the position and posture of the camera by analyzing the above k frames of images. Estimating the movement of the camera between adjacent frames by performing feature matching on k frames of images, so as to obtain the relative pose of the camera when the next frame is collected compared to the previous frame.
  • the absolute pose and relative pose are combined to perform pose correction, which further improves the accuracy of camera positioning.
  • the camera positioning method provided in the present disclosure can also be applied to electronic devices that train neural networks, such as cloud platforms, neural network training platforms, and so on.
  • the electronic device uses this method to train the neural network to obtain the target neural network. After the image is subsequently input to the target neural network, the absolute pose of the camera that collected the image in the world coordinate system can be obtained.
  • the camera positioning method provided by the embodiment of the present disclosure may include the following steps 210-230:
  • step 210 the prior probability that a movable object appears at each of the multiple pixels included in the image template is obtained.
  • the electronic device can analyze the probability of a movable object appearing at each pixel of each image based on each of the above images, and use this probability as the occurrence of a movable object at each pixel on an image template that is the same size as each image. The prior probability.
  • step 220 according to the prior probability, an operation of discarding part of pixels is performed on an image to be processed that is as large as the image template to obtain a target image.
  • the image to be processed may be at least one sample image, and the electronic device may perform the operation of discarding some pixels on the at least one sample image according to the prior probability corresponding to each pixel on the image template, so as to obtain the target image.
  • the operation of discarding some pixels includes but is not limited to the operation of discarding all pixels or randomly partially discarding pixels on at least one sample image whose a priori probability sampling value is greater than a preset value.
  • step 230 the absolute pose of the camera that collects the image to be processed in the world coordinate system is determined according to the target image.
  • the electronic device can determine the absolute pose of the camera that collects at least one sample image in the world coordinate system through the regression loss function according to the obtained target image.
  • the regression loss function can be a mean square error loss function (such as L2 loss function), average absolute error (such as L1 loss function), smooth average absolute error loss function (such as Huber loss function), log hyperbolic cosine loss function, and Quantile loss function, etc.
  • step 210 may be performed by an electronic device that trains a neural network, and the execution process is the same as the execution process of step 110 in FIG. 2, and will not be repeated here.
  • step 220 may be performed by an electronic device that trains a neural network, and the execution process is the same as that of step 120 in FIG. 4, and will not be repeated here.
  • step 230 may be performed by an electronic device that trains a neural network.
  • step 230 may include steps 231-233:
  • step 231 the feature parameters in the target image are extracted through a neural network to obtain a feature extraction image.
  • the neural network can extract feature parameters of each target image from at least one target image, thereby obtaining a feature extraction image corresponding to each target image.
  • step 232 on the preset spatial dimension and/or preset channel dimension of the neural network, the weight value corresponding to the second pixel point belonging to the background in the feature extraction image is increased.
  • the neural network can increase the weight value of the second pixel point belonging to the background in the feature extraction image in at least one of the preset space dimension and the preset channel dimension through a self-attention mechanism.
  • the neural network transforms a certain feature extraction image of H (height) ⁇ W (width) ⁇ C (channel) using a spatial self-attention mechanism to obtain an image H ⁇ W ⁇ 1 on the same channel.
  • the neural network transforms a certain feature extraction image of H ⁇ W ⁇ C using the channel self-attention mechanism to obtain an image 1 ⁇ 1 ⁇ C with the same height and width.
  • the neural network ignores the information of the first pixel of the movable object as much as possible, and pays more attention to the information of the second pixel of the background.
  • the image shown in FIG. 13B is obtained.
  • the gray values of pixels in the image shown in FIG. 13B are higher than the gray values of pixels in other parts of the image shown in FIG. 13B.
  • the pixels circled by the dashed box belong to the movable object automobile, and each of the image templates that are as large as the image shown in FIG. 13A can be obtained through the previous step 210.
  • the prior probability that a movable object appears at the pixel point and then step 220 discards all or part of the pixel point in the image shown in FIG. 13A whose sampling value of the prior probability is greater than the preset threshold.
  • the weight values belonging to immovable objects are increased in two dimensions, so that the neural network pays more attention to traffic signs, electric poles and other immovable or low-moving objects, reducing the location where the camera collects images.
  • the effect of the movement of objects in the scene on the result of the positioning of the camera on the movable machine equipment improves the accuracy and precision of the positioning of the camera by the neural network, and improves the robustness of the positioning detection result.
  • step 233 the feature extraction image adjusted by the weight value is analyzed by the neural network to obtain the absolute pose of the camera that collects the image to be processed in the world coordinate system.
  • the neural network can analyze the feature extraction image after the weight value adjustment through regression loss function, such as the mean square error function, the absolute value error function, etc., to obtain the camera that collects at least one sample image in the world coordinate system The absolute pose of the next.
  • regression loss function such as the mean square error function, the absolute value error function, etc.
  • the above-mentioned camera positioning method further includes step 240:
  • step 240 according to the difference between the absolute pose and the predetermined true value of the camera's pose of the image to be processed, the network parameters of the neural network are adjusted to obtain the target neural network by training.
  • this step may be performed by an electronic device that trains a neural network.
  • the camera acquires at least one sample image that is as large as the image template, the true value of the pose is known.
  • the electronic device can use the neural network output to collect at least one sample image of the camera's absolute pose and data in the world coordinate system. Knowing the difference in the true value of the pose, adjust the network parameters of the neural network to minimize the loss function of the neural network, and finally train the desired target neural network.
  • the embodiments of the present disclosure are based on the above-mentioned camera positioning method, and also provide a framework diagram of a target neural network.
  • a target neural network For example, as shown in FIG. 15, it includes Probabilistic Dropout Module (Partial Pixel Dropout Module), Feature Ectractor Module (feature extraction module), Self-attention Module (self-attention module) and Regressor Module (regression module).
  • At least one sample image may be used as the input value of the partial pixel discarding module, and the partial pixel discarding module may be composed of at least five sub-networks connected in sequence.
  • Each sub-network can be implemented separately by using network units set in a preset order, such as a convolutional layer, a Relu layer, and a pooling layer.
  • the first sub-network can perform pixel-level semantic segmentation on each image in at least one sample image; the second sub-network can determine the first part of each sample image belonging to the movable object according to the result of pixel-level semantic segmentation. A pixel and a second pixel belonging to the background; the third sub-network can determine an image template as large as the sample image based on the statistical distribution of the first pixel and the second pixel in each sample image.
  • the fourth sub-network may sample the prior probability corresponding to at least some of the pixels included in at least one sample image to obtain the original The sampling result of the sub-sampling; the fifth sub-network can remove the pixel points with the sampling value of the prior probability greater than the preset threshold T in at least one sample image according to the sampling result of this time to obtain the target image.
  • the feature extraction module can be designed by stacking network units set in a preset order such as convolutional layer, Relu layer, pooling layer, etc. according to the preset structure, and extract the feature parameters in the target image obtained by Probabilistic Dropout Module to obtain feature extraction image.
  • the self-attention module can also be composed of at least two separate fifth and sixth sub-networks.
  • Each sub-network includes a convolutional layer, a Relu layer, a pooling layer and other network units set in a preset order.
  • the fifth The sub-network can focus on the preset spatial dimension
  • the sixth sub-network can focus on the preset channel dimension. After passing through the above two sub-networks, the weight value of the second pixel that belongs to the background in the feature extraction image can be adjusted.
  • the embodiment of the present disclosure does not limit the sequence of the fifth sub-network and the sixth sub-network.
  • the regression module may include a seventh sub-network, the seventh sub-network may include a convolutional layer, a Relu layer, a pooling layer and other network units set in a preset order, and the seventh sub-network takes the image output by the self-attention module as input value , Taking the known pose of the camera that collects at least one sample image as the output value, and the seventh sub-network corresponds to a regression loss function.
  • the regression loss function can include mean square error loss function (such as L2 loss function), average absolute error (such as L1 loss function), smooth average absolute error loss function (such as Huber loss function), log hyperbolic cosine loss function, and score Bit loss function, etc.
  • the target neural network finally obtained reduces the focus on the movable objects on the sample image, and pays more attention to the background pixels on the sample image, that is, the information of immobile or fixed objects.
  • the impact of the pixel points corresponding to the object on the overall imaging quality of the image improves the robustness of the target neural network.
  • the present disclosure also provides an embodiment of a camera positioning device.
  • the embodiments of the present disclosure also provide a camera positioning device, which can be applied to movable machinery and equipment. Since the movable electronic equipment will move, the pose of the camera set on the movable machinery and equipment will change accordingly. The high accuracy of camera positioning can improve the accuracy of mobile machinery and equipment when performing various tasks.
  • FIG. 16 is a block diagram of a camera positioning device according to an exemplary embodiment of the present disclosure.
  • the device includes: an acquisition module 310, configured to acquire each of the multiple pixels included in the image template The prior probability of a movable object appearing at the location; the execution module 320 is configured to perform the operation of discarding some pixels according to the prior probability for the image to be processed as large as the image template to obtain the target image; the positioning module 330, It is used to determine the absolute pose of the camera that collects the image to be processed in the world coordinate system according to the target image.
  • the acquisition module 310 includes: a segmentation sub-module 311, configured to perform pixel-level semantic segmentation on each image in a predetermined image set; and a first determination sub-module 312, configured to According to the result of pixel-level semantic segmentation, determine the first pixel that belongs to the movable object and the second pixel that belongs to the background in each image; the second determination sub-module 313 is used to determine based on the The statistical distribution of the first pixel point and the second pixel point is determined, and all the pixels where the movable object appears at each of the multiple pixel points included in the image template as large as the image in the predetermined image set are determined. State the prior probability.
  • the execution module 320 includes: a sampling sub-module 321, configured to sample the prior probability corresponding to at least some of the pixels included in the image to be processed; and execute;
  • the sub-module 322 is configured to remove the pixel points whose a priori probability sampling value is greater than a preset threshold on the image to be processed to obtain the target image.
  • the positioning module 330 includes a second positioning sub-module 331 for inputting the image to be processed into the target neural network, and obtaining the camera of the image to be processed The absolute pose in the world coordinate system.
  • the image to be processed includes at least two frames of images with time sequence captured by the camera; for example, as shown in FIG. 20, the device further includes: a first determining module 340, configured to Two frames of images determine the relative pose of the camera when shooting the at least two frames of images; the second determining module 350 is configured to determine the correction of the camera according to the relative pose of the camera and the absolute pose Posture.
  • a first determining module 340 configured to Two frames of images determine the relative pose of the camera when shooting the at least two frames of images
  • the second determining module 350 is configured to determine the correction of the camera according to the relative pose of the camera and the absolute pose Posture.
  • the second determining module 350 further includes: a third determining sub-module 351, configured to determine the deterministic probability of the absolute pose; and a fourth determining sub-module 352, configured to In order to determine the first weight of the relative pose and the second weight of the absolute pose according to the deterministic probability; a fifth determination sub-module 353 is configured to determine the relative pose, the first weight, The absolute pose and the second weight determine the corrected pose of the camera.
  • the present disclosure also provides a camera positioning device, which can be applied to an electronic device, and the electronic device can train a neural network to obtain a target neural network. After the image is subsequently input to the target neural network, the absolute pose of the camera that collected the image in the world coordinate system can be obtained.
  • FIG. 22 is a block diagram of a camera positioning device according to an exemplary embodiment of the present disclosure.
  • the device includes: an acquisition module 410, configured to acquire each of the multiple pixels included in the image template The prior probability of a movable object appearing at the location; the execution module 420 is configured to perform the operation of discarding some pixels according to the prior probability for the image to be processed as large as the image template to obtain the target image; the positioning module 430, It is used to determine the absolute pose of the camera that collects the image to be processed in the world coordinate system according to the target image.
  • the acquisition module 410 includes: a segmentation sub-module 411, configured to perform pixel-level semantic segmentation on each image in a predetermined image set; and a first determining sub-module 412, configured to According to the result of pixel-level semantic segmentation, the first pixel that belongs to the movable object and the second pixel that belongs to the background in each image are determined; the second determination sub-module 413 is used to determine the The statistical distribution of the first pixel point and the second pixel point is determined, and all the pixels where the movable object appears at each of the multiple pixels included in the image template that is as large as the image in the predetermined image set are determined. State the prior probability.
  • the execution module 420 includes: a sampling sub-module 421, configured to sample the prior probability corresponding to at least some pixels included in the image to be processed; execute;
  • the sub-module 422 is configured to remove pixels with a priori probability sampling value greater than a preset threshold on the image to be processed to obtain the target image.
  • the positioning module 430 includes: a first processing sub-module 431 for extracting feature parameters in the target image via a neural network to obtain a feature extraction image; a second processing sub-module The module 432 is configured to increase the weight value corresponding to the second pixel point belonging to the background in the feature extraction image in the preset space dimension and/or the preset channel dimension of the neural network; the first positioning sub-module 433 , Used to analyze the feature extraction image after the weight value is adjusted by the neural network, and obtain the absolute pose of the camera that collects the image to be processed in the world coordinate system.
  • the device further includes: a training module 440 configured to determine the true value of the camera's pose and pose according to the absolute pose and the predetermined image to be processed. Differences, adjust the network parameters of the neural network, and train the target neural network.
  • a training module 440 configured to determine the true value of the camera's pose and pose according to the absolute pose and the predetermined image to be processed. Differences, adjust the network parameters of the neural network, and train the target neural network.
  • the relevant part can refer to the part of the description of the method embodiment.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place. , Or it can be distributed to multiple network units.
  • Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the present disclosure. Those of ordinary skill in the art can understand and implement it without creative work.
  • the embodiment of the present disclosure also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute any of the above-mentioned camera positioning methods.
  • the embodiment of the present disclosure also provides a camera positioning device, the device includes: a processor; a memory for storing executable instructions of the processor; wherein the processor is used for calling the executable instructions stored in the memory to implement any one of the aforementioned cameras Positioning method.
  • the camera positioning device provided in the embodiments of the present disclosure can implement the method provided in any of the foregoing embodiments.
  • the camera positioning device can discard some pixels in the image to be processed that are as large as the image template according to the prior probability of a movable object appearing at each of the multiple pixels included in the image template, and then according to the obtained
  • the target image is used to determine the absolute pose of the camera, which reduces the influence of the movement of the object in the scene where the camera collects the image on the result of the positioning of the camera on the movable machinery and equipment, and improves the accuracy of the camera positioning.
  • the camera positioning device provided by the embodiments of the present disclosure can be applied to movable machinery and equipment to locate cameras provided on the movable machinery and equipment. Since the movable machinery and equipment will move, the pose of the camera set on the equipment will change accordingly.
  • the accuracy of camera positioning can improve the accuracy of mobile machinery and equipment when performing various tasks. For example, according to the image of the forward environment of the vehicle collected by the camera installed on the unmanned vehicle, the current location information of the camera can be determined, and the current location information of the vehicle can be located according to the location information of the camera, so that the unmanned vehicle can be The vehicle performs at least one intelligent driving control such as path planning, trajectory tracking, and collision warning.
  • the camera positioning device provided by the present disclosure can also be used on electronic devices for training neural networks, such as cloud platforms, neural network training platforms, and the like.
  • the electronic device uses this method to train the neural network to obtain the target neural network. After the image is subsequently input to the target neural network, the absolute pose of the camera that collected the image in the world coordinate system can be obtained.
  • Fig. 27 is a schematic structural diagram of an electronic device 2700 according to an exemplary embodiment.
  • the electronic device 2700 includes movable machinery and a cloud platform for training neural networks.
  • the electronic device 2700 includes a processing component 2722, which further includes one or more processors, and a memory resource represented by a memory 2732 for storing instructions executable by the processing component 2722, such as application programs.
  • the application program stored in the memory 2732 may include at least one module, and each module corresponds to a set of instructions.
  • the processing component 2722 is used to execute instructions to execute any of the aforementioned camera positioning methods.
  • the electronic device 2700 may further include a power component 2726 for performing power management of the electronic device 2700, a wired or wireless network interface 2750 for connecting the electronic device 2700 to a network, and an input output (I/O) interface 2758.
  • the electronic device 2700 can operate based on an operating system stored in the memory 2732, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeB SDTM or the like.
  • the electronic device 2700 is a movable machine device
  • the electronic device 2700 further includes a camera for capturing images.
  • the electronic device 2700 is a cloud platform for training a neural network
  • the electronic device can communicate with a mobile machine device through the input and output interface 2758.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A camera positioning method and device, and a storage medium. The method comprises: obtaining a prior probability of a movable object appearing at each of multiple pixels comprised in an image template (110); performing, according to the prior probability, an operation of discarding some pixels on an image to be processed that has the same size as the image template to obtain a target image (120); and determining an absolute pose of the camera in a world coordinate system according to the target image (130).

Description

相机定位Camera positioning 技术领域Technical field
本公开涉及计算机视觉领域,具体涉及一种相机定位方法及装置、存储介质。The present disclosure relates to the field of computer vision, in particular to a camera positioning method and device, and storage medium.
背景技术Background technique
视觉定位有着广泛的应用。在实际应用环境中,物体移动等因素可能影响视觉定位的精确度,甚至直接导致视觉定位失败。Visual positioning has a wide range of applications. In the actual application environment, factors such as object movement may affect the accuracy of visual positioning, and even directly cause visual positioning to fail.
发明内容Summary of the invention
本公开提供了一种相机定位方法及装置、存储介质。The present disclosure provides a camera positioning method, device, and storage medium.
根据本公开实施例的第一方面,提供一种相机定位方法,所述方法包括:According to a first aspect of the embodiments of the present disclosure, there is provided a camera positioning method, the method including:
获取图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率;Obtain the prior probability of a movable object appearing at each of the multiple pixels included in the image template;
根据所述先验概率针对与所述图像模板等大的待处理图像执行丢弃部分像素点的操作,得到目标图像;Performing an operation of discarding some pixels for an image to be processed that is as large as the image template according to the prior probability to obtain a target image;
根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿。The absolute pose of the camera that collects the image to be processed in the world coordinate system is determined according to the target image.
根据本公开实施例的第二方面,提供一种相机定位装置,包括:According to a second aspect of the embodiments of the present disclosure, there is provided a camera positioning device, including:
获取模块,用于获取图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率;An acquisition module for acquiring the prior probability of a movable object appearing at each of the multiple pixels included in the image template;
执行模块,用于根据所述先验概率针对与所述图像模板等大的待处理图像执行丢弃部分像素点的操作,得到目标图像;The execution module is configured to perform an operation of discarding some pixels for an image to be processed that is as large as the image template according to the prior probability to obtain a target image;
定位模块,用于根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿。The positioning module is configured to determine, according to the target image, the absolute pose of the camera that collects the image to be processed in the world coordinate system.
根据本公开实施例的第三方面,提供一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述第一方面所述的相机定位方法。According to a third aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, the storage medium stores a computer program, and the computer program is used to execute the camera positioning method described in the first aspect.
根据本公开实施例的第四方面,提供一种相机定位装置,所述装置包括:处理器;用于存储所述处理器可执行指令的存储器。其中,所述处理器用于调用所述存储器中存储的可执行指令,实现第一方面中所述的相机定位方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a camera positioning device, the device comprising: a processor; and a memory for storing executable instructions of the processor. The processor is configured to call executable instructions stored in the memory to implement the camera positioning method described in the first aspect.
本实施例中,可以先获取图像模板所包括的多个像素点中每个像素点处出现可移动物体的先验概率,基于先验概率对与图像模板等大的待处理图像执行丢弃部分像素点的操作以得到目标图像,并根据目标图像确定相机在世界坐标系下的绝对位姿。降低了相机采集图像所在的场景中物体的移动对可移动机器设备上的相机进行定位的结果的影响,提升了相机定位的准确性。In this embodiment, the prior probability that a movable object appears at each of the multiple pixels included in the image template can be obtained first, and some pixels are discarded based on the prior probability of the image to be processed that is as large as the image template. Point operation to obtain the target image, and determine the absolute pose of the camera in the world coordinate system according to the target image. The effect of the movement of the object in the scene where the camera collects the image is reduced on the result of the positioning of the camera on the movable machine equipment, and the accuracy of the camera positioning is improved.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the present disclosure.
附图说明Description of the drawings
图1是本公开根据一示例性实施例示出的相机定位方法的流程图;Fig. 1 is a flowchart of a camera positioning method according to an exemplary embodiment of the present disclosure;
图2是本公开根据一示例性实施例示出的步骤110的流程图;Fig. 2 is a flowchart of step 110 according to an exemplary embodiment of the present disclosure;
图3是本公开根据一示例性实施例示出的图像模板的示意图;Fig. 3 is a schematic diagram showing an image template according to an exemplary embodiment of the present disclosure;
图4是本公开根据一示例性实施例示出的步骤120的流程图;Fig. 4 is a flowchart showing step 120 according to an exemplary embodiment of the present disclosure;
图5是本公开根据另一示例性实施例示出的相机定位方法的流程图;Fig. 5 is a flowchart of a camera positioning method according to another exemplary embodiment of the present disclosure;
图6是本公开根据一示例性实施例示出的步骤150的流程图;Fig. 6 is a flowchart of step 150 according to an exemplary embodiment of the present disclosure;
图7是本公开根据一示例性实施例示出的多个绝对位姿的示意图;Fig. 7 is a schematic diagram showing multiple absolute poses according to an exemplary embodiment of the present disclosure;
图8是本公开根据一示例性实施例示出的确定修正位姿的过程的示意图;Fig. 8 is a schematic diagram showing a process of determining and correcting a pose according to an exemplary embodiment of the present disclosure;
图9是本公开根据一示例性实施例示出的优化后的位姿图的示意图;Fig. 9 is a schematic diagram showing an optimized pose graph according to an exemplary embodiment of the present disclosure;
图10是本公开根据另一示例性实施例示出的相机定位方法的流程图;Fig. 10 is a flowchart of a camera positioning method according to another exemplary embodiment of the present disclosure;
图11是本公开根据一示例性实施例示出的步骤230的流程图;Fig. 11 is a flowchart showing step 230 according to an exemplary embodiment of the present disclosure;
图12A至12B是本公开根据一示例性实施例示出的自注意力机制示意图;12A to 12B are schematic diagrams showing a self-attention mechanism according to an exemplary embodiment of the present disclosure;
图13A是本公开根据一示例性实施例示出的待处理图像的示意图;Fig. 13A is a schematic diagram showing an image to be processed according to an exemplary embodiment of the present disclosure;
图13B是本公开根据一示例性实施例示出的权重值调整后的特征提取图像的示意图;Fig. 13B is a schematic diagram showing a feature extraction image after weight value adjustment according to an exemplary embodiment of the present disclosure;
图14是本公开根据另一示例性实施例示出的相机定位方法流程图;Fig. 14 is a flowchart of a camera positioning method according to another exemplary embodiment of the present disclosure;
图15是本公开根据一示例性实施例示出的目标神经网络框架图;Fig. 15 is a frame diagram of a target neural network according to an exemplary embodiment of the present disclosure;
图16是本公开根据一示例性实施例示出的相机定位装置框图;Fig. 16 is a block diagram showing a camera positioning device according to an exemplary embodiment of the present disclosure;
图17是本公开根据一示例性实施例示出的获取模块的框图;Fig. 17 is a block diagram showing an obtaining module according to an exemplary embodiment of the present disclosure;
图18是本公开根据一示例性实施例示出的执行模块的框图;Fig. 18 is a block diagram showing an execution module according to an exemplary embodiment of the present disclosure;
图19是本公开根据一示例性实施例示出的定位模块的框图;Fig. 19 is a block diagram showing a positioning module according to an exemplary embodiment of the present disclosure;
图20是本公开根据另一示例性实施例示出的相机定位装置框图;Fig. 20 is a block diagram showing a camera positioning device according to another exemplary embodiment of the present disclosure;
图21是本公开根据一示例性实施例示出的第二确定模块的框图;Fig. 21 is a block diagram showing a second determining module according to an exemplary embodiment of the present disclosure;
图22是本公开根据另一示例性实施例示出的相机定位装置框图;Fig. 22 is a block diagram showing a camera positioning device according to another exemplary embodiment of the present disclosure;
图23是本公开根据一示例性实施例示出的获取模块的框图;Fig. 23 is a block diagram showing an obtaining module according to an exemplary embodiment of the present disclosure;
图24是本公开根据一示例性实施例示出的执行模块的框图;Fig. 24 is a block diagram showing an execution module according to an exemplary embodiment of the present disclosure;
图25是本公开根据一示例性实施例示出的定位模块的框图;Fig. 25 is a block diagram showing a positioning module according to an exemplary embodiment of the present disclosure;
图26是本公开根据另一示例性实施例示出的相机定位装置框图;Fig. 26 is a block diagram showing a camera positioning device according to another exemplary embodiment of the present disclosure;
图27是本公开根据一示例性实施例示出的电子设备的结构示意图。Fig. 27 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Here, exemplary embodiments will be described in detail, and examples thereof are shown in the accompanying drawings. When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present disclosure. Rather, they are merely examples of devices and methods consistent with some aspects of the present disclosure as detailed in the appended claims.
在本公开运行的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所运行的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中运行的术语“和/或”是指并包含一张或多张相关联的列出项目的任何或所有可能组合。The terms operating in the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. The singular forms of "a", "said" and "the" used in this disclosure and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items.
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所运行的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of the present disclosure, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information. Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to a certainty".
本公开实施例提供了一种相机定位方法,可以根据图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率,丢弃与图像模板等大的待处理图像中的部分像素点以得到目标图像,再根据目标图像确定相机的绝对位姿,降低了相机采集图像所在的场景中物体的移动对相机定位结果的影响,提升了相机定位的准确性和精度。The embodiments of the present disclosure provide a camera positioning method, which can discard parts of the image to be processed that are as large as the image template according to the prior probability of a movable object appearing at each of the multiple pixels included in the image template The pixel points are used to obtain the target image, and then the absolute pose of the camera is determined according to the target image, which reduces the influence of the movement of objects in the scene where the camera collects the image on the camera positioning result, and improves the accuracy and precision of the camera positioning.
本公开实施例提供的相机定位方法可以应用在可移动机器设备上,对可移动机器设备上设置的相机进行定位。可移动机器设备包括但不限于设置了相机的无人机、无人驾驶车辆、机器人等。The camera positioning method provided by the embodiments of the present disclosure can be applied to a movable machine and equipment to position a camera provided on the movable machine and equipment. Movable machinery and equipment include, but are not limited to, drones, unmanned vehicles, and robots with cameras.
由于可移动机器设备会发生移动,从而会造成设备上设置的相机的位姿随之发生改变。相机定位的准确性可以提高可移动机器设备执行各种任务时的准确度。例如,根据无人驾驶车辆上设置的相机所采集的车辆前向环境的图像,可确定相机当前的定位信息,并根据相机的定位信息来定位车辆当前的定位信息,进而可对该无人驾驶车辆进行路径规划、轨迹跟踪、碰撞预警等至少一种智能驾驶控制。Since the movable machinery and equipment will move, the pose of the camera set on the equipment will change accordingly. The accuracy of camera positioning can improve the accuracy of mobile machinery and equipment when performing various tasks. For example, according to the image of the forward environment of the vehicle collected by the camera installed on the unmanned vehicle, the current location information of the camera can be determined, and the current location information of the vehicle can be located according to the location information of the camera, so that the unmanned vehicle can be The vehicle performs at least one intelligent driving control such as path planning, trajectory tracking, and collision warning.
如图1所示,本公开实施例提供的相机定位方法可以包括以下步骤110-130:As shown in FIG. 1, the camera positioning method provided by the embodiment of the present disclosure may include the following steps 110-130:
在步骤110中,获取图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率。In step 110, the prior probability of a movable object appearing at each of the multiple pixels included in the image template is obtained.
本公开实施例中,图像模板可以是包括有与当前场景对应的、用于记录在与图像模板等大的图像上多个像素点中每个像素点处出现可移动物体的先验概率的模板。可移动物体包括但不限于各种可以自行移动或受控而移动的物体,例如巴士、小车、人、自行车、卡车、摩托车、动物等。先验概率是指通过对以往采集的与当前场景相同或相似的图像进行分析后,得到的该图像上每个像素点属于可移动物体的概率。如果某像素点对应的先验概率较高,说明针对场景采集的图像中在该像素点处出现可移动物体的可能性较高;反之,如果某像素点对应的先验概率较低,说明针对场景采集的图像中在该像素点处出现可移动物体的可能性较低。该图像模板可以反映出所采集的图像中不同的像素点处出现可移动物体的先验的可能性。In the embodiment of the present disclosure, the image template may include a template corresponding to the current scene and used to record the prior probability of a movable object appearing at each of multiple pixels on an image as large as the image template. . Movable objects include, but are not limited to, various objects that can move on their own or under control, such as buses, cars, people, bicycles, trucks, motorcycles, animals, etc. The prior probability refers to the probability that each pixel on the image is a movable object obtained by analyzing the image that is the same or similar to the current scene collected in the past. If the prior probability corresponding to a certain pixel is high, it means that there is a high possibility of a movable object at that pixel in the image collected for the scene; conversely, if the prior probability corresponding to a certain pixel is low, it means that The possibility of a movable object at this pixel point in the image collected by the scene is low. The image template can reflect the a priori possibility of movable objects appearing at different pixels in the collected image.
可以针对与当前场景相同或相似的场景采集的图像集合,分析上述图像集合中每张图像上每个像素点处出现可移动物体的概率,并将这一概率作为当前场景对应的图像模板上每个像素点处出现可移动物体的先验概率。The probability of a movable object appearing at each pixel on each image in the above image collection can be analyzed for a collection of images collected from a scene that is the same or similar to the current scene, and this probability can be used as each image template corresponding to the current scene. The prior probability that a movable object appears at each pixel.
例如,当前场景为无人驾驶车辆在城市主要街道行驶时,若对无人驾驶车辆上设置的相机进行定位,则在与当 前场景相同或相似的场景采集的图像集合可以包括该城市主要街道的至少一张图像。For example, when the current scene is an unmanned vehicle driving on a main street in a city, if the camera installed on the unmanned vehicle is positioned, the image collection collected in the same or similar scene as the current scene can include the main street of the city. At least one image.
在步骤120中,根据所述先验概率针对与所述图像模板等大的待处理图像执行丢弃部分像素点的操作,得到目标图像。In step 120, an operation of discarding some pixels is performed on an image to be processed that is as large as the image template according to the prior probability to obtain a target image.
待处理图像可以是可移动机器设备上设置的相机在该可移动机器设备移动过程中所采集到的至少一张图像。可移动机器设备可以按照与当前场景对应的图像模板上每个像素点对应的先验概率,对可移动机器设备上设置的相机所采集的与图像模板等大的至少一张图像,执行丢弃部分像素点的操作,从而得到目标图像。The image to be processed may be at least one image collected by a camera provided on the movable machine equipment during the movement of the movable machine equipment. The mobile machinery and equipment can perform the discarding part on at least one image that is as large as the image template collected by the camera set on the mobile machinery and equipment according to the prior probability corresponding to each pixel on the image template corresponding to the current scene The operation of the pixel points to obtain the target image.
在本公开实施例中,丢弃部分像素点的操作包括但不限于,对相机所采集的与图像模板等大的至少一张图像上先验概率的采样值大于预设值的像素点全部丢弃或随机部分丢弃。In the embodiment of the present disclosure, the operation of discarding part of the pixels includes, but is not limited to, discarding all or all pixels whose a priori probability sampling value is greater than a preset value on at least one image collected by the camera and the size of the image template. Partially discarded randomly.
在步骤130中,根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿。In step 130, the absolute pose of the camera collecting the image to be processed in the world coordinate system is determined according to the target image.
例如,可移动机器设备可以根据目标图像,通过回归损失函数,确定相机在世界坐标系下的绝对位姿。其中,回归损失函数可以是均方误差损失函数(例如L2损失函数)、平均绝对误差(例如L1损失函数)、平滑平均绝对误差损失函数(例如Huber损失函数)、对数双曲余弦损失函数或分位数损失函数等。For example, the mobile machine equipment can determine the absolute pose of the camera in the world coordinate system according to the target image through regression loss function. Among them, the regression loss function can be a mean square error loss function (such as L2 loss function), average absolute error (such as L1 loss function), smooth average absolute error loss function (such as Huber loss function), logarithmic hyperbolic cosine loss function or Quantile loss function, etc.
上述实施例中,可移动机器设备可以结合与当前场景对应的图像模板上多个像素点中每个像素点处出现可移动物体的先验概率,对当前场景下可移动机器设备上设置的相机所采集的至少一张图像进行部分像素点的丢弃以得到目标图像,并利用目标图像确定相机的绝对位姿,可有效降低当前场景中物体的移动对相机定位的负面影响,提升了相机定位的准确性和精度。In the above embodiment, the movable machinery and equipment can be combined with the prior probability that a movable object appears at each of the multiple pixels on the image template corresponding to the current scene. At least one of the collected images is discarded with some pixels to obtain the target image, and the target image is used to determine the absolute pose of the camera, which can effectively reduce the negative impact of the movement of objects in the current scene on the camera positioning and improve the camera positioning Accuracy and precision.
对于设置在可移动机器设备上的相机,其位姿可由于可移动机器设备的移动和/或相机的位置调整等因素而改变,从而需要对相机进行定位。本公开的发明人发现,如果在相机采集图像的视场中存在物体的移动,则该物体的移动会造成相机所采集图像的相应部分的成像质量不佳,例如出现图像模糊、抖动等,这些质量不佳的部分会影响所采集图像的整体特征的质量,进而影响基于图像整体特征进行相机定位的准确性和精度。然而,所采集图像中某些不动或固定物体对相机定位反而有用。For a camera set on a movable machine equipment, its pose may be changed due to factors such as the movement of the movable machine equipment and/or the position adjustment of the camera, so that the camera needs to be positioned. The inventor of the present disclosure found that if there is movement of an object in the field of view of the image captured by the camera, the movement of the object will cause poor imaging quality of the corresponding part of the image captured by the camera, such as image blur, jitter, etc. Poor quality parts will affect the quality of the overall features of the captured image, and further affect the accuracy and precision of camera positioning based on the overall features of the image. However, some immobile or fixed objects in the captured image are actually useful for camera positioning.
为此,本公开实施例通过结合先验知识确定所采集图像中各个像素点处出现可移动物体的概率(即先验概率),并基于所确定的概率对所采集图像执行部分像素点的丢弃,如丢弃部分出现可移动物体的先验概率较高的像素点,由此可减少这些像素点对图像整体质量的负面影响,从而有利于改善基于局部像素点丢弃后的图像的整体质量进行相机定位的精度。To this end, the embodiments of the present disclosure determine the probability of a movable object appearing at each pixel in the captured image (that is, the prior probability) by combining prior knowledge, and perform partial pixel discarding on the captured image based on the determined probability , Such as discarding some of the pixels with a higher a priori probability of moving objects, which can reduce the negative impact of these pixels on the overall quality of the image, which is beneficial to improve the overall quality of the image after the local pixel is discarded. The accuracy of positioning.
在一些可选实施例中,步骤110可以由电子设备执行,该电子设备可以是可移动机器设备,也可以是对神经网络进行训练的电子设备,例如云平台等,本公开对此不作限定。如图2所示,步骤110可以包括步骤111-113:In some optional embodiments, step 110 may be performed by an electronic device, which may be a mobile machine device, or an electronic device for training a neural network, such as a cloud platform, which is not limited in the present disclosure. As shown in Figure 2, step 110 may include steps 111-113:
在步骤111中,对与当前场景关联的预定图像集合中的每张图像进行像素级语义分割。In step 111, pixel-level semantic segmentation is performed on each image in a predetermined image set associated with the current scene.
本公开实施例中,与当前场景关联的预定图像集合包括在与当前场景相同或相似的场景中采集的多张图片。电子设备可以通过查找预定图像集合中每张图像上存在的内容,来获得每张图像的像素级语义分割结果。例如,假设当前场景为无人驾驶车辆在城市主要街道行驶,则与当前场景关联的预定图像集合可包括如图3所示的图像m1、m2……mN。In the embodiment of the present disclosure, the predetermined image set associated with the current scene includes multiple pictures collected in the same or similar scene as the current scene. The electronic device can obtain the pixel-level semantic segmentation result of each image by searching for the content existing on each image in the predetermined image set. For example, assuming that the current scene is an unmanned vehicle driving on a main street in a city, the predetermined image set associated with the current scene may include images m1, m2,...mN as shown in FIG. 3.
在步骤112中,根据像素级语义分割的结果确定所述每张图像中属于可移动物体的第一像素点和属于背景的第二像素点。In step 112, the first pixel belonging to the movable object and the second pixel belonging to the background in each image are determined according to the result of pixel-level semantic segmentation.
可选地,背景是图像中不可移动的物体,例如图像中除了确定为可移动物体之外的其他物体,例如天空、建筑物、树木、道路等。Optionally, the background is an immovable object in the image, for example, other objects in the image that are not determined to be movable objects, such as sky, buildings, trees, roads, etc.
在步骤113中,基于预定图像集合中每张图像上第一像素点和第二像素点的统计分布,确定与所述预定图像集合中的图像等大的图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率。In step 113, based on the statistical distribution of the first pixel and the second pixel on each image in the predetermined image set, it is determined that each of the multiple pixels included in the image template that is the same size as the image in the predetermined image set The prior probability that a movable object appears at each pixel.
在本公开实施例中,电子设备基于与当前场景关联的预定图像集合中每张图像上属于可移动物体的第一像素点以及属于背景的第二像素点的统计分布,得到的与当前场景对应的图像模板,例如图3中的图像模板M,以记录在当前场景下采集的与图像模板等大的图像中每个像素点处出现可移动物体的先验概率。In the embodiment of the present disclosure, the electronic device obtains a statistical distribution corresponding to the current scene based on the statistical distribution of the first pixel of the movable object and the second pixel of the background in each image in the predetermined image set associated with the current scene An image template, such as the image template M in FIG. 3, is used to record the prior probability that a movable object appears at each pixel in an image that is as large as the image template collected in the current scene.
本公开实施例中,图像模板上记录的每个像素点处出现可移动物体的先验概率是一个统计分布范围,而非一个固定值。在后续根据所述先验概率对与所述图像模板等大的待处理图像执行丢弃部分像素点的操作时,每次可以根据先验概率的统计分布范围丢弃不同的像素点,得到不同的目标图像。并且,根据多个不同的目标图像确定相机的绝对位姿,可得到更好的相机定位结果,尤其是在大规模城市交通场景中。In the embodiment of the present disclosure, the prior probability of a movable object at each pixel recorded on the image template is a statistical distribution range, rather than a fixed value. When the subsequent operation of discarding some pixels of an image to be processed that is as large as the image template is performed according to the prior probability, different pixels can be discarded according to the statistical distribution range of the prior probability each time to obtain different targets. image. In addition, determining the absolute pose of the camera based on multiple different target images can obtain better camera positioning results, especially in large-scale urban traffic scenes.
可选地,图像模板包括的每个像素点处出现可移动物体的先验概率可以符合高斯分布,如公式1所示:Optionally, the prior probability that a movable object appears at each pixel included in the image template may conform to a Gaussian distribution, as shown in formula 1:
p(M(i,j))~N(σ 2(i,j),μ(i,j)),  公式1 p(M(i,j))~N(σ 2 (i,j),μ(i,j)), formula 1
其中,i表示图像模板上第i行的像素点,j表示图像模板上第j列的像素点,(i,j)对应像素点坐标,像素点 (i,j)的数学期望为μ(i,j),
Figure PCTCN2020091768-appb-000001
其中,N是像素点数目,像素点(i,j)的方差为σ 2(i,j),σ 2(i,j)=μ(i,j)(1-μ(i,j)),p(M(i,j))是像素点(i,j)的先验概率。
Among them, i represents the pixel point of the i-th row on the image template, j represents the pixel point of the j-th column on the image template, (i,j) corresponds to the pixel coordinates, and the mathematical expectation of the pixel point (i,j) is μ(i ,j),
Figure PCTCN2020091768-appb-000001
Among them, N is the number of pixels, the variance of the pixel (i,j) is σ 2 (i,j), σ 2 (i,j)=μ(i,j)(1-μ(i,j)) , P(M(i,j)) is the prior probability of pixel (i,j).
在一些可选实施例中,例如图4所示,步骤120可以包括:In some optional embodiments, such as shown in FIG. 4, step 120 may include:
在步骤121中,对所述待处理图像所包括的至少部分像素点对应的先验概率进行采样。In step 121, a priori probability corresponding to at least some pixels included in the image to be processed is sampled.
对于相机采集的至少一个待处理图像,每个待处理图像上每个像素点处出现可移动物体的先验概率的分布满足高斯分布。For at least one to-be-processed image collected by the camera, the distribution of the prior probability that a movable object appears at each pixel on each to-be-processed image satisfies a Gaussian distribution.
对于至少一个待处理图像中的每一个待处理图像,可移动机器设备可以对该待处理图像所包括的至少部分像素点对应的先验概率进行采样,得到本次采样后该待处理图像上的至少部分像素点对应的先验概率的采样值。For each to-be-processed image in at least one to-be-processed image, the mobile machine and equipment can sample the prior probability corresponding to at least some of the pixels included in the to-be-processed image to obtain the image on the to-be-processed image after this sampling. The sampling value of the prior probability corresponding to at least some pixels.
在步骤122中,在所述待处理图像上去除先验概率的采样值大于预设阈值的像素点,得到与本次采样对应的目标图像。In step 122, pixel points whose a priori probability sampling value is greater than a preset threshold are removed from the image to be processed to obtain a target image corresponding to this sampling.
在本次采样结果中,如果待处理图像1上像素点1的先验概率的采样值大于预设阈值,那么认为像素点1属于可移动物体,可移动机器设备可以在待处理图像1上去除像素点1,从而得到与待处理图像1的本次采样对应的目标图像。In this sampling result, if the sampling value of the prior probability of pixel 1 on image 1 to be processed is greater than the preset threshold, then pixel 1 is considered to be a movable object, and movable machinery and equipment can remove it from image 1 to be processed Pixel 1 to obtain the target image corresponding to the current sampling of image 1 to be processed.
对于至少一个待处理图像中的每一个待处理图像,可移动机器设备可以对该待处理图像按照上述方式去除全部先验概率的采样值大于预设阈值的像素点,或者随机去除部分先验概率的采样值大于预设阈值的像素点,得到与该待处理图像的本次采样对应的目标图像。For each to-be-processed image in at least one to-be-processed image, the mobile machine equipment can remove all pixels with a priori probability that the sampling value is greater than a preset threshold for the to-be-processed image in the above-mentioned manner, or randomly remove part of the priori probability For pixel points whose sampling value is greater than the preset threshold value, the target image corresponding to the current sampling of the image to be processed is obtained.
在一些可选实施例中,可移动机器设备如果对待处理图像上像素点对应的先验概率进行多次采样,那么可以让同一个待处理图像上同一个像素点对应的先验概率的每次采样值不同,使得执行丢弃部分像素点的操作之后得到的多个目标图像两两之间存在至少一个不同的像素点。In some optional embodiments, if the mobile machine equipment performs multiple sampling of the prior probability corresponding to the pixel on the image to be processed, the prior probability of the same pixel on the same image to be processed can be The sampling values are different, so that there is at least one different pixel point between the multiple target images obtained after the operation of discarding some pixels.
例如,在第一次采样时,待处理图像1上像素点1对应的先验概率的采样值为P 1,第二次采样时,待处理图像1上像素点1对应的先验概率的采样值为P 2,预设阈值为T。其中,P 1<T<P 2。则在第一次采样之后得到的目标图像保留像素点1,在第二次采样之后得到的目标图像需要去除像素点1。 For example, in the first sampling, the sampling value of the prior probability corresponding to pixel 1 on image 1 to be processed is P 1 , and in the second sampling, the sampling of the prior probability corresponding to pixel 1 on image 1 to be processed The value is P 2 and the preset threshold is T. Among them, P 1 <T<P 2 . Then the target image obtained after the first sampling retains pixel 1, and the target image obtained after the second sampling needs to remove pixel 1.
通过上述过程,可以让可移动机器设备对同一个待处理图像上像素点对应的先验概率进行多次采样,并相应得到多个不同的目标图像用于进行相机定位,有利于保障最终得到的相机定位的准确性。Through the above process, the mobile equipment can sample the prior probabilities corresponding to the pixels on the same image to be processed multiple times, and accordingly obtain multiple different target images for camera positioning, which is beneficial to guarantee the final result The accuracy of camera positioning.
在一些可选实施例中,步骤130可以包括:将所述待处理图像输入目标神经网络,得到所述相机在世界坐标系下的绝对位姿。In some optional embodiments, step 130 may include: inputting the to-be-processed image into a target neural network to obtain the absolute pose of the camera in the world coordinate system.
可移动机器设备可以将待处理图像输入目标神经网络,由目标神经网络直接输出采集该待处理图像的相机在世界坐标系下的绝对位姿。The mobile machine equipment can input the image to be processed into the target neural network, and the target neural network directly outputs the absolute pose of the camera that collects the image to be processed in the world coordinate system.
上述实施例中,可移动机器设备根据图像模板上每个像素点属于可移动物体的先验概率,丢弃了待处理图像上先验概率大于预设值的至少部分像素点,从而提升了相机定位的准确性。In the above-mentioned embodiment, the movable machine equipment discards at least some pixels on the image to be processed with a prior probability greater than a preset value according to the prior probability that each pixel on the image template belongs to the movable object, thereby improving the camera positioning Accuracy.
在一些可选实施例中,如果待处理图像包括所述相机采集的具有时间先后性、也即时序性的k帧图像(k是大于或等于2的整数),则如图5所示,所述方法还包括步骤140-150:In some optional embodiments, if the image to be processed includes k frames of images (k is an integer greater than or equal to 2) that are acquired by the camera in time sequence, that is, time sequence, as shown in FIG. 5, The method also includes steps 140-150:
在步骤140中,根据所述k帧图像确定所述相机在拍摄所述k帧图像时的相对位姿。In step 140, the relative pose of the camera when shooting the k frames of images is determined according to the k frames of images.
本公开实施例中,可移动机器设备可以通过视觉里程计方法,确定相机在采集第k帧图像时,相对于采集k-1 帧图像时的相对位姿。In the embodiments of the present disclosure, the movable machine equipment can use the visual odometry method to determine the relative pose of the camera when acquiring the k-th frame image with respect to the acquisition of the k-1 frame image.
在步骤150中,根据所述相机的相对位姿和绝对位姿,确定所述相机的修正位姿。In step 150, the corrected pose of the camera is determined according to the relative pose and absolute pose of the camera.
本公开实施例中,可移动机器设备可以将相机在采集k帧图像中时序上最靠前的一帧图像(也被称为第一帧图像)时在世界坐标系中的绝对位姿作为参照,根据在采集与第一帧图像相邻的第二帧图像时相机的相对位姿以及绝对位姿,确定出相机的修正位姿。In the embodiment of the present disclosure, the mobile machine equipment can use the camera's absolute pose in the world coordinate system when acquiring the first frame image (also referred to as the first frame image) in the sequence of k frames of images as a reference , According to the relative pose and absolute pose of the camera when the second frame image adjacent to the first frame image is collected, the corrected pose of the camera is determined.
后续可移动机器设备可以根据修正位姿调整相机的位姿,从而降低场景中物体的移动对相机定位的影响,可有利于保障可移动机器设备执行各种任务的准确度。Subsequent movable machinery and equipment can adjust the pose of the camera according to the corrected pose, thereby reducing the impact of the movement of objects in the scene on the positioning of the camera, which can help ensure the accuracy of the movable machinery and equipment in performing various tasks.
在一些可选实施例中,如图6所示,步骤150可具体包括步骤151-153:In some optional embodiments, as shown in FIG. 6, step 150 may specifically include steps 151-153:
在步骤151中,确定所述绝对位姿的确定性概率。In step 151, the deterministic probability of the absolute pose is determined.
本公开实施例中,确定性概率是对所述绝对位姿的结果的准确程度评价。如果确定性概率越高,说明绝对位姿的结果越准确,否则说明绝对位姿的结果越不准确。In the embodiment of the present disclosure, the deterministic probability is an evaluation of the accuracy of the result of the absolute pose. If the probability of certainty is higher, the result of the absolute pose is more accurate, otherwise the result of the absolute pose is less accurate.
可移动机器设备可以采用随机采样的方法,例如蒙特卡洛法,对相机所采集的具有时序性的k帧图像对应的先验概率进行采样,得到多次采样的采样结果。k是大于或等于2的整数。The movable machinery and equipment can adopt a random sampling method, such as Monte Carlo method, to sample the prior probabilities corresponding to the k-frame images with sequential nature collected by the camera to obtain the sampling result of multiple sampling. k is an integer greater than or equal to 2.
例如图7所示,可基于图像模板M包括的每个像素点的先验概率,对当前图像进行多次采样,并基于每次采样对应的目标图像分别确定该当前图像对应的多个绝对位姿。For example, as shown in Figure 7, the current image can be sampled multiple times based on the prior probability of each pixel included in the image template M, and multiple absolute positions corresponding to the current image can be determined based on the target image corresponding to each sample. posture.
根据当前图像对应的多个绝对位姿来确定当前图像对应的绝对位姿的确定性概率。例如,若当前图像对应的多个绝对位姿两两之间差异较大,则可以确定当前图像对应的绝对位姿的确定性概率较低,反之则确定当前图像对应的绝对位姿的确定性概率较高。The deterministic probability of the absolute pose corresponding to the current image is determined according to the multiple absolute poses corresponding to the current image. For example, if the difference between the absolute poses corresponding to the current image is large, it can be determined that the absolute pose corresponding to the current image has a low probability of certainty. Otherwise, the absolute pose corresponding to the current image is determined to be certain The probability is higher.
在步骤152中,根据所述绝对位姿的确定性概率确定所述相对位姿的第一权重和所述绝对位姿的第二权重。In step 152, the first weight of the relative pose and the second weight of the absolute pose are determined according to the deterministic probability of the absolute pose.
本公开实施例中,对于相机采集的具有时序性的k帧图像,可移动机器设备可以根据每帧图像对应的绝对位姿的确定性概率来确定每帧图像对应的相对位姿的第一权重以及每帧图像对应的绝对位姿的第二权重。In the embodiments of the present disclosure, for k frames of images with sequential nature collected by the camera, the movable machine equipment can determine the first weight of the relative pose corresponding to each frame of image according to the deterministic probability of the absolute pose corresponding to each frame of image And the second weight of the absolute pose corresponding to each frame of image.
例如,如果当前图像对应的绝对位姿的确定性概率较高,则可以提高该当前图像对应的绝对位姿的第二权重;如果当前图像对应的绝对位姿的确定性概率较低,可以提高该当前图像对应的相对位姿的第一权重。For example, if the absolute pose corresponding to the current image has a high certainty probability, the second weight of the absolute pose corresponding to the current image can be increased; if the absolute pose corresponding to the current image has a low certainty probability, it can be increased The first weight of the relative pose corresponding to the current image.
在步骤153中,根据所述相对位姿、所述第一权重、所述绝对位姿和所述第二权重,确定所述相机的修正位姿。In step 153, the corrected pose of the camera is determined according to the relative pose, the first weight, the absolute pose, and the second weight.
本公开实施例中,例如图8所示,以具有时序性的k帧图像中第一帧图像对应的绝对位姿为参考,采用滑动窗口的方式依次进行移动,根据第二帧图像对应的相对位姿、第一权重、绝对位姿和第二权重,确定出第二帧图像相对于第一帧图像的修正位姿。In the embodiments of the present disclosure, for example, as shown in FIG. 8, taking the absolute pose corresponding to the first frame of the k-frame image with time sequence as a reference, the sliding window is adopted to move sequentially, and the relative position corresponding to the second frame of image The pose, the first weight, the absolute pose and the second weight determine the corrected pose of the second frame of image relative to the first frame of image.
本公开实施例中,如果相对位姿较为准确,则可以提高相对位姿的权重,如果绝对位姿较为准确,可以提高绝对位姿的权重。这样,通过使相对位姿和绝对位姿各自具有不同权重来确定修正位姿,可使得修正位姿更加准确,也就使得相机定位更加准确。In the embodiments of the present disclosure, if the relative pose is more accurate, the weight of the relative pose can be increased, and if the absolute pose is more accurate, the weight of the absolute pose can be increased. In this way, by making the relative pose and the absolute pose each have different weights to determine the corrected pose, the corrected pose can be made more accurate, and the camera positioning can be more accurate.
通过修正位姿,对最终确定的相机的位姿图进行优化,优化后的位姿图可如图9所示,图9中的三角形代表了相机采集每一帧图像时的绝对位姿,带箭头的线段代表相对位姿,圆圈代表滑动窗口。图9中修正后的绝对位姿和相对位姿按照箭头方向依次对应图8中由左上角到右下角的绝对位姿和相对位姿。By correcting the pose, optimize the final pose map of the camera. The optimized pose map can be shown in Figure 9. The triangle in Figure 9 represents the absolute pose when the camera collects each frame of image, with The arrow line represents the relative pose, and the circle represents the sliding window. The corrected absolute pose and relative pose in Fig. 9 correspond to the absolute pose and relative pose in Fig. 8 from the upper left corner to the lower right corner in sequence according to the arrow direction.
上述实施例中,可以采用VO(Visual Odometry,视觉里程计)方法确定的位姿作为图像对应的相对位姿。其中,VO方法是通过分析上述k帧图像来确定相机的位置和姿态。通过对k帧图像进行特征匹配等方法估计相机在相邻帧间的运动,从而可获得相机在采集后一帧图像时相对于采集前一帧图像时的相对位姿。In the foregoing embodiment, the pose determined by the VO (Visual Odometry) method may be used as the relative pose corresponding to the image. Among them, the VO method is to determine the position and posture of the camera by analyzing the above k frames of images. Estimating the movement of the camera between adjacent frames by performing feature matching on k frames of images, so as to obtain the relative pose of the camera when the next frame is collected compared to the previous frame.
进一步地,在本公开实施例中,结合绝对位姿和相对位姿进行位姿修正,进一步提升了相机定位的精确度。Further, in the embodiments of the present disclosure, the absolute pose and relative pose are combined to perform pose correction, which further improves the accuracy of camera positioning.
在一实施例中,本公开提供的相机定位方法还可以应用于对神经网络进行训练的电子设备上,例如云平台、神经网络训练平台等。由电子设备采用该方法对神经网络进行训练,得到目标神经网络。后续将图像输入目标神经网络之后,可以得到采集该图像的相机在世界坐标系下的绝对位姿。In an embodiment, the camera positioning method provided in the present disclosure can also be applied to electronic devices that train neural networks, such as cloud platforms, neural network training platforms, and so on. The electronic device uses this method to train the neural network to obtain the target neural network. After the image is subsequently input to the target neural network, the absolute pose of the camera that collected the image in the world coordinate system can be obtained.
如图10所示,本公开实施例提供的相机定位方法可以包括以下步骤210-230:As shown in FIG. 10, the camera positioning method provided by the embodiment of the present disclosure may include the following steps 210-230:
在步骤210中,获取图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率。In step 210, the prior probability that a movable object appears at each of the multiple pixels included in the image template is obtained.
在预定图像集合中的每张图像上,已知属于可移动物体的像素点。电子设备可以根据上述每张图像,分析每张图像每个像素点处出现可移动物体的概率,并将这一概率作为与每张图像等大的图像模板上每个像素点处出现可移动物体的先验概率。On each image in the predetermined image set, pixels belonging to a movable object are known. The electronic device can analyze the probability of a movable object appearing at each pixel of each image based on each of the above images, and use this probability as the occurrence of a movable object at each pixel on an image template that is the same size as each image. The prior probability.
在步骤220中,根据所述先验概率针对与所述图像模板等大的待处理图像执行丢弃部分像素点的操作,得到目标图像。In step 220, according to the prior probability, an operation of discarding part of pixels is performed on an image to be processed that is as large as the image template to obtain a target image.
待处理图像可以是至少一张样本图像,电子设备可以按照图像模板上每个像素点对应的先验概率,对至少一张样本图像执行丢弃部分像素点的操作,从而得到目标图像。The image to be processed may be at least one sample image, and the electronic device may perform the operation of discarding some pixels on the at least one sample image according to the prior probability corresponding to each pixel on the image template, so as to obtain the target image.
在本公开实施例中,丢弃部分像素点的操作包括但不限于对至少一张样本图像上先验概率的采样值大于预设值的像素点进行全部丢弃或随机部分丢弃的操作。In the embodiment of the present disclosure, the operation of discarding some pixels includes but is not limited to the operation of discarding all pixels or randomly partially discarding pixels on at least one sample image whose a priori probability sampling value is greater than a preset value.
在步骤230中,根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿。In step 230, the absolute pose of the camera that collects the image to be processed in the world coordinate system is determined according to the target image.
电子设备可以根据得到的目标图像,通过回归损失函数,确定采集至少一个样本图像的相机在世界坐标系下的绝对位姿。The electronic device can determine the absolute pose of the camera that collects at least one sample image in the world coordinate system through the regression loss function according to the obtained target image.
其中,回归损失函数可以是均方误差损失函数(例如L2损失函数)、平均绝对误差(例如L1损失函数)、平滑平均绝对误差损失函数(例如Huber损失函数)、对数双曲余弦损失函数和分位数损失函数等。Among them, the regression loss function can be a mean square error loss function (such as L2 loss function), average absolute error (such as L1 loss function), smooth average absolute error loss function (such as Huber loss function), log hyperbolic cosine loss function, and Quantile loss function, etc.
在一些可选实施例中,步骤210可以由对神经网络进行训练的电子设备执行,执行过程与图2中步骤110的执行过程一致,在此不再赘述。In some optional embodiments, step 210 may be performed by an electronic device that trains a neural network, and the execution process is the same as the execution process of step 110 in FIG. 2, and will not be repeated here.
在一些可选实施例中,步骤220可以由对神经网络进行训练的电子设备执行,执行过程与图4中步骤120的执行过程一致,在此也不再赘述。In some optional embodiments, step 220 may be performed by an electronic device that trains a neural network, and the execution process is the same as that of step 120 in FIG. 4, and will not be repeated here.
在一些可选实施例中,步骤230可以由对神经网络进行训练的电子设备执行,例如图11所示,步骤230可以包括步骤231-233:In some optional embodiments, step 230 may be performed by an electronic device that trains a neural network. For example, as shown in FIG. 11, step 230 may include steps 231-233:
在步骤231中,经神经网络提取所述目标图像中的特征参数,得到特征提取图像。In step 231, the feature parameters in the target image are extracted through a neural network to obtain a feature extraction image.
神经网络可以从至少一个目标图像中提取出每个目标图像的特征参数,从而得到与每个目标图像对应的特征提取图像。The neural network can extract feature parameters of each target image from at least one target image, thereby obtaining a feature extraction image corresponding to each target image.
在步骤232中,在所述神经网络的预设空间维度和/或预设通道维度上,增加所述特征提取图像中属于背景的第二像素点所对应的权重值。In step 232, on the preset spatial dimension and/or preset channel dimension of the neural network, the weight value corresponding to the second pixel point belonging to the background in the feature extraction image is increased.
神经网络可以在预设空间维度和预设通道维度的至少一个维度上,通过自注意力机制增加特征提取图像中属于背景的第二像素点的权重值。The neural network can increase the weight value of the second pixel point belonging to the background in the feature extraction image in at least one of the preset space dimension and the preset channel dimension through a self-attention mechanism.
例如图12A所示,神经网络将H(高度)×W(宽度)×C(通道)的某个特征提取图像采用空间自注意力机制变换后,得到同一通道上的图像H×W×1。再例如图12B所示,神经网络将H×W×C的某个特征提取图像采用 通道自注意力机制变换后,得到相同高度和宽度的图像1×1×C。For example, as shown in FIG. 12A, the neural network transforms a certain feature extraction image of H (height) × W (width) × C (channel) using a spatial self-attention mechanism to obtain an image H × W × 1 on the same channel. For another example, as shown in Figure 12B, the neural network transforms a certain feature extraction image of H×W×C using the channel self-attention mechanism to obtain an image 1×1×C with the same height and width.
神经网络通过自注意力机制,尽可能忽略属于可移动物体的第一像素点的信息,更加关注属于背景的第二像素点的信息。Through the self-attention mechanism, the neural network ignores the information of the first pixel of the movable object as much as possible, and pays more attention to the information of the second pixel of the background.
在神经网络的预设空间维度和预设通道维度上,增加图13A所示的图像上用实线方框圈出的第二像素点的权重值后,得到图13B所示的图像。图13B所示图像中被方框圈出的像素点的灰度值高于图13B所示图像中其他部分的像素点的灰度值。In the preset space dimension and the preset channel dimension of the neural network, after adding the weight value of the second pixel circled with a solid square on the image shown in FIG. 13A, the image shown in FIG. 13B is obtained. The gray values of pixels in the image shown in FIG. 13B are higher than the gray values of pixels in other parts of the image shown in FIG. 13B.
本公开实施例中,在图13A所示图像中,用虚线方框圈出的像素点属于可移动物体汽车,可以通过之前的步骤210获取与图13A所示图像等大的图像模板中每个像素点处出现可移动物体的先验概率,再通过步骤220丢弃掉图13A所示图像中先验概率的采样值大于预设阈值的像素点的全部或部分。In the embodiment of the present disclosure, in the image shown in FIG. 13A, the pixels circled by the dashed box belong to the movable object automobile, and each of the image templates that are as large as the image shown in FIG. 13A can be obtained through the previous step 210. The prior probability that a movable object appears at the pixel point, and then step 220 discards all or part of the pixel point in the image shown in FIG. 13A whose sampling value of the prior probability is greater than the preset threshold.
进一步地,通过步骤232在两个维度上增加属于不可移动物体的权重值,使得神经网络更关注交通标志、电线杠等这些不可移动或者移动概率较低的物体上,降低了相机采集图像所在的场景中物体的移动对可移动机器设备上的相机进行定位的结果的影响,提升了神经网络对相机进行定位的准确性和精度,提升了定位检测结果的鲁棒性。Further, through step 232, the weight values belonging to immovable objects are increased in two dimensions, so that the neural network pays more attention to traffic signs, electric poles and other immovable or low-moving objects, reducing the location where the camera collects images. The effect of the movement of objects in the scene on the result of the positioning of the camera on the movable machine equipment improves the accuracy and precision of the positioning of the camera by the neural network, and improves the robustness of the positioning detection result.
在步骤233中,经神经网络对权重值调整后的特征提取图像进行分析,得到采集所述待处理图像的相机在世界坐标系下的所述绝对位姿。In step 233, the feature extraction image adjusted by the weight value is analyzed by the neural network to obtain the absolute pose of the camera that collects the image to be processed in the world coordinate system.
本公开实施例中,神经网络可以通过回归损失函数,例如均方误差函数、绝对值误差函数等,对权重值调整后的特征提取图像进行分析,得到采集至少一个样本图像的相机在世界坐标系统下的绝对位姿。In the embodiment of the present disclosure, the neural network can analyze the feature extraction image after the weight value adjustment through regression loss function, such as the mean square error function, the absolute value error function, etc., to obtain the camera that collects at least one sample image in the world coordinate system The absolute pose of the next.
在一些可选实施例中,例如图14所示,在进行神经网络训练的过程中,上述相机定位方法还包括步骤240:In some optional embodiments, such as shown in FIG. 14, during the process of neural network training, the above-mentioned camera positioning method further includes step 240:
在步骤240中,根据所述绝对位姿和预先确定的所述待处理图像的所述相机的位姿真值的差异,调整神经网络的网络参数,训练得到目标神经网络。In step 240, according to the difference between the absolute pose and the predetermined true value of the camera's pose of the image to be processed, the network parameters of the neural network are adjusted to obtain the target neural network by training.
本公开实施例中,本步骤可以由对神经网络进行训练的电子设备执行。相机在采集与图像模板等大的至少一张样本图像时的位姿真值已知,电子设备可以根据神经网络输出的采集至少一张样本图像的相机在世界坐标系统中的绝对位姿和已知的位姿真值的差异,调整神经网络的网络参数,让该神经网络的损失函数最小,最终训练得到所需要的目标神经网络。In the embodiment of the present disclosure, this step may be performed by an electronic device that trains a neural network. When the camera acquires at least one sample image that is as large as the image template, the true value of the pose is known. The electronic device can use the neural network output to collect at least one sample image of the camera's absolute pose and data in the world coordinate system. Knowing the difference in the true value of the pose, adjust the network parameters of the neural network to minimize the loss function of the neural network, and finally train the desired target neural network.
在一些可选实施例中,本公开实施例基于上述相机定位方法,还提供了一种目标神经网络的框架图,例如图15所示,包括Probabilistic Dropout Module(部分像素点丢弃模块)、Feature Ectractor Module(特征提取模块)、Self-attention Module(自注意力模块)和Regressor Module(回归模块)。In some optional embodiments, the embodiments of the present disclosure are based on the above-mentioned camera positioning method, and also provide a framework diagram of a target neural network. For example, as shown in FIG. 15, it includes Probabilistic Dropout Module (Partial Pixel Dropout Module), Feature Ectractor Module (feature extraction module), Self-attention Module (self-attention module) and Regressor Module (regression module).
其中,在目标神经网络的训练过程中,可以将至少一个样本图像作为部分像素点丢弃模块的输入值,部分像素点丢弃模块可以由顺序连接的至少五个子网络组成。每个子网络可以采用卷积层、Relu层、池化层等按照预设顺序设置的网络单元单独实现。In the training process of the target neural network, at least one sample image may be used as the input value of the partial pixel discarding module, and the partial pixel discarding module may be composed of at least five sub-networks connected in sequence. Each sub-network can be implemented separately by using network units set in a preset order, such as a convolutional layer, a Relu layer, and a pooling layer.
第一子网络可以对至少一张样本图像中的每张图像分别进行像素级语义分割;第二子网络可以根据像素级语义分割的结果,确定每张样本图像中属于所述可移动物体的第一像素点和属于背景的第二像素点;第三子网络可以基于每张样本图像中所述第一像素点和所述第二像素点的统计分布,确定与样本图像等大的图像模板包括的多个像素点中每个像素点处出现所述可移动物体的先验概率;第四子网络可以对至少一张样本图像所包括的至少部分像素点对应的先验概率进行采样,得到本次采样的采样结果;第五子网络可以根据本次采样结果,在至少一张样本图像去除先验概率的采样值大于预设阈值T的像素点,得到所述目标图像。The first sub-network can perform pixel-level semantic segmentation on each image in at least one sample image; the second sub-network can determine the first part of each sample image belonging to the movable object according to the result of pixel-level semantic segmentation. A pixel and a second pixel belonging to the background; the third sub-network can determine an image template as large as the sample image based on the statistical distribution of the first pixel and the second pixel in each sample image. The prior probability of the movable object appearing at each pixel in the plurality of pixels; the fourth sub-network may sample the prior probability corresponding to at least some of the pixels included in at least one sample image to obtain the original The sampling result of the sub-sampling; the fifth sub-network can remove the pixel points with the sampling value of the prior probability greater than the preset threshold T in at least one sample image according to the sampling result of this time to obtain the target image.
特征提取模块可以采用卷积层、Relu层、池化层等按照预设顺序设置的网络单元按照预设的结构堆叠设计而得,提取Probabilistic Dropout Module得到的目标图像中的特征参数,得到特征提取图像。The feature extraction module can be designed by stacking network units set in a preset order such as convolutional layer, Relu layer, pooling layer, etc. according to the preset structure, and extract the feature parameters in the target image obtained by Probabilistic Dropout Module to obtain feature extraction image.
自注意力模块同样可以采用至少两个单独的第五子网络和第六子网络组成,每个子网络包括卷积层、Relu层、池化层等按照预设顺序设置的网络单元,其中第五子网络可以关注预设空间维度,第六子网络可以关注预设通道维度,经过上述两个子网络后可以调整特征提取图像中属于背景的第二像素点的权重值。本公开实施例不限定第五子网络和第六子网络的先后顺序。The self-attention module can also be composed of at least two separate fifth and sixth sub-networks. Each sub-network includes a convolutional layer, a Relu layer, a pooling layer and other network units set in a preset order. The fifth The sub-network can focus on the preset spatial dimension, and the sixth sub-network can focus on the preset channel dimension. After passing through the above two sub-networks, the weight value of the second pixel that belongs to the background in the feature extraction image can be adjusted. The embodiment of the present disclosure does not limit the sequence of the fifth sub-network and the sixth sub-network.
回归模块可以包括第七子网络,第七子网络可以包括卷积层、Relu层、池化层等按照预设顺序设置的网络单元,第七子网络以自注意力模块输出的图像作为输入值,将已知的采集至少一张样本图像的相机的位姿作为输出值,第七子网络对应一回归损失函数。该回归损失函数可以包括均方误差损失函数(例如L2损失函数)、平均绝对误差(例如L1损失函数)、平滑平均绝对误差损失函数(例如Huber损失函数)、对数双曲余弦损失函数和分位数损失函数等。The regression module may include a seventh sub-network, the seventh sub-network may include a convolutional layer, a Relu layer, a pooling layer and other network units set in a preset order, and the seventh sub-network takes the image output by the self-attention module as input value , Taking the known pose of the camera that collects at least one sample image as the output value, and the seventh sub-network corresponds to a regression loss function. The regression loss function can include mean square error loss function (such as L2 loss function), average absolute error (such as L1 loss function), smooth average absolute error loss function (such as Huber loss function), log hyperbolic cosine loss function, and score Bit loss function, etc.
上述实施例中,最终得到的目标神经网络降低了对样本图像上可移动物体的关注,更多的关注样本图像上属于背景的像素点,即不动或固定物体的信息,通过减小可移动物体对应的像素点对图像整体的成像质量的影响,提升了目标神经网络的鲁棒性。In the above embodiment, the target neural network finally obtained reduces the focus on the movable objects on the sample image, and pays more attention to the background pixels on the sample image, that is, the information of immobile or fixed objects. The impact of the pixel points corresponding to the object on the overall imaging quality of the image improves the robustness of the target neural network.
与前述方法实施例相对应,本公开还提供了相机定位装置的实施例。Corresponding to the foregoing method embodiment, the present disclosure also provides an embodiment of a camera positioning device.
本公开实施例还提供了一种相机定位装置,可以应用于可移动机器设备,由于可移动电子设备会发生移动,从而会造成可移动机器设备上设置的相机的位姿随之发生改变。相机定位的高准确性可以提高可移动机器设备执行各种任务时的准确度。The embodiments of the present disclosure also provide a camera positioning device, which can be applied to movable machinery and equipment. Since the movable electronic equipment will move, the pose of the camera set on the movable machinery and equipment will change accordingly. The high accuracy of camera positioning can improve the accuracy of mobile machinery and equipment when performing various tasks.
如图16所示,图16是本公开根据一示例性实施例示出的一种相机定位装置框图,该装置包括:获取模块310,用于获取图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率;执行模块320,用于根据所述先验概率针对与所述图像模板等大的待处理图像执行丢弃部分像素点的操作,得到目标图像;定位模块330,用于根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿。As shown in FIG. 16, FIG. 16 is a block diagram of a camera positioning device according to an exemplary embodiment of the present disclosure. The device includes: an acquisition module 310, configured to acquire each of the multiple pixels included in the image template The prior probability of a movable object appearing at the location; the execution module 320 is configured to perform the operation of discarding some pixels according to the prior probability for the image to be processed as large as the image template to obtain the target image; the positioning module 330, It is used to determine the absolute pose of the camera that collects the image to be processed in the world coordinate system according to the target image.
在一些实施例中,例如图17所示,所述获取模块310包括:分割子模块311,用于对预定图像集合中的每张图像进行像素级语义分割;第一确定子模块312,用于根据像素级语义分割的结果确定所述每张图像中属于可移动物体的第一像素点和属于背景的第二像素点;第二确定子模块313,用于基于所述每张图像中所述第一像素点和所述第二像素点的统计分布,确定与所述预定图像集合中的图像等大的图像模板包括的多个像素点中每个像素点处出现所述可移动物体的所述先验概率。In some embodiments, as shown in FIG. 17, for example, the acquisition module 310 includes: a segmentation sub-module 311, configured to perform pixel-level semantic segmentation on each image in a predetermined image set; and a first determination sub-module 312, configured to According to the result of pixel-level semantic segmentation, determine the first pixel that belongs to the movable object and the second pixel that belongs to the background in each image; the second determination sub-module 313 is used to determine based on the The statistical distribution of the first pixel point and the second pixel point is determined, and all the pixels where the movable object appears at each of the multiple pixel points included in the image template as large as the image in the predetermined image set are determined. State the prior probability.
在一些实施例中,例如图18所示,所述执行模块320包括:采样子模块321,用于对所述待处理图像所包括的至少部分像素点对应的所述先验概率进行采样;执行子模块322,用于在所述待处理图像上去除先验概率的采样值大于预设阈值的像素点,得到所述目标图像。In some embodiments, for example, as shown in FIG. 18, the execution module 320 includes: a sampling sub-module 321, configured to sample the prior probability corresponding to at least some of the pixels included in the image to be processed; and execute; The sub-module 322 is configured to remove the pixel points whose a priori probability sampling value is greater than a preset threshold on the image to be processed to obtain the target image.
在一些实施例中,采样次数为多次时,执行丢弃部分像素点的操作之后得到的多个目标图像两两之间存在至少一个不同的像素点。In some embodiments, when the number of sampling times is multiple, there is at least one different pixel point between each of the multiple target images obtained after the operation of discarding some pixels.
在一些实施例中,如图19所示,所述定位模块330包括:第二定位子模块331,用于将所述待处理图像输入所述目标神经网络,得到所述待处理图像的相机在世界坐标系下的所述绝对位姿。In some embodiments, as shown in FIG. 19, the positioning module 330 includes a second positioning sub-module 331 for inputting the image to be processed into the target neural network, and obtaining the camera of the image to be processed The absolute pose in the world coordinate system.
在一些实施例中,所述待处理图像包括所述相机采集的具有时序性的至少两帧图像;例如图20所示,所述装置还包括:第一确定模块340,用于根据所述至少两帧图像确定所述相机在拍摄所述至少两帧图像时的相对位姿;第二确定模块350,用于根据所述相机的相对位姿和所述绝对位姿,确定所述相机的修正位姿。In some embodiments, the image to be processed includes at least two frames of images with time sequence captured by the camera; for example, as shown in FIG. 20, the device further includes: a first determining module 340, configured to Two frames of images determine the relative pose of the camera when shooting the at least two frames of images; the second determining module 350 is configured to determine the correction of the camera according to the relative pose of the camera and the absolute pose Posture.
在一些实施例中,例如图21所示,所述第二确定模块350还包括:第三确定子模块351,用于确定所述绝对位姿的确定性概率;第四确定子模块352,用于根据所述确定性概率确定所述相对位姿的第一权重和所述绝对位姿的第二权重;第五确定子模块353,用于根据所述相对位姿、所述第一权重、所述绝对位姿和所述第二权重,确定所述相 机的修正位姿。In some embodiments, such as shown in FIG. 21, the second determining module 350 further includes: a third determining sub-module 351, configured to determine the deterministic probability of the absolute pose; and a fourth determining sub-module 352, configured to In order to determine the first weight of the relative pose and the second weight of the absolute pose according to the deterministic probability; a fifth determination sub-module 353 is configured to determine the relative pose, the first weight, The absolute pose and the second weight determine the corrected pose of the camera.
在一些可选实施例中,本公开还提供了一种相机定位装置,可以应用于电子设备,该电子设备可以对神经网络进行训练,得到目标神经网络。后续将图像输入目标神经网络之后,可以得到采集该图像的相机在世界坐标系下的绝对位姿。In some optional embodiments, the present disclosure also provides a camera positioning device, which can be applied to an electronic device, and the electronic device can train a neural network to obtain a target neural network. After the image is subsequently input to the target neural network, the absolute pose of the camera that collected the image in the world coordinate system can be obtained.
如图22所示,图22是本公开根据一示例性实施例示出的一种相机定位装置框图,该装置包括:获取模块410,用于获取图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率;执行模块420,用于根据所述先验概率针对与所述图像模板等大的待处理图像执行丢弃部分像素点的操作,得到目标图像;定位模块430,用于根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿。As shown in FIG. 22, FIG. 22 is a block diagram of a camera positioning device according to an exemplary embodiment of the present disclosure. The device includes: an acquisition module 410, configured to acquire each of the multiple pixels included in the image template The prior probability of a movable object appearing at the location; the execution module 420 is configured to perform the operation of discarding some pixels according to the prior probability for the image to be processed as large as the image template to obtain the target image; the positioning module 430, It is used to determine the absolute pose of the camera that collects the image to be processed in the world coordinate system according to the target image.
在一些实施例中,例如图23所示,所述获取模块410包括:分割子模块411,用于对预定图像集合中的每张图像进行像素级语义分割;第一确定子模块412,用于根据像素级语义分割的结果确定所述每张图像中属于可移动物体的第一像素点和属于背景的第二像素点;第二确定子模块413,用于基于所述每张图像中所述第一像素点和所述第二像素点的统计分布,确定与所述预定图像集合中的图像等大的图像模板包括的多个像素点中每个像素点处出现所述可移动物体的所述先验概率。In some embodiments, such as shown in FIG. 23, the acquisition module 410 includes: a segmentation sub-module 411, configured to perform pixel-level semantic segmentation on each image in a predetermined image set; and a first determining sub-module 412, configured to According to the result of pixel-level semantic segmentation, the first pixel that belongs to the movable object and the second pixel that belongs to the background in each image are determined; the second determination sub-module 413 is used to determine the The statistical distribution of the first pixel point and the second pixel point is determined, and all the pixels where the movable object appears at each of the multiple pixels included in the image template that is as large as the image in the predetermined image set are determined. State the prior probability.
在一些实施例中,例如图24所示,所述执行模块420包括:采样子模块421,用于对所述待处理图像所包括的至少部分像素点对应的所述先验概率进行采样;执行子模块422,用于在所述待处理图像上去除先验概率的采样值大于预设阈值的像素点,得到所述目标图像。In some embodiments, for example, as shown in FIG. 24, the execution module 420 includes: a sampling sub-module 421, configured to sample the prior probability corresponding to at least some pixels included in the image to be processed; execute; The sub-module 422 is configured to remove pixels with a priori probability sampling value greater than a preset threshold on the image to be processed to obtain the target image.
在一些实施例中,采样次数为多次时,执行丢弃部分像素点的操作之后得到的多个目标图像两两之间存在至少一个不同的像素点。In some embodiments, when the number of sampling times is multiple, there is at least one different pixel point between each of the multiple target images obtained after the operation of discarding some pixels.
在一些实施例中,例如图25所示,所述定位模块430包括:第一处理子模块431,用于经神经网络提取所述目标图像中的特征参数,得到特征提取图像;第二处理子模块432,用于在所述神经网络的预设空间维度和/或预设通道维度上,增加所述特征提取图像中属于背景的第二像素点所对应的权重值;第一定位子模块433,用于经神经网络对权重值调整后的特征提取图像进行分析,得到采集所述待处理图像的相机在世界坐标系下的所述绝对位姿。In some embodiments, such as shown in FIG. 25, the positioning module 430 includes: a first processing sub-module 431 for extracting feature parameters in the target image via a neural network to obtain a feature extraction image; a second processing sub-module The module 432 is configured to increase the weight value corresponding to the second pixel point belonging to the background in the feature extraction image in the preset space dimension and/or the preset channel dimension of the neural network; the first positioning sub-module 433 , Used to analyze the feature extraction image after the weight value is adjusted by the neural network, and obtain the absolute pose of the camera that collects the image to be processed in the world coordinate system.
在一些实施例中,例如图26所示,所述装置还包括:训练模块440,用于根据所述绝对位姿和预先确定的采集所述待处理图像的所述相机的位姿真值的差异,调整神经网络的网络参数,训练得到目标神经网络。In some embodiments, for example, as shown in FIG. 26, the device further includes: a training module 440 configured to determine the true value of the camera's pose and pose according to the absolute pose and the predetermined image to be processed. Differences, adjust the network parameters of the neural network, and train the target neural network.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本公开方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。For the device embodiment, since it basically corresponds to the method embodiment, the relevant part can refer to the part of the description of the method embodiment. The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place. , Or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the present disclosure. Those of ordinary skill in the art can understand and implement it without creative work.
本公开实施例还提供了一种计算机可读存储介质,存储介质存储有计算机程序,计算机程序用于执行上述任一的相机定位方法。The embodiment of the present disclosure also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute any of the above-mentioned camera positioning methods.
本公开实施例还提供了一种相机定位装置,装置包括:处理器;用于存储处理器可执行指令的存储器;其中,处理器用于调用存储器中存储的可执行指令,实现上述任一的相机定位方法。The embodiment of the present disclosure also provides a camera positioning device, the device includes: a processor; a memory for storing executable instructions of the processor; wherein the processor is used for calling the executable instructions stored in the memory to implement any one of the aforementioned cameras Positioning method.
本公开实施例中提供的相机定位装置可以实现上述任一个实施例提供的方法。该相机定位装置,可以根据图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率,丢弃与图像模板等大的待处理图像中的部分像素点,再根据得到的目标图像去确定相机的绝对位姿,降低了相机采集图像所在的场景中物体的移动对可移动机器设备上的相机进行定位的结果的影响,提升了相机定位的准确性。The camera positioning device provided in the embodiments of the present disclosure can implement the method provided in any of the foregoing embodiments. The camera positioning device can discard some pixels in the image to be processed that are as large as the image template according to the prior probability of a movable object appearing at each of the multiple pixels included in the image template, and then according to the obtained The target image is used to determine the absolute pose of the camera, which reduces the influence of the movement of the object in the scene where the camera collects the image on the result of the positioning of the camera on the movable machinery and equipment, and improves the accuracy of the camera positioning.
本公开实施例提供的相机定位装置可以应用在可移动机器设备上,对可移动机器设备上设置的相机进行定位。由于可移动机器设备会发生移动,从而会造成设备上设置的相机的位姿随之发生改变。相机定位的准确性可以提高可移动机器设备执行各种任务时的准确度。例如,根据无人驾驶车辆上设置的相机所采集的车辆前向环境的图像,可确定相机当前的定位信息,并根据相机的定位信息来定位车辆当前的定位信息,进而可对该无人驾驶车辆进行路径规划、轨迹跟踪、碰撞预警等至少一种智能驾驶控制。The camera positioning device provided by the embodiments of the present disclosure can be applied to movable machinery and equipment to locate cameras provided on the movable machinery and equipment. Since the movable machinery and equipment will move, the pose of the camera set on the equipment will change accordingly. The accuracy of camera positioning can improve the accuracy of mobile machinery and equipment when performing various tasks. For example, according to the image of the forward environment of the vehicle collected by the camera installed on the unmanned vehicle, the current location information of the camera can be determined, and the current location information of the vehicle can be located according to the location information of the camera, so that the unmanned vehicle can be The vehicle performs at least one intelligent driving control such as path planning, trajectory tracking, and collision warning.
本公开提供的相机定位装置还可以用于对神经网络进行训练的电子设备上,例如云平台、神经网络训练平台等。由电子设备采用该方法对神经网络进行训练,得到目标神经网络。后续将图像输入目标神经网络之后,可以得到采集该图像的相机在世界坐标系下的绝对位姿。The camera positioning device provided by the present disclosure can also be used on electronic devices for training neural networks, such as cloud platforms, neural network training platforms, and the like. The electronic device uses this method to train the neural network to obtain the target neural network. After the image is subsequently input to the target neural network, the absolute pose of the camera that collected the image in the world coordinate system can be obtained.
如图27所示,图27是根据一示例性实施例示出的电子设备2700的结构示意图。该电子设备2700包括可移动机器设备和对神经网络进行训练的云平台。As shown in Fig. 27, Fig. 27 is a schematic structural diagram of an electronic device 2700 according to an exemplary embodiment. The electronic device 2700 includes movable machinery and a cloud platform for training neural networks.
参照图27,电子设备2700包括处理组件2722,其进一步包括一个或多个处理器,以及由存储器2732所代表的存储器资源,用于存储可由处理组件2722的执行的指令,例如应用程序。存储器2732中存储的应用程序可以包括至少一个模块,各个模块对应于一组指令。此外,处理组件2722用于执行指令,以执行上述任一的相机定位方法。27, the electronic device 2700 includes a processing component 2722, which further includes one or more processors, and a memory resource represented by a memory 2732 for storing instructions executable by the processing component 2722, such as application programs. The application program stored in the memory 2732 may include at least one module, and each module corresponds to a set of instructions. In addition, the processing component 2722 is used to execute instructions to execute any of the aforementioned camera positioning methods.
电子设备2700还可以包括电源组件2726用于执行电子设备2700的电源管理,有线或无线网络接口2750用于将电子设备2700连接到网络,和输入输出(I/O)接口2758。电子设备2700可以操作基于存储在存储器2732的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeB SDTM或类似。当电子设备2700为可移动机器设备时,电子设备2700还包括用于采集图像的相机。当电子设备2700为对神经网络进行训练的云平台时,电子设备可以通过该输入输出接口2758与一可移动机器设备通信。The electronic device 2700 may further include a power component 2726 for performing power management of the electronic device 2700, a wired or wireless network interface 2750 for connecting the electronic device 2700 to a network, and an input output (I/O) interface 2758. The electronic device 2700 can operate based on an operating system stored in the memory 2732, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeB SDTM or the like. When the electronic device 2700 is a movable machine device, the electronic device 2700 further includes a camera for capturing images. When the electronic device 2700 is a cloud platform for training a neural network, the electronic device can communicate with a mobile machine device through the input and output interface 2758.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或者惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Those skilled in the art will easily think of other embodiments of the present disclosure after considering the specification and practicing the invention disclosed herein. The present disclosure is intended to cover any variations, uses, or adaptive changes of the present disclosure. These variations, uses, or adaptive changes follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure. . The description and the embodiments are to be regarded as exemplary only, and the true scope and spirit of the present disclosure are pointed out by the following claims.
以上所述仅为本公开的较佳实施例而已,并不用以限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开保护的范围之内。The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure shall be included in the present disclosure Within the scope of protection.

Claims (20)

  1. 一种相机定位方法,包括:A camera positioning method includes:
    获取图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率;Obtain the prior probability of a movable object appearing at each of the multiple pixels included in the image template;
    根据所述先验概率针对与所述图像模板等大的待处理图像执行丢弃部分像素点的操作,得到目标图像;Performing an operation of discarding some pixels for an image to be processed that is as large as the image template according to the prior probability to obtain a target image;
    根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿。The absolute pose of the camera that collects the image to be processed in the world coordinate system is determined according to the target image.
  2. 根据权利要求1所述的方法,其特征在于,获取所述图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率,包括:The method according to claim 1, wherein obtaining the prior probability of a movable object at each of the plurality of pixels included in the image template comprises:
    对预定图像集合中的每张图像进行像素级语义分割;Perform pixel-level semantic segmentation on each image in the predetermined image set;
    根据像素级语义分割的结果确定所述每张图像中属于可移动物体的第一像素点和属于背景的第二像素点;Determining, according to the result of pixel-level semantic segmentation, the first pixel that belongs to the movable object and the second pixel that belongs to the background in each image;
    基于所述每张图像中所述第一像素点和所述第二像素点的统计分布,确定与所述预定图像集合中的图像等大的图像模板包括的多个像素点中每个像素点处出现所述可移动物体的所述先验概率。Based on the statistical distribution of the first pixel point and the second pixel point in each image, determine each pixel point in a plurality of pixels included in an image template that is as large as an image in the predetermined image set The prior probability that the movable object appears at.
  3. 根据权利要求1或2所述的方法,其特征在于,根据所述先验概率对所述待处理图像执行丢弃部分像素点的操作,得到目标图像,包括:The method according to claim 1 or 2, wherein the step of discarding some pixels of the image to be processed according to the prior probability to obtain the target image comprises:
    对所述待处理图像所包括的至少部分像素点对应的先验概率进行采样;Sampling the prior probabilities corresponding to at least some of the pixels included in the image to be processed;
    在所述待处理图像上去除先验概率的采样值大于预设阈值的像素点,得到所述目标图像。Remove the pixel points with a priori probability sampling value greater than a preset threshold from the image to be processed to obtain the target image.
  4. 根据权利要求3所述的方法,其特征在于,采样次数为多次时,执行丢弃部分像素点的操作之后得到的多个目标图像两两之间存在至少一个不同的像素点。The method according to claim 3, wherein when the number of sampling times is multiple times, there is at least one different pixel point between the multiple target images obtained after the operation of discarding some pixels.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿,包括:The method according to any one of claims 1 to 4, wherein determining, according to the target image, the absolute pose of the camera that collects the image to be processed in the world coordinate system comprises:
    经神经网络提取所述目标图像中的特征参数,得到特征提取图像;Extracting feature parameters in the target image through a neural network to obtain a feature extraction image;
    在所述神经网络的预设空间维度和/或预设通道维度上,增加所述特征提取图像中属于背景的第二像素点所对应的权重值;Adding a weight value corresponding to a second pixel that belongs to the background in the feature extraction image on the preset space dimension and/or the preset channel dimension of the neural network;
    经所述神经网络对权重值调整后的特征提取图像进行分析,得到采集所述待处理图像的相机在世界坐标系下的所述绝对位姿。The neural network analyzes the feature extraction image after the weight value is adjusted to obtain the absolute pose of the camera that collects the image to be processed in the world coordinate system.
  6. 根据权利要求5所述的方法,其特征在于,经所述神经网络对权重值调整后的特征提取图像进行分析,得到采集所述待处理图像的相机在世界坐标系下的所述绝对位姿之后,所述方法还包括:The method according to claim 5, wherein the feature extraction image after the weight value adjustment is analyzed by the neural network to obtain the absolute pose of the camera collecting the image to be processed in the world coordinate system After that, the method further includes:
    根据所述绝对位姿和预先确定的采集所述待处理图像的所述相机的位姿真值的差异,调整所述神经网络的网络参数,训练得到目标神经网络。Adjust the network parameters of the neural network according to the difference between the absolute pose and the predetermined true value of the camera that collects the image to be processed to obtain a target neural network through training.
  7. 根据权利要求6所述的方法,其特征在于,根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿,包括:The method according to claim 6, wherein determining the absolute pose of the camera that collects the image to be processed in the world coordinate system according to the target image comprises:
    将所述待处理图像输入所述目标神经网络,得到采集所述待处理图像的相机在世界坐标系下的所述绝对位姿。The image to be processed is input into the target neural network to obtain the absolute pose of the camera that collects the image to be processed in the world coordinate system.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述待处理图像包括所述相机采集的具有时序性的至少两帧图像;The method according to any one of claims 1-7, wherein the image to be processed comprises at least two frames of images with time series collected by the camera;
    根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿之后,所述方法还包括:After determining the absolute pose of the camera that collects the image to be processed in the world coordinate system according to the target image, the method further includes:
    根据所述至少两帧图像确定所述相机在拍摄所述至少两帧图像时的相对位姿;Determining, according to the at least two frames of images, the relative pose of the camera when shooting the at least two frames of images;
    根据所述相机的相对位姿和所述绝对位姿,确定所述相机的修正位姿。Determine the corrected pose of the camera according to the relative pose of the camera and the absolute pose.
  9. 根据权利要求8所述的方法,其特征在于,根据所述相机的相对位姿和所述绝对位姿,确定所述相机的修正位姿包括:The method according to claim 8, wherein determining the corrected pose of the camera according to the relative pose of the camera and the absolute pose comprises:
    确定所述绝对位姿的确定性概率;Determining the certainty probability of the absolute pose;
    根据所述确定性概率确定所述相对位姿的第一权重和所述绝对位姿的第二权重;Determining the first weight of the relative pose and the second weight of the absolute pose according to the certainty probability;
    根据所述相对位姿、所述第一权重、所述绝对位姿和所述第二权重,确定所述相机的修正位姿。Determine the corrected pose of the camera according to the relative pose, the first weight, the absolute pose, and the second weight.
  10. 一种相机定位装置,包括:A camera positioning device includes:
    获取模块,用于获取图像模板包括的多个像素点中每个像素点处出现可移动物体的先验概率;An acquisition module for acquiring the prior probability of a movable object appearing at each of the multiple pixels included in the image template;
    执行模块,用于根据所述先验概率针对与所述图像模板等大的待处理图像执行丢弃部分像素点的操作,得到目标图像;The execution module is configured to perform an operation of discarding some pixels for an image to be processed that is as large as the image template according to the prior probability to obtain a target image;
    定位模块,用于根据所述目标图像确定采集所述待处理图像的相机在世界坐标系下的绝对位姿。The positioning module is configured to determine, according to the target image, the absolute pose of the camera that collects the image to be processed in the world coordinate system.
  11. 根据权利要求10所述的装置,其特征在于,所述获取模块包括:The device according to claim 10, wherein the acquisition module comprises:
    分割子模块,用于对预定图像集合中的每张图像进行像素级语义分割;The segmentation sub-module is used to perform pixel-level semantic segmentation on each image in the predetermined image set;
    第一确定子模块,用于根据像素级语义分割的结果确定所述每张图像中属于可移动物体的第一像素点和属于背景的第二像素点;The first determining submodule is configured to determine the first pixel that belongs to the movable object and the second pixel that belongs to the background in each image according to the result of pixel-level semantic segmentation;
    第二确定子模块,用于基于所述每张图像中所述第一像素点和所述第二像素点的统计分布,确定与所述预定图像集合中的图像等大的图像模板包括的多个像素点中每个像素点处出现所述可移动物体的所述先验概率。The second determining sub-module is configured to determine, based on the statistical distribution of the first pixel points and the second pixel points in each image, how many image templates that are as large as the images in the predetermined image set include The prior probability that the movable object appears at each pixel in each pixel.
  12. 根据权利要求10或11所述的装置,其特征在于,所述执行模块包括:The device according to claim 10 or 11, wherein the execution module comprises:
    采样子模块,用于对所述待处理图像所包括的至少部分像素点对应的先验概率进行采样;The sampling sub-module is used to sample the prior probabilities corresponding to at least some pixels included in the image to be processed;
    执行子模块,用于在所述待处理图像上去除先验概率的采样值大于预设阈值的像素点,得到所述目标图像。The execution sub-module is used to remove the pixel points whose a priori probability sampling value is greater than a preset threshold on the image to be processed to obtain the target image.
  13. 根据权利要求12所述的装置,其特征在于,采样次数为多次时,执行丢弃部分像素点的操作之后得到的多个目标图像两两之间存在至少一个不同的像素点。The device according to claim 12, wherein when the number of sampling times is multiple times, there is at least one different pixel point between the multiple target images obtained after the operation of discarding some pixels.
  14. 根据权利要求10-13任一项所述的装置,其特征在于,所述定位模块包括:The device according to any one of claims 10-13, wherein the positioning module comprises:
    第一处理子模块,用于经神经网络提取所述目标图像中的特征参数,得到特征提取图像;The first processing sub-module is used to extract feature parameters in the target image via a neural network to obtain a feature extraction image;
    第二处理子模块,用于在所述神经网络的预设空间维度和/或预设通道维度上,增加所述特征提取图像中属于背景的第二像素点所对应的权重值;The second processing sub-module is configured to increase the weight value corresponding to the second pixel point belonging to the background in the feature extraction image in the preset space dimension and/or the preset channel dimension of the neural network;
    第一定位子模块,用于经所述神经网络对权重值调整后的特征提取图像进行分析,得到采集所述待处理图像的相机在世界坐标系下的所述绝对位姿。The first positioning sub-module is configured to analyze the feature extraction image after the weight value is adjusted by the neural network to obtain the absolute pose of the camera that collects the image to be processed in the world coordinate system.
  15. 根据权利要求14所述的装置,其特征在于,所述装置还包括:The device according to claim 14, wherein the device further comprises:
    训练模块,用于根据所述绝对位姿和预先确定的采集所述待处理图像的所述相机的位姿真值的差异,调整所述神经网络的网络参数,训练得到目标神经网络。The training module is configured to adjust the network parameters of the neural network according to the difference between the absolute pose and the predetermined true value of the camera that collects the image to be processed, to train the target neural network.
  16. 根据权利要求15所述的装置,其特征在于,所述定位模块包括:The device according to claim 15, wherein the positioning module comprises:
    第二定位子模块,用于将所述待处理图像输入所述目标神经网络,得到所述待处理图像的相机在世界坐标系下的所述绝对位姿。The second positioning sub-module is used to input the image to be processed into the target neural network to obtain the absolute pose of the camera of the image to be processed in the world coordinate system.
  17. 根据权利要求10-16任一项所述的装置,其特征在于,所述待处理图像包括所述相机采集的具有时序性的至少两帧图像;The device according to any one of claims 10-16, wherein the image to be processed comprises at least two frames of images with time series collected by the camera;
    所述装置还包括:The device also includes:
    第一确定模块,用于根据所述至少两帧图像确定所述相机在拍摄所述至少两帧图像时的相对位姿;The first determining module is configured to determine the relative pose of the camera when shooting the at least two frames of images according to the at least two frames of images;
    第二确定模块,用于根据所述相机的相对位姿和所述绝对位姿,确定所述相机的修正位姿。The second determining module is configured to determine the corrected pose of the camera according to the relative pose of the camera and the absolute pose.
  18. 根据权利要求17所述的装置,其特征在于,所述第二确定模块还包括:The device according to claim 17, wherein the second determining module further comprises:
    第三确定子模块,用于确定所述绝对位姿的确定性概率;The third determining sub-module is used to determine the certainty probability of the absolute pose;
    第四确定子模块,用于根据所述确定性概率确定所述相对位姿的第一权重和所述绝对位姿的第二权重;A fourth determination submodule, configured to determine the first weight of the relative pose and the second weight of the absolute pose according to the certainty probability;
    第五确定子模块,用于根据所述相对位姿、所述第一权重、所述绝对位姿和所述第二权重,确定所述相机的修正位姿。The fifth determining sub-module is configured to determine the corrected pose of the camera according to the relative pose, the first weight, the absolute pose, and the second weight.
  19. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-9任一所述的相机定位方法。A computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is used to execute the camera positioning method according to any one of claims 1-9.
  20. 一种电子设备,包括:An electronic device including:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;A memory for storing executable instructions of the processor;
    其中,所述处理器用于调用所述存储器中存储的可执行指令,实现权利要求1-9中任一项所述的相机定位方法。Wherein, the processor is configured to call executable instructions stored in the memory to implement the camera positioning method according to any one of claims 1-9.
PCT/CN2020/091768 2019-05-27 2020-05-22 Camera positioning WO2020238790A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021534170A JP2022513868A (en) 2019-05-27 2020-05-22 Camera positioning
KR1020217019918A KR20210095925A (en) 2019-05-27 2020-05-22 camera positioning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910447759.7A CN112001968B (en) 2019-05-27 2019-05-27 Camera positioning method and device and storage medium
CN201910447759.7 2019-05-27

Publications (1)

Publication Number Publication Date
WO2020238790A1 true WO2020238790A1 (en) 2020-12-03

Family

ID=73461260

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/091768 WO2020238790A1 (en) 2019-05-27 2020-05-22 Camera positioning

Country Status (4)

Country Link
JP (1) JP2022513868A (en)
KR (1) KR20210095925A (en)
CN (1) CN112001968B (en)
WO (1) WO2020238790A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112885134A (en) * 2021-01-24 2021-06-01 成都智慧赋能科技有限公司 Big data-based smart city traffic management method
CN114118367A (en) * 2021-11-16 2022-03-01 上海脉衍人工智能科技有限公司 Method and equipment for constructing incremental nerve radiation field
CN114693776A (en) * 2022-03-25 2022-07-01 广东电网有限责任公司 Cable position information determining method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978722A (en) * 2015-07-06 2015-10-14 天津大学 Multi-exposure image fusion ghosting removing method based on background modeling
CN105931275A (en) * 2016-05-23 2016-09-07 北京暴风魔镜科技有限公司 Monocular and IMU fused stable motion tracking method and device based on mobile terminal
CN108257177A (en) * 2018-01-15 2018-07-06 天津锋时互动科技有限公司深圳分公司 Alignment system and method based on space identification
CN109387204A (en) * 2018-09-26 2019-02-26 东北大学 The synchronous positioning of the mobile robot of dynamic environment and patterning process in faced chamber

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5333860B2 (en) * 2010-03-31 2013-11-06 アイシン・エィ・ダブリュ株式会社 Vehicle position detection system using landscape image recognition
JP5743849B2 (en) * 2011-10-27 2015-07-01 株式会社日立製作所 Video analysis apparatus and system
JP2016177388A (en) * 2015-03-18 2016-10-06 株式会社リコー Mobile object position/attitude measuring apparatus
JP6985897B2 (en) * 2017-01-06 2021-12-22 キヤノン株式会社 Information processing equipment and its control method, program
US11348274B2 (en) * 2017-01-23 2022-05-31 Oxford University Innovation Limited Determining the location of a mobile device
US10467756B2 (en) * 2017-05-14 2019-11-05 International Business Machines Corporation Systems and methods for determining a camera pose of an image
JP7043755B2 (en) * 2017-08-29 2022-03-30 ソニーグループ株式会社 Information processing equipment, information processing methods, programs, and mobiles
CN107833236B (en) * 2017-10-31 2020-06-26 中国科学院电子学研究所 Visual positioning system and method combining semantics under dynamic environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978722A (en) * 2015-07-06 2015-10-14 天津大学 Multi-exposure image fusion ghosting removing method based on background modeling
CN105931275A (en) * 2016-05-23 2016-09-07 北京暴风魔镜科技有限公司 Monocular and IMU fused stable motion tracking method and device based on mobile terminal
CN108257177A (en) * 2018-01-15 2018-07-06 天津锋时互动科技有限公司深圳分公司 Alignment system and method based on space identification
CN109387204A (en) * 2018-09-26 2019-02-26 东北大学 The synchronous positioning of the mobile robot of dynamic environment and patterning process in faced chamber

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112885134A (en) * 2021-01-24 2021-06-01 成都智慧赋能科技有限公司 Big data-based smart city traffic management method
CN112885134B (en) * 2021-01-24 2023-05-16 陕西合友网络科技有限公司 Smart city traffic management method based on big data
CN114118367A (en) * 2021-11-16 2022-03-01 上海脉衍人工智能科技有限公司 Method and equipment for constructing incremental nerve radiation field
CN114118367B (en) * 2021-11-16 2024-03-29 上海脉衍人工智能科技有限公司 Method and equipment for constructing incremental nerve radiation field
CN114693776A (en) * 2022-03-25 2022-07-01 广东电网有限责任公司 Cable position information determining method, device, equipment and storage medium

Also Published As

Publication number Publication date
KR20210095925A (en) 2021-08-03
JP2022513868A (en) 2022-02-09
CN112001968B (en) 2022-07-15
CN112001968A (en) 2020-11-27

Similar Documents

Publication Publication Date Title
CN109784333B (en) Three-dimensional target detection method and system based on point cloud weighted channel characteristics
WO2020238790A1 (en) Camera positioning
US20200051250A1 (en) Target tracking method and device oriented to airborne-based monitoring scenarios
WO2021196294A1 (en) Cross-video person location tracking method and system, and device
CN110570454B (en) Method and device for detecting foreign matter invasion
EP2798611B1 (en) Camera calibration using feature identification
CN105718872B (en) Auxiliary method and system for rapidly positioning lanes on two sides and detecting vehicle deflection angle
US11748894B2 (en) Video stabilization method and apparatus and non-transitory computer-readable medium
CN111126399A (en) Image detection method, device and equipment and readable storage medium
CN105930822A (en) Human face snapshot method and system
CN103824070A (en) Rapid pedestrian detection method based on computer vision
CN111126184B (en) Post-earthquake building damage detection method based on unmanned aerial vehicle video
Ringwald et al. UAV-Net: A fast aerial vehicle detector for mobile platforms
CN110569754A (en) Image target detection method, device, storage medium and equipment
WO2021218671A1 (en) Target tracking method and device, and storage medium and computer program
US20190114753A1 (en) Video Background Removal Method
CN110443279B (en) Unmanned aerial vehicle image vehicle detection method based on lightweight neural network
CN113191180A (en) Target tracking method and device, electronic equipment and storage medium
CN104966095A (en) Image target detection method and apparatus
CN111738033A (en) Vehicle driving information determination method and device based on plane segmentation and vehicle-mounted terminal
Mejia et al. Vehicle speed estimation using computer vision and evolutionary camera calibration
Miller et al. Person tracking in UAV video
CN104715476A (en) Salient object detection method based on histogram power function fitting
CN114071015A (en) Method, device, medium and equipment for determining linkage snapshot path
CN113591735A (en) Pedestrian detection method and system based on deep learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20812552

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021534170

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217019918

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20812552

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20812552

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20812552

Country of ref document: EP

Kind code of ref document: A1