WO2021115071A1

WO2021115071A1 - Three-dimensional reconstruction method and apparatus for monocular endoscope image, and terminal device

Info

Publication number: WO2021115071A1
Application number: PCT/CN2020/129546
Authority: WO
Inventors: 廖祥云; 孙寅紫; 王琼; 王平安
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2019-12-12
Filing date: 2020-11-17
Publication date: 2021-06-17
Also published as: CN111145238A; CN111145238B

Abstract

A three-dimensional reconstruction method for a monocular endoscope image. The method comprises: acquiring a plurality of distorted images, photographed by a monocular endoscope, of a checkerboard calibration target, and performing distortion correction on the plurality of distorted images of the checkerboard calibration target to obtain an image sequence (S101); determining a key frame from the image sequence (S102); acquiring a pose parameter of the key frame, and estimating a depth map of the key frame (S103); and performing image reconstruction on the basis of the pose parameter of the key frame and the depth map of the key frame to obtain a three-dimensional point cloud (S104). Further provided are a three-dimensional reconstruction apparatus (300) for a monocular endoscope image, and a terminal device (400). An error caused by imaging distortion of a monocular endoscope is reduced, and the display effect of an image is also improved.

Description

Method, device and terminal equipment for three-dimensional reconstruction of monocular endoscope image

Technical field

This application belongs to the field of image processing technology, and in particular relates to a method, device and terminal device for three-dimensional reconstruction of monocular endoscopic images.

Background technique

Three-dimensional reconstruction is one of the research hotspots in computer vision. Its main purpose is to restore the three-dimensional structure of objects from two-dimensional images. It is widely used in augmented reality, virtual navigation and medical fields. The three-dimensional information of the image mainly depends on the visual real-time positioning and map construction (Simultaneous Localization and Mapping, SLAM) technology.

At present, the distortion of the monocular endoscope imaging will cause the increase of the pose error, and the endoscope is usually used with a cold light source, and its imaging will also be disturbed by the light, which may affect the feature matching result in the SLAM process. It is usually difficult to use a monocular endoscope to provide accurate samples for training. Combining the SLAM scheme and the depth prediction scheme can the two-dimensional image sequence be densely reconstructed. However, due to the above-mentioned pose and depth map errors and other factors, The accuracy and effect of 3D reconstruction are deteriorated.

Summary of the invention

The embodiments of the present application provide a method and device for three-dimensional reconstruction of a monocular endoscope, which can solve the problem of reducing errors caused by imaging distortion caused by the inherent parameters of the monocular endoscope, and the accuracy of the three-dimensional reconstruction of a two-dimensional image sequence is not high and The problem of poor results.

In the first aspect, an embodiment of the present application provides a three-dimensional reconstruction method of a monocular endoscopic image, including:

Acquiring the distortion images of a plurality of checkerboard calibration boards taken by a monocular endoscope, and performing distortion correction on the distortion images of the checkerboard calibration boards to obtain an image sequence;

Determining key frames from the image sequence;

Acquiring the pose parameters of the key frame, and estimating the depth map of the key frame;

Perform image reconstruction based on the pose parameters of the key frame and the depth map of the key frame to obtain a three-dimensional point cloud.

Optionally, the image reconstruction based on the pose parameters of the key frame and the depth map of the key frame to obtain a three-dimensional point cloud includes:

Acquiring the pixel coordinates of the key frame;

Calculating the target space coordinates according to the depth map, the pose parameters of the key frame, and the pixel coordinates of the key frame;

Obtain the color information of each pixel in the key frame, and perform point cloud fusion on the key frame according to the color information of each pixel in the key frame and the target space coordinates to obtain the three-dimensional point cloud.

Optionally, the acquiring the distortion images of a plurality of checkerboard calibration boards taken by a monocular endoscope, and correcting the distortion images of the checkerboard calibration boards to obtain an image sequence includes:

Acquire the corner points of the chessboard in the distortion images of the checkerboard calibration boards, calibrate the monocular endoscope based on the corner points of the chessboard, and obtain the camera parameters and distortion parameters of the monocular endoscope ；

Determining an image to be corrected from the distorted image according to the camera parameter and the distortion parameter;

Performing distortion correction on the image to be corrected based on the camera coordinate system to obtain the image sequence.

Optionally, the performing distortion correction on the image to be corrected based on the camera coordinate system to obtain the image sequence includes:

Acquiring the preset coordinates of each pixel of the image to be corrected in the camera coordinates;

Projecting the camera coordinate system onto the plane where each pixel of the image to be corrected is located to obtain the pixel coordinates of the preset coordinates in the pixel coordinate system;

The pixel coordinates of the preset coordinates in the pixel coordinate system are mapped to the camera coordinate system to obtain the image sequence.

Optionally, the obtaining the pixel coordinates of the key frame includes:

Mapping the pixel coordinates of the preset coordinates in the pixel coordinate system to the camera coordinate system to obtain the image sequence and the pixel coordinates corresponding to the image sequence;

Obtain the pixel coordinates of the key frame based on the pixel coordinates corresponding to the image sequence.

Optionally, the determining a key frame from the image sequence includes:

Acquiring local features of each image in the image sequence, and performing feature point matching on each image in the image sequence based on the local feature of each image, to obtain a matching result;

When the matching result is that the number of feature points matched by the first image and the second image is greater than or equal to a preset threshold, the first image is used as a key frame, where the first image and the second image are Any two adjacent frames of images in the image sequence.

Optionally, the obtaining the pose parameters of the key frame includes:

Performing pose initialization on the first image;

Estimate the pose parameters of the key frames in the image sequence.

Optionally, the estimating the depth map of the key frame includes:

Determining a reference frame image from the key frames, where the reference frame image is any frame image or multiple frames of images in the key frame;

Performing depth estimation processing on each pixel of the reference frame image based on the pose parameter to obtain the depth map of the key frame.

In the second aspect, an embodiment of the present application provides a three-dimensional reconstruction device for monocular endoscopic images, including:

The acquisition module is used to acquire the distortion images of a plurality of checkerboard calibration boards taken by a monocular endoscope, and perform distortion correction on the distortion images of the checkerboard calibration boards to obtain an image sequence;

A determining module for determining key frames from the image sequence;

A calculation module for obtaining the pose parameters of the key frame and estimating the depth map of the key frame;

The generating module is used for image reconstruction based on the pose parameters of the key frame and the depth map of the key frame to obtain a three-dimensional point cloud.

In the third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, Realize the above-mentioned three-dimensional reconstruction method.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program that implements the above-mentioned three-dimensional reconstruction method when the computer program is executed by a processor.

Compared with the prior art, the embodiment of the present application has the following beneficial effects: by obtaining the distortion images of multiple checkerboard calibration boards taken by a monocular endoscope, and performing distortion correction on the distortion images of the multiple checkerboard calibration boards to obtain an image sequence, Determine the key frame from the image sequence, obtain the pose parameters of the key frame, estimate the depth map of the key frame, and reconstruct the image based on the pose parameters of the key frame and the depth map of the key frame to obtain a three-dimensional point cloud. The above method uses the checkerboard calibration board image to achieve the calibration and distortion correction of the monocular endoscope to obtain the image sequence, which effectively reduces the imaging distortion error caused by the monocular endoscope itself, and determines multiple images that meet the requirements from the image sequence As a key frame, and determine the pose parameters of the key frame, it can avoid the interference of external factors such as light changes, can accurately estimate the pose parameters and depth map, and perform image reconstruction according to the position parameters and depth map of the key frame to get more The fine three-dimensional point cloud also improves the display effect of the image.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only of the present application. For some embodiments, for those of ordinary skill in the art, other drawings may be obtained based on these drawings without creative labor.

FIG. 1 is a schematic flowchart of a method for three-dimensional reconstruction of monocular endoscopic images provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of the image distortion correction process provided by an embodiment of the present application;

3 is a schematic structural diagram of a three-dimensional reconstruction device for monocular endoscopic images provided by an embodiment of the present application;

Fig. 4 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.

Detailed ways

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.

It should be understood that when used in the specification and appended claims of this application, the term "comprising" indicates the existence of the described features, wholes, steps, operations, elements and/or components, but does not exclude one or more other The existence or addition of features, wholes, steps, operations, elements, components, and/or collections thereof.

It should also be understood that the term "and/or" used in the specification and appended claims of this application refers to any combination of one or more of the items listed in the associated and all possible combinations, and includes these combinations.

As used in the description of this application and the appended claims, the term "if" can be construed as "when" or "once" or "in response to determination" or "in response to detecting ". Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".

In addition, in the description of the specification of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

The reference to "one embodiment" or "some embodiments" described in the specification of this application means that one or more embodiments of this application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the sentences "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless it is specifically emphasized otherwise. The terms "including", "including", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized.

FIG. 1 shows a schematic flow chart of a three-dimensional reconstruction method of a monocular endoscopic image provided by the first application of the present application. As shown in FIG. 1, the three-dimensional reconstruction method of a monocular endoscope image provided by the present application, the three-dimensional reconstruction method includes S101 To S104, the details are as follows:

S101: Obtain distortion images of multiple checkerboard calibration boards taken by a monocular endoscope, and perform distortion correction on the distortion images of the multiple checkerboard calibration boards to obtain an image sequence;

In this embodiment, the distortion image of the checkerboard calibration board can be used for distortion correction of the monocular endoscope. The checkerboard calibration board is a binary image with two rows of black and white stripe intervals. The monocular endoscope can observe the calibration from different angles. The board obtains multiple monocular endoscope distortion images. The imaging process of the camera mainly involves the transformation between the image pixel coordinate system, the image physical coordinate system, the camera coordinate system and the world coordinate system. Due to the lens imaging principle, the camera imaging distortion occurs, and the distortion correction is to find the corresponding relationship between the point positions before and after the distortion. .

It should be noted that the imaging model of the monocular endoscope is different from the small hole imaging model, but is closer to the fisheye camera model. The checkerboard calibration board is a black and white grid arranged at intervals, also called a checkerboard calibration board. The calibration target is used in machine vision, image measurement, photogrammetry, three-dimensional reconstruction and other applications to correct lens distortion and determine the physical size and pixels The conversion relationship between the two, and to determine the relationship between the three-dimensional geometric position of a point on the surface of a space object and its corresponding point in the image requires the establishment of a geometric model of camera imaging. The camera's geometric model can be obtained by the camera with a fixed-pitch pattern array plate and calculation by the calibration algorithm, thereby obtaining high-precision measurement and reconstruction results. The flat plate with a fixed-pitch pattern array is the calibration plate.

It should be understood that by acquiring the distortion images of multiple checkerboard calibration boards taken by the monocular endoscope, the camera calibration of the monocular endoscope can be realized, and the distorted image can be corrected to obtain the image according to the calibration monocular endoscope. Sequence, that is, real image, can reduce the error caused by image distortion to image recognition.

Figure 2 shows a flow chart of the implementation of distortion correction provided by the present application. As shown in Figure 2, the acquisition of the distortion images of the multiple checkerboard calibration boards taken by the monocular endoscope, and the distortion images of the multiple checkerboard calibration boards Perform correction to obtain an image sequence, including S1011 to S1013:

S1011: Obtain the corner points of the chessboard in the distortion images of the multiple checkerboard calibration boards, and calibrate the monocular endoscope based on the corner points of the chessboard to obtain the camera parameters of the monocular endoscope and Distortion parameter

In this embodiment, 20 images with a checkerboard calibration board taken with a monocular endoscope at about different angles are acquired, the corner points of the checkerboard in the image are extracted, and the distorted image that meets the fitting conditions is selected. The Canny corner operator can be used to detect the distortion images obtained from all monocular endoscopes observing the checkerboard calibration board, and count the number of corner points in all the distortion images. The distortion image that meets the fitting conditions is preferably the number of corner points detected in the image No less than 6 meshes. Among them, the number of corner points can be selected according to actual conditions, and is not specifically limited here.

Specifically, the parameters of the ellipse equation are obtained by fitting the selected distortion image and the detected corner points. The ellipse equation can be a standard equation including 6 parameters. According to the corner points in the detected distortion image, the least square method is used to obtain the parameters. The parameters of the ellipse equation are obtained from the curved surface projection parameters, and the parameter fitting results of the ellipse equations of multiple distorted images are obtained by means of filtering. Construct a curved surface projection model, establish the parameters of the ellipse equation, even if the corresponding relationship between the image pixel coordinates and the real image pixel coordinates, then construct the curved surface projection model, and establish the curved surface model parameters according to the curved surface projection principle to obtain the distorted image point coordinates and the real The corresponding relationship between the coordinates of the image points is used to calibrate the monocular endoscope, and the camera parameters and distortion parameters of the monocular endoscope can be obtained by calibrating the monocular endoscope. Through calibration, the internal parameter matrix K and the distortion parameter matrix (k ₁ k ₂ k ₃ k ₄ ) of the monocular endoscope can be obtained, where K can be expressed as:

Where fx and fy are the focal lengths of the endoscope in pixels, and cx and cy are the principal point positions in pixels (that is, the center pixel position of the imaging).

It should be noted that the chessboard is a calibration board composed of black and white squares as the calibration object for camera calibration (mapped from the real world to the object in the digital image). Compared with a three-dimensional object, a two-dimensional object lacks some information. The checkerboard is used as a calibration object because the plane checkerboard mode is easier to handle. After changing the position of the checkerboard many times to capture the image, it can obtain richer coordinate information.

S1012: Determine an image to be corrected from the distorted image according to the camera parameter and the distortion parameter;

In this embodiment, calibrating the monocular endoscope can determine the pose of the camera to obtain the camera parameters and distortion parameters of the monocular endoscope. Taking an image as an example, the camera parameters and the distortion parameters are calculated Whether the image has not been distorted to the distorted image to be corrected, that is, it can judge whether the captured multiple images are distorted, or set a preset threshold, and compare the calculation result with the preset threshold to obtain a comparison result, where the comparison result is The ones with greater difference in the comparison result are regarded as distortion and those with little difference in the comparison result are regarded as no distortion, and vice versa.

It should be noted that various distortions (distortion) are often produced in the process of image acquisition or display. The common ones are geometric shape distortion, grayscale distortion, and color distortion. The causes of image distortion are the aberration and distortion of the imaging system. , Limited bandwidth, shooting status, scanning nonlinearity, relative motion, etc., non-uniform lighting conditions or point light source lighting, etc. According to the camera parameters and distortion parameters, the image to be corrected is determined from the multiple images taken, which is convenient to eliminate the error of distortion in image recognition and processing, and improves the accuracy of image processing to a certain extent.

S1013: Perform distortion correction on the image to be corrected based on the camera coordinate system to obtain the image sequence.

In this embodiment, the straight line in the distortion space is generally no longer a straight line in the image space, but only the straight line passing through the center of symmetry is an exception. When performing distortion correction, the center of symmetry can be found, and then the general geometric distortion correction process can be performed. . The general steps of distortion correction are to first find the symmetry center of the distortion map, convert the address space relationship represented by the distortion map into a space coordinate system with the center of symmetry as the origin, and then space transformation, rearrange the pixels on the input image, that is, the distortion map to restore The original spatial relationship, that is, use the address mapping relationship to find their corresponding point in the distortion map space for each point in the correction map space, and finally the grayscale difference is to assign the corresponding grayscale value to the pixel after the space transformation to restore the original The gray value of the location. The correction of geometric distortion requires the use of coordinate transformation, including simple transformations such as parallel movement, rotation, enlargement and reduction.

It should be noted that the process of distortion correction can be understood as processing a distorted image into an undistorted image, that is, a real image. Different camera models display different images when taking pictures, and may or may not be distorted. The process of using distortion correction can be the same or different. Image distortion mainly includes radial distortion and tangential distortion. Radial distortion refers to the smallest distortion at the center position. As the radius increases, the distortion increases. Radial distortion can be divided into pincushion distortion and barrel distortion. Tangential distortion refers to when the lens is not parallel to the imaging plane, similar to perspective transformation. By performing distortion correction on the image to be corrected to obtain a corrected image sequence, the reliability of image processing is ensured to a certain extent.

Optionally, the performing distortion correction on the image to be corrected based on the camera coordinate system to obtain the image sequence includes steps A1 to A3:

Step A1: Obtain the preset coordinates of each pixel of the image to be corrected in the camera coordinates;

In this embodiment, the camera coordinate system can be obtained by calibrating the monocular endoscope. According to the camera imaging model and the camera coordinate system, the world coordinate system and the camera coordinate system, the camera coordinate system and the image coordinate system, and the image coordinates can be realized. System to pixel coordinate system conversion. The conversion between the world coordinate system and the camera coordinate system is from one three-dimensional coordinate to another three-dimensional coordinate system. The pose parameters of the camera can be obtained through the rotation matrix and the translation vector, that is, the camera coordinate system. From the camera coordinate system to the image coordinate system, a three-dimensional coordinate is projected on a two-dimensional plane, and it is estimated based on the distance between the two coordinate systems, that is, the focal length of the camera. In other words, the preset coordinates in the camera coordinate system are corrected to obtain the coordinates in the undistorted camera coordinate system, and the coordinates in the undistorted camera coordinate system are mapped to the pixel coordinate system to obtain an undistorted image sequence.

Step A2: Project the camera coordinate system onto the plane where each pixel of the image to be corrected is located, and obtain the pixel coordinates of the preset coordinates in the pixel coordinate system;

In this embodiment, it is assumed that the coordinates of the pixel (u', v') of the image taken by the monocular endoscope in the camera coordinate system are (x, y, z), and the pixel point in the camera coordinate system is The coordinates are projected to the plane where the image is located, that is, the image coordinate system. According to the positional relationship of the origin of the image coordinate system relative to the origin of the pixel coordinate system, it can be regarded as projecting the coordinates of the pixel point in the camera coordinate system to the pixel coordinate system, which can be expressed as follows :

x′=x/z,y′=y/z,r ² =x′ ² +y′ ²

θ=arctan(r)

θ′=θ(1+k ₁ θ ² +k ₂ θ ⁴ +k ₃ θ ⁶ +k ₄ θ ⁸ )

x′=(θ′/r)x,y′=(θ′/r)y

u=f _x x′+c _x

v=f _y y′+c _y

Among them, (x', y') are the coordinates projected on the plane, r represents the distance between the point and the center on the projection plane (projection radius), and θ represents the angle of incidence. The corresponding relationship between the camera coordinate system and the image coordinate system can be determined through the above formula, which facilitates the subsequent determination of the pixel coordinate system and pixel coordinates.

Step A3: Map the pixel coordinates of the preset coordinates in the pixel coordinate system to the camera coordinate system to obtain the image sequence.

In this embodiment, for N undistorted images, there are a total of 4 internal parameters + 6N external parameters to calibrate. There are 4 effective corner points on each chessboard graph, and 8 constraints can be provided, so 8N is required. >=4+6N, at least 2 undistorted images are needed to obtain the internal and external parameters of the camera. In fact, 10 or 20 images can be generally used to obtain a more accurate solution by using the least square method. After the internal and external parameters are added, the distortion-related parameters can be obtained according to the remaining point coordinates.

It should be noted that when performing distortion correction on the image to be corrected, at least two images may have a linear relationship. A remapping process can be used to convert the pixel coordinates of the distorted endoscopic image into the coordinates of the distorted camera coordinate system. The distorted camera coordinate system coordinates are transformed into the undistorted camera coordinate system coordinates, and finally the undistorted camera coordinate system coordinates are transformed into the pixel coordinates of the undistorted image, so that the corrected image sequence can be obtained, and the corresponding image can also be obtained The pixel coordinates of, it is convenient to determine the key frame and the pose parameters of the key frame later.

S102: Determine a key frame from the image sequence;

In this embodiment, the image after distortion correction is used as an image sequence, from which key frames are determined. ORB_SLAM2 is an embedded position recognition model, which has the characteristics of relocation, preventing tracking failure (such as occlusion), reinitialization of the mapped scene, loop detection, etc., using the same ORB features for tracking, mapping, and location recognition tasks. Features have good robustness in rotation and scale, and have good invariance to the camera's automatic gain, automatic exposure, and illumination changes. It can also quickly extract features and match features to meet the needs of real-time operation. This application uses ORB_SLAM2 to determine the key frame and pose estimation of monocular endoscopic images. You can use ORB to extract features of the image sequence, estimate the initial pose of the camera through the previous image frame, and initialize the position through global relocation. The four processes of posture, tracking the local map and the judgment standard of the new key frame are used to determine the key frame and the posture parameter of the key frame more accurately.

It should be noted that the key frame can be used as a mark of the image sequence and has a guiding effect. The distortion-corrected images in the image sequence are arranged in a preset order, and they can be arranged in sequence according to the shooting time order, which is convenient for feature extraction of each image. Processing to improve the efficiency of monocular endoscope image processing.

Optionally, the determining the key frame from the image sequence includes steps B1 to B2, which are specifically as follows:

Step B1: Obtain local features of each image in the image sequence, and perform feature point matching on each image in the image sequence based on the local feature of each image to obtain a matching result;

In this embodiment, extract the local features of each image in the image sequence, and perform feature point matching on each image in the image sequence with the local features of each image to extract the regions corresponding to the coordinates of each image for feature matching, or Extract all pixels in the rich area of the image, and match the feature points of the two images before and after in the preset order, that is, the number of feature points that are successfully matched with the same ORB feature in the two frames of the image is used as the matching result, and set the number of feature points that are successfully matched The threshold is between 50-100.

It should be noted that the peripheral edge area of the image imaged by the monocular endoscope is a black area with no information, and useful feature information cannot be extracted. Therefore, the information-rich area in the image is selected and the area can be defined as a region of interest. To extract the ORB features of the region, ORB (Oriented FAST and Rotated Brief) is a fast feature point extraction and description algorithm. The ORB algorithm includes feature point extraction and feature point description. The ORB algorithm has the characteristics of fast calculation speed and uses FAST detection The feature point, again, is to use the BRIEF algorithm to calculate the descriptor. The unique binary string representation of the descriptor not only saves storage space, but also greatly shortens the matching time.

It should be understood that by determining the key frame from the gradually corrected image sequence, the key frame can be used as a marker to quickly process the image sequence, which can improve the efficiency of monocular endoscope image processing.

Step B2: When the matching result is that the number of feature points matched by the first image and the second image is greater than or equal to a preset threshold, use the first image as a key frame, wherein the first image and the first image The two images are any two adjacent frames of images in the image sequence.

In this embodiment, the threshold for the number of feature points that are successfully matched is set to be between 50 and 100. When the number of feature points matched by the first image and the second image exceeds the threshold, it is determined that the first and the next two frames of images are matched successfully .

It should be noted that if the previous frame image is successfully tracked, the constant motion rate model can be used to predict the current camera position (that is, the camera is considered to be moving at a constant speed), and then the corresponding cloud of the feature points in the previous frame image can be searched for in the map The matching point between the point and the current frame image is finally used to further optimize the pose of the current camera by using the searched matching point, so as to obtain the image in the image sequence that meets the requirements to improve the accuracy of determining the key frame.

S103: Obtain the pose parameters of the key frame, and estimate the depth map of the key frame;

In this embodiment, ORB_SLAM2 based on the feature point method can obtain the pose parameters. For image sequences with relative pose parameters, that is, there is a linear relationship between the two images, the pose parameters describe the two images corresponding to the camera. Relative movement relationship. The depth map refers to the number of bits used to store each pixel, which is used to measure the color resolution of the image.

Optionally, obtaining the pose parameters of the key frame includes:

Performing pose initialization on the first image; estimating pose parameters of key frames in the image sequence.

In this embodiment, when the number of feature points matched by the first image and the second image exceeds the set threshold, the pose of the first image, that is, the image of the previous frame, is initialized to (R0, t0), and the key frame contains For multiple images with successful feature point matching, extract the ORB feature of each frame of the key frame according to the pose initialization of the first image, perform feature matching with the previous frame, and estimate its pose parameters (rotation matrix Ri, translation vector ti ), take the image of successful pose estimation as the key frame, obtain the pose parameter corresponding to the key frame, and store the key frame and the pose parameter corresponding to the key frame together, so as to perform depth estimation on all the key frames later.

It should be understood that, according to the pose initialization of the first image, other images in the key frame are also judged by the above-mentioned feature point matching process, and the pose parameters of the current image are estimated, and the image whose feature points are successfully matched is used as the key frame, and According to the initialized pose parameters, the pose parameters of each image in the image sequence, that is, the pose parameters of the key frame are obtained, which improves the accuracy of the estimation of the pose parameters.

Optionally, the estimating the depth map of the key frame includes:

In this embodiment, the photometric error is minimized according to the first depth map of the key image frame in the monocular video, and the current camera pose between the reference frame image and the key frame in the monocular endoscopic image is determined. According to The current camera pose triangulates the high gradient image points in the reference frame image and the key frame, determines the second depth map of the key frame, performs Gaussian fusion of the first depth map and the second depth map, and updates the first depth map of the key frame. A depth map. If the next camera pose between the next image frame and the key frame of the reference frame image exceeds the preset camera pose, the updated first depth map is determined as the dense depth map of the key frame.

It should be noted that for depth map estimation, one frame of image or multiple frames of images can be selected for estimation. When one frame of image in the key frame is selected as the reference frame, each pixel of each image in the key frame is triangulated. And Bayesian probability estimation strategy to get dense depth map. When multiple images in the key frame are selected for iterative calculation to obtain the depth value corresponding to each pixel, then the depth map is smoothed and filtered to eliminate some noise in the depth map, which can improve the efficiency and accuracy of depth estimation.

In this embodiment, the first depth map of the key frame can be a dense depth map obeying the Gaussian distribution obtained by initializing the depth values of the high gradient points in the key frame, or it can be the depth of the previous key frame of the key frame. The value is a dense depth map projected according to the camera pose. For example, if the key frame to be depth estimated is the first key frame in the image sequence, the first depth map of the key frame is the dense depth map obtained by initialization; if the key frame with depth estimation is the first key frame in the image sequence For key frames other than one key frame, the first depth map of the key frame is a dense depth map obtained by projecting the depth value of the previous key frame. Luminosity error refers to the measurement difference between the high gradient point in the projected image and the corresponding high gradient point in the reference frame image. The projected image is based on the initial camera pose between the reference frame and the key frame in the image sequence. The high gradient points corresponding to the pixels in the frame are projected to the reference frame image. The current camera pose includes the rotation and translation between the reference frame and the key frame. The second depth map of the key frame refers to the image sequence according to the The new dense depth map obtained by triangulating the current camera pose between the reference frame image and the key frame; the next frame image of the reference frame image refers to the next frame image adjacent to the pre-reference frame image in the image sequence , The posture of the latter camera includes the maximum threshold of the posture of the latter camera, which can be preset according to actual conditions and requirements, and there is no specific limitation here.

It should be noted that a dense depth map refers to an image that includes depth values corresponding to a large number of feature points, or an image that includes both high gradient points and depth values corresponding to low gradient points. The depth estimation obtains the depth map and the depth value, which is convenient for subsequent restoration of the spatial coordinates of the pixel.

S104: Perform image reconstruction based on the pose parameters of the key frame and the depth map of the key frame to obtain a three-dimensional point cloud.

In this embodiment, 3D reconstruction refers to the establishment of a 3D model from the input data. Each frame of data scanned by the depth camera not only contains the color RGB image of the point in the scene, but also includes each point to the vertical plane where the depth camera is located. This distance value is called the depth value, and these depth values together constitute the depth map of this frame. The depth map can be regarded as a grayscale image. The grayscale value of each point in the image represents the true distance from the position of the point in reality to the vertical plane where the camera is located. Each point in the RGB image corresponds to a point on the camera. A three-dimensional point in the local coordinate system.

It should be noted that the process of 3D reconstruction can be image acquisition, camera calibration, feature extraction, stereo matching, 3D reconstruction, etc., where stereo matching refers to establishing a correspondence between image pairs based on the extracted features. That is, the imaging points of the same physical space point in two different images are mapped one by one. When matching, pay attention to the interference of some factors in the scene, such as lighting conditions, noise interference, distortion of the geometric shape of the scene, surface physical characteristics, and camera characteristics, etc., in order to obtain a high-precision three-dimensional point cloud, and also enhance the vision effect.

Optionally, S104 may include steps C1 to C3, which are specifically as follows:

Step C1: Obtain the pixel coordinates of the key frame;

In this embodiment, by performing camera calibration on the monocular endoscope described above, the pixel coordinate system and the pixel coordinates of each image in the key frame can be determined. The pixel coordinates indicate the position of the pixel in the image, and the key frame can be determined The pixel position of each image is convenient for subsequent three-dimensional reconstruction of the image.

Step C2: calculating the target space coordinates according to the depth map, the pose parameters of the key frame, and the pixel coordinates of the key frame;

In this embodiment, the depth value corresponding to the depth map of each image in the key is obtained, and the depth value and the pose parameter of the key frame and the pixel coordinates of each image of the key frame are calculated to obtain the spatial coordinates of each image, namely The conversion from two-dimensional coordinates to three-dimensional coordinates, according to the depth value obtained by accurate depth estimation, also improves the accuracy of the calculated target space coordinates.

Step C3: Obtain the color information of each pixel in the key frame, and perform point cloud fusion on the key frame according to the color information of each pixel in the key frame and the target space coordinates to obtain the Describe the three-dimensional point cloud.

In this embodiment, the pixel coordinates [u, v] in the two-dimensional image, the corresponding point cloud contains color information and spatial position information, and the color information is represented by the RGB value of the pixel. According to the depth map, The pose parameters of the key frame and the pixel coordinates of the key frame are calculated to obtain the target space coordinates as [x, y, z], and the space coordinates are restored from the pixel coordinates [u, v] and its depth value d by the following formula Means:

z′=d

x′=z(uc _x )/f _x

y′=z(vc _y )/f _y

(x,y,z) ^T = (R _i ,t _i )(x′,y′,z′) ^T

Among them, d represents the depth of the pixel, which is derived from the depth estimation of the REMODE scheme, (x',y',z') is the coordinate value in the camera coordinate system, (Ri,ti) is the bit corresponding to the frame Pose parameters.

It should be noted that the point cloud is a set of discrete points. The point cloud stores the spatial coordinates and color information corresponding to the pixels of the frame. When multi-frame point cloud fusion is performed, the multi-frame point cloud Stored in a container, and then the repeated point cloud is removed by a filter, and a three-dimensional point cloud of multiple frames of images can be obtained. The above-mentioned three-dimensional reconstruction method may draw a point cloud of multiple frames of images during fusion to obtain finer three-dimensional information.

Optionally, when S101 includes S1011 to S1013, obtaining the pixel coordinates of the key frame in step C1 includes steps C11 to C13:

Step C11: Project the camera coordinate system onto the plane where each pixel of the image to be corrected is located, and obtain the pixel coordinates of the preset coordinates in the pixel coordinate system;

In this embodiment, the coordinates of the pixel points in the camera coordinate system are defined, and the correspondence between the camera coordinate system and the image coordinate system is calculated by projection, and then the pixel coordinate system is obtained through the correspondence between the image coordinate system and the pixel coordinate system. The pixel coordinates here are the same as the process of the pixel coordinates obtained in the above-mentioned distortion correction, and will not be repeated here.

Step C12: Map the pixel coordinates of the preset coordinates in the pixel coordinate system to the camera coordinate system to obtain the image sequence and the pixel coordinates corresponding to the image sequence;

In this embodiment, the corrected image sequence and the pixel coordinates corresponding to the image sequence can be obtained through the coordinate system transformation method of the distortion correction. The specific processing process here is the same as the above-mentioned distortion correction process, and will not be repeated here.

Step C13: Obtain the pixel coordinates of the key frame based on the pixel coordinates corresponding to the image sequence.

In this embodiment, the key frame is determined from the image sequence, and the pixel coordinates of the key frame can be obtained. According to the pixel coordinates of each image of the key frame, the moving position relationship of each image relative to the camera can be determined, so as to improve the single The processing efficiency of the endoscopic image.

FIG. 3 shows a three-dimensional reconstruction device 300 for monocular endoscopic images provided by an embodiment of the present application. As shown in FIG. 3, the three-dimensional reconstruction device 300 for monocular endoscope images provided by the present application includes:

The acquiring module 310 is configured to acquire the distortion images of a plurality of checkerboard calibration boards taken by a monocular endoscope, and perform distortion correction on the distortion images of the checkerboard calibration boards to obtain an image sequence;

The determining module 320 is configured to determine a key frame from the image sequence;

The calculation module 330 is configured to obtain the pose parameters of the key frame and estimate the depth map of the key frame;

The generating module 340 is configured to perform image reconstruction based on the pose parameters of the key frame and the depth map of the key frame to obtain a three-dimensional point cloud.

In this embodiment, the device for 3D reconstruction of monocular endoscopic images may be a terminal device, a server, or a device capable of human-computer interaction.

Optionally, the obtaining module 310 specifically includes:

The first acquisition unit is configured to acquire the corner points of the chessboard in the distortion images of the multiple chessboard calibration boards, and calibrate the monocular endoscope based on the corner points of the chessboard to obtain the monocular endoscope Camera parameters and distortion parameters of the mirror;

A first determining unit, configured to determine an image to be corrected from the distorted image according to the camera parameter and the distortion parameter;

The first processing unit is configured to perform distortion correction on the image to be corrected based on the camera coordinate system to obtain the image sequence.

Optionally, the obtaining module 310 further includes:

A second acquiring unit, configured to acquire the preset coordinates of each pixel of the image to be corrected in the camera coordinates;

A second processing unit, configured to project the camera coordinate system onto the plane where each pixel of the image to be corrected is located, to obtain the pixel coordinates of the preset coordinates in the pixel coordinate system;

The third processing unit is configured to map the pixel coordinates of the preset coordinates in the pixel coordinate system to the camera coordinate system to obtain the image sequence.

Optionally, the determining module 320 specifically includes:

The third acquiring unit is configured to acquire the local features of each image in the image sequence, and perform feature point matching on each image in the image sequence based on the local features of each image to obtain a matching result;

The second determining unit is configured to use the first image as a key frame when the matching result is that the number of feature points matched by the first image and the second image is greater than or equal to a preset threshold, wherein the first image And the second image are any two adjacent frames of images in the image sequence.

Optionally, the determining module 320 further includes:

A third determining unit, configured to use the first image as a key frame when the number of feature points matched by the first image and the second image is greater than or equal to a preset threshold;

The fourth processing unit is used to initialize the pose of the first image;

The first estimation unit is used to estimate the pose parameters of the key frames in the image sequence.

Optionally, the determining module 320 further includes:

A fourth determining unit, configured to determine a reference frame image from the key frame, wherein the reference frame image is any frame image or multiple frames of images in the key frame;

The second estimation unit is configured to perform depth estimation processing on each pixel of the reference frame image based on the pose parameter to obtain the depth map of the key frame.

Optionally, the generating module 340 includes:

The fourth acquiring unit is used to acquire the pixel coordinates of the key frame;

The third estimation unit is configured to calculate the target space coordinates according to the depth map, the pose parameters of the key frame, and the pixel coordinates of the key frame;

The first generating unit is used to obtain the color information of each pixel in the key frame, and perform a point cloud on the key frame according to the color information of each pixel in the key frame and the target space coordinates Fusion to obtain the three-dimensional point cloud.

Optionally, the generating module 340 further includes:

A first projection unit, projecting the camera coordinate system onto a plane where each pixel point of the image to be corrected is located, to obtain the pixel coordinates of the preset coordinates in the pixel coordinate system;

The second projection unit is configured to map the pixel coordinates of the preset coordinates in the pixel coordinate system to the camera coordinate system to obtain the image sequence and the pixel coordinates corresponding to the image sequence;

The second generating unit is configured to obtain the pixel coordinates of the key frame based on the pixel coordinates corresponding to the image sequence.

Please refer to FIG. 4, which is a schematic structural diagram of a terminal device 400 provided by an embodiment of the present application. The terminal device 400 includes a memory 410, at least one processor 420, and is stored in the memory 410 and can be stored in the processor 420. When the processor 420 executes the computer program 430, the above-mentioned three-dimensional reconstruction method is implemented.

The terminal device 400 may be a desktop computer, a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, an ultra mobile personal computer (ultra -Mobile personal computer (UMPC), netbook, personal digital assistant (personal digital assistant, PDA) and other terminal devices, the embodiment of this application does not impose any restrictions on the specific types of terminal devices.

The terminal device 400 may include but is not limited to a processor 420 and a memory 410. Those skilled in the art can understand that FIG. 4 is only an example of the terminal device 400, and does not constitute a limitation on the terminal device 400. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. , For example, can also include input and output devices.

The so-called processor 420 may be a central processing unit (CPU), and the processor 420 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). , ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

In some embodiments, the memory 410 may be an internal storage unit of the terminal device 400, such as a hard disk or a memory of the terminal device 400. In other embodiments, the memory 410 may also be an external storage device of the terminal device 400, such as a plug-in hard disk equipped on the terminal device 400, a smart media card (SMC), and a secure digital (Secure Digital). Digital, SD) card, flash card (Flash Card), etc. Further, the memory 410 may also include both an internal storage unit of the terminal device 400 and an external storage device. The memory 410 is used to store an operating system, an application program, a boot loader (Boot Loader), data, and other programs, such as the program code of the computer program. The memory 410 may also be used to temporarily store data that has been output or will be output.

It should be noted that the information interaction and execution process between the above-mentioned generating devices/units are based on the same concept as the method embodiment of this application, and its specific functions and technical effects can be found in the method embodiment section. I won't repeat them here.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of a software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above-mentioned auxiliary photographing device, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.

The embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be realized.

The embodiments of the present application provide a computer program product. When the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the implementation of all or part of the processes in the above-mentioned embodiment methods in the present application can be accomplished by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may at least include: any entity or device capable of carrying computer program code to a terminal device, a recording medium, a computer memory, a read-only memory (Read-Only Memory, ROM), and a random access memory (Random Access). Memory, RAM), electrical carrier signals, telecommunications signals, and software distribution media. For example, U disk, mobile hard disk, floppy disk or CD-ROM, etc. In some jurisdictions, according to legislation and patent practices, computer-readable media cannot be electrical carrier signals and telecommunication signals.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed apparatus/network equipment and method may be implemented in other ways. For example, the device/network device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A method for three-dimensional reconstruction of monocular endoscopic images, characterized in that the three-dimensional reconstruction method includes:

Acquiring the distortion images of a plurality of checkerboard calibration boards taken by a monocular endoscope, and performing distortion correction on the distortion images of the checkerboard calibration boards to obtain an image sequence;

Determining key frames from the image sequence;

Acquiring the pose parameters of the key frame, and estimating the depth map of the key frame;

Perform image reconstruction based on the pose parameters of the key frame and the depth map of the key frame to obtain a three-dimensional point cloud.
The 3D reconstruction method according to claim 1, wherein the image reconstruction based on the pose parameters of the key frame and the depth map of the key frame to obtain a 3D point cloud comprises:

Acquiring the pixel coordinates of the key frame;

Calculating the target space coordinates according to the depth map, the pose parameters of the key frame, and the pixel coordinates of the key frame;

Obtain the color information of each pixel in the key frame, and perform point cloud fusion on the key frame according to the color information of each pixel in the key frame and the target space coordinates to obtain the three-dimensional point cloud.
The three-dimensional reconstruction method according to claim 1 or 2, characterized in that said acquiring the distortion images of a plurality of checkerboard calibration boards taken by a monocular endoscope, and correcting the distortion images of the plurality of checkerboard calibration boards to obtain Image sequence, including:

Acquire the corner points of the chessboard in the distortion images of the checkerboard calibration boards, calibrate the monocular endoscope based on the corner points of the chessboard, and obtain the camera parameters and distortion parameters of the monocular endoscope ；

Determining an image to be corrected from the distorted image according to the camera parameter and the distortion parameter;

Performing distortion correction on the image to be corrected based on the camera coordinate system to obtain the image sequence.
The three-dimensional reconstruction method according to claim 3, wherein said performing distortion correction on said image to be corrected based on a camera coordinate system to obtain said image sequence comprises:

Acquiring the preset coordinates of each pixel of the image to be corrected in the camera coordinates;

Projecting the camera coordinate system onto the plane where each pixel of the image to be corrected is located to obtain the pixel coordinates of the preset coordinates in the pixel coordinate system;

The pixel coordinates of the preset coordinates in the pixel coordinate system are mapped to the camera coordinate system to obtain the image sequence.
The three-dimensional reconstruction method according to claim 3, wherein said obtaining the pixel coordinates of the key frame comprises:

Projecting the camera coordinate system onto the plane where each pixel of the image to be corrected is located to obtain the pixel coordinates of the preset coordinates in the pixel coordinate system;

Mapping the pixel coordinates of the preset coordinates in the pixel coordinate system to the camera coordinate system to obtain the image sequence and the pixel coordinates corresponding to the image sequence;

Obtain the pixel coordinates of the key frame based on the pixel coordinates corresponding to the image sequence.
The three-dimensional reconstruction method according to any one of claims 1-2 and 4-5, wherein said determining a key frame from the image sequence comprises:

Acquiring local features of each image in the image sequence, and performing feature point matching on each image in the image sequence based on the local feature of each image, to obtain a matching result;

When the matching result is that the number of feature points matched by the first image and the second image is greater than or equal to a preset threshold, the first image is used as a key frame, where the first image and the second image are Any two adjacent frames of images in the image sequence.
The 3D reconstruction method according to claim 6, wherein said obtaining the pose parameters of the key frame comprises:

Performing pose initialization on the first image;

Estimate the pose parameters of the key frames in the image sequence.
The 3D reconstruction method according to claim 1 or 7, wherein the estimating the depth map of the key frame comprises:

Determining a reference frame image from the key frames, where the reference frame image is any frame image or multiple frames of images in the key frame;

Performing depth estimation processing on each pixel of the reference frame image based on the pose parameter to obtain the depth map of the key frame.
A three-dimensional reconstruction device for monocular endoscopic images, which is characterized in that it comprises:

The acquisition module is used to acquire the distortion images of a plurality of checkerboard calibration boards taken by a monocular endoscope, and perform distortion correction on the distortion images of the checkerboard calibration boards to obtain an image sequence;

A determining module for determining key frames from the image sequence;

A calculation module for obtaining the pose parameters of the key frame and estimating the depth map of the key frame;

The generating module is used for image reconstruction based on the pose parameters of the key frame and the depth map of the key frame to obtain a three-dimensional point cloud.
A computer-readable storage medium storing a computer program, wherein the computer program implements the three-dimensional reconstruction method according to any one of claims 1 to 8 when the computer program is executed by a processor.