WO2010081094A2

WO2010081094A2 - A system for registration and information overlay on deformable surfaces from video data

Info

Publication number: WO2010081094A2
Application number: PCT/US2010/020649
Authority: WO
Inventors: Gregory D. Hager; Li-ming SU; Russell H. Taylor; Balazs Peter Vagvolgyi
Original assignee: The Johns Hopkins University
Priority date: 2009-01-09
Filing date: 2010-01-11
Publication date: 2010-07-15
Also published as: WO2010081094A3

Abstract

A surgical system including an optical imaging device arranged to image an internal structure of a subject under observation, a processor in communication with the optical imaging device, and a visual display in communication with the processor. The optical imaging device provides a first image signal corresponding to at least a portion of the internal structure corresponding to a first time interval and a second image signal corresponding to a least a portion of the internal structure corresponding to a second time interval. The processor is operable to determine a first set of points corresponding to the at least a portion of the internal structure corresponding to the first time interval from the first image signal, determine a second set of points corresponding to the at least a portion of the internal structure corresponding to the second time interval from the second image signal based on the determined first set of points, receive supplemental data corresponding to the at least a portion of the internal structure, register the supplemental data with at least one of the first set of points or the second set of points, and output a display image signal, to the visual display, corresponding to an overlay of the supplemental data with at least one of the first set of points or the second set of points based on the registering supplemental data.

Description

A SYSTEM FOR REGISTRATION AND INFORMATION OVERLAY ON DEFORMABLE SURFACES FROM VIDEO

DATA

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to U.S. Provisional Application No. 61/143,468 filed January 9, 2009, the entire contents of which are hereby incorporated by reference.

The U.S. Government has a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Grant Nos. TATRC W81XWH-06-1-0195, awarded by the Telemedicine & Advanced Technology Research Center.

BACKGROUND

1. Field of Invention

The current invention relates to augmented reality systems, and more particularly to a system for information overlay during surgery.

2. Discussion of Related Art

The contents of all references, including articles, published patent applications and patents referred to anywhere in this specification are hereby incorporated by reference.

Minimally invasive surgery (MIS) is a technique whereby instruments are inserted into the body via small incisions (or in some cases natural orifices), and surgery is carried out under video guidance. While presenting great advantages for the patient, MIS presents numerous challenges for the surgeon due to the restricted field of view presented by the endoscope, the tool motion constraints imposed by the insertion point, and the loss of haptic feedback. One means of overcoming some of these limitations is to present the surgeon with addition visual information. There has been some prior work on the problem of fusing video and volume information with application to medicine. In W. Chang et al., "Intuitive intra-operative ultrasound guidance using the sonic flashlight, a novel ultrasound display system," in Op. Neurosurgery Supp. 2 56(4) (2005) 434-437, joint acquisition and display of the video and ultrasound data was possible due to a known, rigid calibration between the camera and the ultrasound head, hi M. Kanbara et al., "A stereoscopic video see-through augmented reality system based on real-time vision-based registration," in, Proc. Virt. Reality. (2000) 255-262, video overlay was demonstrated using traditional image navigation techniques (which rely on an external tracking system) and rigid anatomy, hi Mourgues F, Dvernay F, Coste-Maniere E., "3D reconstruction of the operating field for image overlay in 3D- endoscopic surgery " in Proc. of IEEE and ACM International Symposium on Augmented Reality (ISAR 2001), 191-192 (2001), a traditional single frame reconstruction algorithm was applied to stereo endoscopic image data for the purposes of image overlay. In Stoyanov, D., Darzi, A., Yang, G.Z., "Dense 3d depth recovery for soft tissue deformation during robotically assisted laparoscopic surgery," in MICCAI (2). (2004) 41-48, a traditional single-frame region matching stereo system was described and validated against CT, and in Paul, P., Fleig, O., and Jannin, P. "Augmented Virtuality Based on Stereoscopic Reconstruction in Multimodal Image-Guided Neurosurgery: Methods and Performance Evaluation," in IEEE Trans. On Medical Imaging, 24(11), 2005 augmented reality using a single stereo-frame region matching method is presented. In Stoyanov, D., Mylonas, G.P., Deligianni, F., Darzi, A., Yang, G.Z., "Soft-tissue motion tracking and structure estimation for robotic assisted mis procedures," in MICCAI (2). (2005) 139-146, a real-time motion estimation system for discrete points was presented. However none of these works provide for sequentially processing and deformably registering video information to pre-operative volumes and surfaces in substantially real-time for the purposes of information overlay display during surgery. There is thus a need for an improved system for displaying anatomical information for the purposes of surgical guidance. SUMMARY

A surgical system according to an embodiment of the current invention has an optical imaging device arranged to image an internal structure of a subject under observation, a processor in communication with the optical imaging device, and a visual display in communication with the processor. The optical imaging device provides a first image signal corresponding to at least a portion of the internal structure corresponding to a first time interval and a second image signal corresponding to a least a portion of the internal structure corresponding to a second time interval. The processor is operable to determine a first set of points corresponding to the at least a portion of the internal structure corresponding to the first time interval from the first image signal, determine a second set of points corresponding to the at least a portion of the internal structure corresponding to the second time interval from the second image signal based on the determined first set of points, receive supplemental data corresponding to the at least a portion of the internal structure, register the supplemental data with at least one of the first set of points or the second set of points, and output a display image signal, to the visual display, corresponding to an overlay of the supplemental data with at least one of the first set of points or the second set of points based on the registering supplemental data.

A surgical system according to another embodiment of the current invention has an optical imaging device arranged to image an internal structure of a subject under observation, a processor in communication with the optical imaging device, and a visual display in communication with the processor. The optical imaging device provides a first image signal corresponding to at least a portion of the internal structure corresponding to a first time interval and a second image signal corresponding to a least a portion of the internal structure corresponding to a second time interval. The internal structure corresponding to the second time interval is a different shape from the internal structure corresponding to the first time interval. The processor is operable to determine a first set of points corresponding to the at least a portion of the internal structure corresponding to the first time interval from the first image signal, determine a second set of points corresponding to the at least a portion of the internal structure corresponding to the second time interval from the second image signal, receive supplemental data corresponding to the at least a portion of the internal structure, register the supplemental data with at least one of the first set of points or the second set of points, and output a display image signal, to the visual display, corresponding to an overlay of the supplemental data with at least one of the first set of points or the second set of points based on the registering supplemental data.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention may be better understood by reading the following detailed description with reference to the accompanying figures, in which:

Figure 1 is a block diagram of a surgical system for imaging an internal structure according to an embodiment of the current invention;

Figures 2A-2D are diagrams of a internal structure and supplemental data according to an embodiment of the current invention;

Figure 3 is a flowchart of the process for imaging an internal structure according to an embodiment of the current invention;

Figure 4 is a flowchart of images illustrating the process for imaging an internal structure according to an embodiment of the current invention;

Figure 5 is a block diagram of the architecture of a surgical system for imaging an internal structure according to an embodiment of the current invention;

Figure 6 illustrates a comparison of stereo reconstructions of a surface of a phantom using the surgical system without structured light according to an embodiment of the current invention;

Figure 7 illustrates a stereo reconstruction of a surface of a phantom using the surgical system with structured light according to an embodiment of the current invention;

Figures 8A and 8B illustrate registration of a deformable structure according to an embodiment of the current invention; Figure 9 illustrates a comparison of stereo reconstructions of an internal structure using the surgical system according to an embodiment of the current invention;

Figure 10 illustrates a comparison of rigid registration and deformable registration according to an embodiment of the current invention;

Figure 11 illustrates a sequence of images showing the result of using automatic registration to present an overlay; and

Figure 12 is an illustration of a surgical system for imaging an internal structure according to an embodiment of the current invention.

DETAILED DESCRIPTION hi describing embodiments of the present invention illustrated in the drawings, specific terminology is employed for the sake of clarity. However, the invention is not intended to be limited to the specific terminology so selected. It is to be understood that each specific element includes all technical equivalents which operate in a similar manner to accomplish a similar purpose.

Figure 1 is a schematic illustration of a surgical system 100 for imaging an internal structure according to an embodiment of the current invention. The surgical system 100 includes an optical imaging device 102. In an embodiment, the optical imaging device 102 is a stereo endoscope. The optical imaging device 102 is in communication with a processor 106. The processor 106 is a computer processing unit, but can also be any type of computing device. The processor 106 is in communication with a visual display 104. The visual display 104 is an optical display device, such as, e.g., but not limited to, a liquid crystal display (LCD) or a cathode ray tube (CRT), etc. The processor 106 is also in communication with a supplemental data source 108, such as, e.g., but not limited to, a network, an external device, a database, memory, etc. Communication between the optical imaging device 102, processor 106, visual display 104, and supplemental data source 108 is provided by wired communications, but in alternate embodiments communication is provided wirelessly. The optical imaging device 102 is arranged to image an internal structure of a subject under observation. The subject can be any object with an internal structure, such as, e.g., but not limited to, a person or an animal, and the internal structure can be any object within a subject, such as, e.g., but not limited to, tissue, an organ, bone, etc. In a surgical setting, the optical imaging device 102 images an intra-operative view of a surgical site. The optical imaging device 102 sends these images as image signals to the processor 106. The image signals correspond to at least a portion of an internal structure corresponding to a time interval. For example, the image signal can correspond to an image of a visible portion of an organ at a particular time during a surgery. The processor 106 receives the image signals from the optical imaging device

102 and processes the images to determine one or more sets of points from the images. The sets of points can define a surface. The surface is a two-dimensional manifold that may be represented in a variety of ways including but not limited to a triangulated surface, a spline, or a parametric implicit function. The processor 106 also receives supplemental data corresponding to at least a portion of the internal structure from the supplemental data source 108. The supplemental data can be based on at least one of pre-operative data or operative data. The processor 106 can receive the supplemental data from a variety of sources, such as, e.g., but not limited to, a network, an external device, a database, memory, etc. The processor 106 registers the supplemental data with a determined set of points from an image signal. The processor 106 then outputs a display image signal to the visual display 104 corresponding to an overlay of the supplemental data with the set of points based on the registration of the supplemental data.

The visual display 104 receives the display image signal and displays a display image corresponding to the display image signal for a user. The display image shows an image of at least a portion of an internal structure corresponding to the image signal provided by the optical imaging device with supplemental data corresponding to the at least a portion of the internal structure overlaid on the image of the at least a portion of the internal structure. The supplemental data is overlaid according to the registration of the supplemental data with the determined set of points.

Figures 2A-2D are diagrams of an internal surface and supplemental data according to an embodiment of the current invention. Figure 2 A is supplemental data 202 including a CT scan of a kidney. Figure 2B is an image 204 of an internal structure received from an optical imaging device 102. Figure 2C is an image 206 of the previous image 204 with a determined set of points 208 connected to form a triangular mesh. Figure 2D is an image 210 of the previous image 204 with supplemental data 202 overlaid on the image of the internal structure based on registering the supplemental data 202 with the determined set of points.

Figure 3 is a flowchart of the process for imaging an internal structure according to an embodiment of the current invention. The process 300 begins with 302 and immediately continues with 304. In 304, the surgical system 100 determines a first and second set of points. The processor 106 receives a first image signal corresponding to at least a portion of an internal structure corresponding to a first time interval. The processor 106 determines a first set of points corresponding to the at least a portion of the internal structure corresponding to the first time interval from the first image signal. In an embodiment, the first image signal includes signals corresponding to a pair of stereo images. The stereo image includes a two- dimensional left and right image of the at least a portion of the internal structure taken from a first and second location. The processor 106 determines the first set of points based on calculating three-dimensional information of the at least a portion of the internal structure. The processor 106 matches portions of the internal structure shown in the left image with portions of the internal structure shown in the right image to determine the distance of portions of the internal structure from the optical imaging device 102. The depths of the portions of the internal structure can be determined by stereo processing using at least one of: an area-based stereo matching method, a dynamic programming stereo matching method, or other stereo matching algorithms. The matching methods further match portions of the internal structure in a left and right image based on previous matches of previous left and previous right image. hi an area-based stereo matching method, blocks of pixels in the left image are matched with blocks of pixels in the right image. Based on the difference in location of blocks of pixels between the left image and right image, the depth of the block of pixels is triangulated. Generally, the larger the difference in location of the matched blocks, the smaller the distance from the optical imaging device 102. hi the system 100, the area-based stereo matching method is a hierarchical stereo matching method with iterative matching using changes in scale of the image. An area-based stereo matching method and example algorithm are described in greater detail in the examples below.

A dynamic programming matching method combines image intensity matching error with smoothness constraints on underlying disparities. For any pixel location in the left image a disparity value for a corresponding point in the right image is determined based on the difference in location of the pixel between the left image and right image. A dynamic programming stereo matching method and example algorithm are described in greater detail in the examples below. hi another embodiment, the processor 106 determines the set of points based on a global optimization method, such as, e.g, but not limited to, graph cuts, loopy belief propagation, or continuous optimization, etc. In an embodiment, the processor 106 combines stereo processing algorithms. The processor 106 uses a first stereo processing method to match portions of stereo images and then uses a second stereo processing method to match portions of the stereo images that could not be matched by the first stereo processing method. According to an embodiment, the processor

106 first performs an area-based stereo matching method and then uses a dynamic programming stereo matching method to match portions that could not be matched by the area-based stereo matching method.

The processor 106 also receives a second image signal corresponding to the at least a portion of the internal structure at a second time interval. The second time interval occurs after the first time interval. After determining the first set of points, the processor 106 determines a second set of points corresponding to the at least a portion of the internal structure corresponding to the second time interval. The processor 106 determines the second set of points in a similar manner to how the processor 106 determines the first set of points. However, the processor 106 also determines the second set of points based on the determined first set of points. In matching blocks of pixels between the left and right images, the processor 106 considers previous matches of blocks of pixels made between previous left and right images. For area-based stereo matching, the processor 106 presumes that for a given block of pixels in a left image, the matching block of pixels in the right image is in a location close to the matching block of pixels in the previous right image. For dynamic programming stereo matching, the processor 106 searches over a small bracket of disparities about a previous left and right image pair.

According to another embodiment, the optical imaging device 102 is a monocular endoscope with range imaging. The monocular endoscope provides the processor 106 single images of at least a portion of an internal structure, and provides distances of portions of the internal structure based on timing the reflection of pulses of light sent from the monocular endoscope, triangulation by use of a projected light pattern, or other substantially similar means. Ranging imaging can also be based on other methods. Instead of utilizing stereo processing to determine depths, the processor 106 determines the set of points corresponding to the at least a portion of the internal structure based on the distances measured by the range imaging. In yet another embodiment, an optical imaging device 102 combines both stereo processing and range imaging to determine a set of points. In yet another embodiment, the stereo processing is combined with a projector to improve the quality of the reconstruction by use of a projected pattern.

According to an additional embodiment, the optical imaging device 102 is a monocular endoscope and determining a set of points is based on a plurality of image signals corresponding to at least a portion of the internal structure corresponding to different time intervals. Pn this embodiment, the spatial locations of a set of points are determined simultaneously with the viewing location of the endoscope. Methods for performing this process are well understood in the computer vision literature. For example, a process for this is described in Wang H., Mirota, D. Ishii, M., and Hager, G., "Robust Motion Estimation and Structure Recovery from Endoscopic Image Sequences With an Adaptive Scale Kernel Consensus Estimator," in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-7, 2008.

From 304, the process 300 continues with 306. In 306, the processor 106 registers the supplemental data corresponding to the at least a portion of the internal structure with at least one of the first set of points or second set of points. In an embodiment, the processor 106 registers the supplemental data with both the first set of points and the second set of points. In registration, the processor 106 matches locations in the set of points with locations corresponding to sets of points extracted from the supplemental data. The registration process includes an initial registration of a determined set of points with a set of points extracted from the supplemental data. Subsequent determined sets of points are registered based on one or more previous registrations.

Registration can be performed using at least one of: rigid registration and non- rigid registration. Rigid registration can be based on the iterative closest point (ICP) algorithm. Iterative closest point (ICP) is an algorithm employed to minimize the difference between two clouds of points. As used by the system 100, one of the point clouds corresponds to a point cloud corresponding to a determined set of points. The other point cloud corresponds to a point cloud corresponding to the supplemental data. Based on minimizing the distances between points in the two point clouds, the supplemental data is registered with the determined set of points.

In an embodiment, the registration to the first set of points is performed in an offline manner, whereby substantial time is taken to establish this initial registration.

Offline registration can include non real-time registration during operation or registration prior to operation of the system 100. In another embodiment, rigid registration is performed with the aid of an external tracking device attached to the optical imaging device 102 or with the aid of kinematics of a robot, such as, e.g., but not limited to, a da Vinci robot, holding the optical imaging device 102. Changes in the determined sets of points of the internal structure as imaged by the optical imaging device 102 can not only be caused by changes in the actual internal structure, but also can be caused by changes associated with the optical imaging device 102. A change in any one of the location orientation, and zoom of the optical imaging device 102 will change the determined set of points of the internal structure as imaged by the optical imaging device 102. The processor 106 receives device data corresponding to at least one of the position, orientation, or zoom of the optical imaging device 102. Based on changes in the data, the processor 106 registers the determined set of points with the supplemental data.

According to an embodiment, after a rigid transformation using ICP, a non- rigid deformable set of points registration is computed by the processor 106. Internal structures can change in shape across different time intervals. Non-rigid deformation improves registering of determined sets of points when the determined set of points is based on at least a portion of an internal structure where the internal structure is a different shape than the shape of the internal structure corresponding to another set of points. Time intervals correspond to the time interval that an internal structure is imaged or the time interval that supplemental data is provided for. The processor 106 uses a spring-mass system to calculate deformations between the determined set of points and the supplemental data. Calculation of deformations is further described below in the examples. According to an embodiment, registration includes an initial registration. The initial registration is determined without the benefit of a previous registration, and may take substantially longer than subsequent registrations. According to an embodiment, registering supplemental data is based on a combination of information corresponding to the image signal and interactive user input. In an embodiment, the initial registration is one or more of a manual registration or a pre-operative registration. Pn manual registration, a user manually registers the supplemental data until it appropriately matches with the determined set of points. The user modifies a set of points, surface, or model corresponding with the supplemental data until the set of points, surface, or model overlays an image of the internal structure. The processor 106 then registers the supplemental data with the determined set of points based on the overlay provided by the user. According to an embodiment, the initial registration includes the manual identification of feature points, such as, e.g., but not limited to, obvious surface landmarks (e.g., surface vasculature), in one or more images. The user provides the system 100 one or more feature points using which the system 100 places special emphasis in matching between determined sets of points and the supplemental data. According to another embodiment, the feature points are chosen automatically according one of several known feature selection methods including but not limited to the Harris corner detector, the SIFT feature detector, and corner detection based on a Laplacian of a Gaussian.

Registrations subsequent to the initial registration are based on one or more previous registrations. The supplemental data can include data for locations that are visible and occluded in the determined set of points. By excluding data corresponding to non- visible locations, the processor 106 avoids attempting to match non- visible locations with the determined set of points. Using this process, the registration process excludes supplemental data corresponding to portions of the internal structure not visible in an image corresponding to an image signal.

In an embodiment, registration is dense registration relying on dense surface information. An example of dense registration is further described below in the examples. However, in another embodiment, registration is sparse registration relying on sparse surface information. The sparse surface information is feature locations chosen in the determined set of points with corresponding feature locations corresponding to the supplemental data. Determining a set of points can be based on a sparse set of feature-matched points. In an embodiment, determining a set of points can also be based on a sparse set of tracked features. The processor 106 registers supplemental data with determined sets of points based on matching feature locations between the supplemental data and the determined set of points. An example of sparse registration is further described below in the examples.

From 306, process 300 continues with 308. In 308, the processor 106 outputs a display image signal to the visual display 104. The visual display 104 displays a display image corresponding to the display image signal. The display image is an image corresponding to supplemental data overlaid on an image of at least a portion of the internal structure, hi an embodiment, during registration the processor 106 renders a surface corresponding to the supplemental data. The supplemental data is registered with the determined set of points by registering the surface corresponding to the supplemental data with the determined surface. When outputting the display image signal, the processor 106 bases the output on re-using the surface corresponding to the supplemental data rendered during the registration process. Thus, for the output of the display image signal, the processor 106 retrieves the previously rendered surface corresponding to the supplemental data to overlay the surface on at least a portion of the internal structure. Based on the registration of supplemental data with determined sets of points, supplemental data is included in the display image so that the supplemental data moves along with the at least a portion of the internal structure shown in the display image. From 308, the process 300 can return to 304 to process a subsequent image signal from the optical imaging device 102. Alternatively, from 308 the process 300 ends with 310.

Figure 4 is a flowchart of images illustrating the process for imaging an internal structure according to an embodiment of the current invention. The flowchart 400 is split into two portions, an initialization phase 402 and a real-time tracking/registration phase 404. The flowchart 400 begins with loading supplemental data 406 in the initialization phase 402. From 406, flowchart 400 continues with 408. In 408, a user manually registers the supplemental data with a determined surface. From 408, flowchart 400 continues with 410. In 410, the processor 106 receives selections of feature points from the user. As seen in 410, a mesh for a determined surface and corresponding points is shown on the image. The user selects points as feature points to use to perform subsequent registrations. From 410, flowchart 400 continues with 412. In 412, the feature points are tracked. Feature point tracking is provided by matching information in the first image to the second image. In an embodiment, this tracking method is based on minimizing the sum of the differences between image information related to the features as is described in Hager, G. and Belhumeur, P., "Efficient Region Tracking With Parametric Models of Geometry and Illumination," in IEEE Trans. PAMI 20 (10), pp. 1125-1139, 1998. From 412, flowchart 400 continues with 414. In 414, automatic surface tracking is provided by registering supplemental data based on the feature tracking. From 414, flowchart 400 continues with 416. In 416, a display image is presented depicting an overlay of a cutting margin on an image of the internal structure based on the automatic surface tracking. As the internal structure changes, these changes are automatically tracked and the overlay updated in a corresponding manner.

Another embodiment of the current invention is directed to a surgical system. The surgical system has an optical imaging device arranged to image an internal structure of a subject under observation, a processor in communication with the optical imaging device, and a visual display in communication with the processor. The optical imaging device provides a first image signal corresponding to at least a portion of the internal structure corresponding to a first time interval and a second image signal corresponding to a least a portion of the internal structure corresponding to a second time interval. The internal structure corresponding to the second time interval is a different shape from the internal structure corresponding to the first time interval. The processor is operable to determine a first set of points corresponding to the at least a portion of the internal structure corresponding to the first time interval from the first image signal, determine a second set of points corresponding to the at least a portion of the internal structure corresponding to the second time interval from the second image signal, receive supplemental data corresponding to the at least a portion of the internal structure, register the supplemental data with at least one of the first set of points or the second set of points, and output a display image signal, to the visual display, corresponding to an overlay of the supplemental data with at least one of the first set of points or the second set of points based on the registering supplemental data.

FIRST EXAMPLE

The first example discusses the development of algorithms for computing registered stereoscopic video overlays that allow a surgeon to view pre-operative imagery during minimally invasive surgery. There are two elements to this approach. The first element is a real-time computational stereo system that operates on stereoscopic video acquired during minimally invasive surgery. The stereo algorithm implements a single-pass dynamic programming optimization specialized for low- texture video sequences with substantial specularities. In particular, the dynamic program employs multi-directional smoothing constraints and makes use of the previous frame stereo estimate to reduce the disparity search. The second element is an efficient deformable surface-to-surface ICP registration. By combining the two, the system is able to perform video to volume registration in real time. This in turn facilitates rendering of annotations and visualization of sub-surface information on structures within the surgical field. Results of the system for both realistic organ phantoms and reconstruction results for recorded surgical images are presented.

A registered three-dimensional overlay of information tied to preoperative or intra-operative volumetric data can provide guidance and feedback on the location of subsurface structures not apparent in endoscopic video data. An integrated deformable stereoscopic registration and visualization methodology for in-vivo information overlay presents this data. This approach dynamically registers preoperative volume data to surfaces extracted from video, and thus permits the surgeon to view pre-operative high-resolution CT or MRI or intra-operative ultrasound scans directly on tissue surfaces. By doing so, operative targets and preoperative plans become clearly apparent.

This example targets minimally invasive surgery using the da Vinci surgical robot. The da Vinci system provides the surgeon with a stereoscopic view of the surgical field. The surgeon operates by moving two master manipulators which are linked to two (or more) patient-side manipulators. Thus, the surgeon is able to use his or her natural 3D hand-eye coordination skills to perform delicate manipulations in extremely confined areas of the body.

A robotic surgery system presents several advantages. First, it provides stereo (as opposed to the more common monocular) data from the surgical field. Second, through the da Vinci API, the system is able to acquire motion data from the master and slave manipulators, thus providing complete information on the motion of the surgical tools. Finally, the system is provided with motion information on the observing camera itself, making it possible to anticipate and compensate for ego- motion. hi the remainder of this example provides more detail on the form of surgical video and the video acquisition system, describes the computational stereo and surface registration algorithms used, and presents the system in operation.

1. DESCRIPTION OF THE SYSTEM

Figure 5 is a block diagram of the architecture of a surgical system for imaging an internal structure according to an embodiment of the current invention. The visualization system involves three major components as shown in Figure 5: a stereo engine for tracking deformable surfaces in endoscopic video, a registration engine performing deformable surface-to-volume registration, and a visualization engine for overlaying information tied to the 3D data into the video stream. This example concentrates on the first two elements. hi the vision literature, most recent work on stereo has either focused on improving the speed of computation, or on imposing constraints on the matching process to improve the accuracy of matching [(M. Z. Brown, D. Burschka, and G. D. Hager. "Advances in Computational Stereo," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8):993-1008, 2003), (D. Scharstein and R. Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms," in International Journal of Computer Vision, 47(l):7-42, May 2002)].

Most real-time stereo systems make use of classical region matching methods operating either on gray-scale images, or on derived representations (J. Banks, M. Bennamoun, K. Kubik, and P. Corke, "Evaluation of new and existing confidence measures for stereo matching," in Proc. of the Image & Vision Computing NZ conference (IVCNZ98), 1998). Stereo systems that make use of hierarchical computations to improve speed and robustness have also been developed [(C. Chang and S. Chatterjee, "Multiresolution stereo: A bayesian approach," in ICPR, pages I: 908-912, 1990), (G. van Meerbergen, M. Vergauwen, M. Pollefeys, and L. Van Gool,

"A hierarchical symmetric stereo algorithm using dynamic programming," in /JCF, 47(l-3):275-285, April 2002)]. Both traditional and hierarchical region matching have been implemented on graphics hardware (R. Yang and M. Pollefeys, "Multi- resolution real-time stereo on commodity graphics hardware," in Proc. CVPR, pages 211-218, 2003). This example uses a hierarchical stereo system which uses structured light to produce reliable reconstructions.

In recent years, various global optimization methods have been developed to improve the accuracy of stereo. Original work focused on scan-line dynamic programming (I. J. Cox, S. L. Hingorani, S. Rao, and B. M. Maggs, "A maximum likelihood stereo algorithm," in Computer Vision and Image Understanding, 63 (3): 542-567, 1996). To avoid so-called streaking effects, other authors have introduce inter-scanline smoothness constraints (Y. Ohta and T. Kanade, "Stereo by intra- and interscanline search using dynamic programming," in IEEE PAMI, 7(2): 139-154, 1985) and hierarchical methods (G. van Meerbergen, M. Vergauwen, M. Pollefeys, and L. Van Gool, "A hierarchical symmetric stereo algorithm using dynamic programming," in /JCF, 47(l-3):275-285, April 2002). Recently, Hirschmuller (H. Hirschmuller, "Stereo vision in structured environments by consistent semi-global matching," in Proc. CVPR, pages 2386-2393, 2006) introduced an impressive system using dynamic programming with multiple smoothness constraints. Other authors have turned to global optimizations such as graph cuts (V. Kolmogorov and R. Zabih, "Graph cut algorithms for binocular stereo with occlusions," in Mathematical Models in Computer Vision: The Handbook. Springer- Verlag, 2005). However, all of these approaches are more computationally intensive than simple region-based matching and do not operate in real time. An exception is (V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, "Probabilistic fusion of stereo with color and contrast for bi-layer segmentation," in IEEE PAMI, 28(9): 1480-1492, 2006) where a modified graph-cuts optimization was exhibited for the specific purpose of foreground-background segmentation. In order to achieve the reliability necessary for the system, the system uses a modified dynamic programming method that operates in real time, detailed below. 2. METHODS

In the following, fully rectified stereo images L (the left image) and R (the right image) are assumed. In general, L(u,v,b, t) (resp. R(u,v,b, t)) are written to denote the pixel value at location (u,v), image band b at time t in the left (resp. right) image. When time is not an issue, the final argument is dropped with the understanding that all references refer to the same time point. For any pixel location (u,v) in the left camera, the disparity value for a corresponding point (u,v ') in the right image is defined as D(u,v) = v-v'. For the purposes of exposition, pixel values are taken as RGB vectors with components ranging from 0 to 1.

2.1 HIERARCHICAL STEREO

Block matching techniques have been widely used for finding corresponding points in stereo vision, visual tracking, and video compression [(M. Z. Brown, D. Burschka, and G. D. Hager. "Advances in Computational Stereo," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8):993-1008, 2003), (J. Banks, M. Bennamoun, K. Kubik, and P. Corke, "Evaluation of new and existing confidence measures for stereo matching," in Proc. of the Image & Vision Computing NZ conference (IVCNZ98), 1998), (D. Scharstein and R. Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms," International Journal of Computer Vision, 47(l):7-42, May 2002)]. Typical challenges for block matching include illumination compensation and tolerance to geometric distortions introduced by the differing viewpoints of the cameras. The former can be ameliorated by filtering the images with a Laplacian of a Gaussian before processing; the influence of the latter is reduced by using a robust matching method. In this example, the well- known sum of absolute differences (SAD) metric is used.

The choice of match window size is important to region-based stereo, hi the application, the variability of surface texture necessitates a very large match window. However, large match windows are known to create large artifacts in the resulting disparity maps, and exacerbate geometric effects. An alternative is to perform hierarchical stereo using an image pyramid. This example implements such an approach as a baseline against which other algorithms are compared. A hierarchical SAD can be briefly described as follows. Both left and right images are downscaled S levels where each change in scale reduces image size by a factor of 2. The notation L_s and R_s, 5 e {1, . . .S} is used to denote the image at scale s which is of size 1/2^S ' relative to the original image. At each scale the algorithm computes the disparity map D_s(w,v) using a fixed region of size RxR using SAD. SAD is used because it has low complexity, high speed, and easy implementation with special SIMD CPU instructions. The disparity map is first computed for the coarsest scale S using a global search over a predefined disparity set -^ , then the disparity value is propagated recursively towards the finer levels where each scale refines the previous results. The disparities are thus given by the following recursive equation:

K

SADi u, = £ £ LUt + }, v + j + d.b) -

/?(« + /, v + ,/.&} H

Ds( M, v ) = arg mϊp SAD[U . r, d )

D_x(UA¹) = argmin SADf — ) j

hi the implementation, Ad < 2 .

2.2 DYNAMIC PROGRAMMING STEREO Dynamic programming provides an efficient method for combining image intensity matching error with smoothness constraints on the underlying disparities. It is well-known that the problems with periodic patterns and the lack of texture, the areas where block matching fails, are addressed by the smoothness assumption. hi order to define the objective function, the following definitions are used. The match cost of a location (u, v) for disparity d is defined as e(βΛ\d) = J^ \L(u, v + dJ^y'! - RiuΛ\h} \ (1) b

This match cost can be extended over a window of pixels; a window size of 1x5 pixels has been found to work well in practice. In one embodiment, the regularization cost of a location (u, v) for disparity d is defined as

<2 I — Λ d\ - d2 C)

where the λ parameter controls the balance between the two parts of the objective function, hi a subsequent embodiment, it was found that the following criterion provides superior performance:

r{d_x,d₂) = /^> _J(max(25,|d₁ -

(2b) where Ps is the smoothness parameter. Typically Ps = 10.0

Using the standard formulation of dynamic programming (I. J. Cox, S. L. Hingorani, S. Rao, and B. M. Maggs, "A maximum likelihood stereo algorithm," in Computer Vision and Image Understanding, 63(3):542-567, 1996), the objective function C(u,v,d) and the memoization buffer M(u,v, d) computed from the cost transitions C_tr at location (u,v) and disparity d is formulated as

C_!r{H, ι\d_td' ) = C(U, V - Ld^ + rldJ^ +eiuΛ'.d)

C(u, ι\(1) = min C_tr[ u,v.d^f ) a'

M(u,%'..d) = argniin Q_r{uA\d>d^l)

In this classical case, the optimization is done scanline by scanline by enforcing only horizontal smoothness constraints. Pn order to ensure vertical smoothness the cost transition function can be extended as

= i (fi n - J . v.d' ) +C[ it. v - Ld-¹)) + r{d..d') -re{UA\d)

(4)

It is noted that this is an approximation in the following sense. At a given location (u, v), it is possible that the left and upper neighbor could have differing disparities. Thus, in principle the regularization function should include separate terms for both neighbors, and the minimization in (4) should operate on two independent disparity values. However, this leads to a quadratic (in disparity) complexity. In practice, the depth resolution of stereo is far less than the lateral (pixel) resolution. As a result, in most cases there are large patches of constant disparity. Thus, the approximation of constant local disparity is quite good, and is well worth the computational savings. Once the memoization buffer is computed for the entire image, the disparity map satisfying both the horizontal and vertical smoothness criteria can be read out recursively as

Di ««m ■ 'k ) = niin M [« . v. d \

DiuΛ'i = ^ [A/ (H + l . v. D(u + L v )) + <5 i Af(«,v + l.Diff, v + l ) )]

In order to improve performance, two additional modifications are made. First, a scale s (typically s = 1 or s = 2) is chosen to perform computations. Note that each factor of two reduction improves performance by a factor of eight. Second, rather than searching over the complete disparity range ^■ '&* in every image, only a small bracket of disparities about the previous image is searched over. With camera motion there are areas of the image (primarily at discontinuities) that occasionally violate this assumption of small change. In practice these areas converge to the correct answer within a small number of frames.

2.3 SU RFA CE REG ISTRA TION

This section provides an estimate of the rigid registration (R₁ ,T_t) of images taken at time t to a preoperative surface given a previous estimate

Preoperative CT images are segmented to produce organ surfaces. In addition, obvious surface landmarks (e.g. surface vasculature) are marked in the images.

The surface landmarks are located within the stereo video data, providing a set of 3D- 3D reference points. Given at least two points plus dense surface information, an initial rigid registration (Ri, Ti) is computed using standard methods (B. K. P. Horn, H. M. Hilden, and S. Negahdaripour, "Closed-form solution of absolute orientation using orthonomal matrices," in J. Opt. Soc. Amer., A-5:1127-1135, 1988).

After this, a special version of the classical ICP algorithm (P. Besl and N. McKay, "A method for registration of 3D shapes," in PAMI, 14(2):239-256, 1992) is used for finding the rigid transformation between the two surfaces. Given the depth map computed from the stereo endoscopic video stream as one point cloud (P_stereo) and the 3D model of the anatomy placed in the FOV as the other point cloud (?_m0d_eϊ)_> the proposed algorithm registers them as follows. While V _stereo is a surface mesh that contains only the visible 3D details of the anatomy, P,,,_0<fe_/ contains all the visible and occluded anatomical details. It is assumed that P _st_er_eo is a small subset of the surface points of P_m0</_e/, thus before finding the point correspondence the algorithm renders the z-buffer of the pre-operative model P_OTOr_fe/ using R_t-i, T_t-i and extracts those points that are visible by the virtual camera

The resulting P 'mode/surface point cloud is a surface mesh similar to P_s/_e,_eo, thus finding the point correspondence with ^stereo is now possible.

The proposed method for finding the point matches is the closest point algorithm, accelerated by using a k-d tree (J. P. Williams, R. H. Taylor, and L. B. Wolff, "Augmented k-d techniques for accelerated registration and distance measurement of surfaces," in Computer Aided Surgery: Computer-Integrated Surgery of the Head and Spine, pages P01-21, Linz, Austria, 1997). In this case for each v e F '_modei_suφce it finds the closest surface point (CP(v)) on ¥ stereo- For finding the rigid transformation the closed form solution technique employing SVD (K. S. Aran, T. S. Huang, and S. D. Blostein, "Least squares fitting of two 3-D point sets," in IEEE Trans. Pat. Anal. Machine Intell., 9:698-700, 1987) is used. Since both point clouds are a surface, the convergence of the iterative registration process is fast; the translation vector and the rotations around the x and the y axes are computed accurately in the first couple of iterations.

After finding the estimated rigid transformation using ICP, a deformable surface registration is computed. For these purposes, a set of points are defined below the surface in the CT volume, and a spring-mass system is defined as reported in (K. Montgomery et. al. Spring, "A general framework for collaborative real-time surgical simulation," in Proc. MMVR, 2002). Springs are also defined between the landmark points, and between the video and CT surface. A numerical solution minimizing the sum of the stereo reconstruction error and the spring tension is computed. The result provides the volume deformation that best fits the observed stereoscopic video.

The implementation computes the forces between the reconstructed surface and the CT surface. Given the point correspondence computed by the rigid transformation, the forces (F(Vj) between the corresponding surface points can be computed.

For each v e P '_mOdeiswface

Fi V ¹I = WPstcrett CP(v) J - Pmodetsurface 11' ) )! *6⁾

Since the surface model is represented by a 2D mesh where values of ~P_modei_suιf_ace(v) determine the elevations of the mesh at the surface points, the magnitude of the force between P_ite;e0 and ^~P_modei_sιφ_ceis perpendicular to the surface. As a result F can be represented as a 2D surface as well, which at each vertex position has the value of the force between the surfaces. In order to eliminate the noise of the reconstructed surface, the force field is filtered before deformation (where N(v) is the set of neighbors of v) :

For each v e P,,,_orfe/_s,,,/_fl<

Ffώ Wl *»

Finally the forces are applied on the model surface to get the deformed surface

\y defoi medsiiiface) •

* defot medsur jci^e I *' •' ¹^ ^/l I "rnodclsiir face H^' J ' * U' ? J i I A — 1 Irmodeisuπace ^^' l 3. RESULTS

Here are presented results comparing the performance and output of the previously described stereo methods. In all cases, prior to stereo computation, stereo image rectification was first performed using bilinear interpolation. The algorithm then adjusted the global brightness and color balance values of the left video channel to the values measured on the right channel. The balancing is performed by computing the average value for each image band (red, green, and blue), and then multiplying the values of the left image by a factor that makes these averages equal. The images are then processed with a digital unsharp filter based on a Gaussian low-pass filter. The amount of the filter is 600% (high frequencies are boosted by a factor of 6), the radius is 10 pixels, and the minimum brightness change threshold is 3. These preprocessing steps have been found to enhance the effectiveness of the SAD image matching metric on surgical images.

3.7 PHANTOM RESULTS

3.1.1 SAD

Although computing generally plausible disparities, if the block matching at any level fails to select the right disparity level then the error is propagated to the finer scales and the algorithm remains stuck in a local minima. Figure 6 illustrates a comparison of stereo reconstructions of a surface of a phantom using the surgical system without structured light according to an embodiment of the current invention. Figure 6 shows stereo reconstruction of phantom sequence without structured light, hi the top two images are shown a hierarchical SAD disparity map and 3D mesh (number of scales: S = 4; SAD block size: R = 2). hi the bottom two images are shown a dynamic programming disparity map and 3D mesh. Figure 6 demonstrates a difficult case for the SAD method because it features large stationary and periodic regions and specularities where even the coarser scales are unable to find the accurate disparity level. Figure 7 illustrates a stereo reconstruction of a surface of a phantom using the surgical system with structured light according to an embodiment of the current invention. Figure 7 shows hierarchical SAD reconstruction of phantom image using structured light (uniform color noise). The left image shows a disparity map. The right image shows a 3D mesh. As shown in Figure 7, most of the problems with the hierarchical block matching approach can be resolved by adding more details to the image content using structured light.

3.1.2 Dynamic Programming

The DP algorithm provides accurate and stable disparity map computations, even on poorly detailed and periodic textures. Figure 6 clearly demonstrates that the DP implementation is less sensitive to specularities and it is superior to the hierarchical SAD on images, producing almost completely homogeneous regions because of the smoothness constraint in the optimization process. On the other hand, assuming smooth surface somewhat limits the application as a generic computational stereo algorithm in case of large disparity discontinuities. However the targeted application is an area where the smoothness assumption is reasonable because of the nature of the scene. 3.1.3 Registration Results

Figures 8 A and 8B illustrate registration of a deformable structure according to an embodiment of the current invention. Figure 8 A shows rigid registration of the deformable phantom. The left image shows the phantom without deformation (right channel). The right image shows the registered model on overlay. As seen in the Figure 8 A, the model of the scene was a detailed mesh (5000 triangles) of the phantom and the nearby surroundings. The tests demonstrate that the registration is accurate despite of the significant differences between the actual reconstructed mesh and the model.

Figure 8B shows simple deformable registration of the deformed phantom. The left image shows a reconstructed 3D mesh of the phantom (stereo by DP). The right image shows a model surface deformed to the reconstructed surface and rendered on overlay. However the deformed surface shown in Figure 8B is smoothed by deformable registration, so it has lost some detail especially at the edges of the phantom. On the other hand the deformations are clearly observable on the model's surface which indicates that the method provides a reliable virtual "force field" for the volumetric spring-mass model.

3.2 RESULTS FROM LIVE IMA GES 3.2.1 SAD

In this example the same parameters as for the phantom images are used (number of scales: S = 4; SAD block size: R = 2). Figure 9 illustrates a comparison of stereo reconstructions of an internal structure using the surgical system according to an embodiment of the current invention. Figure 9 shows stereo reconstruction of intra-operative sequence without structured light. The top image shows a hierarchical SAD disparity map and 3D mesh. The bottom image shows a dynamic programming disparity map and 3D mesh. The hierarchical SAD performed reasonably well on the intra-operative sequence (Figure 9), although the amplitude of the noise implies that the details at the scales 2-3 did not contain enough information to completely resolve the depths in far left and the lower right side of the scene. On the other hand, as seen at the tool area, the algorithm works well at discontinuities, because it accommodates sharp transitions between large disparity steps.

3.2.2 Dynamic Programming The dynamic programming method provides improved results on the intraoperative sequence as well as shown in Figure 9. The high depth resolution and the fine details demonstrate that the algorithm had no difficulties dealing with the discontinuities of the anatomical surface. The only cases where significant discontinuities may occur are the areas of the surgical tools in the field of view. Fortunately, modern surgical systems such as the Intuitive Surgical da Vinci system are often capable of providing the accurate 3D positions of the tools relative to the camera frame which makes it possible to mask out the image regions covered by the tools before applying DP.

This example presents a system for performing deformable registration of geometric surfaces to a stereo video stream. The system operates at near real-time rates, and produces good results even under the challenging conditions found in intraoperative video.

The registration and visualization system can also incorporate an intensity- based registration method to improve algorithm stability in regions where there is poor geometric constraint. The algorithm can be parallelized to improve the speed of both stereo processing and registration. Finally, online calibration correction can deal with the changing optical parameters of the system.

SECOND EXAMPLE This example describes algorithms for computing registered stereoscopic video overlays that allow a surgeon to view pre-operative imagery during minimally invasive surgery. There are three elements to the approach. The first element is a real-time computer vision system that operates on stereoscopic video acquired during minimally invasive surgery to extract geometric information. Two variations on this system are presented: a dense stereo algorithm and a sparse point-based method. The second element is an efficient deformable surface-to-surface ICP registration. The final element is a display system that has been customized to operate well with stereo vision. By combining these elements, the system is able to perform video to volume registration and display in real time. This in turn facilitates rendering of annotations and visualization of subsurface information on structures within the surgical field. This example describes a system that provides the surgeon with a three- dimensional information overlay registered to pre-operative or intra-operative volumetric data. The system lies in its use of stereo video data to perform the registration without recourse to an external tracking system. This example of the system specifically augments the surgical view during laparoscopic kidney procedures.

1. METHODS

Briefly, the implemented system provides three general functions: 1) extraction of 3D information from stereo video data; 2) registration of video data to preoperative images; and 3) rendering and information display. Two methods for computing depth information and performing registration are described: a dense stereo matching algorithm, and a local point-based tracking algorithm. In all that follows, it is assumed that the endoscope has been calibrated to determine the corresponding 2D projection parameters (Zhang, Z., "A flexible new technique for camera calibration," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11) (2000) 1330-1334). Using these parameters the video frames can be rectified to simulate a perfect perspective projection.

1.1 VIDEO TO CT REGISTRATION 1.1.1 Extracting Surfaces From Stereo

The approach is a highly optimized dynamic programming stereo algorithm which provides a desirable tradeoff between reliability and speed. In the following discussion, fully rectified color stereo images L(u,v) (the left image) and R(u,v) image) are assumed. For any pixel location (u,v) in the left camera, the disparity value for a corresponding point (u, v ') in the right image is defined as D(u,v) = v-v '. Given a known camera calibration, it is well known how to convert a dense disparity representation into a set of 3D points (Brown, M.Z., Burschka, D., Hager, G.D., "Advances in Computational Stereo," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8) (2003) 993-1008). The dynamic programming stereo algorithm optimizes the following objective function:

CtΛu. vJJ') = - lC{u - ] .r._tf } -C!κ.v- l /l) + r_\d.d^l ) -t- e\ιt,v.d'- U )

where e is an image match cost of two color pixels and C is the cost of disparities differing from their neighbors. The image matching function is the sum of absolute differences (SAD) of a pair of color pixels. A cost limit of 25 gray values which has been experimentally proven to improve matching performance (Scharstein, D., Szeliski, R., "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms," in International Journal of Computer Vision, 47(1) (2002) 7-42) is imposed. The smoothness term is a linear function of the disparity values. It is noted that (1) is an approximation in the following sense. At a given location (u,v), it is possible that the left and upper neighbors could have differing disparities. Thus, in principle the regularization function should include separate terms for both neighbors, and the minimization in (1) should operate on two independent disparity values. However, this leads to a quadratic (in disparity) complexity. In practice, the depth resolution of stereo is far less than the lateral (pixel) resolution. As a result, in most cases there are large patches of consistent disparity. Thus, the approximation of constant local disparity is quite good, and is well worth the computational savings. Once (1) is computed for the entire image, the disparity map satisfying both the horizontal and vertical smoothness criteria can be read out recursively as \ - I ) ) ->} ⁱ²⁾

hi order to improve performance, two additional modifications are made. First, a reduced scale (typically a factor of two to four) is chosen to perform computations. Each factor of two reduction improves performance by a factor of eight. Second, rather than searching over the complete disparity range ^ in every image, only a small bracket of disparities about the previous stereo pair in the image sequence is searched. With camera motion there are areas of the image (primarily at discontinuities) that occasionally violate this assumption of small change. In practice these areas converge to the correct answer within a small number of frames. As an optional feature, the algorithm is capable of computing sub-pixel disparity estimates by fitting a parabola on the costs associated to the neighbors of the winning discrete disparity. The location of the apex of resulting parabola determines the estimate of the sub-pixel disparity value.

To reduce the effects of illumination, the video data was preprocessed by first globally adjusting the brightness and color values of the left video channel to the values measured on the right channel, and then applying a Laplacian high-pass filter in advance to increase the fine detail contrast.

1.1.2 Dense Registration Method

Given a 3D point cloud from stereo, a CT surface segmentation, and a good starting point (typically available based on prior knowledge of the procedure), the system computes a rigid registration (R_t ,T_t ) of images taken at time t to a preoperative surface given a previous estimate (R_t-i,T_t-i).

For this purpose, the system uses a modified version of the classical ICP algorithm (Besl, P., McKay, N., "A method for registration of 3D shapes," in PAMI, 14(2) (1992) 239-256) applied to the depth map computed from the stereo endoscopic video stream as one point cloud (? _stereo) and the 3D model of the anatomy placed in the FOV as the other point cloud (P_mOded- While V_stereo is a surface mesh that contains only the visible 3D details of the anatomy, ?_moιi_ei contains all the visible and occluded anatomical details. It is assumed that P_ito(?0 is a small subset of the surface points of P_mor_fe/, thus before finding the point correspondence, the algorithm renders the z-buffer of the pre-operative model P_modei using R_t-i,T_t-i and extracts those points that are visible by the virtual camera (Pmodeisnφce)- The resulting Ϋmodeisuφce point cloud is a surface mesh similar to ¥ _ster_eo, thus finding the point correspondence with P_S^₀ is now possible. The implemented method for finding the point matches is accelerated by using a k-d tree (Williams, J.P., Taylor, R.H., Wolff, L.B., "Augmented k-d techniques for accelerated registration and distance measurement of surfaces," in Computer Aided Surgery: Computer-Integrated Surgery of the Head and Spine, Linz, Austria (1997) PO 1-21). In this case, for each v e P modei_swface it finds the closest surface point (CP(v)) on P _stereo- For finding the rigid transformation the closed-form solution technique employing SVD (Aran, K.S., Huang, T.S., Blostein, S.D., "Least-squares fitting of two 3-D point sets," in IEEE Trans. Pat. Anal. Machine Intell. 9 (1987) 698-700) is used. Since both point clouds are a surface, the convergence of the iterative registration process is quite rapid.

After finding the estimated rigid transformation using ICP, a deformable surface registration is computed. For these purposes, a set of points are defined below the surface in the CT volume, and a spring-mass system is defined as reported in (K. Montgomery et. al.: Spring, "A general framework for collaborative real-time surgical Simulation," in Proc. MMVR. (2002)). For efficiency, the current implementation computes just the forces between the reconstructed surface and the CT surface. Given the point correspondence computed by the rigid transformation, the strain (F(V)) between the corresponding surface points is computed as

F {v)

- P_mni{etseirface{v) ) O) where the parameter γ e [0,1] determines the strength of deformation.

The force field which the reconstructed surface exerts the model can now be defined. In an ideal case, where ICP finds always the perfect matching point pairs, the strain vectors could be applied to deform the model directly. However ICP is a rigid registration algorithm thus the point correspondence between the model and the deformed surface will always be somewhat incorrect. In order to overcome this difficulty the algorithm filters the strain field (F_βi,) before applying deformation. The filtering is done with a gaussian kernel on the neighboring strain vectors. The neighborhood is defined in 2D on the visible surface mesh of the model. Finally the force field is applied on the model surface to yield the deformed surface (P_def_suφ_ce)'-

^S = O - 1 H P_ιm,^,J,-ϊa<φeΛ^Y) + ^FβlΛⁱ } ^'l + ^?-^P _tk/m_ffarA^v-' ^~ 1 ^'<■ <4⁾ where λ e [0,1] allows adjusting the temporal consistency of deformation. 1.1.3 Sparse Registration Methods

As shown in the results section, there are many cases where surface geometry alone is inadequate for stable registration. For such cases, a stronger point-based registration method is implemented. To use this method, it is assumed that the model has been brought into registration with the video, either using a dense registration or by other manual means.

Once a registration is known, a set of image feature locations, pi, p₂, . . . p_n in one image are chosen. A disparity map is calculated as described above. With this, the corresponding points in the second image are known, and the 3D locations of those points in CT coordinates are given by the registration. Thus, a direct 3D to 3D point registration can be performed using (Arun, K.S., Huang, T.S., Blostein, S. D., "Least- squares fitting of two 3-D point sets," in IEEE Trans. Pat. Anal. Machine Intell. 9 (1987) 698-700). To maintain the registration, a simple brute- force template tracking algorithm has been implemented to recompute the feature points in each image. This brute force algorithm simply finds the best match to the chosen feature point by evaluating all points in a 40x40 region about the previous point location, and chooses that point has the lowest sum of absolute (SAD) difference, hi every frame of the video, the new feature locations are used to recompute the reconstructed 3D points, and the surface model is re-registered using these points.

1.2 RENDERING AND DISPLAY

One of the major challenges is to perform the video processing, registration, and stereoscopic rendering of the 3D overlay in real time. In order to eliminate the redundancies in the displaying and registration pipelines, a special-purpose 3D rendering engine has been developed. This rendering engine incorporates all of the functionality of a typical graphics pipeline, including a full geometrical transformation engine with Z-buffering, several lighting models, and various forms of transparent display. The graphics pipeline supports fast stereo rendering with no redundancy in the lighting and transformation phases, and shared texture and model memories. By sharing the same memory layout for representing the 3D geometry and having direct access to the intermediate processing steps, the list of visible triangles and the Z- buffer can be extracted from the 3D rendering pipeline and reused during the dense 3D to 3D registration. The gains in memory efficiency and computational complexity are significant. The final system can render 5 million stereo triangles per second with Texture + Lighting + Transparency on a Dual Pentium 4 3.2 GHz.

The visual appearance of the 3D models is designed so that they are visible but not obtrusive. Moreover other 3D models are designed that provide additional intraoperative visual guidance for dissecting the tumor, hi particular, the system also displays the kidney collecting system to help the surgeon understand the underlying anatomy relative to the video view. Figure 2 also shows the final display used for partial nephrectomy.

2. RESULTS

Here, details of the performance of the previously described stereo and registration methods are presented. In terms of performance, the speed of the stereo and display engines is more than adequate for the models used in the example. In the case of the former, dense stereo to 1/4 pixel resolution on 240x320 images (one half VGA resolution) at over 10 frames/second can be computed. With respect to the latter, typically there are no more than 30,000 triangles on 3D scenes which can be rendered in stereo by the engine over 100 frames per second. Both of these operate in parallel on separate CPU cores. Feature tracking is about as fast as dense stereo (10 frames/second) because a large template size is required for the high accuracy feature tracking on endoscopic video data.

2.1 STEREO AND REGISTRATION ON IN-VIVO DATA ANIMAL DATA 2.1.1 Dynamic Programming Stereo

The dynamic programming method demonstrates very stable 3D reconstruction on an intra-operative sequence. The high depth resolution and the fine details demonstrate that the algorithm had no difficulties dealing with the discontinuities of the anatomical surface.

2.1.2 Registration Results

A video frame where the anatomy was not covered by any surgical tools is selected and a 3D surface model from the reconstructed surface mesh is created. This model is used for rigid and deformable registration. The rigid registration gave a perfect match for the video frame from which the model was created. For the rest of the video, the rigid registration provided a good approximation of the motion of the corresponding anatomical feature. ICP was configured to stop at 2 mm accuracy and process no more than 5 iterations. Enabling deformable registration improved the surface matching significantly. Figure 10 illustrates a comparison of rigid registration of a surface and deformable registration of the surface according to an embodiment of the current invention. The top row of Figure 10 shows images based on rigid registration of the anatomical surface model. The bottom row of Figure 10 shows images based on deformable registration of the anatomical surface model. The left side images of Figure 10 show the deformed wireframe model. The right side images of Figure 10 show the deformed surface model rendered with depth shading. The deformed surface behaves like a latex surface: stretching, shrinking and sticking to the reconstructed surface (see Figure 10). For rigid registration the average error measured by ICP was below 2 mm per vertex in the video segment where the surgical tool was out of the work area (successful registration in 1 iteration). The deformable registration reduced the average registration error below 0.5 mm for most of the same video segment. 2.1.3 Evaluation on In- Vivo Patient Data

In the first case, video data is recorded during a laparoscopic partial nephrectomy carried out using a surgical grade stereoscopic endoscope (Scholly America, West Boylston, MA). A segment of the video is chosen where the kidney surface has been exposed prior to surgical excision of the tumor. The corresponding CT image for this patient is segmented manually by a surgeon producing 3D models for the kidney surface, the tumor, and the collecting system in VTK file format.

Figure 2 shows the final display used for partial nephrectomy. Figure 2 shows laparoscopic partial nephrectomy of a tumor (sequence 1, left to right): segmented CT model; source image (left channel); after manual registration and feature point selection; automatic registration and augmented reality overlay of the safety margin of dissection (red ring). The ring model represents the cutting margins on the kidney surface around the tumor. This example shows automated full-surface registration and manual registration followed by feature point selection and tracking. Due to the limited amount of kidney surface appearing in the video, manual registration followed by "pinning" with surface feature points had superior stability as well as providing better overall performance. Figure 11 illustrates a sequence of images showing the result of using automatic registration of a surface to present an overlay. Figure 11 show several examples from a second sequence taken for the same case. The second case is the surgical removal of a large kidney stone. The data is again recorded with a surgical grade stereoscopic endoscope, this time in the context of a robotic surgery carried out with the da Vinci system (Intuitive Surgical, Sunnyvale, CA). The CT segmentation employed does not contain the collecting system, but does contain both the stone and the kidney surface. This segmentation is also performed manually. As before, both the pure surface-based registration and the registration using feature points can be used, however the latter may be much more stable.

This example has presented a system for performing deformable registration and display on solid organ surfaces observed with a stereo video endoscope. The system operates at near real-time rates, and produces good results even under the challenging conditions found in intra-operative video. The results of the stereo processing and registration system on both phantom and real video data are presented, and the displays on two human cases are evaluated, hi an embodiment, surface and local feature tracking registration can be combined, and selection of points for the latter can be automated.

THIRD EXAMPLE

In a third example, the optical imaging device 102 is a da Vinci stereo endoscope attached to an arm of a da Vinci surgical robot, the visual display 104 is a da Vinci master console, and the processor 106 is an external processing unit interfaced to the da Vinci stereo endoscope and the da Vinci master console. Figure 12 is an illustration of a surgical system for imaging an internal structure according to an embodiment of the current invention. The combination of the invention with the da Vinci surgical system provides additional advantages. The user is able to naturally interact with and visualize surfaces and volumes using the da Vinci master console; this is advantageous during the initial registration phase. The da Vinci robot is able to measure the motion of the surgical tools in the scene, making it simpler to account for the scene occlusions introduced by the tools. Also, the da Vinci master console is a stereoscopic display, making it possible to provide the user with an impression of depth in the overlaid information.

The current invention is not limited to the specific embodiments of the invention illustrated herein by way of example, but is defined by the claims. One of ordinary skill in the art would recognize that various modifications and alternatives to the examples discussed herein are possible without departing from the scope and general concepts of this invention.

Claims

WE CLAIM:

1. A surgical system, comprising: an optical imaging device arranged to image an internal structure of a subject under observation; a processor in communication with said optical imaging device; and a visual display in communication with said processor, wherein said optical imaging device provides: a first image signal corresponding to at least a portion of said internal structure corresponding to a first time interval; and a second image signal corresponding to a least a portion of said internal structure corresponding to a second time interval, wherein said processor is operable to: determine a first set of points corresponding to the at least a portion of said internal structure corresponding to the first time interval from the first image signal; determine a second set of points corresponding to the at least a portion of said internal structure corresponding to the second time interval from the second image signal based on the determined first set of points; receive supplemental data corresponding to the at least a portion of said internal structure; register the supplemental data with at least one of the first set of points or the second set of points; and output a display image signal, to said visual display, corresponding to an overlay of the supplemental data with at least one of the first set of points or the second set of points based on the registering supplemental data.

2. The surgical system of claim 1, wherein said register the supplemental data comprises registering the supplemental data with the second set of points based on registration of the supplemental data with the first set of points.

3. The surgical system of claim 1, wherein said register supplemental data further comprises: excluding supplemental data corresponding to portions of said internal structure not visible in an image corresponding to at least one of the first image signal or the second image signal.

4. The surgical system of claim 1, wherein said register supplemental data is based on a combination of information corresponding to the image signal and interactive user input.

5. The surgical system of claim 1, wherein the image signals correspond to stereo images, wherein said determine the second set of points comprises determining the second set of points based on a dynamic programming matching algorithm.

6. The surgical system of claim 1, wherein the image signals correspond to stereo images, wherein said determine the second set of points comprises determining the second set of points based on an area-based matching algorithm.

7. The surgical system of claim 1, wherein the image signals correspond to stereo images, wherein said determine the second set of points comprises determining the second set of points based on a global optimization method comprising at least one of: graph cuts, loopy belief propagation, or continuous optimization.

8. The surgical system of claim 1, wherein said determine the second set of points is based on a sparse set of feature-matched points.

9. The surgical system of claim 1, wherein said determine the second set of points is based on a sparse set of tracked features.

10. The surgical system of claim 1 , wherein said register supplemental data is based on rigid registration and non-rigid registration.

11. The surgical system of claim 10, wherein said non-rigid registration comprising non-rigid registration based on a spring-mass model.

12. The surgical system of claim 1, wherein said optical imaging device comprises a stereo endoscope.

13. The surgical system of claim 1, wherein said optical imaging device comprises a monocular endoscope, wherein said determine the second set of points is based on a plurality of image signals corresponding to at least a portion of said internal structure corresponding to different time intervals.

14. The surgical system of claim 1, wherein the supplemental data comprises data from at least one of: a computerized tomography (CT) image; a magnetic resonance (MR) image; an ultrasound image; or a laser range scanner.

15. A surgical system, comprising: an optical imaging device arranged to image an internal structure of a subject under observation; a processor in communication with said optical imaging device; and a visual display in communication with said processor, wherein said optical imaging device provides: a first image signal corresponding to at least a portion of said internal structure corresponding to a first time interval; and a second image signal corresponding to a least a portion of said internal structure corresponding to a second time interval, wherein said internal structure corresponding to the second time interval is a different shape from said internal structure corresponding to the first time interval, wherein said processor is operable to: determine a first set of points corresponding to the at least a portion of said internal structure corresponding to the first time interval from the first image signal; determine a second set of points corresponding to the at least a portion of said internal structure corresponding to the second time interval from the second image signal; receive supplemental data corresponding to the at least a portion of said internal structure; register the supplemental data with at least one of the first set of points or the second set of points; and output a display image signal, to said visual display, corresponding to an overlay of the supplemental data with at least one of the first set of points or the second set of points based on the registering supplemental data.

16. The surgical system of claim 15, wherein said register the supplemental data comprises registering the supplemental data with the second set of points based on registration of the supplemental data with the first set of points.

17. The surgical system of claim 15, wherein said register supplemental data further comprises: excluding supplemental data corresponding to portions of said internal structure not visible in an image corresponding to at least one of the first image signal or the second image signal.

18. The surgical system of claim 15, wherein said register supplemental data is based on a combination of information corresponding to the image signal and interactive user input.

19. The surgical system of claim 15, wherein the image signals correspond to stereo images, wherein said determine the second set of points comprises determining the second set of points based on a dynamic programming matching algorithm.

20. The surgical system of claim 15, wherein the image signals correspond to stereo images, wherein said determine the second set of points comprises determining the second set of points based on an area-based matching algorithm.

21. The surgical system of claim 15, wherein the image signals correspond to stereo images, wherein said determine the second set of points comprises determining the second set of points based on a global optimization method comprising at least one of: graph cuts, loop belief propagation, or continuous optimization.

22. The surgical system of claim 15, wherein said determine the second set of points is based on a sparse set of feature-matched points.

23. The surgical system of claim 15, wherein said determine the second set of points is based on a sparse set of tracked features.

24. The surgical system of claim 15, wherein said register supplemental data is based on rigid registration and non-rigid registration.

25. The surgical system of claim 24, wherein said non-rigid registration comprising non-rigid registration based on a spring-mass model.

26. The surgical system of claim 15, wherein said optical imaging device comprises a stereo endoscope.

27. The surgical system of claim 15, wherein said optical imaging device comprises a monocular endoscope with range imaging.

28. The surgical system of claim 15, wherein said optical imaging device comprises a monocular endoscope, wherein said determine the second set of points is based on a plurality of image signals corresponding to at least a portion of said internal structure corresponding to different time intervals.

29. The surgical system of claim 15, wherein the supplemental data comprises data from at least one of: a computerized tomography (CT) image; a magnetic resonance (MR) image; an ultrasound image; or a laser range scanner.