WO2024042468A1 - Surgical robotic system and method for intraoperative fusion of different imaging modalities - Google Patents

Surgical robotic system and method for intraoperative fusion of different imaging modalities Download PDF

Info

Publication number
WO2024042468A1
WO2024042468A1 PCT/IB2023/058368 IB2023058368W WO2024042468A1 WO 2024042468 A1 WO2024042468 A1 WO 2024042468A1 IB 2023058368 W IB2023058368 W IB 2023058368W WO 2024042468 A1 WO2024042468 A1 WO 2024042468A1
Authority
WO
WIPO (PCT)
Prior art keywords
ultrasound
laparoscopic
tissue
probe
image processing
Prior art date
Application number
PCT/IB2023/058368
Other languages
French (fr)
Inventor
Faisal I. Bashir
Meir Rosenberg
Original Assignee
Covidien Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Covidien Lp filed Critical Covidien Lp
Publication of WO2024042468A1 publication Critical patent/WO2024042468A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B8/00Diagnosis using ultrasonic, sonic or infrasonic waves
    • A61B8/08Detecting organic movements or changes, e.g. tumours, cysts, swellings
    • A61B8/0833Detecting organic movements or changes, e.g. tumours, cysts, swellings involving detecting or locating foreign bodies or organic structures
    • A61B8/085Detecting organic movements or changes, e.g. tumours, cysts, swellings involving detecting or locating foreign bodies or organic structures for locating body or organic structures, e.g. tumours, calculi, blood vessels, nodules
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B8/00Diagnosis using ultrasonic, sonic or infrasonic waves
    • A61B8/42Details of probe positioning or probe attachment to the patient
    • A61B8/4245Details of probe positioning or probe attachment to the patient involving determining the position of the probe, e.g. with respect to an external reference frame or to the patient
    • A61B8/4263Details of probe positioning or probe attachment to the patient involving determining the position of the probe, e.g. with respect to an external reference frame or to the patient using sensors not mounted on the probe, e.g. mounted on an external reference frame
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B8/00Diagnosis using ultrasonic, sonic or infrasonic waves
    • A61B8/48Diagnostic techniques
    • A61B8/483Diagnostic techniques involving the acquisition of a 3D volume of data
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B8/00Diagnosis using ultrasonic, sonic or infrasonic waves
    • A61B8/52Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves
    • A61B8/5215Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves involving processing of medical diagnostic data
    • A61B8/5238Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves involving processing of medical diagnostic data for combining image data of patient, e.g. merging several images from different acquisition modes into one image
    • A61B8/5246Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves involving processing of medical diagnostic data for combining image data of patient, e.g. merging several images from different acquisition modes into one image combining images from the same or different imaging techniques, e.g. color Doppler and B-mode
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B8/00Diagnosis using ultrasonic, sonic or infrasonic waves
    • A61B8/52Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves
    • A61B8/5215Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves involving processing of medical diagnostic data
    • A61B8/5238Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves involving processing of medical diagnostic data for combining image data of patient, e.g. merging several images from different acquisition modes into one image
    • A61B8/5246Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves involving processing of medical diagnostic data for combining image data of patient, e.g. merging several images from different acquisition modes into one image combining images from the same or different imaging techniques, e.g. color Doppler and B-mode
    • A61B8/5253Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves involving processing of medical diagnostic data for combining image data of patient, e.g. merging several images from different acquisition modes into one image combining images from the same or different imaging techniques, e.g. color Doppler and B-mode combining overlapping images, e.g. spatial compounding
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B8/00Diagnosis using ultrasonic, sonic or infrasonic waves
    • A61B8/52Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves
    • A61B8/5269Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves involving detection or reduction of artifacts
    • A61B8/5276Devices using data or image processing specially adapted for diagnosis using ultrasonic, sonic or infrasonic waves involving detection or reduction of artifacts due to motion
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B90/00Instruments, implements or accessories specially adapted for surgery or diagnosis and not covered by any of the groups A61B1/00 - A61B50/00, e.g. for luxation treatment or for protecting wound edges
    • A61B90/36Image-producing devices or illumination devices not otherwise provided for
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/10Computer-aided planning, simulation or modelling of surgical operations
    • A61B2034/101Computer-aided simulation of surgical operations
    • A61B2034/105Modelling of the patient, e.g. for ligaments or bones
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/20Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
    • A61B2034/2046Tracking techniques
    • A61B2034/2051Electromagnetic tracking systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/20Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
    • A61B2034/2046Tracking techniques
    • A61B2034/2059Mechanical position encoders
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B90/00Instruments, implements or accessories specially adapted for surgery or diagnosis and not covered by any of the groups A61B1/00 - A61B50/00, e.g. for luxation treatment or for protecting wound edges
    • A61B90/36Image-producing devices or illumination devices not otherwise provided for
    • A61B2090/364Correlation of different images or relation of image positions in respect to the body
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B90/00Instruments, implements or accessories specially adapted for surgery or diagnosis and not covered by any of the groups A61B1/00 - A61B50/00, e.g. for luxation treatment or for protecting wound edges
    • A61B90/36Image-producing devices or illumination devices not otherwise provided for
    • A61B90/37Surgical systems with images on a monitor during operation
    • A61B2090/378Surgical systems with images on a monitor during operation using ultrasound
    • A61B2090/3782Surgical systems with images on a monitor during operation using ultrasound transmitter or receiver in catheter or minimal invasive instrument
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/30Surgical robots
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B34/00Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
    • A61B34/30Surgical robots
    • A61B34/37Master-slave robots

Definitions

  • Surgical robotic systems are currently being used in a variety of surgical procedures, including minimally invasive medical procedures.
  • Some surgical robotic systems include a surgeon console controlling a surgical robotic arm and a surgical instrument having an end effector (e.g., forceps or grasping instrument) coupled to and actuated by the robotic arm.
  • the robotic arm In operation, the robotic arm is moved to a position over a patient and then guides the surgical instrument into a small incision via a surgical port or a natural orifice of a patient to position the end effector at a work site within the patient’s body.
  • an imaging system includes a laparoscopic ultrasound probe configured to be inserted through an access port and to obtain a plurality of 2D ultrasound images of a tissue.
  • the system also includes a laparoscopic camera configured to capture a video stream of the tissue.
  • the system further includes an image processing device configured to receive a volumetric image of the tissue formed from a first modality images, generate an ultrasound volume from the plurality of 2D ultrasound images, register the ultrasound volume with the volumetric image of tissue, and generate an overlay of the volumetric image of tissue and a 2D ultrasound image of the plurality of 2D ultrasound images.
  • the system additionally includes a screen configured to display the video stream showing the laparoscopic ultrasound probe and the overlay extending from the laparoscopic ultrasound probe.
  • the ultrasound images may be obtained using an ultrasound probe that is localized without using physical fiducial markers, i.e., using vision-only approach.
  • the vision-based approach may utilize a deep learning model to estimate degrees of freedom (DoF) pose of a rigid object (i.e., ultrasound probe) from stereo or monocular laparoscopic camera images.
  • the probe may have any number of DoF, which may be 6 DoF.
  • Realistic training data for probe localization for the deep learning model may be provided by a custom synthetic data generation pipeline.
  • a synthetic 3D anatomically accurate surgical site may be developed based on real data from surgical procedures.
  • the ultrasound probe may be rendered on surgical site using the 3D virtual (e.g., computer aided drafting) model of the probe and stereo laparoscopic camera geometry from camera calibration.
  • the data may include a plurality of synthetic images, e.g., around 100,000, to develop the deep learning network to estimate 6 DoF pose of ultrasound probe directly from images without modifying the probe in any way, i.e., no physical fiducial markers on the probe.
  • This deep learning model can be trained on each image of a pair of stereoscopic images separately, rather than in pairs.
  • the laparoscopic or robotic ultrasound probe may be localized in the field of view of the laparoscopic monocular or stereo camera through a vision-only approach without modifying the laparoscopic or robotic ultrasound probe to include any physical fiducial markers.
  • the image processing device may be further configured to localize the laparoscopic ultrasound probe in the video stream based on the key points or virtual fiducial markers.
  • the image processing device may be further configured to estimate multiple DoF (e.g., 6) pose and orientation of the laparoscopic or robotic ultrasound probe from the video stream based on the key points or fiducial markers.
  • the image processing device may be further configured to estimate the articulated pose and orientation of the grasper instrument holding the ultrasound probe.
  • the image processing device may be configured to estimate the pose and orientation of the ultrasound probe by combining the pose and orientation of the probe as well as the pose and orientation of the grasper holding the probe.
  • the image processing device may be further configured to generate dense depth map of surgical site from laparoscopic monocular or stereo camera to estimate 3D location of instruments, probe, and anatomy in the laparoscopic camera frame of reference.
  • the image processing device may also be configured to implement a Deformable Visual Simultaneous Localization and Mapping (DV-SLAM) pipeline to localize the laparoscopic camera in 3D space at every acquired image frame over time with respect to a World Coordinate System (WCS).
  • the WCS may be tied to the trocar from which laparoscopic endoscope camera is inserted, in which case, the location of trocar on patient anatomy is estimated from multiple external cameras mounted on robot carts or towers.
  • the WCS may be also tied to one of the instruments, e.g., the grasper instrument manipulating the laparoscopic ultrasound probe, in which case, the location of trocar on patient anatomy is estimated from segmenting the shaft of the grasper instrument in plurality of images captured from laparoscopic camera and computing the intersection between lines fit to the shafts, hence localizing the remote center of motion (RCM) of the grasper instrument trocar.
  • the image processing system may be configured to dispose the location and orientation of ultrasound probe in each laparoscopic camera image with respect to the WCS, hence transferring all images with respect to a fixed frame of reference.
  • the image processing device may be also configured to generate the ultrasound volume by computing the value of each ultrasound voxel by interpolating between the values of ultrasound image slice pixels that overlap the corresponding voxels after placing each ultrasound image in the world coordinate system.
  • the laparoscopic camera is localized with respect to the WCS using depth mapping from a stereo or a pair of monocular images and DV-SLAM.
  • localization of the camera includes depth mapping from stereo pair followed by DV-SLAM on the successive stereo image pairs over time.
  • Depth mapping provides a single frame snapshot of how far objects are from camera, whereas DV-SLAM provides for localization of the camera in WCS.
  • Depth mapping can be performed for either monocular or stereo cameras. Stereo camera depth estimation is easier and more reliable than monocular camera depth estimation.
  • DV-SLAM can be performed using either monocular camera input or stereo camera input. DV-SLAM with monocular camera input can’t resolve the scale factor (how far away the camera is from scene) reliably because there are multiple solutions along the same line of 3D point.
  • Deformable Visual SLAM in combination with depth map from stereo reconstruction provides the most reliable method of localizing the camera with respect to WCS.
  • the default mode of operation may include: 1) depth estimation through stereo reconstruction using calibrated stereo endoscope or through monocular depth estimation using monocular laparoscope, i.e., depth mapping; and 2) DV-SLAM at every frame with stereo pair from calibrated stereo camera pair images.
  • Ultrasound volume may be generated using a method that relies on the 6 Degrees of Freedeom (DoF) probe pose estimation from calibrated stereo endoscope images.
  • DoF Degrees of Freedeom
  • the ultrasound probe is localized in 3D space by 6 DoF probe pose estimation from stereo endoscope images in the stereo endoscope frame of reference.
  • the stereo endoscope camera itself may be localized in 3D space in the WCS of reference tied to the trocar into which the camera is inserted.
  • Stereo reconstruction and DV-SLAM are used to update the position of the camera from the images provided by the camera itself.
  • the 3D position of the probe is tracked in the stationary WCS of reference tied to a landmark, e.g., one or more trocars.
  • virtual fiducial markers on the probe are estimated from both images of stereo pair at each time step, i.e., frame. Then pose regression on the detected key points is run by the deep learning model.
  • the neural network based on the deep learning model is executed on each channel of stereo video streams separately to estimate DoF pose from each stream.
  • the estimated pose from each channel i.e., left or right image of the pair of images
  • the neural network might be trained directly on the stereo pair to directly estimate the 6DoF pose of the probe from rectified stereo pair images. This localizes the probe DoF in 3D.
  • the image processing device may also be configured to enhance the ultrasound volume by matching a plurality of key points in each 2D ultrasound image of the plurality of ultrasound images.
  • the image processing device may be additionally configured to generate a 3D model based on the volumetric image of tissue formed from the first modality images and deform the 3D model to conform to the ultrasound volume.
  • the 3D model may be deformed as follows: segmenting of the 2D/3D anatomical target surface and tracking of the instrument to isolate motion of instrument from anatomy; compensation for motion of the patient, e.g., breathing motion compensation, by tracking target anatomy and estimating movement of anatomy while masking out the movement of instruments; and biomechanical modeling to estimate the physically-realistic movement of the organ of interest along with the anatomy around the tissue being tracked.
  • the image processing device may be configured to segment all instruments at the surgical site in order to mask out non-anatomical regions of interest from organ surface deformation estimation.
  • the image processing device may further be configured to perform instance segmentation mask of the organ in laparoscopic camera images for every frame to estimation breathing motion as well surface deformation.
  • the surface deformation may further involve generating a depth map to estimate ultrasound probe pressure on tissue and the resulting deformation in tissue surface.
  • the breathing motion estimation component may also involve interface into a pulse oximetry system to predict breathing cycle related tissue surface and sub-tissue structure deformation.
  • the image processing device may be further configured to identify sub-tissue landmarks that are common between ultrasound images and the raw preoperative images or the corresponding 3D model and compute the dense displacement field registration map between the ultrasound volume and the 3D model.
  • the image processing device may be also further configured to transfer a slice of the 3D model to a corresponding 2D ultrasound image of the ultrasound volume using a neural network.
  • pre-operative imaging model registration includes the following: identification of anatomical landmarks in CT (organ surface, sub-surface internal critical structures, e.g., vessels, tumor); identification of surface anatomical landmarks in stereo endoscope images; and segmentation of sub-surface internal critical structures, e.g., vessels, tumor.
  • a surgical robotic system includes a first robotic arm controlling a laparoscopic ultrasound probe that is configured to be inserted through an access port and to obtain intraoperatively a plurality of 2D ultrasound images of a tissue, and a second robotic arm controlling a laparoscopic camera configured to capture a video stream of the tissue.
  • the system also includes an image processing device configured to receive preoperative images of tissue, generate a 3D model of the tissue from the preoperative images, and generate an ultrasound volume from the plurality of 2D ultrasound images.
  • the image processing device is further configured to register the ultrasound volume with the 3D model and generate an overlay of the 3D model and a 2D ultrasound image of the plurality of 2D ultrasound images.
  • the system also includes a screen configured to display the video stream showing the laparoscopic ultrasound probe and the overlay including the 3D model and the 2D ultrasound image extending along an imaging plane of the laparoscopic ultrasonic probe.
  • Implementations of the above embodiment may further include one or more of the following features.
  • the laparoscopic or robotic ultrasound probe may include plurality of physical fiducial markers on the probe in order to robustly estimate its pose and orientation from laparoscopic camera images. This may be used to identify probes lacking any discernable visual features on the outside for pose and orientation estimation.
  • the laparoscopic or robotic ultrasound probe may be devoid of any physical fiducial markers obviating the need for any modification of the ultrasound probe.
  • the image processing device may be further configured to localize the laparoscopic ultrasound probe in the video stream based on the plurality of key points or virtual fiducial markers.
  • This configuration may involve a training phase where a plurality of stereo or monocular laparoscopic camera images are generated from a combination of real surgical sites or in a synthetic environment using computer graphics generated synthetic images.
  • the training set of images are used to train a neural network that has a key point detection subnetwork and a pose regressor subnetwork.
  • the neural network is used to process either monocular laparoscopic camera images or each channel of the stereo laparoscopic camera images to generate the pose and orientation of the laparoscopic ultrasound probe in the laparoscopic camera images.
  • the image processing device may be configured to combine the estimated pose from each channel (i.e., left or right image of the stereo image pair) using stereo calibration for the final 6 DoF pose of the ultrasound probe.
  • the neural network may be alternatively trained directly on the stereo image pairs and trained for end-to-end 6 DoF pose estimation directly from stereo pair input images.
  • the image processing device may be also configured to localize the laparoscopic ultrasound probe based on kinematic data of the first robotic arm.
  • the image processing device may be additionally configured to estimate a pose and orientation of the laparoscopic ultrasound probe from the video stream based on the key points or virtual fiducial markers.
  • the pose and orientation estimation of the laparoscopic ultrasound probe may be accomplished through a combination of kinematics data of the first robotic arm as well as the localization of the plurality of key points or virtual fiducial markers from the laparoscopic camera video stream.
  • the image processing device may be further configured to generate the ultrasound volume by computing the value of each ultrasound voxel by interpolating between the values of ultrasound image slice pixels that overlap the corresponding voxels after placing each ultrasound image in the world coordinate system.
  • the image processing device may also be configured to enhance the ultrasound volume by matching a plurality of key points in each 2D ultrasound image of the plurality of ultrasound images.
  • the image processing device may be also further configured to transfer a slice of the 3D model to a corresponding 2D ultrasound image of the ultrasound volume using a neural network.
  • a method for intraoperative imaging of tissue includes generating a 3D model of tissue from a plurality of preoperative images, generating and updating a depth-based surface map of tissue using monocular or stereo laparoscopic camera, and generating an ultrasound volume from a plurality of 2D ultrasound images obtained from a laparoscopic ultrasonic probe.
  • the method further includes visual localization and mapping pipeline that places the laparoscopic camera in a world coordinate system from every image of the camera.
  • the method further includes ultrasound probe pose and orientation estimation from monocular or stereo laparoscopic camera in the world coordinate system to generate an ultrasound volume from plurality of registered ultrasound image slices.
  • the method further includes registering the ultrasound volume with the 3D model and generating an overlay of the 3D model and a 2D ultrasound image of the plurality of 2D ultrasound images.
  • the method additionally includes displaying a video stream obtained from a laparoscopic video camera and the overlay.
  • the video stream includes the laparoscopic ultrasound probe, and the overlay includes the 3D model and the 2D ultrasound image extending along an imaging plane of the laparoscopic ultrasound probe.
  • Implementations of the above embodiment may additionally include one or more of the following features.
  • the method may further include localizing the laparoscopic ultrasound probe in the video stream based on a plurality of key points or virtual fiducial markers disposed on the laparoscopic ultrasound probe.
  • the method may further include moving the laparoscopic ultrasound probe by a robotic arm and localizing the laparoscopic ultrasound probe based on kinematic data of the robotic arm.
  • the method may additionally include estimating a pose and orientation of the laparoscopic ultrasound probe from the video stream based on the combination of key points or virtual fiducial markers, robotic arm kinematic data, stereo reconstruction, and visual simultaneous localization and mapping of laparoscopic camera with respect to a world coordinate system.
  • the method may further include generating the ultrasound volume by computing the value of each ultrasound voxel by interpolating between the values of ultrasound image slice pixels that overlap the corresponding voxels after placing each ultrasound image in the world coordinate system.
  • the image processing device may also be configured to enhance the ultrasound volume by matching a plurality of key points in each 2D ultrasound image of the plurality of ultrasound images.
  • the method may further include transferring a slice of the 3D model to a corresponding 2D ultrasound image of the ultrasound volume using a neural network.
  • FIG. 1 is a schematic illustration of a surgical robotic system including a control tower, a console, and one or more surgical robotic arms each disposed on a movable cart according to an embodiment of the present disclosure
  • FIG. 2 is a perspective view of a surgical robotic arm of the surgical robotic system of FIG. 1 according to an embodiment of the present disclosure
  • FIG. 3 is a perspective view of a movable cart having a setup arm with the surgical robotic arm of the surgical robotic system of FIG. 1 according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a computer architecture of the surgical robotic system of FIG. 1 according to an embodiment of the present disclosure
  • FIG. 5 is a plan schematic view of movable carts of FIG. 1 positioned about a surgical table according to an embodiment of the present disclosure
  • FIG. 6 is a method for intraoperative fusion of different imaging modalities according to an embodiment of the present disclosure
  • FIG. 7 is a method of obtaining and registering multiple imaging modalities according to an embodiment of the present disclosure
  • FIG. 8A is a computed tomography image according to an embodiment of the present disclosure
  • FIG. 8B is a 3D model image according to an embodiment of the present disclosure.
  • FIG. 8C is an endoscopic video image according to an embodiment of the present disclosure.
  • FIG. 8D is an ultrasound image according to an embodiment of the present disclosure.
  • FIG. 9 is a perspective view of a laparoscopic ultrasound probe according to an embodiment of the present disclosure.
  • FIG. 10 is schematic diagram showing multiple coordinate systems and the transformation matrices to convert between them according to an embodiment of the present disclosure
  • FIG. 11 shows ultrasound segmentation images according to an embodiment of the present disclosure
  • FIG. 12 shows an augmented laparoscopic video stream overlaid with a 3D model and an ultrasound imaging plane extending from the ultrasound probe according to an embodiment of the present disclosure
  • FIGS. 13A and 13B are schematic diagrams of the laparoscopic ultrasound probe identifying margins of a tumor in tissue and identifying a path of resection according to an embodiment of the present disclosure
  • FIG. 14 is a schematic diagram illustrating generation of an ultrasound volume from a plurality of 2D ultrasound slices according to an embodiment of the present disclosure
  • FIG. 15 is a schematic flow chart illustrating registration between an intra-operative ultrasound volume and a pre-operative CT volume according to an embodiment of the present disclosure.
  • a surgical robotic system which includes a surgeon console, a control tower, and one or more movable carts having a surgical robotic arm coupled to a setup arm.
  • the surgeon console receives user input through one or more interface devices, which are processed by the control tower as movement commands for moving the surgical robotic arm and an instrument and/or camera coupled thereto.
  • the surgeon console enables teleoperation of the surgical arms and attached instruments/camera.
  • the surgical robotic arm includes a controller, which is configured to process the movement commands and to generate torque commands for activating one or more actuators of the robotic arm, which would, in turn, move the robotic arm in response to the movement command.
  • a surgical robotic system 10 includes a control tower 20, which is connected to all of the components of the surgical robotic system 10 including a surgeon console 30 and one or more movable carts 60.
  • Each of the movable carts 60 includes a robotic arm 40 having a surgical instrument 50 removably coupled thereto.
  • the robotic arms 40 also couple to the movable carts 60.
  • the robotic system 10 may include any number of movable carts 60 and/or robotic arms 40.
  • the surgical instrument 50 is configured for use during minimally invasive surgical procedures.
  • the surgical instrument 50 may be configured for open surgical procedures.
  • the surgical instrument 50 may be an electrosurgical forceps configured to seal tissue by compressing tissue between jaw members and applying electrosurgical current thereto.
  • the surgical instrument 50 may be a surgical stapler including a pair of jaws configured to grasp and clamp tissue while deploying a plurality of tissue fasteners, e.g., staples, and cutting stapled tissue.
  • the surgical instrument 50 may be a surgical clip applier including a pair of jaws configured apply a surgical clip onto tissue.
  • One of the robotic arms 40 may include a laparoscopic camera 51 configured to capture video of the surgical site.
  • the laparoscopic camera 51 may be a stereoscopic endoscope configured to capture two side-by-side (i.e., left and right) images of the surgical site to produce a video stream of the surgical scene.
  • the laparoscopic camera 51 is coupled to an image processing device 56, which may be disposed within the control tower 20.
  • the image processing device 56 may be any computing device as described below configured to receive the video feed from the laparoscopic camera 51 and output the processed video stream.
  • the surgeon console 30 includes a first screen 32, which displays a video feed of the surgical site provided by camera 51 of the surgical instrument 50 disposed on the robotic arm 40, and a second screen 34, which displays a user interface for controlling the surgical robotic system 10.
  • the first screen 32 and second screen 34 may be touchscreens allowing for displaying various graphical user inputs.
  • the surgeon console 30 also includes a plurality of user interface devices, such as foot pedals 36 and a pair of hand controllers 38a and 38b which are used by a user to remotely control robotic arms 40.
  • the surgeon console further includes an armrest 33 used to support clinician’s arms while operating the hand controllers 38a and 38b.
  • the control tower 20 includes a screen 23, which may be a touchscreen, and outputs on the graphical user interfaces (GUIs).
  • GUIs graphical user interfaces
  • the control tower 20 also acts as an interface between the surgeon console 30 and one or more robotic arms 40.
  • the control tower 20 is configured to control the robotic arms 40, such as to move the robotic arms 40 and the corresponding surgical instrument 50, based on a set of programmable instructions and/or input commands from the surgeon console 30, in such a way that robotic arms 40 and the surgical instrument 50 execute a desired movement sequence in response to input from the foot pedals 36 and the hand controllers 38a and 38b.
  • the foot pedals 36 may be used to enable and lock the hand controllers 38a and 38b, repositioning camera movement and electrosurgical activation/deactivation.
  • the foot pedals 36 may be used to perform a clutching action on the hand controllers 38a and 38b. Clutching is initiated by pressing one of the foot pedals 36, which disconnects (i.e., prevents movement inputs) the hand controllers 38a and/or 38b from the robotic arm 40 and corresponding instrument 50 or camera 51 attached thereto. This allows the user to reposition the hand controllers 38a and 38b without moving the robotic arm(s) 40 and the instrument 50 and/or camera 51. This is useful when reaching control boundaries of the surgical space.
  • Each of the control tower 20, the surgeon console 30, and the robotic arm 40 includes a respective computer 21, 31, 41.
  • the computers 21, 31, 41 are interconnected to each other using any suitable communication network based on wired or wireless communication protocols.
  • Suitable protocols include, but are not limited to, transmission control protocol/internet protocol (TCP/IP), datagram protocol/internet protocol (UDP/IP), and/or datagram congestion control protocol (DC).
  • Wireless communication may be achieved via one or more wireless configurations, e.g., radio frequency, optical, Wi-Fi, Bluetooth (an open wireless protocol for exchanging data over short distances, using short length radio waves, from fixed and mobile devices, creating personal area networks (PANs), ZigBee® (a specification for a suite of high level communication protocols using small, low-power digital radios based on the IEEE 122.15.4-1203 standard for wireless personal area networks (WPANs)).
  • wireless configurations e.g., radio frequency, optical, Wi-Fi, Bluetooth (an open wireless protocol for exchanging data over short distances, using short length radio waves, from fixed and mobile devices, creating personal area networks (PANs), ZigBee® (a specification for a suite of high level communication protocols using small, low-power digital radios based on the IEEE 122.15.4-1203 standard for wireless personal area networks (WPANs)).
  • PANs personal area networks
  • ZigBee® a specification for a suite of high level communication protocols using small, low-power digital radios
  • the computers 21, 31, 41 may include any suitable processor (not shown) operably connected to a memory (not shown), which may include one or more of volatile, non-volatile, magnetic, optical, or electrical media, such as read-only memory (ROM), random access memory (RAM), electrically-erasable programmable ROM (EEPROM), non-volatile RAM (NVRAM), or flash memory.
  • the processor may be any suitable processor (e.g., control circuit) adapted to perform the operations, calculations, and/or set of instructions described in the present disclosure including, but not limited to, a hardware processor, a field programmable gate array (FPGA), a digital signal processor (DSP), a central processing unit (CPU), a microprocessor, and combinations thereof.
  • FPGA field programmable gate array
  • DSP digital signal processor
  • CPU central processing unit
  • microprocessor e.g., microprocessor
  • each of the robotic arms 40 may include a plurality of links 42a, 42b, 42c, which are interconnected at joints 44a, 44b, 44c, respectively.
  • the joint 44a is configured to secure the robotic arm 40 to the movable cart 60 and defines a first longitudinal axis.
  • the movable cart 60 includes a lift 67 and a setup arm 61, which provides a base for mounting of the robotic arm 40.
  • the lift 67 allows for vertical movement of the setup arm 61.
  • the movable cart 60 also includes a display 69 for displaying information pertaining to the robotic arm 40.
  • the robotic arm 40 may include any type and/or number of joints.
  • the setup arm 61 includes a first link 62a, a second link 62b, and a third link 62c, which provide for lateral maneuverability of the robotic arm 40.
  • the links 62a, 62b, 62c are interconnected at joints 63a and 63b, each of which may include an actuator (not shown) for rotating the links 62b and 62b relative to each other and the link 62c.
  • the links 62a, 62b, 62c are movable in their corresponding lateral planes that are parallel to each other, thereby allowing for extension of the robotic arm 40 relative to the patient (e.g., surgical table).
  • the robotic arm 40 may be coupled to the surgical table (not shown).
  • the setup arm 61 includes controls 65 for adjusting movement of the links 62a, 62b, 62c as well as the lift 67.
  • the setup arm 61 may include any type and/or number of joints.
  • the third link 62c may include a rotatable base 64 having two degrees of freedom.
  • the rotatable base 64 includes a first actuator 64a and a second actuator 64b.
  • the first actuator 64a is rotatable about a first stationary arm axis which is perpendicular to a plane defined by the third link 62c and the second actuator 64b is rotatable about a second stationary arm axis which is transverse to the first stationary arm axis.
  • the first and second actuators 64a and 64b allow for full three-dimensional orientation of the robotic arm 40.
  • the actuator 48b of the joint 44b is coupled to the joint 44c via the belt 45a, and the joint 44c is in turn coupled to the joint 46b via the belt 45b.
  • Joint 44c may include a transfer case coupling the belts 45a and 45b, such that the actuator 48b is configured to rotate each of the links 42b, 42c and a holder 46 relative to each other. More specifically, links 42b, 42c, and the holder 46 are passively coupled to the actuator 48b which enforces rotation about a pivot point “P” which lies at an intersection of the first axis defined by the link 42a and the second axis defined by the holder 46. In other words, the pivot point “P” is a remote center of motion (RCM) for the robotic arm 40.
  • RCM remote center of motion
  • the actuator 48b controls the angle 0 between the first and second axes allowing for orientation of the surgical instrument 50. Due to the interlinking of the links 42a, 42b, 42c, and the holder 46 via the belts 45a and 45b, the angles between the links 42a, 42b, 42c, and the holder 46 are also adjusted in order to achieve the desired angle 0. In embodiments, some or all of the joints 44a, 44b, 44c may include an actuator to obviate the need for mechanical linkages.
  • the joints 44a and 44b include an actuator 48a and 48b configured to drive the joints 44a, 44b, 44c relative to each other through a series of belts 45a and 45b or other mechanical linkages such as a drive rod, a cable, or a lever and the like.
  • the actuator 48a is configured to rotate the robotic arm 40 about a longitudinal axis defined by the link 42a.
  • the holder 46 defines a second longitudinal axis and configured to receive an instrument drive unit (IDU) 52 (FIG. 1).
  • the IDU 52 is configured to couple to an actuation mechanism of the surgical instrument 50 and the camera 51 and is configured to move (e.g., rotate) and actuate the instrument 50 and/or the camera 51.
  • IDU 52 transfers actuation forces from its actuators to the surgical instrument 50 to actuate components an end effector 49 of the surgical instrument 50.
  • the holder 46 includes a sliding mechanism 46a, which is configured to move the IDU 52 along the second longitudinal axis defined by the holder 46.
  • the holder 46 also includes a joint 46b, which rotates the holder 46 relative to the link 42c.
  • the instrument 50 may be inserted through an endoscopic access port 55 (FIG. 3) held by the holder 46.
  • the holder 46 also includes a port latch 46c for securing the access port 55 to the holder 46 (FIG. 2).
  • the robotic arm 40 also includes a plurality of manual override buttons 53 (FIG. 1) disposed on the IDU 52 and the setup arm 61, which may be used in a manual mode. The user may press one or more of the buttons 53 to move the component associated with the button 53.
  • each of the computers 21, 31, 41 of the surgical robotic system 10 may include a plurality of controllers, which may be embodied in hardware and/or software.
  • the computer 21 of the control tower 20 includes a controller 21a and safety observer 21b.
  • the controller 21a receives data from the computer 31 of the surgeon console 30 about the current position and/or orientation of the hand controllers 38a and 38b and the state of the foot pedals 36 and other buttons.
  • the controller 21a processes these input positions to determine desired drive commands for each joint of the robotic arm 40 and/or the IDU 52 and communicates these to the computer 41 of the robotic arm 40.
  • the controller 21a also receives the actual joint angles measured by encoders of the actuators 48a and 48b and uses this information to determine force feedback commands that are transmitted back to the computer 31 of the surgeon console 30 to provide haptic feedback through the hand controllers 38a and 38b.
  • the safety observer 21b performs validity checks on the data going into and out of the controller 21a and notifies a system fault handler if errors in the data transmission are detected to place the computer 21 and/or the surgical robotic system 10 into a safe state.
  • the computer 41 includes a plurality of controllers, namely, a main cart controller 41a, a setup arm controller 41b, a robotic arm controller 41c, and an instrument drive unit (IDU) controller 41 d.
  • the main cart controller 41a receives and processes joint commands from the controller 21a of the computer 21 and communicates them to the setup arm controller 41b, the robotic arm controller 41c, and the IDU controller 4 Id.
  • the main cart controller 41a also manages instrument exchanges and the overall state of the movable cart 60, the robotic arm 40, and the IDU 52.
  • the main cart controller 41a also communicates actual joint angles back to the controller 21a.
  • Each of joints 63a and 63b and the rotatable base 64 of the setup arm 61 are passive joints (i.e., no actuators are present therein) allowing for manual adjustment thereof by a user.
  • the joints 63a and 63b and the rotatable base 64 include brakes that are disengaged by the user to configure the setup arm 61.
  • the setup arm controller 41b monitors slippage of each of joints 63a and 63b and the rotatable base 64 of the setup arm 61, when brakes are engaged or can be freely moved by the operator when brakes are disengaged, but do not impact controls of other joints.
  • the robotic arm controller 41c controls each joint 44a and 44b of the robotic arm 40 and calculates desired motor torques required for gravity compensation, friction compensation, and closed loop position control of the robotic arm 40.
  • the robotic arm controller 41c calculates a movement command based on the calculated torque.
  • the calculated motor commands are then communicated to one or more of the actuators 48a and 48b in the robotic arm 40.
  • the actual joint positions are then transmitted by the actuators 48a and 48b back to the robotic arm controller 41c.
  • the IDU controller 41d receives desired joint angles for the surgical instrument 50, such as wrist and jaw angles, and computes desired currents for the motors in the IDU 52.
  • the IDU controller 41 d calculates actual angles based on the motor positions and transmits the actual angles back to the main cart controller 41a.
  • the robotic arm 40 is controlled in response to a pose of the hand controller controlling the robotic arm 40, e.g., the hand controller 38a, which is transformed into a desired pose of the robotic arm 40 through a hand eye transform function executed by the controller 21a.
  • the hand eye function as well as other functions described herein, is/are embodied in software executable by the controller 21a or any other suitable controller described herein.
  • the pose of one of the hand controllers 38a may be embodied as a coordinate position and roll-pitch-yaw (RPY) orientation relative to a coordinate reference frame, which is fixed to the surgeon console 30.
  • the desired pose of the instrument 50 is relative to a fixed frame on the robotic arm 40.
  • the pose of the hand controller 38a is then scaled by a scaling function executed by the controller 21a.
  • the coordinate position may be scaled down and the orientation may be scaled up by the scaling function.
  • the controller 21a may also execute a clutching function, which disengages the hand controller 38a from the robotic arm 40.
  • the controller 21a stops transmitting movement commands from the hand controller 38a to the robotic arm 40 if certain movement limits or other thresholds are exceeded and in essence acts like a virtual clutch mechanism, e.g., limits mechanical input from effecting mechanical output.
  • the desired pose of the robotic arm 40 is based on the pose of the hand controller 38a and is then passed by an inverse kinematics function executed by the controller 21a.
  • the inverse kinematics function calculates angles for the joints 44a, 44b, 44c of the robotic arm 40 that achieve the scaled and adjusted pose input by the hand controller 38a.
  • the calculated angles are then passed to the robotic arm controller 41c, which includes a joint axis controller having a proportional-derivative (PD) controller, the friction estimator module, the gravity compensator module, and a two-sided saturation block, which is configured to limit the commanded torque of the motors of the joints 44a, 44b, 44c.
  • PD proportional-derivative
  • the surgical robotic system 10 is setup around a surgical table 90.
  • the system 10 includes movable carts 60a-d, which may be numbered “1” through “4.”
  • each of the carts 60a-d are positioned around the surgical table 90.
  • Position and orientation of the carts 60a-d depends on a plurality of factors, such as placement of a plurality of access ports 55a-d, which in turn, depends on the surgery being performed.
  • the access ports 55a-d are inserted into the patient, and carts 60a-d are positioned to insert instruments 50 and the laparoscopic camera 51 into corresponding ports 55a-d.
  • each of the robotic arms 40a-d is attached to one of the access ports 55a-d that is inserted into the patient by attaching the latch 46c (FIG. 2) to the access port 55 (FIG. 3).
  • the IDU 52 is attached to the holder 46, followed by the SIM 43 being attached to a distal portion of the IDU 52.
  • the instrument 50 is attached to the SIM 43.
  • the instrument 50 is then inserted through the access port 55 by moving the IDU 52 along the holder 46.
  • the SIM 43 includes a plurality of drive shafts configured to transmit rotation of individual motors of the IDU 52 to the instrument 50 thereby actuating the instrument 50.
  • the SIM 43 provides a sterile barrier between the instrument 50 and the other components of robotic arm 40, including the IDU 52.
  • the SIM 43 is also configured to secure a sterile drape (not shown) to the IDU 52.
  • a method for intraoperative fusion of different imaging modalities includes combining preoperative imaging and intraoperative imaging to provide a combined 3D image of an organ, tumor, or any other tissue as well as overlays of the different modalities.
  • Preoperative imaging includes any suitable imaging modality such as computed tomography (CT), magnetic resonance imaging (MRI), or any other imaging modality capable of obtaining 3D images as shown in FIG. 8A.
  • Intraoperative imaging may be ultrasound imaging.
  • CT imaging is well-suited for preoperative use since CT provides high quality images. However, intraoperative CT use is undesirable due to radiation exposure and supine positioning of the patient.
  • Ultrasound imaging is well-suited for intraoperative use since it is safe for frequent imaging regardless of the position of the patient, even though ultrasound provide noisy images with limited perspective.
  • other imaging modalities such as gamma radiation, Raman spectroscopy, multispectral imaging, time-resolved fluorescence spectroscopy (ms-TRFS) probe, and auto fluorescence.
  • the image processing device 56 receives preoperative images, which may be done by obtaining a plurality of 2D images and reconstructing a 3D volumetric image therefrom.
  • preoperative images may be provided to any other computing device (e.g., outside the operating room) to perform the image processing steps described herein.
  • FIG. 7 provides additional sub steps for each of the main steps of the method of FIG. 6.
  • Step 100 includes multiple segmentation steps lOOa-d, namely, segmentation of the organ surface, vasculature, landmarks, and tumor.
  • segmentation denotes obtaining a plurality of 2D slices or segments of an object.
  • the image processing device 56 or another computing device generates a 3D model shown in FIG. 8B, which may be a wire mesh model based on the preoperative image.
  • the image processing device 56 may generate the 3D model including a plurality of points or vertices interconnected by line segments based on the segmentations and include a surface texture over of the vertices and segments.
  • Steps 100 and 102 are performed preoperatively, with subsequent steps being performed once the surgical procedure has commenced, which includes setting up the robotic system 10 as shown in FIG. 5.
  • the method of the present disclosure may be implemented using a stand-alone imaging system 80 (FIG. 10), the laparoscopic camera 51, and a laparoscopic ultrasound probe 70, which are coupled to the image processing device 56, and one or more screens 32 and 34 (FIG. 1).
  • the ultrasonic probe 70 is inserted through one of the access ports 55a-d and may be controlled by one of the robotic arms 40a-d and corresponding IDU 52.
  • the ultrasound probe 70 includes an ultrasound transducer 72 configured to output ultrasound waves.
  • the laparoscopic camera 51 is positioned such that the surgical site is within its field of view, and the ultrasound probe 70 is then also moved into the field of view of the laparoscopic camera 51.
  • the image processing device 56 localizes the ultrasound probe 70.
  • the image processing device 56 may store (in memory or storage) dimensions of the ultrasound probe 70, and positions and distances between key points or virtual fiducial markers 74.
  • the image processing device 56 analyzes all stereo pair frames (left and right channel images) from the video stream to identify a plurality of the key points or virtual fiducial markers 74 (FIG. 9), which then enables the image processing device 56 to determine 3D dimensions in the video stream, e.g., depth mapping.
  • the virtual fiducial markers 74 are generated by a machine learning image processing algorithm configured to identify geometry of the ultrasound probe 70.
  • the image processing device 56 is configured to execute the image processing algorithm, which may include deep learning model to estimate 6 degrees of freedom (DoF) pose of a rigid object (i.e., ultrasound probe 70) from stereo or monocular laparoscopic camera images.
  • Realistic training data for probe localization for the deep learning model is provided by a custom synthetic data generation pipeline.
  • Synthetic 3D anatomically accurate surgical site is developed based on actual data from surgical procedures.
  • the ultrasound probe may be rendered on surgical site using the 3D virtual (e.g., computer aided drafting) model of the probe and stereo laparoscopic camera geometry from camera calibration.
  • the data may include a plurality of synthetic images, e.g., around 100,000, to develop the deep learning network to estimate 6 DoF pose of ultrasound probe directly from images without modifying the probe in any way, i.e., no physical fiducial markers on probe.
  • This deep learning model can be trained on each image of a pair of stereoscopic images separately, rather than in pairs.
  • the probe 70 may include physical fiducial markers in additional to virtual fiducial markers 74, which may be formed from any visually distinctive (e.g., white, fluorescent, etc.) paint, dye, etching, and/or objects (e.g., dots, blocks, instrument components, markings, etc.).
  • Localization may be based on image processing by the image processing device 56 as described above, or may additionally also include kinematics data of the robotic arm 40 moving the ultrasound probe 70.
  • Kinematics data includes position, velocity, pose, orientation, joint angles, and other data based on the movement commands provided to the robotic arm 40 and execution of the commands by the robotic arm.
  • other tracking techniques may also be used, such as electromagnetic tracking.
  • the image processing device 56 treats the ultrasound probe 70 as a rigid object and uses that to estimate the pose and location in real time based on the images from the laparoscopic camera 51.
  • pose estimation may be performed using machine learning image processing algorithms.
  • the algorithms may be trained on a plurality (e.g., about 100,000) of synthetic stereoscopic images having rendered ultrasound probe and surgical scene with blood, smoke, and other realistic artifacts to improve generalization.
  • Machine learning may be implemented using neural networks in a two-stage process having a key point detector and a pose regressor.
  • the final pose of the ultrasound probe 70 is determined by initially using kinematics to estimate a rough localization pose, followed by fine pose estimation through vision, i.e., image processing by the image processing device 56.
  • the ultrasound probe 70 may be held by the instrument 50 as shown in FIGS. 8C and 10.
  • the image processing device localizes the instrument 50 using the same deep learning algorithm for identifying virtual fiducial markers as described above with respect to step 106.
  • the image processing device 56 may be further configured to estimate the articulated pose and orientation of the instrument 50 holding the ultrasound probe 70.
  • the image processing device 56 may be configured to estimate the pose and orientation of the ultrasound probe 70 by combining the pose and orientation of the probe 70 as well as the pose and orientation of the instrument 50 holding the probe 70.
  • the image processing device 56 generates a depth map of a surgical site from the camera 51 to estimate 3D location of instrument 50, probe 70, and anatomy in the laparoscopic camera frame of reference.
  • the image processing device 56 localizes the camera 51. This may be done by using a Deformable Visual Simultaneous Localization and Mapping (DV- SLAM) pipeline at every acquired image frame over time with respect to a World Coordinate System (WCS) (FIG. 10).
  • DV- SLAM Deformable Visual Simultaneous Localization and Mapping
  • WCS World Coordinate System
  • the WCS may be tied to the access port 55 from which camera 51 is inserted, in which case, the location of access port 55 on patient anatomy is estimated from one or more external cameras (not shown), which may be mounted on the mobile carts 60 a-d and/or system tower 10 or other suitable locations.
  • localization of the camera includes depth mapping from stereo pair followed by DV-SLAM on the successive stereo image pairs over time. Depth mapping provides a single frame snapshot of how far objects are from camera, whereas DV- SLAM provides for localization of the camera in WCS.
  • the ultrasound probe 70 and the instrument 50 are localized in the video feed by the image processing device 56.
  • the ultrasound probe 70 and the instrument 50 are localized in the WCS.
  • the WCS is either tied to the access port 55 from which camera 51 is inserted as explained above.
  • the WCS can also be tied to the access port 55 from which one of the instruments 50, i.e., the grasper instrument 50 manipulating the ultrasound probe 70, is inserted.
  • the location of access port 55 in the patient is estimated from segmenting the shaft of the instrument 50 in plurality of images captured by the camera 51 and computing the intersection between lines fit to the shafts, hence localizing the remote center of motion (RCM) of the access port 55 of the instrument 50.
  • the image processing device 56 is configured to dispose the location and orientation of ultrasound probe 70 in each image with respect to the WCS, hence transferring all images with respect to a fixed frame of reference.
  • the ultrasound probe 70 is used to obtain ultrasound images of the tissue at step 118.
  • the ultrasound probe 70 is used to perform segmentation of vessels, landmarks, tumor at steps 118a, 118b, 118c, respectively.
  • the image processing device 56 may include ultrasound image processors and other components for displaying the ultrasound images alongside the video images from the laparoscopic camera 51 in any suitable fashion e.g., on separate screens 32 and 34, picture-in-picture, overlays, etc.
  • image processing device 56 is also configured to construct a 3D ultrasound volume from the segmented ultrasound images.
  • FIG. 11 shows exemplary segmented ultrasound image slices and volumes reconstructed based on sub-tissue landmarks (e.g., critical structures, tumor, veins, arteries, etc.). Segmented landmarks may be displayed as different colored overlays on ultrasound images displayed on the screen 32 and/or 34.
  • sub-tissue landmarks e.g., critical structures, tumor, veins, arteries, etc.
  • Ultrasound volume may be generated using a method that relies on the 6 DoF probe pose estimation from calibrated stereo endoscope images.
  • the ultrasound probe is localized in 3D space by 6 DoF probe pose estimation from stereo endoscope images in the stereo endoscope frame of reference.
  • the stereo endoscope camera itself may be localized in 3D space in the WCS of reference tied to the access port 55 into which the camera 51 is inserted, or in the WCS of reference tied to the access port 55 into which the grasper instrument 50 is inserted.
  • Stereo reconstruction and DV-SLAM are used to update the position of the camera 51 from the images provided by the camera itself.
  • the ultrasound probe is localized in endoscope images, its 3D position is tracked in the stationary WCS of reference tied to a landmark, e.g., one or more trocars.
  • a landmark e.g., one or more trocars.
  • virtual key points on probe are estimated from both images of stereo pair at each time step.
  • pose regression on the detected key points is run by the deep learning model.
  • the neural network based on the deep learning model is executed on each channel of stereo separately to estimate DoF pose from each channel.
  • the estimated pose from each channel i.e., left or right image of the pair of images
  • the image processing device may be also configured to generate the ultrasound volume by computing the value of each ultrasound voxel and interpolating between the values of ultrasound image slice pixels that overlap the corresponding voxels after placing each ultrasound image in the world coordinate system.
  • 2D ultrasound images represented by their planar equations using 3 points and value of 3D ultrasound voxel between slices computed from distance weighted orthogonal projection.
  • the image processing device 56 is configured to generate the ultrasound volume by registering individual ultrasound slices together based on localization of the ultrasound probe 70 at step 122 (i.e., using kinematics for rough localization and rigid object pose estimation for fine pose estimation). Stitching of ultrasound images may be done by matching key points obtained at different orientations as the ultrasound probe 70 is moved in multiple directions over the tissue being subjected to the ultrasound.
  • the image registration process may include detecting invariant key points using scale invariant feature transform (SIFT), speed up robust feature (SURF), robust independent elementary features (BRIEF), oriented FAST, rotated BRIEF (ORB), or any other suitable image matching technique.
  • SIFT scale invariant feature transform
  • SURF speed up robust feature
  • BRIEF robust independent elementary features
  • oriented FAST oriented FAST
  • rotated BRIEF ORB
  • the image processing device 56 extracts robust feature descriptors from the detected key points in each successive ultrasound slice image and computes the distances between the descriptors of each of the key points in successive ultrasound images. Further, the image processing device 56 selects the top matches for each descriptor of successive ultrasound images and estimates the homography matrix and aligns the successive images to create stitched volumetric ultrasound panorama.
  • the image processing device 56 enhances or refines the ultrasound volume from ultrasound image segments by tracking across ultrasound images and by matching a plurality of key points in each 2D ultrasound image of the plurality of ultrasound images.
  • the image processing device may be additionally configured to generate a 3D ultrasound model based on the volumetric image of tissue formed from the first modality images and deform the 3D ultrasound model to conform to the ultrasound volume.
  • the image processing device 56 compensates for breathing motion of the patient since breathing imparts movement on the tissue and connected instruments 50 and camera 51.
  • the image processing device 56 is configured to receive breathing data from the patient, which may be from a pulse oximetry device configured to measure respiration rate, pulse, etc.
  • the 3D ultrasound model is deformed, which includes: segmentation of 2D/3D anatomical target surface and tracking of the instrument to isolate motion of instrument from anatomy; compensation for motion of the patient, e.g., breathing motion compensation, by tracking target anatomy and estimating movement of anatomy while ignoring the movement of instruments; and biomechanical modeling to estimate the physically-realistic movement of the organ of interest along with the anatomy around the tissue being tracked.
  • an ultrasound slide projection overlay 92 (FIGS. 10 and 12) is shown over the endoscopic image at steps Ic-e (FIG. 7).
  • the image processing device 56 registers the constructed ultrasound volume with the 3D model constructed in step 102.
  • the image processing device 56 utilizes cross-modal transfer learning for this purpose.
  • GAN generative adversarial networks
  • preoperative image or model e.g., CT
  • intraoperative image e.g., ultrasound
  • a transition dataset which is a synthetic ultrasound dataset constructed from CT data, is used to pre-train the segmentation network.
  • transfer learning may be used to fine-tune the neural network on the real ultrasound dataset.
  • the trained neural network may then be used to register the constructed ultrasound volume with the 3D model.
  • registration is performed between a 3D model based on the pre-operative CT volume and the intra-op ultrasound volume that is generated by registering multiple ultrasound images at arbitrary orientations and overlaps.
  • pre-operative model registration Prior to registration of 3D model to ultrasound images/volume common landmarks in the preoperative (e.g., CT/MRI) and intraoperative images (e.g., ultrasound and laparoscopic video feed) are identified. This involves ultrasound sub-tissue structure identification (vessels, tumor, etc.) through ultrasound segmentation followed by matching and aligning the corresponding landmarks in pre-operative imaging/model (i.e., same vessel, tumor, etc.).
  • Pre-operative model registration includes the following: identification of anatomical landmarks in pre-operative imaging/model (organ surface, sub-surface internal critical structures, e.g., vessels, tumor); identification of surface anatomical landmarks in stereo endoscope images; and segmentation of sub-surface internal critical structures, e.g., vessels, tumor.
  • the image processing device 56 deforms the 3D model to align with the ultrasound model.
  • the image processing device 56 may use another neural network trained on the ultrasound and pre-operative imaging/model of the same tissue in order to learn ultrasound to CT/MRI registration, i.e., by learning the appearance mapping between CT/MRI image of the tissue of interest (e.g., tumor) and counterpart ultrasound image and by learning the appearance mapping between CT/MRI critical structure landmarks (vessels, arteries, etc.) and the ultrasound critical structure landmarks.
  • the image processing device 56 is further configured to deform 3D model after registering vessels, landmarks, and tumor between preoperative (e.g., CT/MRI) images and intraoperative (e.g., ultrasound) images.
  • the image processing device 56 displays the deformed 3D model as overlay on the endoscopic video stream of the laparoscopic camera 51, which may be shown on the first screen 32.
  • the image processing device 56 may also display the deformed 3D model as overlay on the on the ultrasound feed of the ultrasound probe 70, which may be shown on the second screen 34.
  • the video and ultrasound streams may be side-by-side displays on the same screen 32 or 34, picture-in-picture, or any other manner.
  • a video stream from the laparoscopic camera 51 is augmented with a 3D model 91 of the tissue of interest (e.g., tumor) along with the ultrasound slide projection overlay 92 extending from the ultrasound probe 70.
  • the ultrasound slide projection overlay 92 is continuously generated along an imaging plane of the ultrasound probe 70.
  • the overlays may be used to identify margins of the tissue of interest, e.g., tumor “T”, in an organ “O” by moving, pivoting, rotating, etc. the ultrasound probe 70 to the edges of the tumor and commencing the incision of the tissue to resect the tumor.
  • Systems and methods of the present disclosure advantageously allow the location of the margins continuously and contemporaneously during resection.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Surgery (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Pathology (AREA)
  • Veterinary Medicine (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Radiology & Medical Imaging (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Vascular Medicine (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Processing (AREA)

Abstract

An imaging system includes a laparoscopic camera configured to capture a video stream of tissue and an intraoperative imaging device configured to be inserted through an access port and to obtain a plurality of signals from a tissue. The system also includes an image processing device configured to: generate a 3D reconstruction of a surgical site from the laparoscopic camera video stream to estimate 3D location of the intraoperative imaging device in a frame of reference of the laparoscopic camera and localize the laparoscopic camera and the intraoperative imaging device in a world coordinate system based on the 3D reconstruction of a surgical site. The image processing device is further configured to receive a volumetric image of tissue formed from a pre-operative imaging modality and generate a multi-frame representation from a plurality of signals from the intraoperative imaging device. The image processing device is also configured to register the multi-frame representation with the volumetric image of the tissue; deform the volumetric image of the tissue according to the multi-frame representation; and generate an overlay of the volumetric image of tissue and the multi-frame representation. The system further includes a screen configured to display the video stream showing data based on the plurality of signals from the intraoperative imaging device and the overlay extending from the intraoperative imaging device.

Description

SURGICAL ROBOTIC SYSTEM AND METHOD FOR INTRAOPERATIVE FUSION OF DIFFERENT IMAGING MODALITIES
BACKGROUND
[0001] Surgical robotic systems are currently being used in a variety of surgical procedures, including minimally invasive medical procedures. Some surgical robotic systems include a surgeon console controlling a surgical robotic arm and a surgical instrument having an end effector (e.g., forceps or grasping instrument) coupled to and actuated by the robotic arm. In operation, the robotic arm is moved to a position over a patient and then guides the surgical instrument into a small incision via a surgical port or a natural orifice of a patient to position the end effector at a work site within the patient’s body. There is a need for a system to combine preoperative imaging with interoperative imaging, e.g., endoscopic video feed and another inter operative imaging modality, e.g., ultrasound imaging, gamma probe, Raman spectroscopy, etc. to provide the surgeon with precise instrument placement and tissue margin identification.
SUMMARY
[0002] According to one embodiment of the present disclosure, an imaging system is disclosed. The imaging system includes a laparoscopic ultrasound probe configured to be inserted through an access port and to obtain a plurality of 2D ultrasound images of a tissue. The system also includes a laparoscopic camera configured to capture a video stream of the tissue. The system further includes an image processing device configured to receive a volumetric image of the tissue formed from a first modality images, generate an ultrasound volume from the plurality of 2D ultrasound images, register the ultrasound volume with the volumetric image of tissue, and generate an overlay of the volumetric image of tissue and a 2D ultrasound image of the plurality of 2D ultrasound images. The system additionally includes a screen configured to display the video stream showing the laparoscopic ultrasound probe and the overlay extending from the laparoscopic ultrasound probe.
[0003] In addition to 2D intraoperative ultrasound imaging other intraoperative imaging modalities are also contemplated, which include gamma radiation, Raman spectroscopy, multispectral, time-resolved fluorescence spectroscopy (ms-TRFS), auto fluorescence, and the like. The corresponding probes may be drop-in or tool-integrated. [0004] Implementations of the above embodiment may also include one or more of the following features. According to one aspect, the ultrasound images may be obtained using an ultrasound probe that is localized without using physical fiducial markers, i.e., using vision-only approach. The vision-based approach may utilize a deep learning model to estimate degrees of freedom (DoF) pose of a rigid object (i.e., ultrasound probe) from stereo or monocular laparoscopic camera images. The probe may have any number of DoF, which may be 6 DoF. Realistic training data for probe localization for the deep learning model may be provided by a custom synthetic data generation pipeline. A synthetic 3D anatomically accurate surgical site may be developed based on real data from surgical procedures. The ultrasound probe may be rendered on surgical site using the 3D virtual (e.g., computer aided drafting) model of the probe and stereo laparoscopic camera geometry from camera calibration. The data may include a plurality of synthetic images, e.g., around 100,000, to develop the deep learning network to estimate 6 DoF pose of ultrasound probe directly from images without modifying the probe in any way, i.e., no physical fiducial markers on the probe. This deep learning model can be trained on each image of a pair of stereoscopic images separately, rather than in pairs.
[0005] According to one aspect of the above embodiment, the laparoscopic or robotic ultrasound probe may be localized in the field of view of the laparoscopic monocular or stereo camera through a vision-only approach without modifying the laparoscopic or robotic ultrasound probe to include any physical fiducial markers. The image processing device may be further configured to localize the laparoscopic ultrasound probe in the video stream based on the key points or virtual fiducial markers. The image processing device may be further configured to estimate multiple DoF (e.g., 6) pose and orientation of the laparoscopic or robotic ultrasound probe from the video stream based on the key points or fiducial markers.
[0006] The image processing device may be further configured to estimate the articulated pose and orientation of the grasper instrument holding the ultrasound probe. Finally, the image processing device may be configured to estimate the pose and orientation of the ultrasound probe by combining the pose and orientation of the probe as well as the pose and orientation of the grasper holding the probe.
[0007] Additionally, the image processing device may be further configured to generate dense depth map of surgical site from laparoscopic monocular or stereo camera to estimate 3D location of instruments, probe, and anatomy in the laparoscopic camera frame of reference. Furthermore, the image processing device may also be configured to implement a Deformable Visual Simultaneous Localization and Mapping (DV-SLAM) pipeline to localize the laparoscopic camera in 3D space at every acquired image frame over time with respect to a World Coordinate System (WCS). The WCS may be tied to the trocar from which laparoscopic endoscope camera is inserted, in which case, the location of trocar on patient anatomy is estimated from multiple external cameras mounted on robot carts or towers. The WCS may be also tied to one of the instruments, e.g., the grasper instrument manipulating the laparoscopic ultrasound probe, in which case, the location of trocar on patient anatomy is estimated from segmenting the shaft of the grasper instrument in plurality of images captured from laparoscopic camera and computing the intersection between lines fit to the shafts, hence localizing the remote center of motion (RCM) of the grasper instrument trocar. The image processing system may be configured to dispose the location and orientation of ultrasound probe in each laparoscopic camera image with respect to the WCS, hence transferring all images with respect to a fixed frame of reference.
[0008] The image processing device may be also configured to generate the ultrasound volume by computing the value of each ultrasound voxel by interpolating between the values of ultrasound image slice pixels that overlap the corresponding voxels after placing each ultrasound image in the world coordinate system.
[0009] As noted above, prior to ultrasound volume generation, the laparoscopic camera is localized with respect to the WCS using depth mapping from a stereo or a pair of monocular images and DV-SLAM. In particular, localization of the camera includes depth mapping from stereo pair followed by DV-SLAM on the successive stereo image pairs over time. Depth mapping provides a single frame snapshot of how far objects are from camera, whereas DV-SLAM provides for localization of the camera in WCS.
[0010] Depth mapping can be performed for either monocular or stereo cameras. Stereo camera depth estimation is easier and more reliable than monocular camera depth estimation. DV-SLAM can be performed using either monocular camera input or stereo camera input. DV-SLAM with monocular camera input can’t resolve the scale factor (how far away the camera is from scene) reliably because there are multiple solutions along the same line of 3D point.
[0011] Deformable Visual SLAM in combination with depth map from stereo reconstruction provides the most reliable method of localizing the camera with respect to WCS. The default mode of operation may include: 1) depth estimation through stereo reconstruction using calibrated stereo endoscope or through monocular depth estimation using monocular laparoscope, i.e., depth mapping; and 2) DV-SLAM at every frame with stereo pair from calibrated stereo camera pair images.
[0012] Ultrasound volume may be generated using a method that relies on the 6 Degrees of Freedeom (DoF) probe pose estimation from calibrated stereo endoscope images. The ultrasound probe is localized in 3D space by 6 DoF probe pose estimation from stereo endoscope images in the stereo endoscope frame of reference. The stereo endoscope camera itself may be localized in 3D space in the WCS of reference tied to the trocar into which the camera is inserted. Stereo reconstruction and DV-SLAM are used to update the position of the camera from the images provided by the camera itself. Once the ultrasound probe is localized in endoscope images, the 3D position of the probe is tracked in the stationary WCS of reference tied to a landmark, e.g., one or more trocars. For 6 DoF ultrasound probe localization, virtual fiducial markers on the probe are estimated from both images of stereo pair at each time step, i.e., frame. Then pose regression on the detected key points is run by the deep learning model. At run time, the neural network based on the deep learning model is executed on each channel of stereo video streams separately to estimate DoF pose from each stream. The estimated pose from each channel (i.e., left or right image of the pair of images) is combined using stereo calibration. In one embodiment, the neural network might be trained directly on the stereo pair to directly estimate the 6DoF pose of the probe from rectified stereo pair images. This localizes the probe DoF in 3D. Furthermore, the image processing device may also be configured to enhance the ultrasound volume by matching a plurality of key points in each 2D ultrasound image of the plurality of ultrasound images. The image processing device may be additionally configured to generate a 3D model based on the volumetric image of tissue formed from the first modality images and deform the 3D model to conform to the ultrasound volume.
[0013] The 3D model may be deformed as follows: segmenting of the 2D/3D anatomical target surface and tracking of the instrument to isolate motion of instrument from anatomy; compensation for motion of the patient, e.g., breathing motion compensation, by tracking target anatomy and estimating movement of anatomy while masking out the movement of instruments; and biomechanical modeling to estimate the physically-realistic movement of the organ of interest along with the anatomy around the tissue being tracked. [0014] In the context of 3D model deformation, the image processing device may be configured to segment all instruments at the surgical site in order to mask out non-anatomical regions of interest from organ surface deformation estimation. The image processing device may further be configured to perform instance segmentation mask of the organ in laparoscopic camera images for every frame to estimation breathing motion as well surface deformation. The surface deformation may further involve generating a depth map to estimate ultrasound probe pressure on tissue and the resulting deformation in tissue surface. The breathing motion estimation component may also involve interface into a pulse oximetry system to predict breathing cycle related tissue surface and sub-tissue structure deformation. The image processing device may be further configured to identify sub-tissue landmarks that are common between ultrasound images and the raw preoperative images or the corresponding 3D model and compute the dense displacement field registration map between the ultrasound volume and the 3D model. The image processing device may be also further configured to transfer a slice of the 3D model to a corresponding 2D ultrasound image of the ultrasound volume using a neural network.
[0015] Prior to registration of the 3D model to ultrasound images/volume, common landmarks in the preoperative (e.g., CT/MRI) and intraoperative images (e.g., ultrasound and laparoscopic video feed) are identified. This involves ultrasound sub-tissue structure identification (vessels, tumor, etc.) through ultrasound segmentation followed by matching and aligning the corresponding landmarks in pre-operative imaging model (i.e., same vessel, tumor, etc.). Pre-operative imaging model registration includes the following: identification of anatomical landmarks in CT (organ surface, sub-surface internal critical structures, e.g., vessels, tumor); identification of surface anatomical landmarks in stereo endoscope images; and segmentation of sub-surface internal critical structures, e.g., vessels, tumor.
[0016] According to another embodiment of the present disclosure, a surgical robotic system is disclosed. The surgical robotic system includes a first robotic arm controlling a laparoscopic ultrasound probe that is configured to be inserted through an access port and to obtain intraoperatively a plurality of 2D ultrasound images of a tissue, and a second robotic arm controlling a laparoscopic camera configured to capture a video stream of the tissue. The system also includes an image processing device configured to receive preoperative images of tissue, generate a 3D model of the tissue from the preoperative images, and generate an ultrasound volume from the plurality of 2D ultrasound images. The image processing device is further configured to register the ultrasound volume with the 3D model and generate an overlay of the 3D model and a 2D ultrasound image of the plurality of 2D ultrasound images. The system also includes a screen configured to display the video stream showing the laparoscopic ultrasound probe and the overlay including the 3D model and the 2D ultrasound image extending along an imaging plane of the laparoscopic ultrasonic probe.
[0017] Implementations of the above embodiment may further include one or more of the following features. According to one aspect of the above embodiment, the laparoscopic or robotic ultrasound probe may include plurality of physical fiducial markers on the probe in order to robustly estimate its pose and orientation from laparoscopic camera images. This may be used to identify probes lacking any discernable visual features on the outside for pose and orientation estimation. According to another aspect of the above embodiment, the laparoscopic or robotic ultrasound probe may be devoid of any physical fiducial markers obviating the need for any modification of the ultrasound probe. The image processing device may be further configured to localize the laparoscopic ultrasound probe in the video stream based on the plurality of key points or virtual fiducial markers. This configuration may involve a training phase where a plurality of stereo or monocular laparoscopic camera images are generated from a combination of real surgical sites or in a synthetic environment using computer graphics generated synthetic images. The training set of images are used to train a neural network that has a key point detection subnetwork and a pose regressor subnetwork. The neural network is used to process either monocular laparoscopic camera images or each channel of the stereo laparoscopic camera images to generate the pose and orientation of the laparoscopic ultrasound probe in the laparoscopic camera images. In this case, the image processing device may be configured to combine the estimated pose from each channel (i.e., left or right image of the stereo image pair) using stereo calibration for the final 6 DoF pose of the ultrasound probe. The neural network may be alternatively trained directly on the stereo image pairs and trained for end-to-end 6 DoF pose estimation directly from stereo pair input images.
[0018] The image processing device may be also configured to localize the laparoscopic ultrasound probe based on kinematic data of the first robotic arm. The image processing device may be additionally configured to estimate a pose and orientation of the laparoscopic ultrasound probe from the video stream based on the key points or virtual fiducial markers. The pose and orientation estimation of the laparoscopic ultrasound probe may be accomplished through a combination of kinematics data of the first robotic arm as well as the localization of the plurality of key points or virtual fiducial markers from the laparoscopic camera video stream. The image processing device may be further configured to generate the ultrasound volume by computing the value of each ultrasound voxel by interpolating between the values of ultrasound image slice pixels that overlap the corresponding voxels after placing each ultrasound image in the world coordinate system. Furthermore, the image processing device may also be configured to enhance the ultrasound volume by matching a plurality of key points in each 2D ultrasound image of the plurality of ultrasound images. The image processing device may be also further configured to transfer a slice of the 3D model to a corresponding 2D ultrasound image of the ultrasound volume using a neural network.
[0019] According to a further embodiment of the present disclosure, a method for intraoperative imaging of tissue is disclosed. The method includes generating a 3D model of tissue from a plurality of preoperative images, generating and updating a depth-based surface map of tissue using monocular or stereo laparoscopic camera, and generating an ultrasound volume from a plurality of 2D ultrasound images obtained from a laparoscopic ultrasonic probe. The method further includes visual localization and mapping pipeline that places the laparoscopic camera in a world coordinate system from every image of the camera. The method further includes ultrasound probe pose and orientation estimation from monocular or stereo laparoscopic camera in the world coordinate system to generate an ultrasound volume from plurality of registered ultrasound image slices. The method further includes registering the ultrasound volume with the 3D model and generating an overlay of the 3D model and a 2D ultrasound image of the plurality of 2D ultrasound images. The method additionally includes displaying a video stream obtained from a laparoscopic video camera and the overlay. The video stream includes the laparoscopic ultrasound probe, and the overlay includes the 3D model and the 2D ultrasound image extending along an imaging plane of the laparoscopic ultrasound probe.
[0020] Implementations of the above embodiment may additionally include one or more of the following features. According to one aspect of the above embodiment, the method may further include localizing the laparoscopic ultrasound probe in the video stream based on a plurality of key points or virtual fiducial markers disposed on the laparoscopic ultrasound probe. The method may further include moving the laparoscopic ultrasound probe by a robotic arm and localizing the laparoscopic ultrasound probe based on kinematic data of the robotic arm. The method may additionally include estimating a pose and orientation of the laparoscopic ultrasound probe from the video stream based on the combination of key points or virtual fiducial markers, robotic arm kinematic data, stereo reconstruction, and visual simultaneous localization and mapping of laparoscopic camera with respect to a world coordinate system. The method may further include generating the ultrasound volume by computing the value of each ultrasound voxel by interpolating between the values of ultrasound image slice pixels that overlap the corresponding voxels after placing each ultrasound image in the world coordinate system. Furthermore, the image processing device may also be configured to enhance the ultrasound volume by matching a plurality of key points in each 2D ultrasound image of the plurality of ultrasound images. The method may further include transferring a slice of the 3D model to a corresponding 2D ultrasound image of the ultrasound volume using a neural network.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] Various embodiments of the present disclosure are described herein with reference to the drawings wherein:
[0022] FIG. 1 is a schematic illustration of a surgical robotic system including a control tower, a console, and one or more surgical robotic arms each disposed on a movable cart according to an embodiment of the present disclosure;
[0023] FIG. 2 is a perspective view of a surgical robotic arm of the surgical robotic system of FIG. 1 according to an embodiment of the present disclosure;
[0024] FIG. 3 is a perspective view of a movable cart having a setup arm with the surgical robotic arm of the surgical robotic system of FIG. 1 according to an embodiment of the present disclosure; [0025] FIG. 4 is a schematic diagram of a computer architecture of the surgical robotic system of FIG. 1 according to an embodiment of the present disclosure;
[0026] FIG. 5 is a plan schematic view of movable carts of FIG. 1 positioned about a surgical table according to an embodiment of the present disclosure;
[0027] FIG. 6 is a method for intraoperative fusion of different imaging modalities according to an embodiment of the present disclosure;
[0028] FIG. 7 is a method of obtaining and registering multiple imaging modalities according to an embodiment of the present disclosure; [0029] FIG. 8A is a computed tomography image according to an embodiment of the present disclosure;
[0030] FIG. 8B is a 3D model image according to an embodiment of the present disclosure;
[0031] FIG. 8C is an endoscopic video image according to an embodiment of the present disclosure;
[0032] FIG. 8D is an ultrasound image according to an embodiment of the present disclosure;
[0033] FIG. 9 is a perspective view of a laparoscopic ultrasound probe according to an embodiment of the present disclosure;
[0034] FIG. 10 is schematic diagram showing multiple coordinate systems and the transformation matrices to convert between them according to an embodiment of the present disclosure;
[0035] FIG. 11 shows ultrasound segmentation images according to an embodiment of the present disclosure;
[0036] FIG. 12 shows an augmented laparoscopic video stream overlaid with a 3D model and an ultrasound imaging plane extending from the ultrasound probe according to an embodiment of the present disclosure;
[0037] FIGS. 13A and 13B are schematic diagrams of the laparoscopic ultrasound probe identifying margins of a tumor in tissue and identifying a path of resection according to an embodiment of the present disclosure;
[0038] FIG. 14 is a schematic diagram illustrating generation of an ultrasound volume from a plurality of 2D ultrasound slices according to an embodiment of the present disclosure; and [0039] FIG. 15 is a schematic flow chart illustrating registration between an intra-operative ultrasound volume and a pre-operative CT volume according to an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0040] Embodiments of the presently disclosed surgical robotic system are described in detail with reference to the drawings, in which like reference numerals designate identical or corresponding elements in each of the several views.
[0041] As will be described in detail below, the present disclosure is directed to a surgical robotic system, which includes a surgeon console, a control tower, and one or more movable carts having a surgical robotic arm coupled to a setup arm. The surgeon console receives user input through one or more interface devices, which are processed by the control tower as movement commands for moving the surgical robotic arm and an instrument and/or camera coupled thereto. Thus, the surgeon console enables teleoperation of the surgical arms and attached instruments/camera. The surgical robotic arm includes a controller, which is configured to process the movement commands and to generate torque commands for activating one or more actuators of the robotic arm, which would, in turn, move the robotic arm in response to the movement command.
[0042] With reference to FIG. 1, a surgical robotic system 10 includes a control tower 20, which is connected to all of the components of the surgical robotic system 10 including a surgeon console 30 and one or more movable carts 60. Each of the movable carts 60 includes a robotic arm 40 having a surgical instrument 50 removably coupled thereto. The robotic arms 40 also couple to the movable carts 60. The robotic system 10 may include any number of movable carts 60 and/or robotic arms 40.
[0043] The surgical instrument 50 is configured for use during minimally invasive surgical procedures. In embodiments, the surgical instrument 50 may be configured for open surgical procedures. In further embodiments, the surgical instrument 50 may be an electrosurgical forceps configured to seal tissue by compressing tissue between jaw members and applying electrosurgical current thereto. In yet further embodiments, the surgical instrument 50 may be a surgical stapler including a pair of jaws configured to grasp and clamp tissue while deploying a plurality of tissue fasteners, e.g., staples, and cutting stapled tissue. In yet further embodiments, the surgical instrument 50 may be a surgical clip applier including a pair of jaws configured apply a surgical clip onto tissue.
[0044] One of the robotic arms 40 may include a laparoscopic camera 51 configured to capture video of the surgical site. The laparoscopic camera 51 may be a stereoscopic endoscope configured to capture two side-by-side (i.e., left and right) images of the surgical site to produce a video stream of the surgical scene. The laparoscopic camera 51 is coupled to an image processing device 56, which may be disposed within the control tower 20. The image processing device 56 may be any computing device as described below configured to receive the video feed from the laparoscopic camera 51 and output the processed video stream.
[0045] The surgeon console 30 includes a first screen 32, which displays a video feed of the surgical site provided by camera 51 of the surgical instrument 50 disposed on the robotic arm 40, and a second screen 34, which displays a user interface for controlling the surgical robotic system 10. The first screen 32 and second screen 34 may be touchscreens allowing for displaying various graphical user inputs.
[0046] The surgeon console 30 also includes a plurality of user interface devices, such as foot pedals 36 and a pair of hand controllers 38a and 38b which are used by a user to remotely control robotic arms 40. The surgeon console further includes an armrest 33 used to support clinician’s arms while operating the hand controllers 38a and 38b.
[0047] The control tower 20 includes a screen 23, which may be a touchscreen, and outputs on the graphical user interfaces (GUIs). The control tower 20 also acts as an interface between the surgeon console 30 and one or more robotic arms 40. In particular, the control tower 20 is configured to control the robotic arms 40, such as to move the robotic arms 40 and the corresponding surgical instrument 50, based on a set of programmable instructions and/or input commands from the surgeon console 30, in such a way that robotic arms 40 and the surgical instrument 50 execute a desired movement sequence in response to input from the foot pedals 36 and the hand controllers 38a and 38b. The foot pedals 36 may be used to enable and lock the hand controllers 38a and 38b, repositioning camera movement and electrosurgical activation/deactivation. In particular, the foot pedals 36 may be used to perform a clutching action on the hand controllers 38a and 38b. Clutching is initiated by pressing one of the foot pedals 36, which disconnects (i.e., prevents movement inputs) the hand controllers 38a and/or 38b from the robotic arm 40 and corresponding instrument 50 or camera 51 attached thereto. This allows the user to reposition the hand controllers 38a and 38b without moving the robotic arm(s) 40 and the instrument 50 and/or camera 51. This is useful when reaching control boundaries of the surgical space.
[0048] Each of the control tower 20, the surgeon console 30, and the robotic arm 40 includes a respective computer 21, 31, 41. The computers 21, 31, 41 are interconnected to each other using any suitable communication network based on wired or wireless communication protocols. The term “network,” whether plural or singular, as used herein, denotes a data network, including, but not limited to, the Internet, Intranet, a wide area network, or a local area network, and without limitation as to the full scope of the definition of communication networks as encompassed by the present disclosure. Suitable protocols include, but are not limited to, transmission control protocol/internet protocol (TCP/IP), datagram protocol/internet protocol (UDP/IP), and/or datagram congestion control protocol (DC). Wireless communication may be achieved via one or more wireless configurations, e.g., radio frequency, optical, Wi-Fi, Bluetooth (an open wireless protocol for exchanging data over short distances, using short length radio waves, from fixed and mobile devices, creating personal area networks (PANs), ZigBee® (a specification for a suite of high level communication protocols using small, low-power digital radios based on the IEEE 122.15.4-1203 standard for wireless personal area networks (WPANs)).
[0049] The computers 21, 31, 41 may include any suitable processor (not shown) operably connected to a memory (not shown), which may include one or more of volatile, non-volatile, magnetic, optical, or electrical media, such as read-only memory (ROM), random access memory (RAM), electrically-erasable programmable ROM (EEPROM), non-volatile RAM (NVRAM), or flash memory. The processor may be any suitable processor (e.g., control circuit) adapted to perform the operations, calculations, and/or set of instructions described in the present disclosure including, but not limited to, a hardware processor, a field programmable gate array (FPGA), a digital signal processor (DSP), a central processing unit (CPU), a microprocessor, and combinations thereof. Those skilled in the art will appreciate that the processor may be substituted for by using any logic processor (e.g., control circuit) adapted to execute algorithms, calculations, and/or set of instructions described herein.
[0050] With reference to FIG. 2, each of the robotic arms 40 may include a plurality of links 42a, 42b, 42c, which are interconnected at joints 44a, 44b, 44c, respectively. Other configurations of links and joints may be utilized as known by those skilled in the art. The joint 44a is configured to secure the robotic arm 40 to the movable cart 60 and defines a first longitudinal axis. With reference to FIG. 3, the movable cart 60 includes a lift 67 and a setup arm 61, which provides a base for mounting of the robotic arm 40. The lift 67 allows for vertical movement of the setup arm 61. The movable cart 60 also includes a display 69 for displaying information pertaining to the robotic arm 40. In embodiments, the robotic arm 40 may include any type and/or number of joints.
[0051] The setup arm 61 includes a first link 62a, a second link 62b, and a third link 62c, which provide for lateral maneuverability of the robotic arm 40. The links 62a, 62b, 62c are interconnected at joints 63a and 63b, each of which may include an actuator (not shown) for rotating the links 62b and 62b relative to each other and the link 62c. In particular, the links 62a, 62b, 62c are movable in their corresponding lateral planes that are parallel to each other, thereby allowing for extension of the robotic arm 40 relative to the patient (e.g., surgical table). In embodiments, the robotic arm 40 may be coupled to the surgical table (not shown). The setup arm 61 includes controls 65 for adjusting movement of the links 62a, 62b, 62c as well as the lift 67. In embodiments, the setup arm 61 may include any type and/or number of joints.
[0052] The third link 62c may include a rotatable base 64 having two degrees of freedom. In particular, the rotatable base 64 includes a first actuator 64a and a second actuator 64b. The first actuator 64a is rotatable about a first stationary arm axis which is perpendicular to a plane defined by the third link 62c and the second actuator 64b is rotatable about a second stationary arm axis which is transverse to the first stationary arm axis. The first and second actuators 64a and 64b allow for full three-dimensional orientation of the robotic arm 40.
[0053] The actuator 48b of the joint 44b is coupled to the joint 44c via the belt 45a, and the joint 44c is in turn coupled to the joint 46b via the belt 45b. Joint 44c may include a transfer case coupling the belts 45a and 45b, such that the actuator 48b is configured to rotate each of the links 42b, 42c and a holder 46 relative to each other. More specifically, links 42b, 42c, and the holder 46 are passively coupled to the actuator 48b which enforces rotation about a pivot point “P” which lies at an intersection of the first axis defined by the link 42a and the second axis defined by the holder 46. In other words, the pivot point “P” is a remote center of motion (RCM) for the robotic arm 40. Thus, the actuator 48b controls the angle 0 between the first and second axes allowing for orientation of the surgical instrument 50. Due to the interlinking of the links 42a, 42b, 42c, and the holder 46 via the belts 45a and 45b, the angles between the links 42a, 42b, 42c, and the holder 46 are also adjusted in order to achieve the desired angle 0. In embodiments, some or all of the joints 44a, 44b, 44c may include an actuator to obviate the need for mechanical linkages.
[0054] The joints 44a and 44b include an actuator 48a and 48b configured to drive the joints 44a, 44b, 44c relative to each other through a series of belts 45a and 45b or other mechanical linkages such as a drive rod, a cable, or a lever and the like. In particular, the actuator 48a is configured to rotate the robotic arm 40 about a longitudinal axis defined by the link 42a.
[0055] With reference to FIG. 2, the holder 46 defines a second longitudinal axis and configured to receive an instrument drive unit (IDU) 52 (FIG. 1). The IDU 52 is configured to couple to an actuation mechanism of the surgical instrument 50 and the camera 51 and is configured to move (e.g., rotate) and actuate the instrument 50 and/or the camera 51. IDU 52 transfers actuation forces from its actuators to the surgical instrument 50 to actuate components an end effector 49 of the surgical instrument 50. The holder 46 includes a sliding mechanism 46a, which is configured to move the IDU 52 along the second longitudinal axis defined by the holder 46. The holder 46 also includes a joint 46b, which rotates the holder 46 relative to the link 42c. During endoscopic procedures, the instrument 50 may be inserted through an endoscopic access port 55 (FIG. 3) held by the holder 46. The holder 46 also includes a port latch 46c for securing the access port 55 to the holder 46 (FIG. 2).
[0056] The robotic arm 40 also includes a plurality of manual override buttons 53 (FIG. 1) disposed on the IDU 52 and the setup arm 61, which may be used in a manual mode. The user may press one or more of the buttons 53 to move the component associated with the button 53.
[0057] With reference to FIG. 4, each of the computers 21, 31, 41 of the surgical robotic system 10 may include a plurality of controllers, which may be embodied in hardware and/or software. The computer 21 of the control tower 20 includes a controller 21a and safety observer 21b. The controller 21a receives data from the computer 31 of the surgeon console 30 about the current position and/or orientation of the hand controllers 38a and 38b and the state of the foot pedals 36 and other buttons. The controller 21a processes these input positions to determine desired drive commands for each joint of the robotic arm 40 and/or the IDU 52 and communicates these to the computer 41 of the robotic arm 40. The controller 21a also receives the actual joint angles measured by encoders of the actuators 48a and 48b and uses this information to determine force feedback commands that are transmitted back to the computer 31 of the surgeon console 30 to provide haptic feedback through the hand controllers 38a and 38b. The safety observer 21b performs validity checks on the data going into and out of the controller 21a and notifies a system fault handler if errors in the data transmission are detected to place the computer 21 and/or the surgical robotic system 10 into a safe state.
[0058] The computer 41 includes a plurality of controllers, namely, a main cart controller 41a, a setup arm controller 41b, a robotic arm controller 41c, and an instrument drive unit (IDU) controller 41 d. The main cart controller 41a receives and processes joint commands from the controller 21a of the computer 21 and communicates them to the setup arm controller 41b, the robotic arm controller 41c, and the IDU controller 4 Id. The main cart controller 41a also manages instrument exchanges and the overall state of the movable cart 60, the robotic arm 40, and the IDU 52. The main cart controller 41a also communicates actual joint angles back to the controller 21a. [0059] Each of joints 63a and 63b and the rotatable base 64 of the setup arm 61 are passive joints (i.e., no actuators are present therein) allowing for manual adjustment thereof by a user. The joints 63a and 63b and the rotatable base 64 include brakes that are disengaged by the user to configure the setup arm 61. The setup arm controller 41b monitors slippage of each of joints 63a and 63b and the rotatable base 64 of the setup arm 61, when brakes are engaged or can be freely moved by the operator when brakes are disengaged, but do not impact controls of other joints. The robotic arm controller 41c controls each joint 44a and 44b of the robotic arm 40 and calculates desired motor torques required for gravity compensation, friction compensation, and closed loop position control of the robotic arm 40. The robotic arm controller 41c calculates a movement command based on the calculated torque. The calculated motor commands are then communicated to one or more of the actuators 48a and 48b in the robotic arm 40. The actual joint positions are then transmitted by the actuators 48a and 48b back to the robotic arm controller 41c.
[0060] The IDU controller 41d receives desired joint angles for the surgical instrument 50, such as wrist and jaw angles, and computes desired currents for the motors in the IDU 52. The IDU controller 41 d calculates actual angles based on the motor positions and transmits the actual angles back to the main cart controller 41a.
[0061] The robotic arm 40 is controlled in response to a pose of the hand controller controlling the robotic arm 40, e.g., the hand controller 38a, which is transformed into a desired pose of the robotic arm 40 through a hand eye transform function executed by the controller 21a. The hand eye function, as well as other functions described herein, is/are embodied in software executable by the controller 21a or any other suitable controller described herein. The pose of one of the hand controllers 38a may be embodied as a coordinate position and roll-pitch-yaw (RPY) orientation relative to a coordinate reference frame, which is fixed to the surgeon console 30. The desired pose of the instrument 50 is relative to a fixed frame on the robotic arm 40. The pose of the hand controller 38a is then scaled by a scaling function executed by the controller 21a. In embodiments, the coordinate position may be scaled down and the orientation may be scaled up by the scaling function. In addition, the controller 21a may also execute a clutching function, which disengages the hand controller 38a from the robotic arm 40. In particular, the controller 21a stops transmitting movement commands from the hand controller 38a to the robotic arm 40 if certain movement limits or other thresholds are exceeded and in essence acts like a virtual clutch mechanism, e.g., limits mechanical input from effecting mechanical output.
[0062] The desired pose of the robotic arm 40 is based on the pose of the hand controller 38a and is then passed by an inverse kinematics function executed by the controller 21a. The inverse kinematics function calculates angles for the joints 44a, 44b, 44c of the robotic arm 40 that achieve the scaled and adjusted pose input by the hand controller 38a. The calculated angles are then passed to the robotic arm controller 41c, which includes a joint axis controller having a proportional-derivative (PD) controller, the friction estimator module, the gravity compensator module, and a two-sided saturation block, which is configured to limit the commanded torque of the motors of the joints 44a, 44b, 44c.
[0063] With reference to FIG. 5, the surgical robotic system 10 is setup around a surgical table 90. The system 10 includes movable carts 60a-d, which may be numbered “1” through “4.” During setup, each of the carts 60a-d are positioned around the surgical table 90. Position and orientation of the carts 60a-d depends on a plurality of factors, such as placement of a plurality of access ports 55a-d, which in turn, depends on the surgery being performed. Once the port placement is determined, the access ports 55a-d are inserted into the patient, and carts 60a-d are positioned to insert instruments 50 and the laparoscopic camera 51 into corresponding ports 55a-d.
[0064] During use, each of the robotic arms 40a-d is attached to one of the access ports 55a-d that is inserted into the patient by attaching the latch 46c (FIG. 2) to the access port 55 (FIG. 3). The IDU 52 is attached to the holder 46, followed by the SIM 43 being attached to a distal portion of the IDU 52. Thereafter, the instrument 50 is attached to the SIM 43. The instrument 50 is then inserted through the access port 55 by moving the IDU 52 along the holder 46. The SIM 43 includes a plurality of drive shafts configured to transmit rotation of individual motors of the IDU 52 to the instrument 50 thereby actuating the instrument 50. In addition, the SIM 43 provides a sterile barrier between the instrument 50 and the other components of robotic arm 40, including the IDU 52. The SIM 43 is also configured to secure a sterile drape (not shown) to the IDU 52.
[0065] With reference to FIG. 6, a method for intraoperative fusion of different imaging modalities includes combining preoperative imaging and intraoperative imaging to provide a combined 3D image of an organ, tumor, or any other tissue as well as overlays of the different modalities. Preoperative imaging includes any suitable imaging modality such as computed tomography (CT), magnetic resonance imaging (MRI), or any other imaging modality capable of obtaining 3D images as shown in FIG. 8A. Intraoperative imaging may be ultrasound imaging. CT imaging is well-suited for preoperative use since CT provides high quality images. However, intraoperative CT use is undesirable due to radiation exposure and supine positioning of the patient. Ultrasound imaging is well-suited for intraoperative use since it is safe for frequent imaging regardless of the position of the patient, even though ultrasound provide noisy images with limited perspective. In addition to ultrasound, other imaging modalities may be used such as gamma radiation, Raman spectroscopy, multispectral imaging, time-resolved fluorescence spectroscopy (ms-TRFS) probe, and auto fluorescence.
[0066] At step 100, the image processing device 56 receives preoperative images, which may be done by obtaining a plurality of 2D images and reconstructing a 3D volumetric image therefrom. In embodiments, preoperative images may be provided to any other computing device (e.g., outside the operating room) to perform the image processing steps described herein.
[0067] FIG. 7 provides additional sub steps for each of the main steps of the method of FIG. 6. Step 100 includes multiple segmentation steps lOOa-d, namely, segmentation of the organ surface, vasculature, landmarks, and tumor. As used herein, the term “segmentation” denotes obtaining a plurality of 2D slices or segments of an object.
[0068] With reference to FIG. 6, at step 102, the image processing device 56 or another computing device generates a 3D model shown in FIG. 8B, which may be a wire mesh model based on the preoperative image. The image processing device 56 may generate the 3D model including a plurality of points or vertices interconnected by line segments based on the segmentations and include a surface texture over of the vertices and segments.
[0069] Steps 100 and 102 are performed preoperatively, with subsequent steps being performed once the surgical procedure has commenced, which includes setting up the robotic system 10 as shown in FIG. 5. In embodiments, the method of the present disclosure may be implemented using a stand-alone imaging system 80 (FIG. 10), the laparoscopic camera 51, and a laparoscopic ultrasound probe 70, which are coupled to the image processing device 56, and one or more screens 32 and 34 (FIG. 1). At step 104, the ultrasonic probe 70 is inserted through one of the access ports 55a-d and may be controlled by one of the robotic arms 40a-d and corresponding IDU 52. With reference to FIGS. 9 and 10, the ultrasound probe 70 includes an ultrasound transducer 72 configured to output ultrasound waves. The laparoscopic camera 51 is positioned such that the surgical site is within its field of view, and the ultrasound probe 70 is then also moved into the field of view of the laparoscopic camera 51. At step 106, the image processing device 56 localizes the ultrasound probe 70. The image processing device 56 may store (in memory or storage) dimensions of the ultrasound probe 70, and positions and distances between key points or virtual fiducial markers 74. With reference to FIG. 8C, the image processing device 56 analyzes all stereo pair frames (left and right channel images) from the video stream to identify a plurality of the key points or virtual fiducial markers 74 (FIG. 9), which then enables the image processing device 56 to determine 3D dimensions in the video stream, e.g., depth mapping.
[0070] The virtual fiducial markers 74 are generated by a machine learning image processing algorithm configured to identify geometry of the ultrasound probe 70. The image processing device 56 is configured to execute the image processing algorithm, which may include deep learning model to estimate 6 degrees of freedom (DoF) pose of a rigid object (i.e., ultrasound probe 70) from stereo or monocular laparoscopic camera images. Realistic training data for probe localization for the deep learning model is provided by a custom synthetic data generation pipeline. Synthetic 3D anatomically accurate surgical site is developed based on actual data from surgical procedures. The ultrasound probe may be rendered on surgical site using the 3D virtual (e.g., computer aided drafting) model of the probe and stereo laparoscopic camera geometry from camera calibration. The data may include a plurality of synthetic images, e.g., around 100,000, to develop the deep learning network to estimate 6 DoF pose of ultrasound probe directly from images without modifying the probe in any way, i.e., no physical fiducial markers on probe. This deep learning model can be trained on each image of a pair of stereoscopic images separately, rather than in pairs. In further embodiments, the probe 70 may include physical fiducial markers in additional to virtual fiducial markers 74, which may be formed from any visually distinctive (e.g., white, fluorescent, etc.) paint, dye, etching, and/or objects (e.g., dots, blocks, instrument components, markings, etc.).
[0071] Localization may be based on image processing by the image processing device 56 as described above, or may additionally also include kinematics data of the robotic arm 40 moving the ultrasound probe 70. Kinematics data includes position, velocity, pose, orientation, joint angles, and other data based on the movement commands provided to the robotic arm 40 and execution of the commands by the robotic arm. In further embodiments, other tracking techniques may also be used, such as electromagnetic tracking.
[0072] During image analysis to localize the ultrasound probe 70, the image processing device 56 treats the ultrasound probe 70 as a rigid object and uses that to estimate the pose and location in real time based on the images from the laparoscopic camera 51. In certain aspects, pose estimation may be performed using machine learning image processing algorithms. The algorithms may be trained on a plurality (e.g., about 100,000) of synthetic stereoscopic images having rendered ultrasound probe and surgical scene with blood, smoke, and other realistic artifacts to improve generalization. Machine learning may be implemented using neural networks in a two-stage process having a key point detector and a pose regressor.
[0073] In embodiments where kinematic data is also used, the final pose of the ultrasound probe 70 is determined by initially using kinematics to estimate a rough localization pose, followed by fine pose estimation through vision, i.e., image processing by the image processing device 56.
[0074] In embodiments, the ultrasound probe 70 may be held by the instrument 50 as shown in FIGS. 8C and 10. At step 108, the image processing device localizes the instrument 50 using the same deep learning algorithm for identifying virtual fiducial markers as described above with respect to step 106.
[0075] The image processing device 56 may be further configured to estimate the articulated pose and orientation of the instrument 50 holding the ultrasound probe 70. In particular, the image processing device 56 may be configured to estimate the pose and orientation of the ultrasound probe 70 by combining the pose and orientation of the probe 70 as well as the pose and orientation of the instrument 50 holding the probe 70.
[0076] At step 110, the image processing device 56 generates a depth map of a surgical site from the camera 51 to estimate 3D location of instrument 50, probe 70, and anatomy in the laparoscopic camera frame of reference. At step 112, the image processing device 56 localizes the camera 51. This may be done by using a Deformable Visual Simultaneous Localization and Mapping (DV- SLAM) pipeline at every acquired image frame over time with respect to a World Coordinate System (WCS) (FIG. 10). The WCS may be tied to the access port 55 from which camera 51 is inserted, in which case, the location of access port 55 on patient anatomy is estimated from one or more external cameras (not shown), which may be mounted on the mobile carts 60 a-d and/or system tower 10 or other suitable locations. In particular, localization of the camera includes depth mapping from stereo pair followed by DV-SLAM on the successive stereo image pairs over time. Depth mapping provides a single frame snapshot of how far objects are from camera, whereas DV- SLAM provides for localization of the camera in WCS.
[0077] At step 114, the ultrasound probe 70 and the instrument 50 are localized in the video feed by the image processing device 56. At step 116, the ultrasound probe 70 and the instrument 50 are localized in the WCS. In particular, the WCS is either tied to the access port 55 from which camera 51 is inserted as explained above. In another embodiment, the WCS can also be tied to the access port 55 from which one of the instruments 50, i.e., the grasper instrument 50 manipulating the ultrasound probe 70, is inserted. In this case, the location of access port 55 in the patient is estimated from segmenting the shaft of the instrument 50 in plurality of images captured by the camera 51 and computing the intersection between lines fit to the shafts, hence localizing the remote center of motion (RCM) of the access port 55 of the instrument 50. The image processing device 56 is configured to dispose the location and orientation of ultrasound probe 70 in each image with respect to the WCS, hence transferring all images with respect to a fixed frame of reference.
[0078] After the ultrasound probe 70 is localized, the ultrasound probe 70 is used to obtain ultrasound images of the tissue at step 118. With reference to FIG. 7, the ultrasound probe 70 is used to perform segmentation of vessels, landmarks, tumor at steps 118a, 118b, 118c, respectively. The image processing device 56 may include ultrasound image processors and other components for displaying the ultrasound images alongside the video images from the laparoscopic camera 51 in any suitable fashion e.g., on separate screens 32 and 34, picture-in-picture, overlays, etc.
[0079] At step 120, image processing device 56 is also configured to construct a 3D ultrasound volume from the segmented ultrasound images. FIG. 11 shows exemplary segmented ultrasound image slices and volumes reconstructed based on sub-tissue landmarks (e.g., critical structures, tumor, veins, arteries, etc.). Segmented landmarks may be displayed as different colored overlays on ultrasound images displayed on the screen 32 and/or 34.
[0080] Ultrasound volume may be generated using a method that relies on the 6 DoF probe pose estimation from calibrated stereo endoscope images. The ultrasound probe is localized in 3D space by 6 DoF probe pose estimation from stereo endoscope images in the stereo endoscope frame of reference. The stereo endoscope camera itself may be localized in 3D space in the WCS of reference tied to the access port 55 into which the camera 51 is inserted, or in the WCS of reference tied to the access port 55 into which the grasper instrument 50 is inserted. Stereo reconstruction and DV-SLAM are used to update the position of the camera 51 from the images provided by the camera itself. Once the ultrasound probe is localized in endoscope images, its 3D position is tracked in the stationary WCS of reference tied to a landmark, e.g., one or more trocars. For 6 DoF ultrasound probe localization, virtual key points on probe are estimated from both images of stereo pair at each time step. Then pose regression on the detected key points is run by the deep learning model. At run time, the neural network based on the deep learning model is executed on each channel of stereo separately to estimate DoF pose from each channel. The estimated pose from each channel (i.e., left or right image of the pair of images) is combined using stereo calibration. This localizes the probe in 3D.
[0081] With reference to FIG. 14, the image processing device may be also configured to generate the ultrasound volume by computing the value of each ultrasound voxel and interpolating between the values of ultrasound image slice pixels that overlap the corresponding voxels after placing each ultrasound image in the world coordinate system. 2D ultrasound images represented by their planar equations using 3 points and value of 3D ultrasound voxel between slices computed from distance weighted orthogonal projection.
[0082] The image processing device 56 is configured to generate the ultrasound volume by registering individual ultrasound slices together based on localization of the ultrasound probe 70 at step 122 (i.e., using kinematics for rough localization and rigid object pose estimation for fine pose estimation). Stitching of ultrasound images may be done by matching key points obtained at different orientations as the ultrasound probe 70 is moved in multiple directions over the tissue being subjected to the ultrasound. In particular, the image registration process may include detecting invariant key points using scale invariant feature transform (SIFT), speed up robust feature (SURF), robust independent elementary features (BRIEF), oriented FAST, rotated BRIEF (ORB), or any other suitable image matching technique. Thereafter, the image processing device 56 extracts robust feature descriptors from the detected key points in each successive ultrasound slice image and computes the distances between the descriptors of each of the key points in successive ultrasound images. Further, the image processing device 56 selects the top matches for each descriptor of successive ultrasound images and estimates the homography matrix and aligns the successive images to create stitched volumetric ultrasound panorama.
[0083] At step 124, the image processing device 56 enhances or refines the ultrasound volume from ultrasound image segments by tracking across ultrasound images and by matching a plurality of key points in each 2D ultrasound image of the plurality of ultrasound images. The image processing device may be additionally configured to generate a 3D ultrasound model based on the volumetric image of tissue formed from the first modality images and deform the 3D ultrasound model to conform to the ultrasound volume.
[0084] At step 126, the image processing device 56 compensates for breathing motion of the patient since breathing imparts movement on the tissue and connected instruments 50 and camera 51. The image processing device 56 is configured to receive breathing data from the patient, which may be from a pulse oximetry device configured to measure respiration rate, pulse, etc.
[0085] At step 128, the 3D ultrasound model is deformed, which includes: segmentation of 2D/3D anatomical target surface and tracking of the instrument to isolate motion of instrument from anatomy; compensation for motion of the patient, e.g., breathing motion compensation, by tracking target anatomy and estimating movement of anatomy while ignoring the movement of instruments; and biomechanical modeling to estimate the physically-realistic movement of the organ of interest along with the anatomy around the tissue being tracked.
[0086] At step 130, an ultrasound slide projection overlay 92 (FIGS. 10 and 12) is shown over the endoscopic image at steps Ic-e (FIG. 7). At step 132, the image processing device 56 registers the constructed ultrasound volume with the 3D model constructed in step 102. With reference to FIG. 15, the image processing device 56 utilizes cross-modal transfer learning for this purpose. In particular, generative adversarial networks (GAN) architecture may be used to transfer preoperative image or model (e.g., CT) slices to intraoperative image (e.g., ultrasound) slices. The CT volume is resliced at arbitrary slice orientations since the ultrasound probe 70 may be located at any orientation and position. A transition dataset, which is a synthetic ultrasound dataset constructed from CT data, is used to pre-train the segmentation network. Thereafter, transfer learning may be used to fine-tune the neural network on the real ultrasound dataset. The trained neural network may then be used to register the constructed ultrasound volume with the 3D model. Thus, registration is performed between a 3D model based on the pre-operative CT volume and the intra-op ultrasound volume that is generated by registering multiple ultrasound images at arbitrary orientations and overlaps.
[0087] Prior to registration of 3D model to ultrasound images/volume common landmarks in the preoperative (e.g., CT/MRI) and intraoperative images (e.g., ultrasound and laparoscopic video feed) are identified. This involves ultrasound sub-tissue structure identification (vessels, tumor, etc.) through ultrasound segmentation followed by matching and aligning the corresponding landmarks in pre-operative imaging/model (i.e., same vessel, tumor, etc.). Pre-operative model registration includes the following: identification of anatomical landmarks in pre-operative imaging/model (organ surface, sub-surface internal critical structures, e.g., vessels, tumor); identification of surface anatomical landmarks in stereo endoscope images; and segmentation of sub-surface internal critical structures, e.g., vessels, tumor. [0088] At step 134, the image processing device 56 deforms the 3D model to align with the ultrasound model. For this step the image processing device 56 may use another neural network trained on the ultrasound and pre-operative imaging/model of the same tissue in order to learn ultrasound to CT/MRI registration, i.e., by learning the appearance mapping between CT/MRI image of the tissue of interest (e.g., tumor) and counterpart ultrasound image and by learning the appearance mapping between CT/MRI critical structure landmarks (vessels, arteries, etc.) and the ultrasound critical structure landmarks. The image processing device 56 is further configured to deform 3D model after registering vessels, landmarks, and tumor between preoperative (e.g., CT/MRI) images and intraoperative (e.g., ultrasound) images.
[0089] At step 136, the image processing device 56 displays the deformed 3D model as overlay on the endoscopic video stream of the laparoscopic camera 51, which may be shown on the first screen 32. The image processing device 56 may also display the deformed 3D model as overlay on the on the ultrasound feed of the ultrasound probe 70, which may be shown on the second screen 34. In embodiments, the video and ultrasound streams may be side-by-side displays on the same screen 32 or 34, picture-in-picture, or any other manner. With reference to FIG. 12, a video stream from the laparoscopic camera 51 is augmented with a 3D model 91 of the tissue of interest (e.g., tumor) along with the ultrasound slide projection overlay 92 extending from the ultrasound probe 70. The ultrasound slide projection overlay 92 is continuously generated along an imaging plane of the ultrasound probe 70.
[0090] With reference to FIGS. 13 A and 13B, the overlays may be used to identify margins of the tissue of interest, e.g., tumor “T”, in an organ “O” by moving, pivoting, rotating, etc. the ultrasound probe 70 to the edges of the tumor and commencing the incision of the tissue to resect the tumor. Systems and methods of the present disclosure advantageously allow the location of the margins continuously and contemporaneously during resection.
[0091] It will be understood that various modifications may be made to the embodiments disclosed herein. Therefore, the above description should not be construed as limiting, but merely as exemplifications of various embodiments. Those skilled in the art will envision other modifications within the scope and spirit of the claims appended thereto.

Claims

WHAT IS CLAIMED IS:
1. An imaging system comprising: a laparoscopic ultrasound probe configured to be inserted through an access port and to obtain a plurality of 2D ultrasound images of a tissue; a laparoscopic camera configured to capture a video stream of the tissue; an image processing device configured to: generate a depth map of a surgical site from the video stream to estimate a 3D location of the laparoscopic ultrasound probe in frame of reference of the laparoscopic camera; localize the laparoscopic camera and the laparoscopic ultrasound probe in a world coordinate system based on the depth map; receive a volumetric image of tissue formed from a first modality images; generate an ultrasound volume from the plurality of 2D ultrasound images; register the ultrasound volume with the volumetric image of the tissue; and generate an overlay of the volumetric image of tissue and a 2D ultrasound image of the plurality of 2D ultrasound images; and a screen configured to display the video stream showing the laparoscopic ultrasound probe and the overlay extending from the laparoscopic ultrasound probe.
2. The imaging system according to claim 1, wherein the image processing device is further configured to generate a plurality of virtual fiducial markers on the laparoscopic ultrasound probe using a deep learning image processing algorithm.
3. The imaging system according to claim 2, wherein the image processing device is further configured to localize the laparoscopic ultrasound probe in the video stream based on the virtual fiducial markers.
4. The imaging system according to claim 3, wherein the image processing device is further configured to estimate a pose and orientation of the laparoscopic ultrasound probe from the video stream based on the virtual fiducial markers.
5. The imaging system according to claim 1, wherein the laparoscopic camera is a stereoscopic camera and the video stream includes a pair of stereo image streams.
6. The imaging system according to claim 5, wherein the image processing device is further configured to localize the laparoscopic camera and the laparoscopic ultrasound probe in the world coordinate system using a Deformable Visual Simultaneous Localization and Mapping (DV-SLAM) pipeline based on the pair of stereo image streams.
7. The imaging system according to claim 1, wherein the image processing device is further configured to compensate for deformation of the ultrasound volume in response to breathing motion based on respiratory data from a pulse oximeter apparatus.
8. The imaging system according to claim 1, wherein the image processing device is further configured to generate the ultrasound volume by matching a plurality of key points in each 2D ultrasound image of the plurality of 2D ultrasound images; and provide visual guidance encompassing a tumor along with a negative resection margin around the tumor.
9. The imaging system according to claim 1, wherein the image processing device is further configured to generate a 3D model based on the volumetric image of tissue formed from the first modality images; and deform the 3D model to conform to the ultrasound volume.
10. The imaging system according to claim 9, wherein the image processing device is further configured to transfer a slice of the 3D model to a corresponding 2D ultrasound image of the ultrasound volume using a neural network.
11. The imaging system according to claim 1, further comprising a laparoscopic imaging probe selected from the group consisting of a gamma radiation probe, Raman spectroscopy probes, multispectral probe, a time-resolved fluorescence spectroscopy (ms-TRFS) probe, and an auto fluorescence probe.
12. A surgical robotic system comprising: a first robotic arm including a laparoscopic ultrasound probe configured to be inserted through an access port and to obtain intraoperatively a plurality of 2D ultrasound images of a tissue; a second robotic arm including a laparoscopic camera configured to capture a video stream of the tissue; an image processing device configured to: generate a depth map of a surgical site from the video stream to estimate 3D location of the laparoscopic ultrasound probe in frame of reference of the laparoscopic camera; localize the laparoscopic camera and the laparoscopic ultrasound probe in a world coordinate system based on the depth map; receive a 3D model of tissue formed from a first modality images; generate an ultrasound volume from the plurality of 2D ultrasound images; register the ultrasound volume with the 3D model of the tissue; deform at least one of the 3D model or the ultrasound volume based on at least one of pressure applied on the tissue by the laparoscopic ultrasound probe or breathing motion; and generate an overlay of the 3D model of tissue and a 2D ultrasound image of the plurality of 2D ultrasound images; and a screen configured to display the video stream showing the laparoscopic ultrasound probe and the overlay including the 3D model and the 2D ultrasound image extending along an imaging plane of the laparoscopic ultrasonic probe.
13. The surgical robotic system according to claim 12, wherein the image processing device is further configured to localize the laparoscopic ultrasound probe based on kinematic data of the first robotic arm.
14. The surgical robotic system according to claim 12, wherein the image processing device is further configured to generate a plurality of virtual fiducial markers on the laparoscopic ultrasound probe using a deep learning image processing algorithm.
15. The surgical robotic system according to claim 14, wherein the image processing device is further configured to localize the laparoscopic ultrasound probe in the video stream based on the virtual fiducial markers.
16. The surgical robotic system according to claim 15, wherein the image processing device is further configured to estimate a pose and orientation of the laparoscopic ultrasound probe from the video stream based on the virtual fiducial markers.
17. The surgical robotic system according to claim 12, wherein the laparoscopic camera is a stereoscopic camera and the video stream includes a pair of stereo image streams.
18. The surgical robotic system according to claim 17, wherein the image processing device is further configured to localize the laparoscopic camera and the laparoscopic ultrasound probe in the world coordinate system using a Deformable Visual Simultaneous Localization and Mapping (DV-SLAM) pipeline based on the pair of stereo image streams.
19. A method for intraoperative imaging of tissue, the method comprising: obtaining a plurality of 2D ultrasound images from a laparoscopic ultrasonic probe; obtaining a video stream from a laparoscopic camera; generating a depth map of a surgical site from the video stream to estimate 3D location of a laparoscopic ultrasound probe in frame of reference of the laparoscopic camera; localizing the laparoscopic camera and the laparoscopic ultrasound probe in a world coordinate system based on the depth map; receiving a 3D model of tissue formed from a first modality images; generate an ultrasound volume from the plurality of 2D ultrasound images; registering the ultrasound volume with the 3D model of the tissue; and generating an overlay of the 3D model of tissue and a 2D ultrasound image of the plurality of 2D ultrasound images; and generate an overlay of the 3D model of tissue and a 2D ultrasound image of the plurality of 2D ultrasound images displaying the video stream, the video stream including the laparoscopic ultrasound probe and the overlay including the 3D model and the 2D ultrasound image extending along an imaging plane of the laparoscopic ultrasonic probe.
20. The method according to claim 19, further comprising: deforming at least one of the 3D model or the ultrasound volume based on at least one of pressure applied on the tissue by the laparoscopic ultrasound probe or breathing motion to generate a deformed model.
21. The method according to claim 20, further comprising: displaying a suggested path of resection of tumor along with a selected negative resection margin from tumor based on the deformed model registered with ultrasound volume.
22. An imaging system comprising: a laparoscopic camera configured to capture a video stream of tissue; an intraoperative imaging device configured to be inserted through an access port and to obtain a plurality of signals from a tissue; an image processing device configured to: generate a 3D reconstruction of a surgical site from the laparoscopic camera video stream to estimate 3D location of the intraoperative imaging device in a frame of reference of the laparoscopic camera; localize the laparoscopic camera and the intraoperative imaging device in a world coordinate system based on the 3D reconstruction of a surgical site; receive a volumetric image of tissue formed from a pre-operative imaging modality; generate a multi-frame representation from a plurality of signals from the intraoperative imaging device; register the multi-frame representation with the volumetric image of the tissue; deform the volumetric image of the tissue according to the multi-frame representation; and generate an overlay of the volumetric image of tissue and the multi-frame representation; and a screen configured to display the video stream showing data based on the plurality of signals from the intraoperative imaging device and the overlay extending from the intraoperative imaging device.
23. The imaging system according to claim 22, wherein the intraoperative imaging device is a laparoscopic ultrasound probe and the pre-operative imaging modality is at least one MRI or CT.
PCT/IB2023/058368 2022-08-24 2023-08-23 Surgical robotic system and method for intraoperative fusion of different imaging modalities WO2024042468A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202263400525P 2022-08-24 2022-08-24
US63/400,525 2022-08-24
US202263428204P 2022-11-28 2022-11-28
US63/428,204 2022-11-28

Publications (1)

Publication Number Publication Date
WO2024042468A1 true WO2024042468A1 (en) 2024-02-29

Family

ID=87929132

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2023/058368 WO2024042468A1 (en) 2022-08-24 2023-08-23 Surgical robotic system and method for intraoperative fusion of different imaging modalities

Country Status (1)

Country Link
WO (1) WO2024042468A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108778143A (en) * 2016-03-16 2018-11-09 皇家飞利浦有限公司 Computing device for laparoscopic image and ultrasonoscopy to be overlapped
US20190388162A1 (en) * 2017-02-14 2019-12-26 Intuitive Surgical Operations, Inc. Multi-dimensional visualization in computer-assisted tele-operated surgery

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108778143A (en) * 2016-03-16 2018-11-09 皇家飞利浦有限公司 Computing device for laparoscopic image and ultrasonoscopy to be overlapped
US20190388162A1 (en) * 2017-02-14 2019-12-26 Intuitive Surgical Operations, Inc. Multi-dimensional visualization in computer-assisted tele-operated surgery

Similar Documents

Publication Publication Date Title
US20230218356A1 (en) Systems and methods for projecting an endoscopic image to a three-dimensional volume
US8498691B2 (en) Robotic catheter system and methods
KR102117273B1 (en) Surgical robot system and method for controlling the same
US20150320514A1 (en) Surgical robots and control methods thereof
KR102119534B1 (en) Surgical robot and method for controlling the same
KR102217573B1 (en) Systems and methods for tracking a path using the null-space
CN102170835B (en) Medical robotic system providing computer generated auxiliary views of a camera instrument for controlling the positioning and orienting of its tip
KR101296215B1 (en) Method and system for performing 3-d tool tracking by fusion of sensor and/or camera derived data during minimally invasive robotic surgery
US20100331855A1 (en) Efficient Vision and Kinematic Data Fusion For Robotic Surgical Instruments and Other Applications
KR20140112207A (en) Augmented reality imaging display system and surgical robot system comprising the same
JP2012529970A (en) Virtual measurement tool for minimally invasive surgery
US20240046589A1 (en) Remote surgical mentoring
US20200246084A1 (en) Systems and methods for rendering alerts in a display of a teleoperational system
WO2024042468A1 (en) Surgical robotic system and method for intraoperative fusion of different imaging modalities
WO2023052998A1 (en) Setting remote center of motion in surgical robotic system
US20230248452A1 (en) Predicting stereoscopic video with confidence shading from a monocular endoscope
CN116056653A (en) Systems and methods for enhancing imaging during surgery
US20240137583A1 (en) Surgical robotic system and method with multiple cameras
Sudra et al. MEDIASSIST: medical assistance for intraoperative skill transfer in minimally invasive surgery using augmented reality
WO2024006729A1 (en) Assisted port placement for minimally invasive or robotic assisted surgery
US20220383555A1 (en) Systems and methods for clinical workspace simulation
US20230363834A1 (en) Real-time instrument position identification and tracking
US20240156325A1 (en) Robust surgical scene depth estimation using endoscopy
US20220323157A1 (en) System and method related to registration for a medical procedure
US20240070875A1 (en) Systems and methods for tracking objects crossing body wallfor operations associated with a computer-assisted system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23764720

Country of ref document: EP

Kind code of ref document: A1