US20200281556A1 - X-ray Detector Pose Estimation in Medical Imaging - Google Patents
X-ray Detector Pose Estimation in Medical Imaging Download PDFInfo
- Publication number
- US20200281556A1 US20200281556A1 US16/292,716 US201916292716A US2020281556A1 US 20200281556 A1 US20200281556 A1 US 20200281556A1 US 201916292716 A US201916292716 A US 201916292716A US 2020281556 A1 US2020281556 A1 US 2020281556A1
- Authority
- US
- United States
- Prior art keywords
- detector
- image
- ray
- markers
- camera
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000002059 diagnostic imaging Methods 0.000 title 1
- 238000000034 method Methods 0.000 claims description 32
- 238000003384 imaging method Methods 0.000 claims description 26
- 230000004044 response Effects 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000004807 localization Effects 0.000 claims description 4
- 239000003550 marker Substances 0.000 description 33
- 238000012549 training Methods 0.000 description 21
- 238000001514 detection method Methods 0.000 description 10
- 230000015654 memory Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 238000001429 visible spectrum Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000007670 refining Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 239000003086 colorant Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000013213 extrapolation Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 210000003484 anatomy Anatomy 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000002329 infrared spectrum Methods 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 210000002414 leg Anatomy 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000008733 trauma Effects 0.000 description 1
- 210000000689 upper leg Anatomy 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B6/00—Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
- A61B6/58—Testing, adjusting or calibrating thereof
- A61B6/587—Alignment of source unit to detector unit
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B6/00—Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
- A61B6/08—Auxiliary means for directing the radiation beam to a particular spot, e.g. using light beams
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B6/00—Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
- A61B6/54—Control of apparatus or devices for radiation diagnosis
- A61B6/547—Control of apparatus or devices for radiation diagnosis involving tracking of position of the device or parts of the device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/74—Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B6/00—Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
- A61B6/42—Arrangements for detecting radiation specially adapted for radiation diagnosis
- A61B6/4208—Arrangements for detecting radiation specially adapted for radiation diagnosis characterised by using a particular type of detector
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B6/00—Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
- A61B6/44—Constructional features of apparatus for radiation diagnosis
- A61B6/4429—Constructional features of apparatus for radiation diagnosis related to the mounting of source units and detector units
- A61B6/4458—Constructional features of apparatus for radiation diagnosis related to the mounting of source units and detector units the source unit or the detector unit being attached to robotic arms
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B6/00—Apparatus or devices for radiation diagnosis; Apparatus or devices for radiation diagnosis combined with radiation therapy equipment
- A61B6/44—Constructional features of apparatus for radiation diagnosis
- A61B6/4429—Constructional features of apparatus for radiation diagnosis related to the mounting of source units and detector units
- A61B6/4464—Constructional features of apparatus for radiation diagnosis related to the mounting of source units and detector units the source unit or the detector unit being mounted to ceiling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
Definitions
- the present embodiments relate to X-ray detector pose estimation for X-ray imaging.
- the X-ray source and detector have several degrees-of-freedom (DOF).
- the detector may be a mobile plate positionable in various locations and/or orientations.
- the x-ray source connects to a robotic arm to allow positioning generally orthogonal to the detector for x-ray imaging.
- the freedom of positioning provides an advantage in terms of flexibility in imaging the patient.
- Such systems enable high quality X-ray examinations to be carried out for more specific regions of interests and for a wider range of patients (e.g. in cases of serious trauma).
- the X-ray source, the region-of-interest, and the X-ray detector should be well aligned.
- This alignment is performed in two steps: positioning the X-ray detector with respect to the region-of-interest of the patient, and then aligning the X-ray source with respect to the X-ray detector.
- an operator performs the detector positioning and X-ray source alignment manually. Because the alignment is performed manually, the quality of the alignment is not consistent, difficult to reproduce every time, and time consuming.
- U.S. Published Application No. 2018/0235566 teaches automatic alignment by locating markers on the detector.
- the pose of the X-ray detector is determined from the located markers in a process providing for marker detection, detector plane estimation, and then detector position estimation based on an image from a camera.
- the markers are detected using hand-crafted filters with different sizes and a Hough transform-based approach.
- Each marker is then classified to match with a pre-defined template detector.
- a region growing algorithm is applied on a depth image to obtain a detector plane.
- the pose of the X-ray detector position is estimated based on the point-correspondence to the template.
- the approach may also have difficulty handling large distance variation of the detector relative to the camera.
- a machine-learned model is used to estimate locations of markers, including occluded or other non-visible markers, from an image.
- the locations of the markers, including the non-visible markers are used to determine the pose of the X-ray detector for aligning an X-ray tube with the X-ray detector.
- a method for estimating X-ray detector pose.
- An image of the X-ray detector is captured. Locations of visible and occluded markers on the X-ray detector are identified from the image by a machine-learned detector.
- the X-ray detector pose of the X-ray detector is determined from the locations of the visible and occluded markers.
- An X-ray tube is positioned based on the X-ray detector pose. A patient is x-ray imaged with the X-ray tube as positioned and the X-ray detector.
- the image does not show the occluded markers due to an intervening object, such as a patient.
- the image may not show markers that are outside of the field of view of the camera.
- the locations of the occluded markers and/or out-of-view markers not shown by the image are still identified.
- the image may be captured as an infrared (IR) image.
- IR infrared
- a camera may be fixedly connected and moveable with the X-ray tube. Assuming the X-ray detector is partially visible inside of the camera's field of view, the captured image may be padded to virtually enlarge the field of view such that the padded image is more likely to include the entirety of the X-ray detector.
- the machine-learned detector may be of various types of machine-learned models, such as a deep neural network trained with deep learning.
- the network structure of the machine-learned detector may be a single network or a sequence of two or more networks. For example, a first of the two or more networks is trained to localize the X-ray detector in the image in a coarse level, and a second of the two or more networks is trained to locate the visible, occluded and/or other out-of-view markers in response to the localization output by the first of the two or more networks. The second of the two or more networks may be trained to locate in response to an input of the image cropped based on the localization output of the first network.
- a third of the two or more networks may be trained to refine positions of the visible, occluded and/or other out-of-view markers output by the second of the two or more networks where the refined positions are the identified locations.
- the identification of the locations may include generating a heatmap of the locations.
- the heatmap provides probabilistic indication of possible locations for each marker.
- the X-ray detector pose is determined with homography.
- a position and orientation in three dimensions of the X-ray detector pose is determined from the geometrical relationship of the markers to each other as viewed by the camera.
- an X-ray imaging system connects to a movable arm.
- a camera connects to the X-ray source or moveable arm to be movable with the X-ray source.
- An X-ray detector is moveable independent of the movable arm.
- An image processor is configured to determine a position, orientation, or position and orientation of the X-ray detector relative to the X-ray source based on markers shown by the camera and markers not shown by the camera. The markers not shown by the camera are located by a machine-learned network.
- the moveable arm is a robotic arm
- the X-ray detector is a flat panel detector.
- the camera is an infrared camera. The markers are reflective for ease of detection by the camera.
- the camera is a color camera.
- the markers are painted with distinguishable color compared to the X-ray detector, such that the markers may be easily detected by the camera.
- a display configured to display an X-ray image of the patient may be provided.
- the X-ray image is acquired based on positioning of the X-ray source relative to the position, orientation, or position and orientation determined by the image processor.
- the image processor is configured to apply the machine-learned network to an image from the camera.
- the machine-learned network outputs the markers not shown by the camera in response to the application.
- a method for locating markers on an X-ray detector.
- a region including a detector with markers and a patient is imaged with a camera.
- the patient intervenes between the camera and at least some of the markers.
- a machine-learned model determines, from the imaging, locations of the markers including the at least some of the markers for which the patient is intervening.
- the imaging is with infrared.
- the markers are infrared-reflective markers.
- a position and orientation of the X-ray detector is determined based on the locations, and an X-ray source is positioned with a robotic arm based on the position and orientation of the X-ray detector.
- the region is a padded image with a processing-created enlarged field of view of the camera.
- the machine-learned model is a machine-learned network including a first image-to-image network trained to locate the detector in the field of view of the camera and a second image-to-image network trained to determine the locations from the imaging cropped based on the located detector from the first image-to-image network.
- FIG. 1A shows an X-ray system utilizing a mobile X-ray detector according to various embodiments
- FIG. 1B is a high-level block diagram of one embodiment of a system controller for determining locations of occluded markers
- FIG. 2 is a detailed view of the mobile X-ray detector in FIG. 1 .
- FIG. 3 is a flow chart diagram of one embodiment of a method for estimating X-ray detector pose with a machine-learned model
- FIG. 4 shows example camera images with occlusion of markers and non-visible markers on an X-ray detector
- FIG. 5 shows example generation of additional training images
- FIG. 6 illustrates an example network architecture for the machine-learned model
- FIG. 7 shows an example graph of accuracy in pose estimation and an expanded view of part of the graph.
- X-ray detector pose estimation is provided by a machine-learned model, such as a deep neural network trained with deep learning methods. Deep or other machine learning-based technique is used to automatically compute the transformation to place the X-ray tube at an ideal imaging location.
- the machine is trained for landmark detection.
- several dedicated convolutional neural networks are trained to estimate visible markers and predict the position of occluded and out-of-view markers. Occluded and out-of-view markers are robustly estimated, making the pose estimation with very few visible markers more accurate.
- This learning-based landmark detection does not require any hand-crafted filters and achieves better detection accuracy and generality as compared to the approach of U.S. Published Application No. 2018/0235566. No explicit marker classification or detection is required. Since the machine learning is data driven, the resulting machine-learned model may adapt to any kind of detector, detector occlusion, detector marker placement, and detector positioning given appropriate training data.
- FIG. 1A shows an X-ray imaging system 100 .
- the X-ray imaging system 100 provides for automatic alignment of the X-ray source 122 and the mobile X-ray detector 110 .
- the automatic alignment utilizes a camera 130 provided on the positionable X-ray source 122 and the mobile X-ray detector 110 having passive markers 200 placed thereon.
- a machine-learned model implemented by the controller 170 determines a position of the X-ray detector 110 from an image of the camera 130 even where some or many of the markers 200 are out-of-view (e.g., occluded or outside the field of view of the camera) in the image.
- Such X-ray system 100 includes an X-ray tube robot system 120 , the positionable X-ray source 122 , an X-ray source base 125 , the camera 130 mounted on or by the X-ray source 122 , the mobile X-ray detector 110 , and the system controller 170 . Additional, different, or fewer components may be provided, such as including a computer network for remote image processing and/or control by computer, server, or workstation forming the system controller 170 .
- the X-ray system 100 is for training, such as using images from a memory 177 and/or the camera 130 and a corresponding ground truth as input samples. Alternatively, the X-ray system 100 is for application of the machine-learned model trained to detect locations of the markers 200 .
- the X-ray tube robot system 120 is configured and adapted to be driven by the system controller 170 for articulating the positionable X-ray source 122 into any desired position to align the X-ray source 122 with the mobile X-ray detector 110 .
- the X-ray tube robot system 120 is a robotic arm on which the X-ray source 122 is arranged. Such a robotic arm allows a particularly variable adjustability of the X-ray source 122 .
- the X-ray tube robot system 120 is a telescoping arm that is hung from overhead tracks. Other robotic arms may be used.
- a manually moveable support structure or arm is provided for the X-ray source 122 . The user is guided to position based one or more images on the display 175 .
- the X-ray source 122 is an X-ray tube. Multiple X-ray sources 122 may be provided, such as for a dual energy X-ray imager.
- the X-ray source 122 is fixedly connected to the movable arm, such as the robot system 120 .
- the connection may be removable, such as by undoing a latch, removing a housing, and/or removing bolts.
- the X-ray source 122 is connected so that moving the end or other part of the robot system 120 moves the X-ray source 122 .
- the X-ray source 122 is movable by the movable arm (e.g., robotic arm) in one or more degrees of translation freedom with or without one or more degrees of rotation freedom.
- the camera 130 may be an optical camera.
- the camera 130 images in the visible light spectrum.
- the visible light camera 130 produces an RGB (Red, Green, Blue) digital image.
- the camera 130 is a depth sensing camera (e.g., three-dimensional (3D) or RGBD camera).
- the depth sensing may allow the camera 130 to determine a depth from the camera 130 to any of the visible markers 200 .
- the digital image data obtained from such camera 130 is typically referred to as an RGBD (RGB+Depth) image, which includes an RGB image, in which each pixel has an RGB value and a depth (or distance) value.
- the camera 130 is an infrared (IR)-based camera, with or without a depth sensor.
- the infrared camera captures images of passive markers 200 , which are IR reflective markers 200 that have strong response on the infra-red light projected onto the markers.
- the IR-based camera 130 produces an IR digital image in which each pixel has an IR value.
- the digital image data obtained from such IR-based camera 130 is an IR image. With depth sensing, an IRD image is obtained.
- a camera or cameras that operate in two or more modes may be used.
- a camera with two or more sensors in either visible spectrum or IR may be used.
- Two or more cameras 130 may be provided. The camera 130 position and optics are calibrated relative to the X-ray source 122 and/or the base 125 of the X-ray system 100 .
- the camera 130 attaches to an appropriate location on the X-ray source 122 and/or the robot system 120 .
- the camera 130 is attached to a rear side of the X-ray source 122 .
- the camera 130 is fixedly connected to the movable arm, such as the robot system 120 .
- the connection may be removable, such as by undoing a latch, removing a housing, and/or removing bolts.
- the camera 130 is connected so that moving the end or other part of the robot system 120 moves the camera 130 with the X-ray source 122 .
- the camera 130 is movable by the movable arm (e.g., robotic arm) in one or more degrees of translation freedom with or without one or more degrees of rotation freedom.
- the X-ray detector 110 is a flat-panel detector for recording high-quality X-ray images.
- the X-ray detector 110 is wireless and is configured with a transceiver for wireless communication with the system controller 170 , for example, for transmitting the acquired X-ray images.
- the X-ray detector 110 has a wired connection to the X-ray system 100 and so quick and error-free communication between the X-ray detector 100 and a component of the X-ray system 100 differing therefrom is ensured.
- the detector 110 is movable independent of the moveable arm or robot system 120 .
- the detector 110 is movable independent of the X-ray source 122 .
- the X-ray detector 110 may be manually positioned anywhere about the patient 500 by an X-ray technician in order to record X-ray images of a region of interest on the patient 500 .
- the mobile X-ray detector 110 is positioned for an X-ray examination of a patient 500 's femur region.
- the detector 110 is a movable detector 110 , such as being free of connection or releasably connected to the rest of the X-ray system 100 .
- the mobile detector 110 may be placed in a drawer in or below the patient bed.
- the mobile detector 110 may be placed in a bracket or connected to a shim or other positioning apparatus.
- the mobile detector 110 may be released and moved to other locations or moved to other locations while attached to a position apparatus (e.g., a wedge holder).
- the mobile detector 110 may have any number of degrees of freedom, such as a being a plate or flat panel that may be positioned on top of the patient bed in any arbitrary position and/or orientation.
- the detector 110 mounts to or is connected fixedly with a manual armature or an automatic robotic arm, such as being part of a C-arm with the X-ray source 122 on the end of the robot system 120 .
- the connection may allow for changeable translation and/or orientation of the detector 110 relative to the X-ray source 122 .
- the detector 110 has a plurality of passive markers 200 placed at designated points on one or more surfaces of the X-ray detector 110 .
- the markers 200 are on a top surface or a surface that faces the X-ray source 122 in use. The marker placement allows the camera to see or image at least some of the markers 200 .
- the plurality of passive markers 200 are provided in shapes and patterns that may uniquely identify each of the corners of the mobile X-ray detector 110 . Any combination of different shapes, patterns, and/or sizes are used.
- the passive markers 200 include radial corner markers and smaller point markers as shown in FIG. 2 .
- the larger radial corner markers are shaped for marking each of the four corners of the mobile X-ray detector 110 .
- the smaller point markers are for the identification and validation of the larger radial corner markers such that the corners may be separately identified.
- other shapes and configuration and placement of the passive markers 200 may be used.
- the markers 200 have uniform shapes, such as being squares or rectangles of the same size with uniform or non-uniform placement on the detector 110 .
- the passive markers 200 are used for detecting the mobile X-ray detector 110 .
- the markers 200 are reflective and/or colored to be visible to the sensing.
- the passive markers 200 For visible spectrum imaging, the passive markers 200 have a color that is visually distinguishable from the color of the mobile X-ray detector's 110 surface, so that the camera 130 may see the passive markers 200 in the visible light spectrum.
- a marker with two or more colors may also be used. Contrasting colors between the markers 200 and the rest of the detector 110 are provided, such as black markers 200 on a beige or white housing of the detector 110 .
- the markers 220 may be formed as an aesthetic element, such as an inset line or pattern.
- the IR reflective markers 200 may be color markers made to blend in with the color of the detector 110 such that the markers 200 are not so noticeable in the visible spectrum but are noticeable in the IR spectrum.
- FIG. 2 shows an IR image 110 A in gray scale of the detector 110 where the markers 200 are lighter than the rest of the detector 110 .
- the reflective markers 200 appear white because these reflective markers 200 reflect IR from the IR source back to the IR camera 130 , and the background appears dark because the light background surface diffuses and/or absorbs IR.
- the system controller 170 controls the overall operation of the X-ray system 100 .
- the system controller 170 is configured with appropriate hardware, firmware, and/or software to control the operation of the X-ray source 122 , camera 130 , robot system 120 , and reading of the detector 110 .
- the controller 170 controls and interfaces with the camera 130 to capture one or more images, the positionable X-ray source 122 to generate X-rays, and the robot system 120 to move the X-ray source based on a determined position of the detector 110 .
- the system controller 170 uses the acquired RGB and/or IR images to determine the pose in 3D of the mobile X-ray detector 110 to then position the X-ray source 122 and drive the X-ray image acquisition function of the X-ray system 100 .
- the system controller 170 may have a network interface 172 for communicating with other devices in the system 100 via a network and/or wired or wireless communications.
- the network interface 172 is configured to carry out data transfer with the mobile X-ray detector 110 and the camera 130 .
- the X-ray system 100 has a user interfacing unit 175 (e.g., a touch and/or display screen, a keyboard, a mouse, etc.) for the X-ray technician to interface with the X-ray system.
- Such input/output devices 175 can be touch-screen interface, display, keyboard, mouse, speakers, buttons, etc.
- the system controller 170 may contain additional, different, or fewer components.
- the system controller 170 may communicate with the X-ray tube robot system 120 to control the positioning and orientation of the X-ray source 122 and to control X-ray image acquisition by the mobile X-ray detector 110 .
- X-ray images acquired by the X-ray detector 110 may be input to the system controller 170 .
- the system controller 170 includes the image processor 171 .
- the image processor 171 may be separate from or part of the controller, which controls the overall operation of the system controller 170 by executing computer program instructions which define such operation.
- the image processor 171 is a control processor, general processor, digital signal processor, three-dimensional data processor, graphics processing unit, application specific integrated circuit, field programmable gate array, artificial intelligence processor, digital circuit, analog circuit, combinations thereof, or other now known or later developed device for image processing and/or application of a machine-learned model.
- the image processor 171 is a single device, a plurality of devices, or a network of devices. For more than one device, parallel or sequential division of processing may be used.
- Different devices making up the image processor 171 may perform different functions, such as one device applying the machine-learned network to locate markers and determine detector pose from the located markers and a separate device for controlling the robot system 120 .
- the image processor 171 is a control processor or other processor of the X-ray system 100 .
- the image processor 171 operates pursuant to stored instructions, hardware, and/or firmware to perform various acts described herein.
- the instructions, camera image, network definition, machine-learned model, outputs, and/or other information are stored in a non-transitory computer readable memory, such as the memory 177 or storage 176 .
- the memory 177 and/or storage 176 are external storage devices, RAM, ROM, database, and/or local memories (e.g., solid state drive or hard drive).
- the same or different non-transitory computer readable media may be used for the instructions and other data.
- the memory 177 and/or storage 176 may be implemented using a database management system (DBMS) and residing on a memory, such as a hard disk, RAM, or removable media.
- DBMS database management system
- the memory 177 is internal to the processor 171 (e.g. cache).
- Non-transitory computer-readable storage media or memories such as a cache, buffer, RAM, removable media, hard drive or other computer readable storage media (e.g., the memory 177 ).
- Computer readable storage media include various types of volatile and nonvolatile storage media.
- the functions, acts or tasks illustrated in the figures or described herein are executed in response to one or more sets of instructions stored in or on computer readable storage media.
- the functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination.
- the instructions are stored on a removable media device for reading by local or remote systems.
- the instructions are stored in a remote location for transfer through a computer network.
- the instructions are stored within a given computer, CPU, GPU or system. Because some of the constituent system components and method steps depicted in the accompanying figures may be implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the way the present embodiments are programmed.
- the image processor 171 is configured to train a machine learning architecture. Based on a user provided or other source of the network architecture and training data, the image processor 171 learns features for an encoder and a decoder or other network parts to train the network. The result of the training is a machine-learned model for identifying marker locations from an image of the camera 130 . Alternatively or additionally, the image processor 171 is configured to apply a previously machine-learned model to identify marker locations from a camera image to align the X-ray source 122 for X-ray imaging of a patient 500 based on a pose of the detector 110 positioned for such imaging.
- the associated X-ray examination workflow is as follows: position the patient 500 appropriately; position the mobile X-ray detector 110 with respect to the region-of-interest of the patient 500 ; acquire one or more images of the scene using the camera 130 ; compute the 6 DOF pose of the X-ray detector 110 using the image(s); automatically position the X-ray source 122 to a position that is in alignment with the mobile X-ray detector 110 based on the 6 DOF pose information on the mobile X-ray detector 110 ; and then record one or more X-ray images of the region-of-interest of the patient 500 .
- a display device connected to the I/O 175 is configured to display an X-ray image of the patient 500 based on positioning of the X-ray source 122 relative to the position, orientation, or position and orientation of the detector 110 as determined by the image processor 171 .
- the system controller 170 is configured to perform the portions of the above-mentioned workflow after the mobile X-ray detector is positioned in place with respect to the region-of-interest.
- the system controller 170 is configured to initiate the process when a single command is inputted via the system controller's operator input device 175 .
- the single command can be a voice command, a click of a key, a click of a mouse, a touch on a touch screen, etc.
- the rest of the workflow does not require any manual intervention from the X-ray technician and the resulting alignment between the X-ray detector and the X-ray source is much more accurate than the conventionally achieved alignment.
- the system controller 170 uses this information to calculate the accurate position and orientation of the mobile X-ray detector 100 in the X-ray system 100 and then automatically aligns the positionable X-ray source 122 to the mobile X-ray detector 110 .
- the system controller 170 accomplishes this by performing transformations between the coordinate system of the camera 130 , the coordinate system of the X-ray base 125 , and the coordinate system of the positionable X-ray source 122 .
- the result is more accurate and repeatable alignment of the X-ray source 122 and the mobile X-ray detector 110 compared to the conventional manual alignment.
- the position of the mobile X-ray detector 110 is calculated from the distance and orientation information gathered from the passive markers 200 using the camera and represented as a 6 DOF (degrees of freedom) pose information on the mobile X-ray detector 110 . Depth measures may also be used.
- the image processor 171 is configured to determine the pose or part of the pose of the detector 110 .
- the image processor 171 is configured to determine a position, orientation, or position and orientation of the detector 110 in 3D space relative to the X-ray source 122 based on markers 200 shown by the camera 130 and markers 200 not shown by the camera 130 .
- the patient 500 and/or other object may occlude or block some markers 200 from being seen by the camera 130 .
- the legs of the patient 500 obstruct one or more (e.g., over half) of the markers 200 as viewed by the camera 130 .
- the markers not shown by the camera 130 are located by a machine-learned network applied by the image processor 171 .
- the image processor 171 is configured to apply a machine-learned network or model to an image from the camera 130 .
- the machine-learned network or model outputs the locations of markers 200 not shown by the camera 130 .
- the markers 200 visible in the image may be located using a different process or are also located by the application of the machine-learned network or model.
- the image processor 171 performs other acts using a machine-learned network or model to locate the markers 200 .
- FIG. 3 shows one embodiment of estimating X-ray detector pose.
- the method includes acts for locating markers on an X-ray detector using a machine-learned model. The locations of both visible and occluded or other non-visible markers are determined by application of a captured image to the machine-learned model.
- acts 34 , 36 , and 38 are not provided in a method to locate the markers.
- acts 36 and 38 are not provided in a method to determine pose.
- acts for positioning the detector, positioning a patient, configuring the X-ray system, and/or acts using output X-ray images are provided.
- the camera images a region including the detector with markers and the patient.
- the region may be imaged with the detector and without the patient.
- the patient and/or another object may intervene between the camera and one or more (e.g., at least some) of the markers on the detector.
- One or more parts of the detector, including one or more markers, are visible from the camera.
- the imaging captures one or more images of the X-ray detector.
- An IR or visual spectrum image of the detector, as occluded, is captured.
- a sequence of such images or just one image once the detector and patient are positioned is captured.
- the camera is fixedly connected with the X-ray tube.
- the captured image is from a perspective similar to the X-ray tube, such as viewing in a same direction with no or little (e.g., 10 cm or less) offset from the X-ray tube.
- the camera is not connected with the X-ray tube and/or is spaced from the X-ray tube. Calibration, measurement, or input are used to provide the spatial relationship of the camera to the X-ray tube at the time of image capture.
- the field of view for the camera is at least the same size as the area of the X-ray detector, but may be larger or smaller so that at least part of the X-ray detector is captured by the camera.
- the camera may have a minimum distance to the detector and/or may be positioned to more likely capture the detector.
- the field of view of the camera and the resulting area represented by the captured image includes part or all of the detector.
- the captured image may be padded, such as adding zero, gray, or other pixel values to increase the area represented. The padding may be by extrapolation or adding a pattern, copies of the image, or other data.
- zero padding is applied to make the captured image 9 times larger, such as adding zero padding in areas to each side and diagonal equal to the area of the captured image (left, right, top, bottom, and at the corners or 3 ⁇ 3 grid with the center area being the captured image and the other areas being processor-added padding).
- the padding makes it more likely that the entire X-ray detector would be within the area represented in the padded image. The entire detector is more likely in the virtual field of view even if the X-ray detector is partially outside the field of view from the captured image (i.e., of the camera).
- the camera field of view extends over an entirety of the patient bed of the X-ray system.
- a wide-angle lens and/or the camera being positioned further away from the bed provides this field of view.
- the camera is moved to different locations to stitch together an image from multiple captures to form the image with the larger field of view.
- a smaller field of view is provided.
- FIG. 4 shows ten example optical images with different sources of occlusion.
- the occlusion may be due to the patient, clothing, and/or another object.
- the occlusion may be due to positioning where part of the detector extends beyond the field of view.
- the pose of the camera with respect to the detector may result in more or less occlusion of markers in the image.
- the image processor identifies locations of visible and occluded markers on the X-ray detector from the captured image.
- the marker locations are identified by a machine-learned detector.
- the machine-learned detector In response to input of the captured image, the machine-learned detector outputs locations of markers including both visible markers and markers not visible (i.e., not shown) in the captured image.
- the machine-learned model identifies locations of the markers including the markers for which the patient is intervening from the imaging by the camera.
- the machine-learned detector outputs the pose of the detector instead of or in addition to the locations of markers.
- the machine-learned detector is trained to output the locations given an input image with or without depth information.
- the training data includes input sample images and corresponding ground truth outputs. Images, including with non-visible markers, and the locations of the markers, including the non-visible markers, are provided as training data.
- a training dataset has 1117 images with ground truth marker locations and/or camera pose and detector pose. 695 images are used for training and 422 for evaluation. Other numbers of images and/or separations for training and evaluation may be used.
- FIG. 4 shows ten images of the training data where the images corresponding to different detector poses, tube poses, and/or obstructions. To create more samples for training, copies of the 695 images may be altered to form any number of additional images.
- FIG. 5 shows two images with different scale, rotation, and/or cropping as compared to the source images from FIG. 4 . Each or some training images are randomly rotated for 0 to 359 degree followed by a random scaling up (e.g., 1 to 1.15 ratio) and/or random cropping. Other augmentation for training data may be used.
- the ground truth is marker location by pixel to learnt to output a heatmap.
- Each individual marker is represented as a heatmap in an individual output channel of the machine-learned architecture so that the classification of each marker is implicitly done during marker detection.
- the output from the machine-learned detector and the ground truth provided for training is a H ⁇ W ⁇ 28 heatmap where H is height and W is width for the image.
- a point, line, area, or other designation is used instead of a heatmap for the output and/or the ground truth are heatmaps.
- the architecture to be machine trained is defined.
- the definition is by configuration or programming of the learning.
- the number of layers or units, type of learning, and other characteristics of the network are controlled by the programmer or user.
- one or more aspects e.g., number of nodes, number of layers or units, or type of learning are defined and selected by the machine during the learning.
- the network is defined as a plurality of sequential feature units or layers. Sequential is used to indicate the general flow of output feature values from one layer to input to a next layer. The information from the next layer is fed to a next layer, and so on until the final output.
- the layers may only feed forward or may be bi-directional, including some feedback to a previous layer.
- the nodes of each layer or unit may connect with all or only a sub-set of nodes of a previous or subsequent layer or unit.
- Deep learning may be used. Rather than pre-programming the features and trying to relate the features to attributes, the deep architecture is defined to learn the features at different levels of abstraction based on an input image or scan data with or without pre-processing. The features are learned to reconstruct lower level features (i.e., features at a more abstract or compressed level). For example, features for reconstructing an image are learned. For a next unit, features for reconstructing the features of the previous unit are learned, providing more abstraction. Different units are provided for learning different features.
- the architecture is a neural network. Deep learning is applied to train features to be extracted from the images and the detection to be performed from the features.
- the machine-learned network is a fully convolutional network, such as a convolutional-to-transposed-convolutional network.
- the machine-learned network may be a U-net encoder-decoder. Multiple levels of feature compression or abstraction are provided, such as four.
- the encoder segment has a plurality of convolutional layers with increasing feature compression or abstraction, and the decoder segment has a plurality of transposed-convolutional layers with decreasing feature compression or abstraction. Skip connections may or may not be provided.
- Other image-to-image architectures may be used, such as a generative adversarial network where the discriminator is used for training and not for application.
- the architecture may include a hierarchy or sequence of two or more networks or models.
- FIG. 6 shows an example with three networks—attention network 61 , coarse prediction network 64 and refining prediction network 67 .
- the markers on the detector are not always visible due to occlusion and being out of the field of view.
- the architecture implements a padding-and-attention marker detection and prediction pipeline to robustly detect all markers.
- the course network 64 and refining prediction network 67 are combined to be one network.
- the coarse network 61 , coarse prediction network 64 , and refining prediction network 67 are combined to be one network directly working on the input with original resolution without any resizing.
- the attention network 61 is trained to localize the X-ray detector in the image 60 .
- the attention network 61 is an image-to-image network, such as a U-net.
- the attention network 61 is a U-Net with 4 stride-2 convolutions for encoder and decoder.
- the original IR or other image 60 is padded to ensure the whole detector is likely to be represented in a single image plane.
- the padding may be by extrapolation from the capture image 60 and/or by zero padding.
- the field of view may not include the entire detector.
- the padding fills in uniform values of zero. Since the detector is assumed to be partially inside field of view of the camera and the detector is not larger than the original image size, then by adding the zeros with the same size of image on both horizontally and vertically (640 ⁇ 480->1920 ⁇ 1440), the field of view is virtually enlarged and the whole detector would be inside the padded image even if not captured in the padded areas.
- the padded image 60 is resized (e.g., sampled or down-sampled) from 1920 ⁇ 1440 to 640 ⁇ 480, but no resizing and/or other sizes or resolutions may be used.
- the attention network 61 receives the image 60 and estimates the center of the detector or a region including the detector.
- a heatmap 62 of the detector, region designator (e.g., rectangle corresponding to the detector or a part of the image 60 surrounding the detector), or center of the detector (e.g., as a heatmap) is output.
- the attention network 61 is trained to locate the detector in the field of view of the camera.
- a 640 ⁇ 480 or another sized ROI is cropped around the detected detector center or for the detected region including the detector from the original (e.g., 1920 ⁇ 1440) padded image 60 .
- the 640 ⁇ 480 ROI 63 is then resized to 320 ⁇ 240 or other scale to obtain a marker prediction at coarse level.
- This cropped and resized image 63 is to be input to the coarse prediction network 64 .
- Other inputs may be used, such as an image cropped without resizing.
- the coarse prediction network 64 is trained to identify the locations of the markers, including markers not visible in the image 60 .
- the visible and not visible markers are to be located or output.
- the coarse prediction network 64 is an image-to-image network, such as a U-net.
- the attention network 61 is a U-Net with 7 stride-2 convolutions for encoder and decoder.
- the coarse prediction network 64 is trained to output marker positions at the coarse resolution.
- the output is a location for each marker.
- An orientation of each marker may be output.
- the output is a heatmap indicating locations for each marker as probabilities.
- the output is up-sampled to a resolution of the original image 60 or other resolution.
- This up-sampled heat map or coarse marker locations 66 are input to the refine prediction network 67 .
- the cropped region of the original image 60 may be input with the up-sampled output.
- the original image 60 is input with an overlay based on the coarse marker locations from the coarse prediction network 64 .
- the refine prediction network 67 is trained to identify the locations of the markers, including markers not visible in the image 60 .
- the visible and not visible markers are located or output at a resolution of the original image 60 or other resolution.
- the refine prediction network 67 is an image-to-image network, such as a U-net.
- the attention network 61 is a U-Net with 5 stride-2 convolutions for encoder and decoder.
- the refine prediction network 67 is trained to output marker positions at the desired resolution.
- the coarse predictions of location are refined.
- the output is a location or position for each marker.
- An orientation of each marker may be output.
- the output is a heatmap 68 indicating locations for each marker as probabilities.
- the output is an identification of the locations (e.g., position and/or orientation) of the markers, including both visible and non-visible markers.
- the convolution layers use a same 3 ⁇ 3 kernel size for all layers of all networks 61 , 64 , 67 .
- the 3 ⁇ 3 region represents a different size or number of pixels relative to the original image in any down-sampled or different resolutions.
- different layers and/or networks have different sized and/or shaped kernels for convolution.
- different network architectures are used, such as one image-to-image or another neural network to receive input of the image 60 and output the locations of the markers.
- another network is provided to use the marker locations to output pose of the detector.
- a machine trains the network arrangement with the training data having ground truth.
- the neural network or networks are trained using the sample images and the ground truth.
- Machine learning is performed to train the various units using of the defined architecture.
- the features e.g., convolution kernels, transposed-convolution kernels, weights, and max pooling connections
- the features providing the desired result or detection of the object are learned.
- the relationship of input features e.g., Haar wavelets, gradients, centerlines, segmentation, or other derivations from the anatomy or tissue images
- the relationship of input feature vectors from the image to output marker locations or detector pose is learned.
- the network is trained end-to-end.
- the results relative to the ground truth and the error are back-projected to learn the features that work best.
- a L2-norm loss is used to optimize the network.
- Other error functions may be used.
- the values of variables of the network are randomly initialized, but another initialization may be used.
- End-to-end training is performed, but one or more features may be set.
- Batch normalization, dropout, and data augmentation are not used, but may be.
- the optimization is with the RMSprop optimizer, but other optimization functions (e.g., Adam, SGD, etc.) may be used. During the optimization, the different distinguishing features are learned.
- the features providing an indication of location input image are learned.
- one or more of the networks 61 , 64 , 67 are pre-trained.
- the attention network 61 is pretrained using image samples with ground truths for detector region or center.
- the pre-trained attention network 61 is then trained end-to-end with the other networks 64 , 67 using ground truth and corresponding loss for marker locations.
- Multi-task learning may be used where a loss is provided for each of the networks in training.
- the model or detector learns to output the locations of visible and/or non-visible markers given an input image. Once trained, the model may be applied to locate markers in the image and/or determine the pose of the detector from the image.
- the trained network is configured by the machine training to generate locations or a heatmap of locations given a previously unseen input image.
- the image processor determines the X-ray detector pose of the X-ray detector.
- the pose is determined as an orientation, position or orientation and position.
- the pose is in two or three dimensions. For example, the pose is a 3D pose relative to the camera.
- Part of the pose may be based on depths measured by the camera.
- the depths of the visible markers may be used to determine the pose, as an initial or starting pose, pose in a sub-set of the degrees of freedom of the detector relative to the camera. Alternatively, the depths are used as input to the machine-learned model for locating markers or are not used.
- the pose is determined using the locations of the visible and non-visible (e.g., occluded or other out-of-view) markers. Where a heatmap is output, the location or locations corresponding to a greatest probability for each marker are selected as the location of the marker.
- the distribution of markers indicates the pose.
- the orientation of one or more markers, orientation of markers relative to each other, the distance between markers, the alignment of markers relative to the camera field of view, and/or other geometrical relationship of or between markers indicates the pose.
- the image processor determines the X-ray detector pose with homography.
- the detected locations of the markers are used to compute a homography to warp a pre-defined template of the markers to a source image (e.g., output heatmap).
- the warping error or matching is used to indicate the pose of the detector.
- the pose is output by the machine-learned model based on estimation from the locations provided within the machine-learned model.
- the controller controls the robot system to position the X-ray tube based on the X-ray detector pose.
- a robotic arm positions the X-ray source relative to the detector pose (position and/or orientation).
- the X-ray tube is manually positioned but with guidance based on the detected pose of the detector (e.g., displaying graphics on the direction and magnitude of movement to align the X-ray tube with the detector).
- the pose of the mobile X-ray detector is estimated in the form of the 6 degrees of freedom in the camera coordinate system, that location information is transformed into a location in the X-ray base coordinate system.
- inverse kinematics may be applied to derive the optimal X-ray control parameters (by converting the location in the X-ray base coordinate system to a position in the X-ray source coordinate system) so that the positionable X-ray source is moved into a position that is in alignment with the mobile X-ray detector (e.g., orthogonal to the detector at a desired distance from the detector).
- the robot system moves the X-ray tube into the aligned position.
- the X-ray system images the patient.
- the patient is imaged with the X-ray tube positioned relative to the X-ray detector. Once positioned, the X-ray tube generates X-rays. The X-rays passing through the patient are detected by the detector.
- FIG. 7 compares using the machine-learned model of FIG. 6 to regress both visible and not visible markers and using a baseline machine-learned model (e.g., single U-net without attention and coarse-to-fine) to regress just the visible markers in detector pose determination.
- the machine-learned model of FIG. 6 is trained for detecting visible and not visible markers. Homography is applied to the regressed marker locations in both the baseline and non-visible model to determine detector pose.
- FIG. 7 shows the sorted root mean square error (y-axis) in the detector position with the baseline regression of only visible markers and with regression of visible and non-visible markers. The x-axis is the error at different percentiles of number of cases. At 90-99% of cases, regression including the non-visible markers provides less error, such as half the error.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Physics & Mathematics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Animal Behavior & Ethology (AREA)
- Biophysics (AREA)
- High Energy & Nuclear Physics (AREA)
- Veterinary Medicine (AREA)
- Public Health (AREA)
- Optics & Photonics (AREA)
- Pathology (AREA)
- Radiology & Medical Imaging (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
Abstract
Description
- The present embodiments relate to X-ray detector pose estimation for X-ray imaging. In robotic X-ray systems, the X-ray source and detector have several degrees-of-freedom (DOF). For example, the detector may be a mobile plate positionable in various locations and/or orientations. The x-ray source connects to a robotic arm to allow positioning generally orthogonal to the detector for x-ray imaging. The freedom of positioning provides an advantage in terms of flexibility in imaging the patient. Such systems enable high quality X-ray examinations to be carried out for more specific regions of interests and for a wider range of patients (e.g. in cases of serious trauma).
- In order to acquire high quality images, the X-ray source, the region-of-interest, and the X-ray detector should be well aligned. The better the alignment, the higher the quality of the X-ray image will be. This alignment is performed in two steps: positioning the X-ray detector with respect to the region-of-interest of the patient, and then aligning the X-ray source with respect to the X-ray detector. Conventionally, an operator performs the detector positioning and X-ray source alignment manually. Because the alignment is performed manually, the quality of the alignment is not consistent, difficult to reproduce every time, and time consuming.
- U.S. Published Application No. 2018/0235566 teaches automatic alignment by locating markers on the detector. The pose of the X-ray detector is determined from the located markers in a process providing for marker detection, detector plane estimation, and then detector position estimation based on an image from a camera. The markers are detected using hand-crafted filters with different sizes and a Hough transform-based approach. Each marker is then classified to match with a pre-defined template detector. After the markers are detected, a region growing algorithm is applied on a depth image to obtain a detector plane. Then, the pose of the X-ray detector position is estimated based on the point-correspondence to the template. However, in some cases, some or many of the markers are occluded, resulting in less accuracy in pose prediction. The approach may also have difficulty handling large distance variation of the detector relative to the camera.
- Systems, methods, and instructions on computer readable media are provided for x-ray detector pose estimation. A machine-learned model is used to estimate locations of markers, including occluded or other non-visible markers, from an image. The locations of the markers, including the non-visible markers are used to determine the pose of the X-ray detector for aligning an X-ray tube with the X-ray detector.
- In a first aspect, a method is provided for estimating X-ray detector pose. An image of the X-ray detector is captured. Locations of visible and occluded markers on the X-ray detector are identified from the image by a machine-learned detector. The X-ray detector pose of the X-ray detector is determined from the locations of the visible and occluded markers. An X-ray tube is positioned based on the X-ray detector pose. A patient is x-ray imaged with the X-ray tube as positioned and the X-ray detector.
- The image does not show the occluded markers due to an intervening object, such as a patient. The image may not show markers that are outside of the field of view of the camera. The locations of the occluded markers and/or out-of-view markers not shown by the image are still identified.
- In various embodiments, the image may be captured as an infrared (IR) image. A camera may be fixedly connected and moveable with the X-ray tube. Assuming the X-ray detector is partially visible inside of the camera's field of view, the captured image may be padded to virtually enlarge the field of view such that the padded image is more likely to include the entirety of the X-ray detector.
- The machine-learned detector may be of various types of machine-learned models, such as a deep neural network trained with deep learning. The network structure of the machine-learned detector may be a single network or a sequence of two or more networks. For example, a first of the two or more networks is trained to localize the X-ray detector in the image in a coarse level, and a second of the two or more networks is trained to locate the visible, occluded and/or other out-of-view markers in response to the localization output by the first of the two or more networks. The second of the two or more networks may be trained to locate in response to an input of the image cropped based on the localization output of the first network. A third of the two or more networks may be trained to refine positions of the visible, occluded and/or other out-of-view markers output by the second of the two or more networks where the refined positions are the identified locations.
- The identification of the locations may include generating a heatmap of the locations. The heatmap provides probabilistic indication of possible locations for each marker.
- In one embodiment, the X-ray detector pose is determined with homography. A position and orientation in three dimensions of the X-ray detector pose is determined from the geometrical relationship of the markers to each other as viewed by the camera.
- In a second aspect, an X-ray imaging system is provided. An X-ray source connects to a movable arm. A camera connects to the X-ray source or moveable arm to be movable with the X-ray source. An X-ray detector is moveable independent of the movable arm. An image processor is configured to determine a position, orientation, or position and orientation of the X-ray detector relative to the X-ray source based on markers shown by the camera and markers not shown by the camera. The markers not shown by the camera are located by a machine-learned network.
- In one embodiment, the moveable arm is a robotic arm, and the X-ray detector is a flat panel detector. The camera is an infrared camera. The markers are reflective for ease of detection by the camera.
- In one embodiment, the camera is a color camera. The markers are painted with distinguishable color compared to the X-ray detector, such that the markers may be easily detected by the camera.
- A display configured to display an X-ray image of the patient may be provided. The X-ray image is acquired based on positioning of the X-ray source relative to the position, orientation, or position and orientation determined by the image processor.
- In one embodiment, the image processor is configured to apply the machine-learned network to an image from the camera. The machine-learned network outputs the markers not shown by the camera in response to the application.
- In a third aspect, a method is provided for locating markers on an X-ray detector. A region including a detector with markers and a patient is imaged with a camera. The patient intervenes between the camera and at least some of the markers. A machine-learned model determines, from the imaging, locations of the markers including the at least some of the markers for which the patient is intervening.
- In a further embodiment, the imaging is with infrared. The markers are infrared-reflective markers. A position and orientation of the X-ray detector is determined based on the locations, and an X-ray source is positioned with a robotic arm based on the position and orientation of the X-ray detector.
- In another embodiment, the region is a padded image with a processing-created enlarged field of view of the camera. The machine-learned model is a machine-learned network including a first image-to-image network trained to locate the detector in the field of view of the camera and a second image-to-image network trained to determine the locations from the imaging cropped based on the located detector from the first image-to-image network.
- Any one or more of the aspects described above may be used alone or in combination. These and other aspects, features and advantages will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings. The present invention is defined by the following claims, and nothing in this section should be taken as a limitation on those claims. Further aspects and advantages of the invention are discussed below in conjunction with the preferred embodiments and may be later claimed independently or in combination.
- The components and the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the embodiments. Moreover, in the figures, like reference numerals designate corresponding parts throughout the different views.
-
FIG. 1A shows an X-ray system utilizing a mobile X-ray detector according to various embodiments; -
FIG. 1B is a high-level block diagram of one embodiment of a system controller for determining locations of occluded markers; -
FIG. 2 is a detailed view of the mobile X-ray detector inFIG. 1 . -
FIG. 3 is a flow chart diagram of one embodiment of a method for estimating X-ray detector pose with a machine-learned model; -
FIG. 4 shows example camera images with occlusion of markers and non-visible markers on an X-ray detector; -
FIG. 5 shows example generation of additional training images; -
FIG. 6 illustrates an example network architecture for the machine-learned model; and -
FIG. 7 shows an example graph of accuracy in pose estimation and an expanded view of part of the graph. - X-ray detector pose estimation is provided by a machine-learned model, such as a deep neural network trained with deep learning methods. Deep or other machine learning-based technique is used to automatically compute the transformation to place the X-ray tube at an ideal imaging location. The machine is trained for landmark detection. In one embodiment, several dedicated convolutional neural networks are trained to estimate visible markers and predict the position of occluded and out-of-view markers. Occluded and out-of-view markers are robustly estimated, making the pose estimation with very few visible markers more accurate.
- This learning-based landmark detection does not require any hand-crafted filters and achieves better detection accuracy and generality as compared to the approach of U.S. Published Application No. 2018/0235566. No explicit marker classification or detection is required. Since the machine learning is data driven, the resulting machine-learned model may adapt to any kind of detector, detector occlusion, detector marker placement, and detector positioning given appropriate training data.
-
FIG. 1A shows anX-ray imaging system 100. TheX-ray imaging system 100 provides for automatic alignment of theX-ray source 122 and themobile X-ray detector 110. The automatic alignment utilizes acamera 130 provided on thepositionable X-ray source 122 and themobile X-ray detector 110 havingpassive markers 200 placed thereon. A machine-learned model implemented by thecontroller 170 determines a position of theX-ray detector 110 from an image of thecamera 130 even where some or many of themarkers 200 are out-of-view (e.g., occluded or outside the field of view of the camera) in the image. - One embodiment of
such X-ray system 100 includes an X-raytube robot system 120, thepositionable X-ray source 122, anX-ray source base 125, thecamera 130 mounted on or by theX-ray source 122, themobile X-ray detector 110, and thesystem controller 170. Additional, different, or fewer components may be provided, such as including a computer network for remote image processing and/or control by computer, server, or workstation forming thesystem controller 170. - The
X-ray system 100 is for training, such as using images from amemory 177 and/or thecamera 130 and a corresponding ground truth as input samples. Alternatively, theX-ray system 100 is for application of the machine-learned model trained to detect locations of themarkers 200. - The X-ray
tube robot system 120 is configured and adapted to be driven by thesystem controller 170 for articulating thepositionable X-ray source 122 into any desired position to align theX-ray source 122 with themobile X-ray detector 110. For example, the X-raytube robot system 120 is a robotic arm on which theX-ray source 122 is arranged. Such a robotic arm allows a particularly variable adjustability of theX-ray source 122. Alternatively, the X-raytube robot system 120 is a telescoping arm that is hung from overhead tracks. Other robotic arms may be used. In other embodiments, a manually moveable support structure or arm is provided for theX-ray source 122. The user is guided to position based one or more images on thedisplay 175. - The
X-ray source 122 is an X-ray tube.Multiple X-ray sources 122 may be provided, such as for a dual energy X-ray imager. - The
X-ray source 122 is fixedly connected to the movable arm, such as therobot system 120. The connection may be removable, such as by undoing a latch, removing a housing, and/or removing bolts. TheX-ray source 122 is connected so that moving the end or other part of therobot system 120 moves theX-ray source 122. TheX-ray source 122 is movable by the movable arm (e.g., robotic arm) in one or more degrees of translation freedom with or without one or more degrees of rotation freedom. - The
camera 130 may be an optical camera. Thecamera 130 images in the visible light spectrum. The visiblelight camera 130 produces an RGB (Red, Green, Blue) digital image. - In one embodiment, the
camera 130 is a depth sensing camera (e.g., three-dimensional (3D) or RGBD camera). The depth sensing may allow thecamera 130 to determine a depth from thecamera 130 to any of thevisible markers 200. The digital image data obtained fromsuch camera 130 is typically referred to as an RGBD (RGB+Depth) image, which includes an RGB image, in which each pixel has an RGB value and a depth (or distance) value. - In another embodiment, the
camera 130 is an infrared (IR)-based camera, with or without a depth sensor. The infrared camera captures images ofpassive markers 200, which are IRreflective markers 200 that have strong response on the infra-red light projected onto the markers. The IR-basedcamera 130 produces an IR digital image in which each pixel has an IR value. The digital image data obtained from such IR-basedcamera 130 is an IR image. With depth sensing, an IRD image is obtained. - Other cameras may be used. A camera or cameras that operate in two or more modes (e.g., visible spectrum and IR) may be used. A camera with two or more sensors in either visible spectrum or IR may be used. Two or
more cameras 130 may be provided. Thecamera 130 position and optics are calibrated relative to theX-ray source 122 and/or thebase 125 of theX-ray system 100. - The
camera 130 attaches to an appropriate location on theX-ray source 122 and/or therobot system 120. In some embodiments, thecamera 130 is attached to a rear side of theX-ray source 122. Thecamera 130 is fixedly connected to the movable arm, such as therobot system 120. The connection may be removable, such as by undoing a latch, removing a housing, and/or removing bolts. Thecamera 130 is connected so that moving the end or other part of therobot system 120 moves thecamera 130 with theX-ray source 122. Thecamera 130 is movable by the movable arm (e.g., robotic arm) in one or more degrees of translation freedom with or without one or more degrees of rotation freedom. - The
X-ray detector 110 is a flat-panel detector for recording high-quality X-ray images. In some embodiments, theX-ray detector 110 is wireless and is configured with a transceiver for wireless communication with thesystem controller 170, for example, for transmitting the acquired X-ray images. In some embodiments, theX-ray detector 110 has a wired connection to theX-ray system 100 and so quick and error-free communication between theX-ray detector 100 and a component of theX-ray system 100 differing therefrom is ensured. - The
detector 110 is movable independent of the moveable arm orrobot system 120. Thedetector 110 is movable independent of theX-ray source 122. TheX-ray detector 110 may be manually positioned anywhere about thepatient 500 by an X-ray technician in order to record X-ray images of a region of interest on thepatient 500. In the example illustration ofFIG. 1A , themobile X-ray detector 110 is positioned for an X-ray examination of a patient 500's femur region. Thedetector 110 is amovable detector 110, such as being free of connection or releasably connected to the rest of theX-ray system 100. Themobile detector 110 may be placed in a drawer in or below the patient bed. Themobile detector 110 may be placed in a bracket or connected to a shim or other positioning apparatus. Themobile detector 110 may be released and moved to other locations or moved to other locations while attached to a position apparatus (e.g., a wedge holder). Themobile detector 110 may have any number of degrees of freedom, such as a being a plate or flat panel that may be positioned on top of the patient bed in any arbitrary position and/or orientation. Alternatively, thedetector 110 mounts to or is connected fixedly with a manual armature or an automatic robotic arm, such as being part of a C-arm with theX-ray source 122 on the end of therobot system 120. The connection may allow for changeable translation and/or orientation of thedetector 110 relative to theX-ray source 122. - The
detector 110 has a plurality ofpassive markers 200 placed at designated points on one or more surfaces of theX-ray detector 110. In one embodiment, themarkers 200 are on a top surface or a surface that faces theX-ray source 122 in use. The marker placement allows the camera to see or image at least some of themarkers 200. - The plurality of
passive markers 200 are provided in shapes and patterns that may uniquely identify each of the corners of themobile X-ray detector 110. Any combination of different shapes, patterns, and/or sizes are used. In some embodiments, thepassive markers 200 include radial corner markers and smaller point markers as shown inFIG. 2 . The larger radial corner markers are shaped for marking each of the four corners of themobile X-ray detector 110. The smaller point markers are for the identification and validation of the larger radial corner markers such that the corners may be separately identified. In other embodiments, other shapes and configuration and placement of thepassive markers 200 may be used. In yet other embodiments, themarkers 200 have uniform shapes, such as being squares or rectangles of the same size with uniform or non-uniform placement on thedetector 110. - In both the IR sensing and visible spectrum sensing, the
passive markers 200 are used for detecting themobile X-ray detector 110. Themarkers 200 are reflective and/or colored to be visible to the sensing. For visible spectrum imaging, thepassive markers 200 have a color that is visually distinguishable from the color of the mobile X-ray detector's 110 surface, so that thecamera 130 may see thepassive markers 200 in the visible light spectrum. A marker with two or more colors may also be used. Contrasting colors between themarkers 200 and the rest of thedetector 110 are provided, such asblack markers 200 on a beige or white housing of thedetector 110. The markers 220 may be formed as an aesthetic element, such as an inset line or pattern. - For IR imaging, the IR
reflective markers 200 may be color markers made to blend in with the color of thedetector 110 such that themarkers 200 are not so noticeable in the visible spectrum but are noticeable in the IR spectrum.FIG. 2 shows anIR image 110A in gray scale of thedetector 110 where themarkers 200 are lighter than the rest of thedetector 110. Thereflective markers 200 appear white because thesereflective markers 200 reflect IR from the IR source back to theIR camera 130, and the background appears dark because the light background surface diffuses and/or absorbs IR. - The
system controller 170 controls the overall operation of theX-ray system 100. For example, thesystem controller 170 is configured with appropriate hardware, firmware, and/or software to control the operation of theX-ray source 122,camera 130,robot system 120, and reading of thedetector 110. For example, thecontroller 170 controls and interfaces with thecamera 130 to capture one or more images, thepositionable X-ray source 122 to generate X-rays, and therobot system 120 to move the X-ray source based on a determined position of thedetector 110. Thesystem controller 170 uses the acquired RGB and/or IR images to determine the pose in 3D of themobile X-ray detector 110 to then position theX-ray source 122 and drive the X-ray image acquisition function of theX-ray system 100. - Referring to
FIG. 1B , thesystem controller 170 may have anetwork interface 172 for communicating with other devices in thesystem 100 via a network and/or wired or wireless communications. For example, thenetwork interface 172 is configured to carry out data transfer with themobile X-ray detector 110 and thecamera 130. Additionally, theX-ray system 100 has a user interfacing unit 175 (e.g., a touch and/or display screen, a keyboard, a mouse, etc.) for the X-ray technician to interface with the X-ray system. Such input/output devices 175 can be touch-screen interface, display, keyboard, mouse, speakers, buttons, etc. Thesystem controller 170 may contain additional, different, or fewer components. - The
system controller 170 may communicate with the X-raytube robot system 120 to control the positioning and orientation of theX-ray source 122 and to control X-ray image acquisition by themobile X-ray detector 110. X-ray images acquired by theX-ray detector 110 may be input to thesystem controller 170. - The
system controller 170 includes theimage processor 171. Theimage processor 171 may be separate from or part of the controller, which controls the overall operation of thesystem controller 170 by executing computer program instructions which define such operation. Theimage processor 171 is a control processor, general processor, digital signal processor, three-dimensional data processor, graphics processing unit, application specific integrated circuit, field programmable gate array, artificial intelligence processor, digital circuit, analog circuit, combinations thereof, or other now known or later developed device for image processing and/or application of a machine-learned model. Theimage processor 171 is a single device, a plurality of devices, or a network of devices. For more than one device, parallel or sequential division of processing may be used. Different devices making up theimage processor 171 may perform different functions, such as one device applying the machine-learned network to locate markers and determine detector pose from the located markers and a separate device for controlling therobot system 120. In one embodiment, theimage processor 171 is a control processor or other processor of theX-ray system 100. Theimage processor 171 operates pursuant to stored instructions, hardware, and/or firmware to perform various acts described herein. - The instructions, camera image, network definition, machine-learned model, outputs, and/or other information are stored in a non-transitory computer readable memory, such as the
memory 177 orstorage 176. Thememory 177 and/orstorage 176 are external storage devices, RAM, ROM, database, and/or local memories (e.g., solid state drive or hard drive). The same or different non-transitory computer readable media may be used for the instructions and other data. Thememory 177 and/orstorage 176 may be implemented using a database management system (DBMS) and residing on a memory, such as a hard disk, RAM, or removable media. Alternatively, thememory 177 is internal to the processor 171 (e.g. cache). - The instructions for implementing the training or application processes, the methods, and/or the techniques discussed herein are provided on non-transitory computer-readable storage media or memories, such as a cache, buffer, RAM, removable media, hard drive or other computer readable storage media (e.g., the memory 177). Computer readable storage media include various types of volatile and nonvolatile storage media. The functions, acts or tasks illustrated in the figures or described herein are executed in response to one or more sets of instructions stored in or on computer readable storage media. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination.
- In one embodiment, the instructions are stored on a removable media device for reading by local or remote systems. In other embodiments, the instructions are stored in a remote location for transfer through a computer network. In yet other embodiments, the instructions are stored within a given computer, CPU, GPU or system. Because some of the constituent system components and method steps depicted in the accompanying figures may be implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the way the present embodiments are programmed.
- The
image processor 171 is configured to train a machine learning architecture. Based on a user provided or other source of the network architecture and training data, theimage processor 171 learns features for an encoder and a decoder or other network parts to train the network. The result of the training is a machine-learned model for identifying marker locations from an image of thecamera 130. Alternatively or additionally, theimage processor 171 is configured to apply a previously machine-learned model to identify marker locations from a camera image to align theX-ray source 122 for X-ray imaging of apatient 500 based on a pose of thedetector 110 positioned for such imaging. - The associated X-ray examination workflow is as follows: position the
patient 500 appropriately; position themobile X-ray detector 110 with respect to the region-of-interest of thepatient 500; acquire one or more images of the scene using thecamera 130; compute the 6 DOF pose of theX-ray detector 110 using the image(s); automatically position theX-ray source 122 to a position that is in alignment with themobile X-ray detector 110 based on the 6 DOF pose information on themobile X-ray detector 110; and then record one or more X-ray images of the region-of-interest of thepatient 500. A display device connected to the I/O 175 is configured to display an X-ray image of thepatient 500 based on positioning of theX-ray source 122 relative to the position, orientation, or position and orientation of thedetector 110 as determined by theimage processor 171. - The
system controller 170 is configured to perform the portions of the above-mentioned workflow after the mobile X-ray detector is positioned in place with respect to the region-of-interest. In preferred embodiments, thesystem controller 170 is configured to initiate the process when a single command is inputted via the system controller'soperator input device 175. For example, the single command can be a voice command, a click of a key, a click of a mouse, a touch on a touch screen, etc. The rest of the workflow does not require any manual intervention from the X-ray technician and the resulting alignment between the X-ray detector and the X-ray source is much more accurate than the conventionally achieved alignment. - By detecting the plurality of
passive markers 200 in the digital image provided by thecamera 130, thesystem controller 170 uses this information to calculate the accurate position and orientation of themobile X-ray detector 100 in theX-ray system 100 and then automatically aligns thepositionable X-ray source 122 to themobile X-ray detector 110. Thesystem controller 170 accomplishes this by performing transformations between the coordinate system of thecamera 130, the coordinate system of theX-ray base 125, and the coordinate system of thepositionable X-ray source 122. The result is more accurate and repeatable alignment of theX-ray source 122 and themobile X-ray detector 110 compared to the conventional manual alignment. - The position of the
mobile X-ray detector 110 is calculated from the distance and orientation information gathered from thepassive markers 200 using the camera and represented as a 6 DOF (degrees of freedom) pose information on themobile X-ray detector 110. Depth measures may also be used. - The
image processor 171 is configured to determine the pose or part of the pose of thedetector 110. For example, theimage processor 171 is configured to determine a position, orientation, or position and orientation of thedetector 110 in 3D space relative to theX-ray source 122 based onmarkers 200 shown by thecamera 130 andmarkers 200 not shown by thecamera 130. Thepatient 500 and/or other object may occlude or block somemarkers 200 from being seen by thecamera 130. In the example show inFIG. 1A , the legs of thepatient 500 obstruct one or more (e.g., over half) of themarkers 200 as viewed by thecamera 130. - The markers not shown by the
camera 130 are located by a machine-learned network applied by theimage processor 171. Theimage processor 171 is configured to apply a machine-learned network or model to an image from thecamera 130. In response, the machine-learned network or model outputs the locations ofmarkers 200 not shown by thecamera 130. Themarkers 200 visible in the image may be located using a different process or are also located by the application of the machine-learned network or model. - The application of the machine-learned network or model is described below as part of the method of
FIG. 3 . In other embodiments, theimage processor 171 performs other acts using a machine-learned network or model to locate themarkers 200. -
FIG. 3 shows one embodiment of estimating X-ray detector pose. The method includes acts for locating markers on an X-ray detector using a machine-learned model. The locations of both visible and occluded or other non-visible markers are determined by application of a captured image to the machine-learned model. - The acts are provided in the orders shown (e.g., top to bottom or numerical), but other orders may be provided. Additional, different or fewer acts may be provided. For example, acts 34, 36, and 38 are not provided in a method to locate the markers. As another example, acts 36 and 38 are not provided in a method to determine pose. In yet other examples, acts for positioning the detector, positioning a patient, configuring the X-ray system, and/or acts using output X-ray images are provided.
- In
act 30, the camera images a region including the detector with markers and the patient. The region may be imaged with the detector and without the patient. The patient and/or another object may intervene between the camera and one or more (e.g., at least some) of the markers on the detector. One or more parts of the detector, including one or more markers, are visible from the camera. - The imaging captures one or more images of the X-ray detector. An IR or visual spectrum image of the detector, as occluded, is captured. A sequence of such images or just one image once the detector and patient are positioned is captured.
- The camera is fixedly connected with the X-ray tube. The captured image is from a perspective similar to the X-ray tube, such as viewing in a same direction with no or little (e.g., 10 cm or less) offset from the X-ray tube. In other embodiments, the camera is not connected with the X-ray tube and/or is spaced from the X-ray tube. Calibration, measurement, or input are used to provide the spatial relationship of the camera to the X-ray tube at the time of image capture.
- The field of view for the camera is at least the same size as the area of the X-ray detector, but may be larger or smaller so that at least part of the X-ray detector is captured by the camera. The camera may have a minimum distance to the detector and/or may be positioned to more likely capture the detector. Once or as currently positioned, the field of view of the camera and the resulting area represented by the captured image includes part or all of the detector. The captured image may be padded, such as adding zero, gray, or other pixel values to increase the area represented. The padding may be by extrapolation or adding a pattern, copies of the image, or other data. In one approach, zero padding is applied to make the captured
image 9 times larger, such as adding zero padding in areas to each side and diagonal equal to the area of the captured image (left, right, top, bottom, and at the corners or 3×3 grid with the center area being the captured image and the other areas being processor-added padding). The padding makes it more likely that the entire X-ray detector would be within the area represented in the padded image. The entire detector is more likely in the virtual field of view even if the X-ray detector is partially outside the field of view from the captured image (i.e., of the camera). - In one embodiment, the camera field of view extends over an entirety of the patient bed of the X-ray system. A wide-angle lens and/or the camera being positioned further away from the bed provides this field of view. Alternatively, the camera is moved to different locations to stitch together an image from multiple captures to form the image with the larger field of view. In other embodiments, a smaller field of view is provided.
- The captured image does not show one or more markers due to an intervening object or being out of the field of view.
FIG. 4 shows ten example optical images with different sources of occlusion. The occlusion may be due to the patient, clothing, and/or another object. The occlusion may be due to positioning where part of the detector extends beyond the field of view. The pose of the camera with respect to the detector may result in more or less occlusion of markers in the image. - In
act 32 ofFIG. 3 , the image processor identifies locations of visible and occluded markers on the X-ray detector from the captured image. The marker locations are identified by a machine-learned detector. In response to input of the captured image, the machine-learned detector outputs locations of markers including both visible markers and markers not visible (i.e., not shown) in the captured image. For example, the machine-learned model identifies locations of the markers including the markers for which the patient is intervening from the imaging by the camera. In other embodiments, the machine-learned detector outputs the pose of the detector instead of or in addition to the locations of markers. - The machine-learned detector is trained to output the locations given an input image with or without depth information. The training data includes input sample images and corresponding ground truth outputs. Images, including with non-visible markers, and the locations of the markers, including the non-visible markers, are provided as training data.
- In one embodiment, a training dataset has 1117 images with ground truth marker locations and/or camera pose and detector pose. 695 images are used for training and 422 for evaluation. Other numbers of images and/or separations for training and evaluation may be used.
FIG. 4 shows ten images of the training data where the images corresponding to different detector poses, tube poses, and/or obstructions. To create more samples for training, copies of the 695 images may be altered to form any number of additional images.FIG. 5 shows two images with different scale, rotation, and/or cropping as compared to the source images fromFIG. 4 . Each or some training images are randomly rotated for 0 to 359 degree followed by a random scaling up (e.g., 1 to 1.15 ratio) and/or random cropping. Other augmentation for training data may be used. - In one embodiment, the ground truth is marker location by pixel to learnt to output a heatmap. Each individual marker is represented as a heatmap in an individual output channel of the machine-learned architecture so that the classification of each marker is implicitly done during marker detection. With 28 markers, the output from the machine-learned detector and the ground truth provided for training is a H×W×28 heatmap where H is height and W is width for the image. In alternative embodiments, a point, line, area, or other designation is used instead of a heatmap for the output and/or the ground truth are heatmaps.
- The architecture to be machine trained is defined. The definition is by configuration or programming of the learning. The number of layers or units, type of learning, and other characteristics of the network are controlled by the programmer or user. In other embodiments, one or more aspects (e.g., number of nodes, number of layers or units, or type of learning) are defined and selected by the machine during the learning.
- The network is defined as a plurality of sequential feature units or layers. Sequential is used to indicate the general flow of output feature values from one layer to input to a next layer. The information from the next layer is fed to a next layer, and so on until the final output. The layers may only feed forward or may be bi-directional, including some feedback to a previous layer. The nodes of each layer or unit may connect with all or only a sub-set of nodes of a previous or subsequent layer or unit.
- Deep learning may be used. Rather than pre-programming the features and trying to relate the features to attributes, the deep architecture is defined to learn the features at different levels of abstraction based on an input image or scan data with or without pre-processing. The features are learned to reconstruct lower level features (i.e., features at a more abstract or compressed level). For example, features for reconstructing an image are learned. For a next unit, features for reconstructing the features of the previous unit are learned, providing more abstraction. Different units are provided for learning different features.
- In one embodiment, the architecture is a neural network. Deep learning is applied to train features to be extracted from the images and the detection to be performed from the features. In one embodiment, the machine-learned network is a fully convolutional network, such as a convolutional-to-transposed-convolutional network. The machine-learned network may be a U-net encoder-decoder. Multiple levels of feature compression or abstraction are provided, such as four. The encoder segment has a plurality of convolutional layers with increasing feature compression or abstraction, and the decoder segment has a plurality of transposed-convolutional layers with decreasing feature compression or abstraction. Skip connections may or may not be provided. Other image-to-image architectures may be used, such as a generative adversarial network where the discriminator is used for training and not for application.
- The architecture may include a hierarchy or sequence of two or more networks or models.
FIG. 6 shows an example with three networks—attention network 61,coarse prediction network 64 andrefining prediction network 67. The markers on the detector are not always visible due to occlusion and being out of the field of view. The architecture implements a padding-and-attention marker detection and prediction pipeline to robustly detect all markers. In other embodiments, thecourse network 64 andrefining prediction network 67 are combined to be one network. In other embodiments, thecoarse network 61,coarse prediction network 64, andrefining prediction network 67 are combined to be one network directly working on the input with original resolution without any resizing. - In the embodiment of
FIG. 6 , theattention network 61 is trained to localize the X-ray detector in theimage 60. Theattention network 61 is an image-to-image network, such as a U-net. In one embodiment, theattention network 61 is a U-Net with 4 stride-2 convolutions for encoder and decoder. - The original IR or
other image 60 is padded to ensure the whole detector is likely to be represented in a single image plane. The padding may be by extrapolation from thecapture image 60 and/or by zero padding. The field of view may not include the entire detector. In one embodiment, the padding fills in uniform values of zero. Since the detector is assumed to be partially inside field of view of the camera and the detector is not larger than the original image size, then by adding the zeros with the same size of image on both horizontally and vertically (640×480->1920×1440), the field of view is virtually enlarged and the whole detector would be inside the padded image even if not captured in the padded areas. - The padded
image 60 is resized (e.g., sampled or down-sampled) from 1920×1440 to 640×480, but no resizing and/or other sizes or resolutions may be used. Theattention network 61 receives theimage 60 and estimates the center of the detector or a region including the detector. Aheatmap 62 of the detector, region designator (e.g., rectangle corresponding to the detector or a part of theimage 60 surrounding the detector), or center of the detector (e.g., as a heatmap) is output. Theattention network 61 is trained to locate the detector in the field of view of the camera. - A 640×480 or another sized ROI is cropped around the detected detector center or for the detected region including the detector from the original (e.g., 1920×1440) padded
image 60. The 640×480ROI 63 is then resized to 320×240 or other scale to obtain a marker prediction at coarse level. This cropped and resizedimage 63 is to be input to thecoarse prediction network 64. Other inputs may be used, such as an image cropped without resizing. - The
coarse prediction network 64 is trained to identify the locations of the markers, including markers not visible in theimage 60. The visible and not visible markers are to be located or output. Thecoarse prediction network 64 is an image-to-image network, such as a U-net. In one embodiment, theattention network 61 is a U-Net with 7 stride-2 convolutions for encoder and decoder. - The
coarse prediction network 64 is trained to output marker positions at the coarse resolution. The output is a location for each marker. An orientation of each marker may be output. In another embodiment, the output is a heatmap indicating locations for each marker as probabilities. - The output is up-sampled to a resolution of the
original image 60 or other resolution. This up-sampled heat map orcoarse marker locations 66 are input to the refineprediction network 67. The cropped region of theoriginal image 60 may be input with the up-sampled output. Alternatively, theoriginal image 60 is input with an overlay based on the coarse marker locations from thecoarse prediction network 64. - The refine
prediction network 67 is trained to identify the locations of the markers, including markers not visible in theimage 60. The visible and not visible markers are located or output at a resolution of theoriginal image 60 or other resolution. The refineprediction network 67 is an image-to-image network, such as a U-net. In one embodiment, theattention network 61 is a U-Net with 5 stride-2 convolutions for encoder and decoder. - The refine
prediction network 67 is trained to output marker positions at the desired resolution. The coarse predictions of location are refined. The output is a location or position for each marker. An orientation of each marker may be output. In another embodiment, the output is aheatmap 68 indicating locations for each marker as probabilities. The output is an identification of the locations (e.g., position and/or orientation) of the markers, including both visible and non-visible markers. - In the embodiment of
FIG. 6 , the convolution layers use a same 3×3 kernel size for all layers of allnetworks image 60 and output the locations of the markers. In other embodiments, another network is provided to use the marker locations to output pose of the detector. - Once the architecture is defined, a machine (e.g., image processor, workstation, computer, or server) trains the network arrangement with the training data having ground truth. The neural network or networks are trained using the sample images and the ground truth. Machine learning is performed to train the various units using of the defined architecture. The features (e.g., convolution kernels, transposed-convolution kernels, weights, and max pooling connections) that are determinative or map to the ground truth are learned. The features providing the desired result or detection of the object are learned. Alternatively or additionally, the relationship of input features (e.g., Haar wavelets, gradients, centerlines, segmentation, or other derivations from the anatomy or tissue images) to the ground truth are learned. Rather than or in addition to learning filter kernels for convolution, the relationship of input feature vectors from the image to output marker locations or detector pose is learned.
- The network is trained end-to-end. The results relative to the ground truth and the error are back-projected to learn the features that work best. In one embodiment, a L2-norm loss is used to optimize the network. Other error functions may be used. In one embodiment, the values of variables of the network are randomly initialized, but another initialization may be used. End-to-end training is performed, but one or more features may be set. Batch normalization, dropout, and data augmentation are not used, but may be. The optimization is with the RMSprop optimizer, but other optimization functions (e.g., Adam, SGD, etc.) may be used. During the optimization, the different distinguishing features are learned. The features providing an indication of location input image are learned.
- In other embodiments, one or more of the
networks attention network 61 is pretrained using image samples with ground truths for detector region or center. Thepre-trained attention network 61 is then trained end-to-end with theother networks - The model or detector learns to output the locations of visible and/or non-visible markers given an input image. Once trained, the model may be applied to locate markers in the image and/or determine the pose of the detector from the image. The trained network is configured by the machine training to generate locations or a heatmap of locations given a previously unseen input image.
- In
act 34 ofFIG. 3 , the image processor determines the X-ray detector pose of the X-ray detector. The pose is determined as an orientation, position or orientation and position. The pose is in two or three dimensions. For example, the pose is a 3D pose relative to the camera. - Part of the pose may be based on depths measured by the camera. The depths of the visible markers may be used to determine the pose, as an initial or starting pose, pose in a sub-set of the degrees of freedom of the detector relative to the camera. Alternatively, the depths are used as input to the machine-learned model for locating markers or are not used.
- The pose is determined using the locations of the visible and non-visible (e.g., occluded or other out-of-view) markers. Where a heatmap is output, the location or locations corresponding to a greatest probability for each marker are selected as the location of the marker.
- The distribution of markers indicates the pose. The orientation of one or more markers, orientation of markers relative to each other, the distance between markers, the alignment of markers relative to the camera field of view, and/or other geometrical relationship of or between markers indicates the pose.
- In one embodiment, the image processor determines the X-ray detector pose with homography. The detected locations of the markers are used to compute a homography to warp a pre-defined template of the markers to a source image (e.g., output heatmap). The warping error or matching is used to indicate the pose of the detector. Alternatively, the pose is output by the machine-learned model based on estimation from the locations provided within the machine-learned model.
- In
act 36, the controller controls the robot system to position the X-ray tube based on the X-ray detector pose. A robotic arm positions the X-ray source relative to the detector pose (position and/or orientation). Alternatively, the X-ray tube is manually positioned but with guidance based on the detected pose of the detector (e.g., displaying graphics on the direction and magnitude of movement to align the X-ray tube with the detector). - Once the pose of the mobile X-ray detector is estimated in the form of the 6 degrees of freedom in the camera coordinate system, that location information is transformed into a location in the X-ray base coordinate system. Then, inverse kinematics may be applied to derive the optimal X-ray control parameters (by converting the location in the X-ray base coordinate system to a position in the X-ray source coordinate system) so that the positionable X-ray source is moved into a position that is in alignment with the mobile X-ray detector (e.g., orthogonal to the detector at a desired distance from the detector). The robot system moves the X-ray tube into the aligned position.
- In
act 38, the X-ray system images the patient. The patient is imaged with the X-ray tube positioned relative to the X-ray detector. Once positioned, the X-ray tube generates X-rays. The X-rays passing through the patient are detected by the detector. -
FIG. 7 compares using the machine-learned model ofFIG. 6 to regress both visible and not visible markers and using a baseline machine-learned model (e.g., single U-net without attention and coarse-to-fine) to regress just the visible markers in detector pose determination. The machine-learned model ofFIG. 6 is trained for detecting visible and not visible markers. Homography is applied to the regressed marker locations in both the baseline and non-visible model to determine detector pose.FIG. 7 shows the sorted root mean square error (y-axis) in the detector position with the baseline regression of only visible markers and with regression of visible and non-visible markers. The x-axis is the error at different percentiles of number of cases. At 90-99% of cases, regression including the non-visible markers provides less error, such as half the error. - Various improvements described herein may be used together or separately. Although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications may be affected therein by one skilled in the art without departing from the scope or spirit of the invention.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/292,716 US10779793B1 (en) | 2019-03-05 | 2019-03-05 | X-ray detector pose estimation in medical imaging |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/292,716 US10779793B1 (en) | 2019-03-05 | 2019-03-05 | X-ray detector pose estimation in medical imaging |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200281556A1 true US20200281556A1 (en) | 2020-09-10 |
US10779793B1 US10779793B1 (en) | 2020-09-22 |
Family
ID=72335042
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/292,716 Active 2039-04-04 US10779793B1 (en) | 2019-03-05 | 2019-03-05 | X-ray detector pose estimation in medical imaging |
Country Status (1)
Country | Link |
---|---|
US (1) | US10779793B1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298075A (en) * | 2021-06-04 | 2021-08-24 | 浙江大学 | Heuristic image scale normalization method based on attention mechanism |
CN113892959A (en) * | 2021-09-26 | 2022-01-07 | 有方(合肥)医疗科技有限公司 | X-ray imaging system |
US20220130059A1 (en) * | 2020-10-23 | 2022-04-28 | Surgical Theater, Inc. | System and method for medical image alignment |
WO2022258502A1 (en) * | 2021-06-10 | 2022-12-15 | Brainlab Ag | Orienting an x-ray device based on an ultrasound image |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7350519B2 (en) * | 2019-05-29 | 2023-09-26 | キヤノン株式会社 | Radiography system, radiography control device, control method thereof, and program |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6405072B1 (en) * | 1991-01-28 | 2002-06-11 | Sherwood Services Ag | Apparatus and method for determining a location of an anatomical target with reference to a medical apparatus |
DE10216857A1 (en) * | 2002-04-16 | 2003-11-13 | Siemens Ag | Method for controlling an x-ray device |
DE102008022921A1 (en) * | 2008-05-09 | 2009-11-12 | Siemens Aktiengesellschaft | Arrangement and method for positioning devices |
DE102009013572B4 (en) | 2009-03-17 | 2013-02-21 | Siemens Aktiengesellschaft | Method for automatically positioning an X-ray source of an X-ray system and X-ray system |
KR20170006790A (en) | 2015-07-09 | 2017-01-18 | 현대자동차주식회사 | Device for mesuring modify for Gripper of robot and method thereof |
CN107874768B (en) * | 2016-09-30 | 2021-02-05 | 通用电气公司 | Mobile radiation imaging system and alignment method thereof |
US10478149B2 (en) | 2017-02-21 | 2019-11-19 | Siemens Healthcare Gmbh | Method of automatically positioning an X-ray source of an X-ray system and an X-ray system |
-
2019
- 2019-03-05 US US16/292,716 patent/US10779793B1/en active Active
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220130059A1 (en) * | 2020-10-23 | 2022-04-28 | Surgical Theater, Inc. | System and method for medical image alignment |
CN113298075A (en) * | 2021-06-04 | 2021-08-24 | 浙江大学 | Heuristic image scale normalization method based on attention mechanism |
WO2022258502A1 (en) * | 2021-06-10 | 2022-12-15 | Brainlab Ag | Orienting an x-ray device based on an ultrasound image |
WO2022258183A1 (en) * | 2021-06-10 | 2022-12-15 | Brainlab Ag | Orienting an x-ray device based on an ultrasound image |
CN113892959A (en) * | 2021-09-26 | 2022-01-07 | 有方(合肥)医疗科技有限公司 | X-ray imaging system |
Also Published As
Publication number | Publication date |
---|---|
US10779793B1 (en) | 2020-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10779793B1 (en) | X-ray detector pose estimation in medical imaging | |
CN107789001B (en) | Positioning method and system for imaging scanning | |
EP3364214B1 (en) | Method of automatically positioning an x-ray source of an x-ray system and an x-ray system | |
CN110136208B (en) | Joint automatic calibration method and device for robot vision servo system | |
US10507002B2 (en) | X-ray system and method for standing subject | |
CN111557665B (en) | Dense body marker estimation from camera data for patient positioning in medical imaging | |
JP6573354B2 (en) | Image processing apparatus, image processing method, and program | |
US10909755B2 (en) | 3D object scanning method using structured light | |
US7336814B2 (en) | Method and apparatus for machine-vision | |
US11290704B2 (en) | Three dimensional scanning system and framework | |
US11741619B2 (en) | Methods and systems for registering preoperative image data to intraoperative image data of a scene, such as a surgical scene | |
CN109085113A (en) | A kind of Atomatic focusing method and device for cervical exfoliated cell detection device | |
CN109171789B (en) | Calibration method and calibration system for image diagnosis equipment | |
JP2002236909A (en) | Image data processing method and modeling device | |
CN112307912A (en) | Method and system for determining personnel track based on camera | |
US20220392084A1 (en) | Scene perception systems and methods | |
KR20040100963A (en) | Image processing apparatus | |
JP6288770B2 (en) | Face detection method, face detection system, and face detection program | |
KR102322995B1 (en) | Method for artificial intelligence nodule segmentation based on dynamic window and apparatus thereof | |
JP2022094744A (en) | Subject motion measuring device, subject motion measuring method, program, and imaging system | |
CN113269885A (en) | Patient model estimation from camera flow in medicine | |
JP7376201B1 (en) | Information processing system, information processing method and program | |
US20240175677A1 (en) | Measuring system providing shape from shading | |
Kolár et al. | 3d surface reconstruction of optical disc | |
Karjalainen | IMAGE REGISTRATION USING INFRARED AND VIS CAMERAS |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: SIEMENS MEDICAL SOLUTIONS USA, INC., PENNSYLVANIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, RUI;CHANG, YAO-JEN;SINGH, VIVEK KUMAR;SIGNING DATES FROM 20190311 TO 20190326;REEL/FRAME:048802/0576 |
|
AS | Assignment |
Owner name: SIEMENS HEALTHCARE GMBH, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TAMERSOY, BIRGI;REEL/FRAME:048879/0668 Effective date: 20190313 |
|
AS | Assignment |
Owner name: SIEMENS HEALTHCARE GMBH, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SIEMENS MEDICAL SOLUTIONS USA, INC.;REEL/FRAME:049115/0099 Effective date: 20190501 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: SIEMENS HEALTHINEERS AG, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SIEMENS HEALTHCARE GMBH;REEL/FRAME:066267/0346 Effective date: 20231219 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |