WO2020201183A1 - Segmentation and view guidance in ultrasound imaging and associated devices, systems, and methods - Google Patents
Segmentation and view guidance in ultrasound imaging and associated devices, systems, and methods Download PDFInfo
- Publication number
- WO2020201183A1 WO2020201183A1 PCT/EP2020/058898 EP2020058898W WO2020201183A1 WO 2020201183 A1 WO2020201183 A1 WO 2020201183A1 EP 2020058898 W EP2020058898 W EP 2020058898W WO 2020201183 A1 WO2020201183 A1 WO 2020201183A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- anatomy
- image frames
- image frame
- patient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
- G06T7/0014—Biomedical image inspection using an image reference approach
- G06T7/0016—Biomedical image inspection using an image reference approach involving temporal comparison
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B34/00—Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
- A61B34/20—Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B8/00—Diagnosis using ultrasonic, sonic or infrasonic waves
- A61B8/08—Clinical applications
- A61B8/0833—Clinical applications involving detecting or locating foreign bodies or organic structures
- A61B8/0841—Clinical applications involving detecting or locating foreign bodies or organic structures for locating instruments
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B8/00—Diagnosis using ultrasonic, sonic or infrasonic waves
- A61B8/46—Ultrasonic, sonic or infrasonic diagnostic devices with special arrangements for interfacing with the operator or the patient
- A61B8/461—Displaying means of special interest
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B8/00—Diagnosis using ultrasonic, sonic or infrasonic waves
- A61B8/48—Diagnostic techniques
- A61B8/483—Diagnostic techniques involving the acquisition of a 3D volume of data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B34/00—Computer-aided surgery; Manipulators or robots specially adapted for use in surgery
- A61B34/20—Surgical navigation systems; Devices for tracking or guiding surgical instruments, e.g. for frameless stereotaxis
- A61B2034/2046—Tracking techniques
- A61B2034/2063—Acoustic tracking systems, e.g. using ultrasound
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10132—Ultrasound image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30048—Heart; Cardiac
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30101—Blood vessel; Artery; Vein; Vascular
Definitions
- the present disclosure relates generally to ultrasound imaging and, in particular, to providing segmentation of moving objects and guidance for locating an optimal imaging view.
- Ultrasound can provide non-radiated, safe, and real-time, dynamic imaging of anatomy and/or medical devices during medical procedures (e.g., diagnostics, interventions, and/or treatments).
- medical procedures e.g., diagnostics, interventions, and/or treatments.
- clinicians have relied on two-dimensional (2D) ultrasound imaging to provide guidance in diagnostic and/or navigations of medical devices through a patient’s body during medical procedures.
- medical devices and/or anatomical structures can be thin, non-rigid, and/or moving, making them difficult to identify in 2D ultrasound images.
- anatomical structures may be thin, tortuous, and in some cases, may be in constant motion (e.g. due to breathing, cardiac, and/or arterial pulses).
- 3D ultrasound enable viewing of 3D volumes instead of 2D image slices.
- the ability to visualize 3D volumes can be valuable in medical procedures.
- the tip of a medical device may be uncertain in a 2D image slice due to foreshortening, but may be clear when viewing in a 3D volume.
- 3D and/or 4D imaging can provide valuable visualization and/or guidance to medical procedures
- the interpretation of 3D and/or 4D imaging data can be complex and challenging due to the high volume, the high dimensionality, the low resolution, and/or the low framerate of the data.
- accurate interpretations of 3D and/or 4D imaging data may require a user or a clinician with extensive training and great expertise. Additionally, the interpretations of the data can be user dependent.
- a clinician may spend a large portion of the time in finding an ideal imaging view of the patient’s anatomy and/or the medical device.
- Computers are generally more proficient in interpreting high-volume, high- dimensionality data.
- algorithmic models can be applied to assist interpretations of 3D and/or 4D imaging data and/or locating an optimal imaging view.
- traditional algorithms may not perform well in identifying and/or segmenting thin objects and/or moving objects in ultrasound images, for example, due to low signal-to-noise ratio (SNR), ultrasound artefacts, occlusion of devices lying in confusing poses such as along vessel walls, and/or high- intensity artefacts which may resemble the moving object.
- SNR signal-to-noise ratio
- Embodiments of the present disclosure provide a deep learning network that utilizes temporal continuity information in three-dimensional (3D) ultrasound data and/or four-dimensional (4D) ultrasound data to segment a moving object and/or provide imaging guidance.
- 3D ultrasound data may refer to a time series of 2D images obtained from 2D ultrasound imaging across time.
- 4D ultrasound data may refer to a time series of 3D volumes obtained from 3D ultrasound imaging across time.
- the temporally- aw are deep learning network includes a recurrent component (e.g., a recurrent neural network (RNN)) coupled to a plurality of convolutional encoding-decoding layers operating at multiple different spatial resolutions.
- RNN recurrent neural network
- the deep learning network is applied to a time series of 2D or 3D ultrasound imaging frames including a moving object and/or a medical device.
- the recurrent component passes the deep learning network’s prediction for a current image frame as a secondary input to a prediction of a next image frame.
- the deep learning network is trained to differentiate a flexible, elongate, thinly-shaped medical device (e.g., a catheter, a guide wire, a needle, a therapy device, and/or a treatment device) passing through an anatomical structure (e.g., heart, lungs, and/or vessels) from the anatomical structure and predict a position and/or motion of the medical device based on time-continuity information in the ultrasound image frames.
- a flexible, elongate, thinly-shaped medical device e.g., a catheter, a guide wire, a needle, a therapy device, and/or a treatment device
- an anatomical structure e.g., heart, lungs, and/or vessels
- the deep learning network is trained to identify a moving portion of an anatomical structure caused by cardiac motion, breathing motion, and/or arterial pulses from a static portion of the anatomical structure and predict motion of the moving portion based on time-continuity information in the ultrasound image frames.
- the deep learning network is trained to predict a target imaging plane of an anatomical structure.
- the deep learning network’s prediction can be used to generate a control signal and/or an instruction (e.g., rotation and/or translation) to automatically steer ultrasound beams for imaging the target imaging plane.
- the deep learning network’s prediction can be used to provide a user with instructions for navigating an ultrasound imaging device towards the target imaging plane.
- the deep learning network can be applied in real-time during 3D and/or 4D imaging to provide dynamic segmentations and imaging guidance.
- an ultrasound imaging system comprising a processor circuit in communication with an ultrasound imaging device, the processor circuit configured to receive, from the ultrasound imaging device, a sequence of input image frames of a moving object over a time period, wherein the moving object comprises at least one of an anatomy of a patient or a medical device traversing through the patient’s anatomy, and wherein a portion of the moving object is at least partially invisible in a first input image frame of the sequence of input image frames; apply a recurrent predictive network associated with image segmentation to the sequence of input image frames to generate segmentation data; and output, to a display in communication with the processor circuit, a sequence of output image frames based on the segmentation data, wherein the portion of the moving object is fully visible in a first output image frame of the sequence of output image frames, the first output image frame and the first input image frame associated with a same time instant within the time period.
- the processor circuit configured to apply the recurrent predictive network is further configured to generate previous segmentation data based on a previous input image frame of the sequence of input image frames, the previous input image frame being received before the first input image frame; and generate first segmentation data based on the first input image frame and the previous segmentation data.
- the processor circuit configured to generate the previous segmentation data is configured to apply a convolutional encoder and a recurrent neural network to the previous input image frame;
- the processor circuit configured to generate the first segmentation data is configured to apply the convolutional encoder to the first input image frame to generate encoded data; and apply the recurrent neural network to the encoded data and the previous segmentation data;
- the processor circuit configured to apply the recurrent predictive network is further configured to apply a convolutional decoder to the first segmentation data and the previous segmentation data.
- the convolutional encoder, the recurrent neural network, and the convolutional decoder operate at multiple spatial resolutions.
- the moving object includes the medical device traversing through the patient’s anatomy
- the convolutional encoder, the recurrent neural network, and the convolutional decoder are trained to identify the medical device from the patient’s anatomy and predict a motion associated with the medical device traversing through the patient’s anatomy.
- the moving object includes the patient’s anatomy with at least one of a cardiac motion, a breathing motion, or an arterial pulse
- the convolutional encoder, the recurrent neural network, and the convolutional decoder are trained to identify a moving portion of the patient’s anatomy from a static portion of the patient’s anatomy and predict a motion associated with the moving portion.
- the moving object includes the medical device traversing through the patient’s anatomy
- the system comprises the medical device.
- the medical device comprises at least one of a needle, a guidewire, a catheter, a guided catheter, a therapy device, or an interventional device.
- the input image frames include at least one of two-dimensional image frames or three-dimensional image frames.
- the processor circuit is further configured to apply spline fitting to the sequence of input image frames based on the segmentation data.
- the system further comprises the ultrasound imaging device, and wherein the ultrasound imaging device comprises an ultrasound transducer array configured to obtain the sequence of input image frames.
- an ultrasound imaging system comprising a processor circuit in communication with an ultrasound imaging device, the processor circuit configured to receive, from the ultrasound imaging device, a sequence of image frames representative of an anatomy of a patient over a time period; apply a recurrent predictive network associated with image acquisition to the sequence of image frames to generate imaging plane data associated with a clinical property of the patient’s anatomy; and output, to a display in communication with the processor circuit based on the imaging plane data, at least one of a target imaging plane of the patient’s anatomy or an instruction for repositioning the ultrasound imaging device towards the target imaging plane.
- the processor circuit configured to apply the recurrent predictive network is further configured to generate first imaging plane data based on a first image frame of the sequence of image frames; and generate second imaging plane data based on a second image frame of the sequence of image frames and the first imaging plane data, the second image frame being received after the first image frame.
- the processor circuit configured to generate the first imaging plane data is configured to apply a convolutional encoder and a recurrent neural network to the first image frame; the processor circuit configured to generate the second imaging plane data is configured to apply the convolutional encoder to the first image frame to generate encoded data; and apply the recurrent neural network to the encoded data and the first imaging plane data; and the processor circuit configured to apply the recurrent predictive network is further configured to apply a
- the convolutional encoder, the recurrent neural network, and the convolutional decoder operate at multiple spatial resolutions, and wherein the convolutional encoder, the recurrent neural network, and the convolutional decoder are trained to predict the target imaging plane for imaging the clinical property of the patient’s anatomy.
- the image frames include at least one of two-dimensional image frames or three-dimensional image frames of the patient’s anatomy.
- the processor circuit is configured to output the target imaging plane including at least one of a cross-sectional image slice, an orthogonal image slice, or a multiplanar reconstruction (MPR) image slice of the patient’s anatomy including the clinical property.
- the system further comprises the ultrasound imaging device, and wherein the ultrasound imaging device comprises an ultrasound transducer array configured to obtain the sequence of image frames.
- the processor circuit is further configured to generate an ultrasound beam steering control signal based on the imaging plane data; and output, to the ultrasound imaging device, the ultrasound beam steering control signal.
- the processor circuit is configured to output the instruction including at least one of a rotation or a translation of the ultrasound imaging device.
- FIG. 1 is a schematic diagram of an ultrasound imaging system, according to aspects of the present disclosure.
- FIG. 2 is a schematic diagram of a deep learning-based image segmentation scheme, according to aspects of the present disclosure.
- FIG. 3 is a schematic diagram illustrating a configuration for a temporally-aware deep learning network, according to aspects of the present disclosure.
- FIG. 4 is a schematic diagram illustrating a configuration for a temporally-aware deep learning network, according to aspects of the present disclosure.
- FIG. 5 illustrates a scenario of an ultrasound-guided procedure, according to aspects of the present disclosure.
- FIG. 6 illustrates a scenario of an ultrasound-guided procedure, according to aspects of the present disclosure.
- FIG. 7 illustrates a scenario of an ultrasound-guided procedure, according to aspects of the present disclosure.
- FIG. 8 illustrates a scenario of an ultrasound-guided procedure, according to aspects of the present disclosure.
- FIG. 9 is a schematic diagram of a deep learning-based image segmentation scheme with spline fitting, according to aspects of the present disclosure.
- FIG. 10 is a schematic diagram of a deep learning-based imaging guidance scheme, according to aspects of the present disclosure.
- FIG. 11 illustrates ultrasound images obtained from an ultrasound-guided procedure, according to aspects of the present disclosure.
- FIG. 12 is a schematic diagram of a processor circuit, according to embodiments of the present disclosure.
- FIG. 13 is a flow diagram of a deep learning-based ultrasound imaging method, according to aspects of the present disclosure.
- FIG. 14 is a flow diagram of a deep learning-based ultrasound imaging method, according to aspects of the present disclosure.
- FIG. 1 is a schematic diagram of an ultrasound imaging system 100, according to aspects of the present disclosure.
- the system 100 is used for scanning an area or volume of a patient’s body.
- the system 100 includes an ultrasound imaging probe 110 in communication with a host 130 over a communication interface or link 120.
- the probe 110 includes a transducer array 112, a beamformer 114, a processing component 116, and a communication interface 118.
- the host 130 includes a display 132, a processing component 134, and a communication interface 136.
- the probe 110 is an external ultrasound imaging device including a housing configured for handheld operation by a user.
- the transducer array 112 can be configured to obtain ultrasound data while the user grasps the housing of the probe 110 such that the transducer array 112 is positioned adjacent to and/or in contact with a patient’s skin.
- the probe 110 is configured to obtain ultrasound data of anatomy within the patient’s body while the probe 110 is positioned outside of the patient’s body.
- the probe 110 is a transthoracic (TTE) probe.
- the probe 110 can be a trans-esophageal (TEE) ultrasound probe.
- the transducer array 112 emits ultrasound signals towards an anatomical object 105 of a patient and receives echo signals reflected from the object 105 back to the transducer array 112.
- the ultrasound transducer array 112 can include any suitable number of acoustic elements, including one or more acoustic elements and/or plurality of acoustic elements.
- the transducer array 112 includes a single acoustic element.
- the transducer array 112 may include an array of acoustic elements with any number of acoustic elements in any suitable configuration.
- the transducer array 112 can include between 1 acoustic element and 10000 acoustic elements, including values such as 2 acoustic elements, 4 acoustic elements, 36 acoustic elements, 64 acoustic elements, 128 acoustic elements, 500 acoustic elements, 812 acoustic elements, 1000 acoustic elements, 3000 acoustic elements, 8000 acoustic elements, and/or other values both larger and smaller.
- the transducer array 112 may include an array of acoustic elements with any number of acoustic elements in any suitable configuration, such as a linear array, a planar array, a curved array, a curvilinear array, a circumferential array, an annular array, a phased array, a matrix array, a one-dimensional (ID) array, a 1.x dimensional array (e.g., a 1.5D array), or a two- dimensional (2D) array.
- the array of acoustic elements e.g., one or more rows, one or more columns, and/or one or more orientations that can be uniformly or independently controlled and activated.
- the transducer array 112 can be configured to obtain one-dimensional, two- dimensional, and/or three-dimensional images of patient anatomy.
- the transducer array 112 may include a piezoelectric micromachined ultrasound transducer (PMUT), capacitive micro machined ultrasonic transducer (CMUT), single crystal, lead zirconate titanate (PZT), PZT composite, other suitable transducer types, and/or combinations thereof.
- PMUT piezoelectric micromachined ultrasound transducer
- CMUT capacitive micro machined ultrasonic transducer
- PZT lead zirconate titanate
- PZT composite other suitable transducer types, and/or combinations thereof.
- the object 105 may include any anatomy, such as blood vessels, nerve fibers, airways, mitral leaflets, kidney, and/or liver of a patient that is suitable for ultrasound imaging examination.
- the object 105 may include at least a portion of a patient’s heart, lungs, and/or skin.
- the object 105 may be in constant motion, for example, resulted from breathing, cardiac activities, and/or arterial pulses.
- the motion may be regular or periodic, for example, with motion of the heart, associated vessels, and/or lungs in the context of a cardiac cycle or a heartbeat cycle.
- the present disclosure can be implemented in the context of any number of anatomical locations and tissue types, including without limitation, organs including the liver, heart, kidneys, gall bladder, pancreas, lungs; ducts; intestines; nervous system structures including the brain, dural sac, spinal cord and peripheral nerves; the urinary tract; as well as valves within the blood vessels, blood, chambers or other parts of the heart, and/or other systems of the body.
- the anatomy may be a blood vessel, as an artery or a vein of a patient’s vascular system, including cardiac vasculature, peripheral vasculature, neural vasculature, renal vasculature, and/or any other suitable lumen inside the body.
- the present disclosure can be implemented in the context of man-made structures such as, but without limitation, heart valves, stents, shunts, filters, implants and other devices.
- the system 100 is used to guide a clinician during a medical procedure (e.g., treatment, diagnostic, therapy, and/or interventions).
- the clinician may insert a medical device 108 into the anatomical object 105.
- the medical device 108 may include an elongate flexible member with a thin geometry.
- the medical device 108 may be a guide wire, a catheter, a guide catheter, a needle, an
- the medical device 108 may be any imaging device suitable for imaging a patient’s anatomy and may be of any suitable imaging modalities, such as optical tomography (OCT), and/or endoscopy.
- the medical device 108 may include a sheath, an imaging device, and/or an implanted device.
- the medical device 108 may be a treatment/therapy device including a balloon, a stent, and/or an atherectomy device.
- the medical device 108 may have a diameter that is smaller than the diameter of a blood vessel.
- the medical device 108 may have a diameter or thickness that is about 0.5 millimeter (mm) or less.
- the medical device 108 may be a guide wire with a diameter of about 0.035 inches.
- the transducer array 112 can produce ultrasound echoes reflected by the object 105 and the medical device 108.
- the beamformer 114 is coupled to the transducer array 112.
- the beamformer 114 controls the transducer array 112, for example, for transmission of the ultrasound signals and reception of the ultrasound echo signals.
- the beamformer 114 provides image signals to the processing component 116 based on the response or the received ultrasound echo signals.
- the beamformer 114 may include multiple stages of beamforming. The beamforming can reduce the number of signal lines for coupling to the processing component 116.
- the transducer array 112 in combination with the beamformer 114 may be referred to as an ultrasound imaging component.
- the processing component 116 is coupled to the beamformer 114.
- the processing component 116 may include a central processing unit (CPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), a controller, a field programmable gate array (FPGA) device, another hardware device, a firmware device, or any combination thereof configured to perform the operations described herein.
- the processing component 134 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- the processing component 116 is configured to process the beamformed image signals.
- the processing component 116 may perform filtering and/or quadrature demodulation to condition the image signals.
- the processing component 116 and/or 134 can be configured to control the array 112 to obtain ultrasound data associated with the object 105 and/or the medical device 108.
- the communication interface 118 is coupled to the processing component 116.
- the communication interface 118 may include one or more transmitters, one or more receivers, one or more transceivers, and/or circuitry for transmitting and/or receiving communication signals.
- the communication interface 118 can include hardware components and/or software components implementing a particular communication protocol suitable for transporting signals over the communication link 120 to the host 130.
- the communication interface 118 can be referred to as a communication device or a communication interface module.
- the communication link 120 may be any suitable communication link.
- the communication link 120 may be a wired link, such as a universal serial bus (USB) link or an Ethernet link.
- the communication link 120 nay be a wireless link, such as an ultra- wideband (UWB) link, an Institute of Electrical and Electronics Engineers (IEEE) 802.11 WiFi link, or a Bluetooth link.
- UWB ultra- wideband
- IEEE Institute of Electrical and Electronics Engineers
- the communication interface 136 may receive the image signals.
- the communication interface 136 may be substantially similar to the communication interface 118.
- the host 130 may be any suitable computing and display device, such as a workstation, a personal computer (PC), a laptop, a tablet, or a mobile phone.
- the processing component 134 is coupled to the communication interface 136.
- the processing component 134 may be implemented as a combination of software components and hardware components.
- the processing component 134 may include a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a controller, a FPGA device, another hardware device, a firmware device, or any combination thereof configured to perform the operations described herein.
- the processing component 134 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- the processing component 134 can be configured to generate image data from the image signals received from the probe 110.
- the processing component 134 can apply advanced signal processing and/or image processing techniques to the image signals.
- the processing component 134 can form three-dimensional (3D) volume image from the image data.
- the processing component 134 can perform real-time processing on the image data to provide a streaming video of ultrasound images of the object 105 and/or the medical device 108.
- the display 132 is coupled to the processing component 134.
- the display 132 may be a monitor or any suitable display.
- the display 132 is configured to display the ultrasound images, image videos, and/or any imaging information of the object 105 and/or the medical device 108.
- the system 100 may be used to provide a clinician with guidance in a medical procedure.
- the system 100 can capture a sequence of ultrasound images of the object 105 and the medical device 108 as the medical device 108 traverses through the object 105.
- the sequence of ultrasound images can be in 2D or 3D.
- the system 100 may be configured to perform biplane imaging or multiplane imaging to provide the sequence of ultrasound images as biplane images or multiplane images, respectively.
- the clinician may have difficulty in identifying and/or distinguishing the medical device 108 from the object 105 based on the captured images due to the motion of the medical device 108 and/or the thin geometry of the medical device 108.
- the medical device 108 may appear to jump from one frame to another frame without time-continuity.
- the processing component 134 can apply a temporally- aware deep learning network trained for segmentation to the series of images.
- the deep learning network identifies and/or distinguishes the medical device 108 from the anatomical object 105 and predicts motion and/or positions of the medical device 108 using temporal information carried in the sequence of images captured across time.
- the processing component 134 can incorporate the prediction into the captured 2D and/or 3D image frames to provide a time series of output images with a stable view of the moving medical device 108 from frame-to-frame.
- the sequence of ultrasound images input to the deep learning network may be 3D volumes and the output prediction may be 2D images, biplane images, and/or multiplane images.
- the medical device 108 may be a 2D ultrasound imaging probe and the deep learning network can be configured to predict volumetric 3D segmentation, where the sequence of ultrasound images input to the deep learning network may be 2D images, biplane images, and/or multiplane images and the output prediction may be 3D volumes.
- anatomical structures e.g., the object 105
- anatomical structures can be difficult to identify under 2D and/or 3D imaging due to the geometry and/or motion of the anatomical structures.
- tortuous blood vessels in distal peripheral anatomy and/or small structures close to the heart may be affected by arterial and/or cardiac motion.
- the mitral leaflets and/or other structures may go in and out of an ultrasound imaging views over a time period.
- vessels, airways, and tumors may go in and out of an ultrasound imaging view during endobronchial ultrasound imaging, due to the breathing motion of the patient.
- the processing component 134 can apply a temporally-aware deep learning network trained for segmentation to a series of 2D and/or 3D images of the object 105 captured across time.
- the deep learning network identifies and/or distinguishes the moving portion (e.g., foreground) of the object 105 from the relatively more static portion (e.g., background) of the object 105 and predicts motion and/or positions of the moving portion using temporal information carried in the sequence of images captured across time.
- the moving portions may correspond to mitral leaflets and the static portions may correspond to cardiac chambers, which may include relatively slower motions than valves.
- the moving portions may correspond to pulsatile arteries and the static portions may correspond to surrounding tissues.
- the moving portions may correspond to lung chambers and airways and the static portions may correspond to surrounding cavities and tissues.
- the processing component 134 can incorporate the prediction into the captured image frames to provide a series of output images with a stable view of the moving anatomical structure from frame-to-frame.
- Mechanisms for providing a stable view of a moving object e.g., the medical device 108 and/or the object 105) using a temporally-aware deep learning model are described in greater detail herein.
- the system 100 may be used to assist a clinician in finding an optimal imaging view of a patient for a certain clinical property or clinical examination.
- the processing component 134 can utilize a temporally-aware deep learning network trained for image acquisition to predict an optimal imaging view or image slice of the object 105 for a certain clinical property from the captured 2D and/or 3D images.
- the system 100 may be configured for cardiac imaging to assist a clinician in measuring a ventricular volume, determining the presence of cardiac arrhythmia, performing a trans-septal puncture, and/or provide mitral valve visualization for repair and/or replacement.
- the cardiac imaging can be configured to provide a four-chamber view, a three-chamber view, and/or a two-chamber view.
- the cardiac imaging can be used for visualizing the left ventricular overflow tract (LVOT), which may be critical for mitraclip and valve in mitral valve
- LVOT left ventricular overflow tract
- the cardiac imaging can be used for visualizing mitral annulus for any procedure involving annuloplasty.
- the cardiac imaging can be used for visualizing the left atrial appendage during a trans-septal puncture (TSP) to prevent proration.
- TSP trans-septal puncture
- the clinical property may be the presence and location of a suspected tumor and may be obtained from lateral or sagittal ultrasound views in which the ultrasound transducer is aligned with the tumor and adjacent airway tracts.
- the processing component 134 can provide the clinician with instructions (e.g., rotations and/or translations) to maneuver the probe 110 from one location to another location or from one imaging plane to another imaging plane to obtain an optimal imaging view of the clinical property based on the prediction output by the deep learning network.
- the processing component 134 can automate the process of reaching the optimal imaging view.
- the processing component 134 is configured to automatically steer 2D or X-plane beams produced by the transducer array 112 to an optimal imaging location based on the prediction output by the deep learning network.
- An X-plane may include a cross-sectional plane and a longitudinal plane. Mechanisms for reaching an optimal imaging view using a deep learning model are described in greater detail herein.
- the system 100 can be used for collecting ultrasound images to form training data set for deep learning network training.
- the host 130 may include a memory 138, which may be any suitable storage device, such as a cache memory (e.g., a cache memory of the processing component 134), random access memory (RAM), magnetoresistive RAM (MRAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), flash memory, solid state memory device, hard disk drives, solid state drives, other forms of volatile and non-volatile memory, or a combination of different types of memory.
- the memory 138 can be configured to store an image data set 140 to train a cache memory (e.g., a cache memory of the processing component 134), random access memory (RAM), magnetoresistive RAM (MRAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read only memory (EPROM), electrically
- FIG. 2 is a schematic diagram of a deep learning-based image segmentation scheme 200, according to aspects of the present disclosure.
- the scheme 200 is implemented by the system 100.
- the scheme 200 utilizes a temporally-aware multi-layered deep learning network 210 to provide segmentations of a moving object in ultrasound images.
- the moving object may be a medical device (e.g., a guide wire, a catheter, a guided catheter, a needle, or a therapy device similar to the devices 108 and/or 212) moving within a patient’s anatomy (e.g., heart, lung, vessels, and/or skin similar to the object 105).
- the moving object may be an anatomical structure (e.g., the object 105) with a cardiac motion, a breathing motion, and/or arterial pulses.
- the multi-layered deep learning network 210 receives a sequence of ultrasound image frames 202 of the device and/or the anatomical structure. Each image frame 202 is passed through the temporally-aware multi-layered deep learning network 210.
- the deep learning network 210 s prediction for a current image frame 202 is passed as an input for prediction of a next image frame 202.
- the deep learning network 210 includes a recurrent component that utilizes the temporal continuity in the sequence of ultrasound image frames 202 for prediction.
- the deep learning network 210 is also referred to as a recurrent predictive network.
- the sequence of image frames 202 is captured across a time period (e.g., from time TO to time Tn).
- the image frames 202 may be captured using the system 100.
- the sequence of image frames 202 are reconstructed from ultrasound echoes collected by the transducer array 112, beamformed by the beamformer 114, filtered and/or conditioned by the processing components 116 and/or 134, and reconstructed by the processing component 134.
- the sequence of image frames 202 are input into the deep learning network 210. While FIG. 2 illustrates the image frames 202 as 3D volumes, the scheme 200 may be similarly applied to a sequence of 2D input image frames captured across time to provide segmentation. In some examples, the sequence of 3D image frames 202 across time can be referred to as a continuous 4D (e.g., 3D volumes and time) ultrasound sequence.
- a continuous 4D e.g., 3D volumes and time
- the deep learning network 210 includes a convolutional encoder 220, a temporally- aware RNN 230, and a convolutional decoder 240.
- the convolutional encoder 220 includes a plurality of convolutional encoding layers 222.
- the convolutional decoder 240 includes a plurality of convolutional decoding layers 242.
- the number of convolutional encoding layers 222 and the number of convolutional decoding layers 242 may be the same.
- FIG. 2 illustrates four convolutional encoding layers 222 KO , 222 KI , 222 K2 , and 222 K3 in the convolutional encoder 220 and four convolutional decoding layers 242 LO , 224 LI , 242 L2 , and 242 L3 in the convolutional decoder 240 for simplicity of illustration and discussion, though it will be recognized that embodiments of the present disclosure may scale to include any suitable number of convolutional encoding layers 222 (e.g., about 2, 3, 5, 6, or more) and any suitable number of convolutional decoding layers 242 (e.g., about 2, 3, 5, 6, or more).
- the subscripts K0, Kl, K2, and K3 represent layer indexing for the convolutional encoding layers 222.
- the subscripts L0, LI, L2, and L3 represent layer indexing for the convolutional decoding layers 242.
- Each of the convolutional encoding layers 222 and each of the convolutional decoding layers 242 may include a convolutional filter or kernel.
- the convolutional kernel can be a 2D kernel or a 3D kernel depending on whether the deep learning network 210 is configured to operate on 2D images or 3D volumes. For example, when the image frames 202 are 2D images, the convolutional kernels are 2D filter kernels. Alternatively, when the image frames 202 are 3D volumes, the convolutional kernels are 3D filter kernels.
- the filter coefficients for the convolutional kernels are trained to learn segmentation of moving objects as described in greater detail herein.
- the convolutional encoding layers 222 and the convolutional decoding layers 242 may operate at multiple different spatial resolutions.
- each convolutional encoding layer 222 may be followed by a down-sampling layer.
- Each convolutional decoding layer 242 can be preceded by an up-sampling layer.
- the down-sampling and up-sampling can be at any suitable factor.
- the down-sampling factor at each down-sampling layer and the up-sampling factor at each up-sampling layer can be about 2.
- the convolutional encoding layers 222 and the convolutional decoding layers 242 can be trained to extract features from the sequence of image frames 202 at different spatial resolutions.
- the RNN 230 is positioned between the convolutional encoding layers 222 and the convolutional decoding layers 242.
- the RNN 230 is configured to capture temporal information (e.g., temporal continuity) from the sequence of input image frames 202 for segmentation of moving objects.
- the RNN 230 may include multiple temporally-aware recurrent components (e.g., the recurrent component 232 of FIGS. 3 and 4).
- the RNN 230 passes a prediction for a current image frame 202 (captured at time TO) back to the RNN 230 as a secondary input for a prediction for a next image frame 202 (captured at time Tl) as shown by the arrow 204.
- the use of temporal information at different spatial resolutions for segmentation of moving objects is described in greater detail below with respect to FIGS. 3 and 4.
- FIG. 3 is a schematic diagram illustrating a configuration 300 for the temporally- aware deep learning network 210, according to aspects of the present disclosure.
- FIG. 3 provides a more detailed view of the use of temporal information at the deep learning network 210.
- FIG. 3 illustrates operations of the network 210 at two time instants, TO and Tl. However, similar operations may be propagated to subsequent time T2, T3, .. and Tn.
- the convolutional encoding layers 222 are shown without the layer indexing subscripts K0, Kl, K2, and K3 and the convolutional decoding layers 242 are shown without the layer indexing subscripts L0, LI, L2, and L3 for simplicity sake.
- FIG. 3 uses subscripts TO and Tl to represent time indexing.
- the system 100 captures an image frame 202to ⁇
- the image frame 202 TO is input into the deep learning network 210.
- the image frame 202 TO is processed by each of the convolutional encoding layers 222.
- the convolutional encoding layers 222 produces encoded features 304to ⁇
- the encoded features 304 TO may include features at different spatial resolutions as described in greater detail herein below.
- the RNN 230 may include multiple recurrent components 232, each operating at one of the spatial resolutions.
- the recurrent component 232 may be long short term memory (LSTM) units.
- the recurrent component 232 may be gated recurrent components (GRUs).
- Each recurrent component 232 is applied to the encoded features 304 TO of a corresponding spatial resolution to produce output 306to ⁇
- the output 306 TO are stored in a memory (e.g., the memory 138).
- the recurrent component 232 can include a single convolutional operation per feature channel.
- the output 306 TO is subsequently processed by the each of the convolutional decoding layers 242 to produce a confidence map 308to ⁇
- the confidence map 308 TO predicts whether a pixel of the image includes the moving object.
- the confidence map 308 TO may include a value between about 0 to about 1 representing the likelihood of a pixel including a moving object, where a value closer to 1 represents a pixel that is likely to include the moving object and a value closer to 0 represents a pixel that is less likely to include the moving object.
- a value closer to 1 may represent a pixel that is less likely to include the moving object and a value closer 0 represents a pixel that is likely to include the moving object.
- the confidence map 308 TO may indicate a probability or confidence level of the pixel including the moving object.
- the confidence map 308 TO can provide prediction of the moving object’s position and/or motion in each image frame 202 in the sequence.
- the system 100 captures the image frame 202 TI .
- the deep learning network 210 may apply the same operations to the image frame 202 TI as the image frame 202 TO .
- the encoded features 304 TI produced by each convolutional encoding layer 222 are concatenated with the output 306 TO from the previous time TO (as shown by the arrow 301) before being passed to the convolutional decoding layers 242.
- the concatenation of passed output 306 TO and current encoded features 304 TI is performed at each spatial resolution layer.
- the concatenation of the previous output 306 TO at time TO and the current encoded features 304 TO at each spatial resolution layer allows the recurrent part of the network 210 to have full exposure to features at every past time point and every spatial resolution level (e.g., from coarse to fine) before making a prediction on the input image frame 202 TI at the current time Tl.
- the capturing of temporal information at each spatial resolution layer is described in greater detail below with respect to FIG. 4.
- FIG. 4 is a schematic diagram illustrating a configuration 400 for the temporally- aware deep learning network 210, according to aspects of the present disclosure.
- FIG. 4 provides a more detailed view of the internal operations at the deep learning network 210.
- FIG. 4 illustrates the operations of the deep learning network 210 on a single input image frame 202 (e.g., at time Tl). However, similar operations may be applied to each image frame 202 in the sequence. Additionally, the operations are shown for four different spatial resolutions 410, 412, 414, and 416. However, similar operations may be applied for any suitable number of spatial resolutions (e.g., about 2, 3, 5, 6, or more).
- FIG. 4 provides an expanded view of the RNN 230.
- the RNN 230 includes a recurrent component 232 at each spatial resolution 410, 412, 414, and 416 to capture temporal information at each spatial resolution 410, 412, 414, and 416.
- the recurrent components 232 are shown as 232 RO , 232 RI , 232 R2 , and 232 R3 for the spatial resolutions 410, 412, 414, and 416, respectively.
- each of the convolutional encoding layer 222 is followed by a down-sampling layer 422 and each of the convolutional decoding layer 242 is preceded by an up-sampling layer 442.
- the image frame 202 TI is captured and input into the deep learning network 210.
- the image frame 202 TI is passed through each of the convolutional encoding layers 222 KO , 222 KI , 222 K2 , and 222 K3 .
- the image frame 202 TI may have a spatial resolution 410.
- the image frame 202 TI is convolved with the convolutional encoding layer 222 KO to output encoded features 304 TI,KO (e.g., in the form of a tensor) at the spatial resolution 410.
- the output of the convolutional encoding layers 222 KO IS down-sampled by the down-sampling layer 422 DO to produce a tensor 402 DO at the spatial resolution 412.
- the tensor 402 DO is convolved with the convolutional encoding layer 222 KI to output encoded features 304 TI,KI at the spatial resolution 412.
- the output of the convolutional encoding layers 222 KI is down-sampled by the down-sampling layer 422 DI to produce a tensor 402 DI at the spatial resolution 414.
- the tensor 402 DI is convolved with the convolutional encoding layer 222 K2 to output encoded features 304 TI,K2 at the spatial resolution 414.
- the output of the convolutional encoding layers 222 K 2 is down- sampled by the down- sampling layer 422 D2 to produce a tensor 402 D 2 at the spatial resolution 416.
- the tensor 402 D2 1S convolved with the convolutional encoding layer 222 K3 to output encoded features 304 TI,K3 at the spatial resolution 416.
- Temporal continuity information is captured at each of the spatial resolution 410, 412, 414, and 416.
- the encoded features 304 TI,KO are concatenated with an output 306 TO,KO of the recurrent component 232 RO obtained at a previous time TO for the convolutional encoding layer 222ko ⁇
- the previous output 306 TO,KO is stored in a memory (e.g., the memory 138) at time TO and retrieved from the memory for processing at time Tl.
- the retrieval of the previous recurrent component output 306 TO,KO from the memory is shown by the empty-filled arrow.
- the recurrent component 232 RO is applied to the concatenation of the encoded features 304 TI,KO and the output 306 TO,KO to produce an output 306ti , ko ⁇
- the output 306n ,K o can be down-sampled so that the output 306 TI,KO may have the same dimensions as the encoded feature 304ti , ko ⁇
- the output 306TI,KO is stored in the memory (shown by the pattern-filled arrow) and can be retrieved for a similar concatenation at a next time T2.
- the encoded features 304 TI,KI are concatenated with an output 306 TO,KI of the recurrent component 232 RI obtained at the previous time TO.
- the recurrent component 232 RI is applied to the concatenation of the encoded features 304 TI,KI and the output 306 TO,KI to produce an output 306ti , ki ⁇
- the output 306 TI,KI is stored in the memory (shown by the pattern-filled arrow) for a similar concatenation at the next time T2.
- the encoded features 304TI,K2 are concatenated with an output 306TO,K2 of the recurrent component 232R2 obtained at the previous time TO.
- the recurrent component 232R2 is applied to the concatenation of the encoded features 304-n ,K2 and the output 306TO,K2 to produce an output 306TI,K2.
- the output 306TI,K2 is stored in the memory (shown by the pattern-filled arrow) for a similar concatenation at the next time T2.
- the encoded features 304TI , K3 are concatenated with an output 306TO , K2 of the recurrent component 232R3 obtained at the previous time TO.
- the recurrent component 232R3 is applied to the concatenation of the encoded features 304TI , K3 and the output 306TO , K3 to produce an output 306TI , K3.
- the output 306TI , K3 is stored in the memory (shown by the pattern- filled arrow) for a similar concatenation at the next time T2.
- the outputs 306TI,K3, 306TI,K2, 306TI,KI, and 306TI,KO are passed to the convolutional decoding layers 242LO, 242LI, and 242L2, respectively.
- the output 306TI , K3 is up- sampled by the up-sampling layer 442uo to produce a tensor 408uo (e.g., including extracted features.
- the tensor 408uo and the output 306n ,K2 are convolved with the convolutional decoding layers 242LO and up-sampled by the up-sampling layer 442ui to produce a tensor 408ui.
- the tensor 408ui and the output 306TI , KI are convolved with the convolutional decoding layers 242LI and up-sampled by the up-sampling layer 442u 2 to produce a tensor 408u 2 .
- the tensor 408u 2 and the output 306TI , KO are convolved with the convolutional decoding layers 242L2 to produce the confidence map 308ti ⁇
- FIG. 4 illustrates four encoding layers 222 and three decoding layers 242, the network 210 can be alternatively configured to include four decoding layers 242 to provide similar predictions.
- the encoder shown in the left side the network 210 in FIG. 4) is where the learning process occurs.
- the number of encoding layers 222 can be determined based on the size of the input volume and the receptive field of the network 210.
- the depth of the network 210 can be varied based on how large the input image is and its influence on learning the features i.e. by controlling the receptive field of the network 210. As such, the network 210 may not have a corresponding decoder/up-sampling layer to the innermost layer.
- the decoder (shown in the right side of the network 210 in FIG. 4) takes the features from lower resolution feature maps and assembles them, while up-sampling towards the original output size
- the deep learning network 210 performs prediction for a current image frame 202 (at time Tn) based on features extracted from the current image frame 202 and the previous image frame 202 (at time Tn-1) instead of based on a single image frame captured at a single point of time.
- the deep learning network 210 can infer motion and/or positional information associated with a moving object based on information in the past.
- the time- continuity information e.g., provided by the temporal concatenation
- the use of temporal information can be particular useful in segmenting a thin object since a thin object may typically be represented by a relatively less number of pixels in an imaging frame than a thicker object. Accordingly, the present disclosure can improve visualization and/or stability in ultrasound images and/or videos of a moving medical device and/or an anatomical structure including a moving portion.
- the down- sampling layers 422 can perform down- sampling at any suitable down- sampling factor.
- each down- sampling layers 422 may perform down- sampling by a factor of 2.
- the input image frame 202 TI has a resolution of 200x200x200 voxels (e.g., the spatial resolution 410).
- the input image frame 202 TI is down-sampled by 2 to produce the tensor 402 DO at a resolution of lOOx lOOx 100 voxels (e.g., the spatial resolution 412).
- the tensor 402 DO IS down-sampled by 2 to produce the tensor 402 DI at a resolution of 50x50x50 voxels (e.g., the spatial resolution 414.
- the tensor 402 DI is down-sampled by 2 to produce the tensor 402 D2 at a resolution of 25x25x25 voxels (e.g., the spatial resolution 416).
- the up-sampling layers 442 may reverse the down- sampling. For example, each of the up- sampling layers 442 may performing up-sampling by a factor of 2. In some other examples, the down- sampling layers 422 may perform down- sampling at different down- sampling factors and the up-sampling layers 442 may perform up -samp ling using factors matching to the down- sampling factors.
- the down-sampling layers 422 DO , 422 DI , and 422 D2 may perform down-sampling by 2, 4, and 8, respectively, and the up-sampling layers 442uo, 442ui, and 442m may perform up-sampling by 8, 4, and 2, respectively.
- the convolutional encoding layers 222 and the convolutional decoding layers 242 may include convolutional kernels of any sizes.
- the kernel sizes may be dependent on the size of the input image frames 202 and can be selected to limit the network 210 to a certain complexity.
- each of the convolutional encoding layers 222 and each of the convolutional decoding layers 242 may include a 5x5x5 convolutional kernel.
- the convolutional encoding layer 222K O may provide about one feature (e.g., the feature 304TI , K O has a size of 1) at the spatial resolution 410.
- the convolutional encoding layer 222KI may provide about two features (e.g., the feature 304TI , KI has a size of 2) at the spatial resolution 412.
- the convolutional encoding layers 222K2 may provide about four features (e.g., the feature 304TI , K2 has a size of 4) at the spatial resolution 414.
- the convolutional encoding layer 222K 3 may provide about eight features (e.g., the feature 304TI , K 3 has a size of 8) at the spatial resolution 416. In general, the number of features may increase as the spatial resolution decreases.
- the convolutions at the convolutional encoding layers 222 and/or the convolutional decoding layers 242 can be repeated.
- the convolution at the convolutional encoding layers 222K O can be repeated twice, the convolution at the
- convolutional encoding layers 222KI can be performed once, the convolution at the convolutional encoding layers 222K 2 can be repeated twice, and the convolution at the at the convolutional encoding layers 222K 3 can be repeated twice.
- each of the convolutional encoding layers 222 and/or each of the convolutional decoding layers 242 can include a non-linear function (e.g., a parametric rectified linear unit (PReLu)).
- a non-linear function e.g., a parametric rectified linear unit (PReLu)
- each of the recurrent components 232 may include a convolutional gated recurrent component (convGRU). In some examples, each of the recurrent components 232 may include a convolutional long short-term memory (convLSTM).
- the deep learning network 210 may output a confidence map 308 for each image frame 202.
- a corresponding confidence map 308 can include a probability or a confidence level of the pixel including the moving object.
- a sequence of output image frames 206 can be generated based on the sequence of input image frames 202 and corresponding confidence maps 308.
- temporally-aware inferencing can interpolate or otherwise predict missing image information of the moving object based on the confidence map 308.
- the inference, interpolation, and/or prediction can be implemented outside of the deep learning network 210.
- the interpolation and/or the reconstruction can be implemented as part of the deep learning network 210.
- the learning and training of the deep learning network 210 may include the inference, interpolation, and/or prediction of missing imaging information.
- the deep learning network 210 can be trained to differentiate an elongate flexible thin moving medical device (e.g., a guide wire, a guided catheter, a catheter, a needle, a therapy device, and/or a treatment device) from an anatomy.
- a training data set e.g., the image data set 140
- the training data set can include input-output pairs.
- the input may include a sequence of image frames (e.g., 2D or 3D) of a medical device (e.g., the device 108) traversing across an anatomy (e.g., the object 105) across time and the output may include ground truths or annotations of the positions of the medical device within each image frame in the sequence.
- the ground truth position of the medical device can be obtained by attaching an ultrasound sensor to the medical device (e.g., at the tip of the medical device) during imaging and subsequently fitting a curve or spline to the captured images using at least the tip as an end point constraint for the spline. After fitting the curve to the ultrasound images, the images can be annotated or labelled with the ground truths for training.
- the deep learning network 210 can be applied to the sequence of image frames using forward propagation to produce an output.
- the coefficients of the convolutional kernels at the convolutional encoding layers 222, the recurrent components 232, and/or the convolutional decoding layers 242 can be adjusted using backward propagation to minimize an error between the output and the ground truth positions of the device.
- the training process can be repeated for each input-output pair in the training data set.
- the deep learning network 210 can be trained to differentiate a moving portion of an anatomy from a static portion of the anatomy using a training data set (e.g., the image date set 140).
- a training data set e.g., the image date set 140
- the training data set can include input-output pairs.
- the input may include a sequence of image frames (e.g., 2D or 3D) of an anatomy with motion (e.g., associated with cardiac, breathing, and/or arterial pulses) and the output may include ground truths or annotations of the various moving and/or static portions of the anatomy.
- ground truths and/or annotations can be obtained from various annotated data sets that are available to the medical community.
- the sequence of image frames can be annotated manually with the ground truths.
- similar mechanisms as described above e.g., for the moving object
- FIGS. 5-8 illustrate various clinical use case scenarios where the temporally-aware deep learning network 210 can be used to provide improved segmentation based on a series of observations over time.
- FIG. 5 illustrates a scenario 500 of an ultrasound-guided procedure, according to aspects of the present disclosure.
- the scenario 500 may correspond to a scenario when the system 100 is used to capture ultrasound images of a thin guide wire 510 (e.g., the medical device 108) passing through a vessel lumen 504 with a vessel wall 502 including an occluded region 520 (e.g., plaque and/or calcification).
- a sequence of ultrasound images is captured at time TO, Tl, T2, T3, and T4.
- the columns in the right side of FIG. 5 include checkmarks and crosses.
- the checkmarks indicate that the guide wire 510 is fully visible in a corresponding image frame.
- the crosses indicate that the guide wire 510 is not fully visible in a corresponding image frame.
- the guide wire 510 enters the lumen 504.
- a beginning portion 512a of the guide wire 510 enters the occluded region 520.
- the guide wire 510 continues to pass through the lumen 504, where a middle portion 512b of the guide wire 510 (shown by the dashed line) is within the occluded region 520.
- the guide wire 510 continues to pass through the lumen 504, where an end portion 512c of the guide wire 510 (shown by the dashed line) is within the occluded region 520.
- the guide wire 510 exits the occluded region 520.
- General 3D segmentation without utilizing temporal information may fail to segment the portions 512a, 512b, and 512c within the occluded region 520 at time Tl, T2, and T3, respectively.
- the image frames obtained at time Tl, T2, and T3 without temporal information may each include a missing segment, section, or portion of the guide wire 510 corresponding to the portions 512a, 512b, and 512c within the occluded region 520, respectively.
- crosses are shown for time Tl, T2, and T3 under the column for segmentation without temporal information.
- the temporally- aware deep learning network 210 is designed to interpolate the missing information based on previous image frames, and thus the system 100 can apply the deep learning network 210 to infer the missing portions 512a, 512b, and 512c in the images. As such, checkmarks are shown for time Tl, T2, and T3 under the column for segmentation with temporal information.
- the scenario 500 may be similar to a peripheral vascular intervention procedure, where the occluded region 520 may correspond to a chronic total occlusion (CTO) crossing in peripheral vascular structure.
- the scenario 500 may be similar to a clinical procedure where a tracking device passes through air gaps, calcifications, or regions of shadowing (e.g., the occluded region 520).
- FIG. 6 illustrates a scenario 600 of an ultrasound-guided procedure, according to aspects of the present disclosure.
- the scenario 600 may correspond to a scenario when the system 100 is used to capture ultrasound images of a guide wire 610 (e.g., the medical device 108) passing through a vessel lumen 604 with a vessel wall 602, where the guide wire 610 may glide along the vessel wall 602 for a period of time.
- a sequence of ultrasound images is captured at time TO, Tl, T2, T3, and T4.
- the columns in the right side of FIG. 6 include checkmarks and crosses.
- the checkmarks indicate that the guide wire 610 is fully visible in corresponding image frames.
- the crosses indicate that the guide wire 610 is not fully visible in corresponding image frames.
- the guide wire 610 initially enters the lumen 604 at about a center of the lumen 604.
- a portion 612a of the guide wire 610 slides against the vessel wall 602.
- the guide wire 610 continues to slide against the vessel wall 602.
- a portion 612b of the guide wire 610 is adjacent to the vessel wall 602.
- a portion 612c of the guide wire 610 is adjacent to the vessel wall 602.
- a portion 612d of the guide wire 610 is adjacent to the vessel wall 602.
- the guide wires 610 may be similarly reflective as the vessel wall 602, and thus general 3D segmentation without utilizing temporal information may fail to segment the portions 612a, 612b, 612c, 612d that are close to the vessel wall 602 at time Tl, T2, T3, and T4, respectively.
- the image frames obtained at time Tl, T2, T3, and T4 without temporal information may each include a missing section, segment, or portion of the guide wire 610 corresponding to the portions 612a, 612b, 612c, and 612d, respectively.
- crosses are shown for time Tl, T2, T3, and T4 under the column for segmentation without temporal information.
- the temporally- aware deep learning network 210 is exposed to the entire sequence of ultrasound image frames or video frames across time, and thus may be applied to the sequence of images to predict the positions and/or motion of the portions 612a, 612b, 612c, and 612d close to the vessel wall 602 at time Tl, T2, T3, and T4, respectively.
- checkmarks are shown for time Tl, T2, T3, and T4 under the column for segmentation with temporal information.
- the scenario 600 may be similar to a cardiac imaging procedure where a medical device or a guide wire glides along the wall of a cardiac chamber. In some examples, the scenario 600 may be similar to a peripheral vascular intervention procedure where a subintimal is purposefully directed to into the adventitia of a vessel wall in order to bypass occlusions.
- FIG. 7 illustrates a scenario 700 of an ultrasound-guided procedure, according to aspects of the present disclosure.
- the scenario 700 may correspond to a scenario when the system 100 is used to capture ultrasound images of a guide wire 710 (e.g., the medical device 108) passing through a vessel lumen 704 with a vessel wall 702, where acoustic coupling is lost for a period of time.
- a sequence of ultrasound images is captured at time TO, Tl, T2, T3, and T4.
- the columns in the right side of FIG. 7 include checkmarks and crosses.
- the checkmarks indicate that the guide wire 710 is fully visible in a corresponding image frame.
- the crosses indicate that the guide wire 710 is not fully visible in a corresponding image frame.
- the guide wire 710 enters the lumen 704.
- the acoustic coupling is lost at time T1 and T2.
- the acoustic coupling is regained at time T3.
- General 3D imaging without utilizing temporal information may lose all knowledge of the positions of the guide wire 610 when acoustic coupling is lost.
- the guide wire 710 may not be visible in image frames obtained at time T1 and T2 without temporal information. As such, crosses are shown for time T1 and T2 under the column for segmentation without temporal information.
- the temporally- aware deep learning network 210 has the capacity to remember the location of the guide wire 710 for at least a few frames, and thus can be applied to the sequence of images to predict the locations of the guide wire 710 at time T1 and time T2. Thus, checkmarks are shown for time T1 and T2 under the column for segmentation with temporal information. If the acoustic coupling is lost for an extended period of time, the temporally-aware deep learning network 210 is less likely to produce incorrect segmentation results.
- the scenario 700 may occur whenever acoustic coupling is lost. It may be difficult to maintain acoustic coupling at all time during imaging.
- the temporally-aware deep learning-based segmentation can improve visualization of various device and/or anatomical structures in ultrasound images, especially when automation is involved, for example, during automatic beam steering, sensor tracking with image-based constraints, and/or robotic control of ultrasound imaging device.
- acoustic coupling may be lost for a short period of time during cardiac imaging due to the motion of the hearts.
- the temporally-aware deep learning-based segmentation can improve visualization in cardiac imaging.
- FIG. 8 illustrates a scenario 800 of an ultrasound-guided procedure, according to aspects of the present disclosure.
- the scenario 800 may correspond to a scenario when the system 100 is used to capture ultrasound images of a guide wire 810 (e.g., the medical device 108) passing through a vessel lumen 804 with a vessel wall 802, where the guide wire 810 may go in and out of plane during imaging.
- a sequence of ultrasound images is captured at time TO, Tl, T2, T3, and T4.
- the columns in the right side of FIG. 8 include checkmarks and crosses.
- the checkmarks indicate that the guide wire 810 is fully visible in a corresponding image frame.
- the crosses indicate that the guide wire 810 is not fully visible in a corresponding image frame.
- the guide wire 810 enters the lumen 804 and is in plane under the imaging.
- the guide wire 810 starts to drift out of plane (e.g., partially out-of-plane).
- the guide wire 810 is fully out of plane.
- the guide wire 810 continues to drift and is partially out of plane.
- the guide wire 810 moves back in plane.
- General 3D imaging without utilizing temporal information may not detect any structure that is out of plane.
- the guide wire 810 may not be fully visible in the image frames obtained at time Tl, T2, and T3 without temporal information. As such, crosses are shown for time Tl, T2, and T3, under the column for segmentation without temporal information.
- the temporally- aware deep learning network 210 is able to predict out-of-plane device position to provide full visibility of the device, and thus can be applied to the sequence of images to predict the locations of the guide wire 810. Thus, checkmarks are shown for time Tl, T2, and T3 under the column for segmentation without temporal information).
- the scenario 800 may occur in an ultrasound-guided procedure where non-volumetric imaging mode (e.g., 2D imaging) is used.
- the scenario 800 may occur in real-time 3D imaging where relatively small- sized 3D volumes are acquired in in a transverse direction in order to maintain a sufficiently high frame rate.
- the scenario 800 may occur in cardiac imaging where the motion of a heart may cause the certain portions of the heart to enter and exit an imaging plane.
- scenarios 500-800 illustrate the use of the temporally-aware deep learning network 210 for providing segmentation of a moving guide wire (e.g., the guide wires 510, 610, 710, and/or 810)
- similar temporally-aware deep learning-based segmentation mechanisms can be applied to any elongate flexible, thinly-shaped moving devices (e.g., catheters, guided catheters, needles, IVUS devices, and/or therapy device) and/or anatomical structures with moving portions.
- temporally-aware deep learning-based segmentation can be used to improve visualization and/or stability of moving devices and/or anatomy with motion under imaging.
- FIG. 9 is a schematic diagram of a deep learning-based image segmentation scheme 900 with spline fitting, according to aspects of the present disclosure.
- the scheme 900 is implemented by the system 100.
- the scheme 900 is substantially similar to the scheme 200.
- the scheme 900 utilizes a temporally-aware multi-layered deep learning network 210 to provide segmentations of a moving object in ultrasound images.
- the scheme 900 includes a spline fitting component 910 coupled to the output of the deep learning network 210.
- the spline fitting component 910 can be implemented by the processing component 134 at the system 100.
- the spline fitting component 910 is configured to apply a spline fitting function to the confidence maps 308 output by the deep learning network 210.
- An expanded view of a confidence map 308 for an image frame 202 in the sequence is shown as a heat map 902.
- the deep learning network 210 predicted the moving object as shown by the curve 930.
- the curve 930 is discontinuous and includes a gap 932.
- the spline fitting component 910 is configured to fit a spline 934 to smooth out the discontinuity of the curve 930 at the gap 932.
- the spline fitting component 910 may perform the spline fitting by taking into account device parameters 904 associated with the moving object under imaging.
- the device parameters 904 may include the shape of the device, the tip position of the device, and/or other dimensional and/or geometric information of the device.
- a spline fitting as postprocessing refinement to the temporal deep learning-based prediction can further improve visualization and/or stability of a moving object under imaging.
- FIG. 10 is a schematic diagram of a deep learning-based imaging guidance scheme 1000, according to aspects of the present disclosure.
- the scheme 1000 implemented by the system 100.
- the scheme 1000 is substantially similar to the scheme 200.
- the scheme 1000 utilizes a temporally-aware multi-layered deep learning network 1010 to provide imaging guidance for ultrasound imaging.
- the deep learning network 1010 may have a substantially similar architecture as the deep learning network 210.
- the deep learning network 1010 includes a convolutional encoder 1020, a temporally-aware RNN 1030, and a convolutional decoder 2100.
- the convolutional encoder 1020 includes a plurality of convolutional encoding layers 1022.
- the convolutional decoder 1040 includes a plurality of convolutional decoding layers 1042.
- the convolutional encoding layers 1022, the convolutional decoding layers 1042, and the RNN 1030 are substantially similar to the convolutional encoding layers 222, the convolutional decoding layers 242, and the RNN 230, respectively, and may operate at multiple different spatial resolutions (e.g., the spatial resolutions 410, 412, 414, and 416) as shown in the configuration 400.
- the convolutional encoding layers 1022, the convolutional decoding layers 1042, and the RNN 1030 are trained to predict an optimal imaging plane for imaging a target anatomy (e.g., including a particular clinical property of interest).
- the optimal imaging plane can be a 2D plane, an X-plane (e.g., including a cross-sectional plane and an orthogonal imaging plane), an MPR, or any suitable imaging plane.
- a sequence of image frames 1002 is captured across a time period (e.g., from time TO to time Tn).
- the image frames 202 may be captured using the system 100.
- the deep learning network 1010 can be applied to the sequence of image frames 1002 to predict an optimal imaging plane.
- the sequence of input image frames 1002 is captured while a medical device 1050 (e.g., the medical device 108) passes through a vessel lumen 1052 with a vessel wall 1054 (e.g., the object 105).
- the output of the deep learning network 1010 provides an optimal long axis slice 1006 and a short axis slice 1008.
- each of the image frame 1002 is processed by each of the convolutional encoding layers 1022 and each of the convolutional decoding layers 1042.
- the RNN 1030 passes a prediction for a current image frame 1002 (captured at time TO) back to the RNN 1030 as a secondary input for a prediction for a next image frame 1002 (captured at time Tl) as shown by the arrow 1004.
- the prediction output by the deep learning network 1010 can be used by the system 100 to automatically steer ultrasound beams to the optimal location.
- the processing component 116 and/or 134 can be configured to control or steer ultrasound beams generated by the transducer array 112 based on the prediction.
- the deep learning network 1010 may predict that an optimal imaging plane is an oblique plane.
- the deep learning network 1010 may provide navigation instructions to a user to maneuver (e.g., rotate and/or translate) the ultrasound probe 110 to align the axis of the probe 110 to the predicted optimal plane.
- the navigation instructions can be displayed on a display similar to the display 132.
- the navigation instructions can be displayed can be displayed using graphical representations (e.g., a rotational symbol or a translational symbol).
- the imaging plane may be in a non-oblique plane.
- the deep learning network 1010 can transition to provide prediction as described in the first example and may communicate with the processing component 116 and/or 134 to steer beams generated by the transducer array 112.
- FIG. 11 illustrates ultrasound images 1110, 1120, and 1130 obtained from an ultrasound-guided procedure, according to aspects of the present disclosure.
- the image 1110 is a 3D image captured using a system similar to the system 100 during a PVD examination.
- the image 1110 shows a thin guide wire 1112 (e.g., the medical device 108 and/or 1050) traversing through a vessel lumen 1114 surrounded by a vessel wall 1116 (e.g., the object 105).
- the device 1112 traverses through the vessel along the x-axis.
- the system may capture a series of 3D images similar to the image 1110 as the device 1112 traverses through the vessel.
- the motion of the device 1112 can cause the device 1112 to go in and out of the imaging view. Additionally, the thin geometry of the device 1112 can cause challenges in distinguishing the device 1112 from the anatomy (e.g., the vessel lumen 1114 and/or the vessel walls 1116.
- a temporally- aware deep learning network trained for segmentation and/or imaging guidance can be applied to the series of 3D images (including the image 1110).
- the prediction results produced by the deep learning network 1010 are used to automatically set MPRs passing through the tip of the device 1112 and aligned with the major axes (e.g., the x-axis and the y-axis) of the device 1112.
- the images 1120 and 1130 are generated based on the deep learning segmentation.
- the image 1120 shows a longitudinal MPR (along the z-x plane) constructed from the image 1110 based on prediction results output by the deep learning network.
- the image 1130 shows a transverse MPR (along the y-z plane) constructed from the image 1110 based on the prediction results.
- the orthogonal MPR planes e.g., the images 1120 and 1130
- the images 1120 and 1130 correspond to the longitudinal and sagittal planes that pass through the tip of the segmented device 1112, respectively, but other MPR planes can be generated as well using similar mechanisms.
- the device 1112 can be located in close proximity to the anatomy (e.g., the vessel wall) and can be equally reflective as the anatomy. Thus, a clinician may have difficulty in visualizing the device 1112 from the captured images.
- the image 1120 and 1130 can be color coded. For example, anatomical structures can be shown in a gray- scale and the device 1112 can be shown in red or any other suitable color.
- FIG. 12 is a schematic diagram of a processor circuit 1200, according to
- the processor circuit 1200 may be implemented in the probe 110 and/or the host 130 of FIG. 1. As shown, the processor circuit 1200 may include a processor 1260, a memory 1264, and a communication module 1268. These elements may be in direct or indirect communication with each other, for example via one or more buses.
- the processor 1260 may include a CPU, a DSP, an application-specific integrated circuit (ASIC), a controller, an FPGA, another hardware device, a firmware device, or any combination thereof configured to perform the operations described herein, for example, aspects of FIGS. 1-11 and 13-15.
- the processor 1260 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- the memory 1264 may include a cache memory (e.g., a cache memory of the processor 1260), random access memory (RAM), magnetoresistive RAM (MRAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), flash memory, solid state memory device, hard disk drives, other forms of volatile and non-volatile memory, or a combination of different types of memory.
- the memory 1264 includes a non-transitory computer-readable medium.
- the memory 1264 may store instructions 1266.
- the instructions 1266 may include instructions that, when executed by the processor 1260, cause the processor 1260 to perform the operations described herein, for example, aspects of FIGS. 1-11 and 13-15 and with reference to the probe 110 and/or the host 130 (FIG. 1).
- Instructions 1266 may also be referred to as code.
- the terms“instructions” and“code” should be interpreted broadly to include any type of computer-readable statement(s).
- the terms“instructions” and“code” may refer to one or more programs, routines, sub -routines, functions, procedures, etc.“Instructions” and“code” may include a single computer -readable statement or many computer-readable statements.
- the communication module 1268 can include any electronic circuitry and/or logic circuitry to facilitate direct or indirect communication of data between the processor circuit 1200, the probe 110, and/or the display 132.
- the communication module 1268 can be an input/output (I/O) device.
- the communication module 1268 facilitates direct or indirect communication between various elements of the processor circuit 1200 and/or the probe 110 (FIG. 1) and/or the host 130 (FIG. 1)
- FIG. 13 is a flow diagram of a deep learning-based ultrasound imaging method 1300, according to aspects of the present disclosure.
- the method 1300 is implemented by the system 100, for example, by a processor circuit such as the processor circuit 1200, and/or other suitable component such as the probe 110, the processing component 114, the host 130, and/or the processing component 134.
- the system 100 can include computer-readable medium having program code recorded thereon, the program code comprising code for causing the system 100 to execute the steps of the method 1300.
- the method 1300 may employ similar mechanisms as in the schemes 200, 900, and/or 1000 described with respect to FIGS. 2, 9, 10, respectively, the configurations 300 and/or 400 described with respect to FIGS.
- the method 1300 includes a number of enumerated steps, but embodiments of the method 1300 may include additional steps before, after, and in between the enumerated steps. In some embodiments, one or more of the enumerated steps may be omitted or performed in a different order.
- the method 1300 includes receiving, by a processor circuit (e.g., the processing component 116 and/or 134 and/or the processor circuit 1200) from an ultrasound imaging device (e.g., the probe 110), a sequence of input image frames (e.g., the image frames 202) of a moving object over a time period (e.g., spanning time TO, Tl, T2... , Tn).
- the moving object includes at least one of an anatomy of a patient or a medical device traversing through the patient’s anatomy and a portion of the moving object is at least partially invisible in a first input image frame of the sequence of input image frames.
- the first input image frame may be any image frame in the sequence of input images frames.
- the anatomy may be similar to the object 105 and may include the patient’s heart, lung, vessels (e.g., the vessel lumens 504, 604, 705, and/or 804 and the vessel walls 502, 602, 702, and/or 802), nerve fibers, and/or any suitable anatomical structure of the patient.
- the medical device is similar to the medical device 108 and/or the guide wires 510, 610, 710, and/or 810.
- the method 1300 includes applying, by the processor circuit, a recurrent predictive network (e.g., the deep learning network 210) associated with image segmentation to the sequence of input image frames to generate segmentation data.
- a recurrent predictive network e.g., the deep learning network 210 associated with image segmentation
- the method includes outputting, to a display (e.g., the display 132) in communication with the processor circuit, a sequence of output image frames (e.g., the image frames 206 and/or 906) frames based on the segmentation data.
- a display e.g., the display 132
- a sequence of output image frames e.g., the image frames 206 and/or 906 frames based on the segmentation data.
- the portion of the moving object is fully visible in a first output image frame of the sequence of output image frames, where the first output image frame and the first input image frame associated with a same time instant within the time period.
- the portion of the moving object may be within an occluded region (e.g., the occluded region 520), for example, as shown in the scenario 500 described above with respect to FIG. 5.
- the portion of the moving object may be lie against an anatomical structure (e.g., the vessel walls 605, 602, 702, and/or 802) of the patient, for example, as shown in the scenario 600 described above with respect to FIG. 6.
- the portion of the moving object may be captured while acoustic coupling is low or lost, for example, as shown in the scenario 700 described above with respect to FIG. 7.
- the portion of the moving object may be out-of-plane while the first input image frame is captured, for example, as shown in the scenario 800 described above with respect to FIG. 8.
- the applying the recurrent predictive network includes generating previous segmentation data based on a previous input image frame of the sequence of input image frames, where the previous input image frame is received before the first input image frame, and generating first segmentation data based on the first input image frame and previous segmentation data.
- the previous input image frame can be any image frame in the sequenced received before the first input image frame or an image frame immediately before the first input image frame in the sequence.
- the first input image frame corresponds to the input image frame 202 TI received at a current time Tl
- the first segmentation data corresponds to the output 306 TI
- the previous input image frame corresponds to the input image frame 202 TO received at a previous time TO
- the previous segmentation data corresponds to the output 306 TO as shown in the configuration 300 described with respect to FIG. 3.
- the generating the previous segmentation data includes applying a convolutional encoder (e.g., the convolutional encoders 220) and a recurrent neural network (e.g., the RNN 230) to the previous input image frame.
- the generating the first segmentation data includes applying the convolutional encoder to the first input image frame to generate encoded data and applying the recurrent neural network to the encoded data and the previous segmentation data.
- the applying the recurrent predictive network further includes applying a convolutional decoder (e.g., the convolutional decoder 240) to the first segmentation data and the previous segmentation data.
- the convolutional encoder, the recurrent neural network, and the convolutional decoder operate at multiple spatial resolutions (e.g., the spatial resolutions 410, 412, 414, and 416).
- the moving object includes the medical device traversing through the patient’s anatomy.
- the convolutional encoder, the recurrent neural network, and the convolutional decoder are trained to identify the medical device from the patient’s anatomy and predict a motion associated with the medical device traversing through the patient’s anatomy.
- the moving object includes the patient’s anatomy with at least one of a cardiac motion, a breathing motion, or an arterial pulse.
- the convolutional encoder, the recurrent network, and the convolutional decoder are trained to identify a moving portion of the patient’s anatomy from a static portion of the patient’s anatomy and predict a motion associated with the moving portion.
- the moving object includes the medical device traversing through the patient’s anatomy and the system includes the medical device.
- the medical device comprises at least one of a needle, a guidewire, a catheter, a guided catheter, a therapy device, or an interventional device.
- the input image frames include 3D image frames and the recurrent predictive network is trained for 4D image segmentation based on temporal information.
- the sequence of input image frames includes 2D image frames and the recurrent predictive network is trained for 3D image segmentation based on temporal information.
- the method 1300 further includes applying spline fitting (e.g., the spline fitting component 920 to the sequence of input image frames based on the segmentation data.
- the spline fitting may utilize spatial information and temporal information in the sequence of input image frames and predictions by the recurrent predictive network.
- FIG. 14 is a flow diagram of a deep learning-based ultrasound imaging method, according to aspects of the present disclosure.
- the method 1400 is implemented by the system 100, for example, by a processor circuit such as the processor circuit 1200, and/or other suitable component such as the probe 110, the processing component 114, the host 130, and/or the processing component 134.
- the system 100 can include computer-readable medium having program code recorded thereon, the program code comprising code for causing the system 100 to execute the steps of the method 1400.
- the method 1400 may employ similar mechanisms as in the schemes 1000 described with respect to FIG. 10, the configurations 300 and 400 described with respect to FIGS. 3 and 4, respectively.
- the method 1400 includes a number of enumerated steps, but embodiments of the method 1400 may include additional steps before, after, and in between the enumerated steps. In some embodiments, one or more of the enumerated steps may be omitted or performed in a different order.
- the method 1400 includes receiving, by a processor circuit (e.g., the processing component 116 and/or 134 and/or the processor circuit 1200) from an ultrasound imaging device (e.g., the probe 110), a sequence of image frames (e.g., the image frames 1002 and/or 1110) representative of an anatomy of a patient over a time period (e.g., spanning time TO, Tl, T2, .. Tn).
- a processor circuit e.g., the processing component 116 and/or 134 and/or the processor circuit 1200
- an ultrasound imaging device e.g., the probe 110
- a sequence of image frames e.g., the image frames 1002 and/or 1110
- the anatomy may be similar to the object 105 and may include a heart, lungs, and/or any anatomical structure of the patient.
- the method 1400 includes applying a recurrent predictive network (e.g., the deep learning network 1010) associated with image acquisition to the sequence of image frames to generate imaging plane data associated with a clinical property of the patient’s anatomy.
- a recurrent predictive network e.g., the deep learning network 1010
- the clinical property may be associated with a heart condition, a lung condition, and/or any other clinical condition.
- the method 1400 includes outputting, to a display (e.g., the display 132) in communication with the processor circuit based on the imaging plane data, at least one of a target imaging plane (e.g., a cross-sectional plane, a longitudinal plane, or an MPR plane) of the patient’s anatomy or an instruction for repositioning the ultrasound imaging device towards the target imaging plane.
- a display e.g., the display 132
- the processor circuit based on the imaging plane data
- at least one of a target imaging plane e.g., a cross-sectional plane, a longitudinal plane, or an MPR plane
- the applying the recurrent predictive network includes generating first imaging plane data based on a first image frame of the sequence of image and generating second imaging plane data based on a second image frame of the sequence of image frames and the first imaging plane data, the second image frame being received after the first image frame.
- the first image frame corresponds to the input image frame 1002 received at a previous time TO
- the first imaging plane data corresponds to the output of the RNN 1030 at time TO
- the second image frame corresponds to the input image frame 1002 TI received at a current time Tl
- the second imaging plane data correspond to output of the RNN 1030 at time Tl, as shown in the scheme 1000 described with respect to FIG. 10.
- the generating the first imaging plane data includes applying a convolutional encoder (e.g., the convolutional encoders 1020) and a recurrent neural network (e.g., the RNN 1030) to the first image frame.
- the generating the second imaging plane data includes applying the convolutional encoder to the second image frame to generate encoded data and applying the recurrent neural network to the encoded data and the first imaging plane data.
- the applying the recurrent predictive network further includes applying a convolutional decoder (e.g., the convolutional decoder 1040) to the first imaging plane data and the second imaging plane data.
- the convolutional encoder, the recurrent neural network, and the convolutional decoder operate at multiple spatial resolutions (e.g., the spatial resolutions 410, 412, 414, and 416).
- the convolutional encoder, the recurrent network, and the convolutional decoder are trained to predict the target imaging plane for imaging the clinical property of the patient’s anatomy.
- the input image frames include 3D image frames and the recurrent predictive network is trained for 3D image acquisition based on temporal information.
- the sequence of input image frames includes 2D image frames and the recurrent predictive network is trained for 2D image acquisition based on temporal information.
- the method 1400 outputs the target imaging plane including at least one of a cross-sectional image slice (e.g., the image slice 1006 and/or 1120), an orthogonal image slice (e.g., the image slice 1008 and/or 1130), or a multiplanar MPR image slice of the patient’s anatomy including the clinical property.
- a cross-sectional image slice e.g., the image slice 1006 and/or 1120
- an orthogonal image slice e.g., the image slice 1008 and/or 1130
- a multiplanar MPR image slice of the patient including the clinical property.
- the method 1400 includes generating an ultrasound beam steering control signal based on the imaging plane data and outputting, to the ultrasound imaging device, the ultrasound beam steering control signal.
- the ultrasound beam steering control signal may steer ultrasound beams generated by a transducer array (e.g., the transducer array 112) of the ultrasound imaging device.
- the processor circuit outputs the instruction including at least one of a rotation or a translation of the ultrasound imaging device.
- the instruction can provide a user with guidance in maneuvering the ultrasound imaging device to an optimal imaging location (e.g., the target imaging plane) for obtaining a target image view of the patient’s anatomy.
- temporal continuity information in the deep learning network (e.g., the deep learning networks 210 and 1010) allows the deep learning network to learn and predict based on a series of observations in time rather than over a single point in time.
- the temporal continuity information provides additional dimensionality information that can improve segmentations of elongate flexibly thinly- shaped moving objects that may be otherwise difficult for segmentations.
- the disclosed embodiments can provide stable view of motions of a moving object under 2D and/or 3D imaging.
- the use of spline fitting as a refinement to the deep learning network output can further provide a smooth transition of motions associated with the moving object under imaging.
- the use of temporal continuity information can also provide automatic view-finding, for example, including beam steering controls and/or imaging guidance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Surgery (AREA)
- Radiology & Medical Imaging (AREA)
- Animal Behavior & Ethology (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Molecular Biology (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Quality & Reliability (AREA)
- Robotics (AREA)
- Multimedia (AREA)
- Ultra Sonic Daignosis Equipment (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2021558735A JP7462672B2 (ja) | 2019-04-02 | 2020-03-30 | 超音波撮像におけるセグメンテーション及びビューガイダンス並びに関連するデバイス、システム及び方法 |
| US17/599,590 US12020434B2 (en) | 2019-04-02 | 2020-03-30 | Segmentation and view guidance in ultrasound imaging and associated devices, systems, and methods |
| CN202080026799.5A CN113678167B (zh) | 2019-04-02 | 2020-03-30 | 超声成像中的分割和视图引导以及相关联的设备、系统和方法 |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962828185P | 2019-04-02 | 2019-04-02 | |
| US62/828,185 | 2019-04-02 | ||
| US202062964715P | 2020-01-23 | 2020-01-23 | |
| US62/964,715 | 2020-01-23 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020201183A1 true WO2020201183A1 (en) | 2020-10-08 |
Family
ID=70058368
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2020/058898 Ceased WO2020201183A1 (en) | 2019-04-02 | 2020-03-30 | Segmentation and view guidance in ultrasound imaging and associated devices, systems, and methods |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12020434B2 (https=) |
| JP (1) | JP7462672B2 (https=) |
| CN (1) | CN113678167B (https=) |
| WO (1) | WO2020201183A1 (https=) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112116608A (zh) * | 2020-10-22 | 2020-12-22 | 上海联影医疗科技股份有限公司 | 一种导丝分割方法、装置、电子设备及存储介质 |
| CN113920131A (zh) * | 2021-09-23 | 2022-01-11 | 珠海横乐医学科技有限公司 | 基于视频时间序列的导丝分割方法、装置及可读介质 |
| EP4111982A1 (en) * | 2021-06-29 | 2023-01-04 | Koninklijke Philips N.V. | Systems and apparatuses for navigation and procedural guidance of laser leaflet resection under intracardiac echocardiography |
| JP2023049951A (ja) * | 2021-09-29 | 2023-04-10 | テルモ株式会社 | コンピュータプログラム、情報処理方法、及び情報処理装置 |
| JP2024513400A (ja) * | 2021-04-02 | 2024-03-25 | アノード アイピー エルエルシー | 診断用途または介入用途のための電子医用画像を処理するシステムおよび方法 |
| US12016724B2 (en) | 2019-09-26 | 2024-06-25 | Koninklijke Philips N.V. | Automatic closed-loop ultrasound plane steering for target localization in ultrasound imaging and associated devices, systems, and methods |
| US12524875B2 (en) | 2020-09-04 | 2026-01-13 | Shanghai United Imaging Healthcare Co., Ltd. | Systems and methods for image processing |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021061257A1 (en) * | 2019-09-27 | 2021-04-01 | Google Llc | Automated maternal and prenatal health diagnostics from ultrasound blind sweep video sequences |
| JP7561833B2 (ja) * | 2020-03-30 | 2024-10-04 | テルモ株式会社 | コンピュータプログラム、情報処理方法及び情報処理装置 |
| CN116783509A (zh) * | 2020-12-18 | 2023-09-19 | 皇家飞利浦有限公司 | 具有基于解剖结构的声学设置的超声成像 |
| US12361557B2 (en) * | 2020-12-21 | 2025-07-15 | Medtronic Navigation, Inc. | Systems and methods for monitoring one or more anatomical elements |
| US20220405957A1 (en) * | 2021-06-18 | 2022-12-22 | Rutgers, The State University Of New Jersey | Computer Vision Systems and Methods for Time-Aware Needle Tip Localization in 2D Ultrasound Images |
| US20230326596A1 (en) * | 2022-04-12 | 2023-10-12 | Canon Medical Systems Corporation | Information processing method, medical image diagnostic apparatus, and information processing system |
| CN119895458A (zh) | 2022-08-30 | 2025-04-25 | 皇家飞利浦有限公司 | 使用来自未标记数据的学习的超声视频特征检测 |
| WO2024104857A1 (en) | 2022-11-15 | 2024-05-23 | Koninklijke Philips N.V. | Automatic measurement point detection for anatomy measurement in anatomical images |
| EP4467080A1 (en) | 2023-05-22 | 2024-11-27 | Koninklijke Philips N.V. | Fusion of extraluminal and intraluminal images using anatomical feature in intraluminal image and associated systems, devices, and methods |
| KR102704221B1 (ko) * | 2023-12-22 | 2024-09-06 | 전남대학교산학협력단 | 인공지능 기반 의료영상으로부터 생체 조직검사바늘의 위치를 검출하는 실시간 니들 가이던스 방법 및 장치 |
Family Cites Families (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6708055B2 (en) * | 1998-08-25 | 2004-03-16 | University Of Florida | Method for automated analysis of apical four-chamber images of the heart |
| US20040225221A1 (en) * | 2003-05-06 | 2004-11-11 | Olsson Lars Jonas | Diagnostic ultrasound imaging system with adaptive persistence |
| US20060030777A1 (en) * | 2004-07-30 | 2006-02-09 | Liang David H | T-statistic method for suppressing artifacts in blood vessel ultrasonic imaging |
| CN101231755B (zh) * | 2007-01-25 | 2013-03-06 | 上海遥薇(集团)有限公司 | 运动目标跟踪及数量统计方法 |
| JP5996870B2 (ja) * | 2009-01-23 | 2016-09-21 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | 心臓画像の処理及び分析 |
| US8396268B2 (en) * | 2010-03-31 | 2013-03-12 | Isis Innovation Limited | System and method for image sequence processing |
| US20150253428A1 (en) * | 2013-03-15 | 2015-09-10 | Leap Motion, Inc. | Determining positional information for an object in space |
| US10588605B2 (en) * | 2015-10-27 | 2020-03-17 | General Electric Company | Methods and systems for segmenting a structure in medical images |
| US10667776B2 (en) * | 2016-08-11 | 2020-06-02 | Siemens Healthcare Gmbh | Classifying views of an angiographic medical imaging system |
| RU2017142603A (ru) * | 2016-12-02 | 2019-08-07 | Авент, Инк. | Система и способ для навигации к целевому анатомическому объекту в медицинских процедурах на основе визуализации |
| JP7083143B2 (ja) * | 2016-12-07 | 2022-06-10 | キャプション ヘルス インコーポレイテッド | 超音波探触子の誘導ナビゲーション |
| US20180247199A1 (en) * | 2017-02-24 | 2018-08-30 | Qualcomm Incorporated | Method and apparatus for multi-dimensional sequence prediction |
| US10032281B1 (en) * | 2017-05-03 | 2018-07-24 | Siemens Healthcare Gmbh | Multi-scale deep reinforcement machine learning for N-dimensional segmentation in medical imaging |
| CN107622485B (zh) * | 2017-08-15 | 2020-07-24 | 中国科学院深圳先进技术研究院 | 一种融合深度张量神经网络的医学影像数据分析方法和系统 |
| AU2018323621A1 (en) * | 2017-08-31 | 2020-02-06 | Butterfly Network, Inc. | Methods and apparatus for collection of ultrasound data |
| US10860859B2 (en) * | 2017-11-30 | 2020-12-08 | Nvidia Corporation | Budget-aware method for detecting activity in video |
| CN108053410B (zh) * | 2017-12-11 | 2020-10-20 | 厦门美图之家科技有限公司 | 运动目标分割方法及装置 |
| US10664979B2 (en) * | 2018-09-14 | 2020-05-26 | Siemens Healthcare Gmbh | Method and system for deep motion model learning in medical images |
| US10426442B1 (en) * | 2019-06-14 | 2019-10-01 | Cycle Clarity, LLC | Adaptive image processing in assisted reproductive imaging modalities |
-
2020
- 2020-03-30 US US17/599,590 patent/US12020434B2/en active Active
- 2020-03-30 WO PCT/EP2020/058898 patent/WO2020201183A1/en not_active Ceased
- 2020-03-30 CN CN202080026799.5A patent/CN113678167B/zh active Active
- 2020-03-30 JP JP2021558735A patent/JP7462672B2/ja active Active
Non-Patent Citations (2)
| Title |
|---|
| KHANAL BISHESH ET AL: "EchoFusion: Tracking and Reconstruction of Objects in 4D Freehand Ultrasound Imaging Without External Trackers", 15 September 2018, PERVASIVE: INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER, BERLIN, HEIDELBERG, PAGE(S) 117 - 127, ISBN: 978-3-642-17318-9, XP047485706 * |
| XIN YANG ET AL: "Fine-grained Recurrent Neural Networks for Automatic Prostate Segmentation in Ultrasound Images", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 6 December 2016 (2016-12-06), XP080737041 * |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12016724B2 (en) | 2019-09-26 | 2024-06-25 | Koninklijke Philips N.V. | Automatic closed-loop ultrasound plane steering for target localization in ultrasound imaging and associated devices, systems, and methods |
| US12524875B2 (en) | 2020-09-04 | 2026-01-13 | Shanghai United Imaging Healthcare Co., Ltd. | Systems and methods for image processing |
| CN112116608A (zh) * | 2020-10-22 | 2020-12-22 | 上海联影医疗科技股份有限公司 | 一种导丝分割方法、装置、电子设备及存储介质 |
| JP2024513400A (ja) * | 2021-04-02 | 2024-03-25 | アノード アイピー エルエルシー | 診断用途または介入用途のための電子医用画像を処理するシステムおよび方法 |
| EP4111982A1 (en) * | 2021-06-29 | 2023-01-04 | Koninklijke Philips N.V. | Systems and apparatuses for navigation and procedural guidance of laser leaflet resection under intracardiac echocardiography |
| CN113920131A (zh) * | 2021-09-23 | 2022-01-11 | 珠海横乐医学科技有限公司 | 基于视频时间序列的导丝分割方法、装置及可读介质 |
| JP2023049951A (ja) * | 2021-09-29 | 2023-04-10 | テルモ株式会社 | コンピュータプログラム、情報処理方法、及び情報処理装置 |
| JP7686523B2 (ja) | 2021-09-29 | 2025-06-02 | テルモ株式会社 | コンピュータプログラム、情報処理方法、及び情報処理装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20220198669A1 (en) | 2022-06-23 |
| CN113678167A (zh) | 2021-11-19 |
| JP2022526575A (ja) | 2022-05-25 |
| JP7462672B2 (ja) | 2024-04-05 |
| CN113678167B (zh) | 2025-08-12 |
| US12020434B2 (en) | 2024-06-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12020434B2 (en) | Segmentation and view guidance in ultrasound imaging and associated devices, systems, and methods | |
| EP4061231B1 (en) | Intelligent measurement assistance for ultrasound imaging and associated devices, systems, and methods | |
| CN114073548B (zh) | 用于在混合现实/虚拟现实中生成脉管表示的系统和方法 | |
| US12232907B2 (en) | Intraluminal ultrasound navigation guidance and associated devices, systems, and methods | |
| US20230301624A1 (en) | Image-Based Probe Positioning | |
| JP5161118B2 (ja) | 動脈の画像システム | |
| US20150011886A1 (en) | Automatic imaging plane selection for echocardiography | |
| KR20190038448A (ko) | 의료 진단 이미징에서의 측정 포인트 결정 | |
| WO2022069208A1 (en) | Ultrasound image-based patient-specific region of interest identification, and associated devices, systems, and methods | |
| JP2022509391A (ja) | 管腔内超音波方向性ガイダンス、並びに関連するデバイス、システム、及び方法 | |
| EP4033987B1 (en) | Automatic closed-loop ultrasound plane steering for target localization in ultrasound imaging and associated devices and systems | |
| JP2023521466A (ja) | ロードマップ画像を生成するためのバイプレーン及び3次元超音波画像取得、並びに関連づけられたシステム及びデバイス | |
| US20240273822A1 (en) | System and Method for Generating Three Dimensional Geometric Models of Anatomical Regions | |
| WO2022128838A1 (en) | Ultrasound image-based identification of anatomical scan window, probe orientation, and/or patient position | |
| JP5468759B2 (ja) | 位置情報に基づいて関心対象ボリュームを収集するための方法及びシステム | |
| CN119789816A (zh) | 用于医学状况的护理点分期的引导超声成像 | |
| Martens et al. | The EchoPAC-3D software for 3D image analysis | |
| WO2025209959A1 (en) | Intravascular imaging and therapeutic treatment of chronic total occlusions and associated systems, devices, and methods | |
| Crestan | Automatic segmentation framework for left atrial appendage orifice identification from 3D echocardiography | |
| CN119895458A (zh) | 使用来自未标记数据的学习的超声视频特征检测 | |
| WO2025067903A1 (en) | Ultrasound imaging with follow up sweep guidance after blind sweep protocol | |
| CN120227062A (zh) | 用于提取左心耳的二维短轴视图的系统和方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20715838 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2021558735 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20715838 Country of ref document: EP Kind code of ref document: A1 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 202080026799.5 Country of ref document: CN |