CN112734798B

CN112734798B - On-line self-adaptive system and method for neural network

Info

Publication number: CN112734798B
Application number: CN202011347044.3A
Authority: CN
Inventors: 孙善辉; 余瀚超; 陈潇; 陈章; 陈德仁
Original assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Current assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Priority date: 2019-11-27
Filing date: 2020-11-26
Publication date: 2023-11-07
Anticipated expiration: 2040-11-26
Also published as: CN112734798A

Abstract

Neural network-based systems, methods, and apparatuses associated with image data processing are described herein. The neural network may be pre-trained to learn parameters or models for processing image data, and upon deployment, the neural network may automatically perform further optimization of the learned parameters or models based on a small set of online data samples. On-line optimization may be facilitated via offline meta-learning, such that optimization may be accomplished quickly in several optimization steps.

Description

On-line self-adaptive system and method for neural network

Cross Reference to Related Applications

The present application claims the benefit of provisional U.S. patent application Ser. No. 62/941,198, filed 11/27 in 2019, and provisional U.S. patent application Ser. No. 17/039,355, filed 9/30 in 2020, the disclosures of which are incorporated herein by reference in their entireties.

Technical Field

The application relates to the technical field of image processing based on deep learning.

Background

In recent years, image processing techniques based on deep learning have been increasingly used to improve the quality of various services including health services. For example, an artificial neural network with machine learning capabilities may be trained to learn a predictive model for detecting differences between adjacent Cardiac Magnetic Resonance (CMR) images, and to estimate the motion of the heart based on the detected differences. The estimate may then be used to assess the anatomy and/or function of the heart, for example, by calculating myocardial strain for a particular subject of the myocardium based on the estimate. While these learning-based image processing techniques have shown tremendous promise in improving the accuracy and efficiency of image or video processing, they often suffer from significant performance degradation when deployed. One of the reasons for the performance degradation is that it is extremely difficult, if not impossible, to collect data that truly represents the distribution of a subject (e.g., heart motion) in the general population. It follows that the data used to train the neural network often does not match the data to be processed at the predicted or test time (e.g., post-deployment).

Accordingly, it is highly desirable for a pre-trained neural network system to have the ability to perform adaptive online learning so that the neural network system can adjust model parameters acquired via pre-training based on data received at a prediction or test time to increase the robustness of the predictive model. Since adaptive learning will be performed while the neural network system is online, it is also desirable for such a system to have the ability to complete online learning in a quick and efficient manner (e.g., using only a small number of samples or via a small number of steps).

Disclosure of Invention

Neural network-based systems, methods, and apparatuses associated with image data processing such as motion tracking and/or image registration are described herein. A system described herein may include at least one processor configured to implement one or more artificial neural networks (e.g., encoder networks and/or decoder networks) having predetermined parameters for processing images or video of an anatomical structure (e.g., myocardium). In bringing one or more artificial neural networks online (online) to process the images or video, the at least one processor may perform online adjustment of predetermined parameters of the one or more artificial neural networks based on a first set of online images of the anatomical structure (e.g., while the one or more artificial neural networks are online). On-line adjustment may be performed, for example, by: a penalty associated with processing the first set of online images using the predetermined parameters is determined, and the predetermined parameters are adjusted based on gradient descent associated with the penalty (e.g., by counter-propagating gradient descent of the one or more artificial neural networks). Once the predetermined parameters are adjusted (e.g., optimized based on the first set of online images), the at least one processor may process a second set of online images of the anatomical structure using the adjusted parameters of the one or more artificial neural networks. Online here refers to the non-learning phase, but the application phase, which may also be replaced by the deployment of this word. The second set of online images herein may be identical to, different from, or partially identical to the first set. In summary, the learned parameters are applied to the entire video. It is the own image itself for registration.

The predetermined parameters of the one or more artificial neural networks may be obtained via offline meta-learning (offline meta-learning) that facilitates online adjustment of the parameters. Meta-learning may be performed using respective instances of one or more artificial neural networks configured with baseline parameters and a training set comprising a plurality of training videos. For each of the plurality of training videos, a respective copy (copy) of the baseline parameter may be obtained. A first set of training images may be selected from each training video (e.g., K pairs of images, where K may be equal to one for an image registration task and K may be greater than one for a motion tracking task), and respective first losses associated with processing the first set of training images using respective copies of the baseline parameters may be determined. The respective copies of the baseline parameters may then be optimized based on the gradient descent associated with the first loss. In response to optimizing respective copies of baseline parameters associated with respective training videos, a second set of training images may be selected from the training videos, and a second penalty associated with processing the second set of training images using the optimized copies of the baseline parameters may be determined. An average of respective second losses associated with processing respective second sets of training images of the plurality of training videos may be determined, and the baseline parameters may be updated based on gradient drops associated with the average losses. Alternatively, in response to optimizing respective copies of baseline parameters associated with respective training videos and determining a second loss associated with processing a second set of training images using the optimized copies of the baseline parameters, gradient drops associated with the second loss may be determined and the baseline parameters may be updated based on an average of the respective gradient drops associated with processing the respective second set of training images. In either case, the first loss and the second loss may be determined based on a loss function, and the baseline parameter may be updated based on a first order approximation of the loss function. The second set of training images may be identical to, different from, or partially identical to the first set.

The baseline parameters used during meta-learning may be derived based on a first training set featuring a first distribution, and the plurality of training videos for meta-learning may be derived from a second training set featuring a second distribution that does not match the first distribution.

Drawings

A more detailed understanding of the examples disclosed herein may be obtained from the following description, which is given by way of example in connection with the accompanying drawings.

Fig. 1A and 1B are block diagrams illustrating an example application domain in which an image processing system described herein may be deployed.

Fig. 2 is a block diagram illustrating an example of estimating motion fields using the image processing system described herein.

FIG. 3 is a flowchart illustrating an example of adaptive online learning in accordance with one or more embodiments described herein.

FIG. 4 is a flowchart illustrating an example of meta learning in accordance with one or more embodiments described herein.

FIG. 5 is another flow diagram illustrating an example of meta learning in accordance with one or more embodiments described herein.

FIG. 6 is a block diagram illustrating example components of an image processing system described herein.

FIG. 7 is a flowchart illustrating an example neural network training process in accordance with one or more embodiments described herein.

Detailed Description

The present disclosure is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings.

Fig. 1A and 1B are block diagrams illustrating examples of motion tracking and image registration that may be performed by the systems or devices described herein. These examples will be described in the context of cardiac Magnetic Resonance Imaging (MRI), but it should be noted that the use of the systems, methods, and apparatus disclosed herein is not limited to medical images or to certain types of imaging modalities. Rather, the disclosed systems, methods, and apparatus may be applicable to a variety of image processing tasks, including, for example, estimating optical flow based on video captured by a camera.

Fig. 1A illustrates an example of heart motion tracking (e.g., estimation) based on MRI video of a human heart, such as a cine MRI, which may include multiple images of the heart recorded at different points in time (e.g., successive points in time along a time axis t). The video may depict one or more cycles of systole and diastole. It can thus be seen that, starting from the first image frame 102 of the video, the medical image processing system or device described herein may estimate the motion of the heart (e.g., myocardium) between the first image frame 102 and the second image frame 104 by: the visual characteristics (e.g., closed border areas, edges, contours, line intersections, corners, etc.) of the two image frames are compared and changes occurring in the image between the time the first image frame 102 was captured and the time the second image frame 104 was captured are identified. The image processing system may then use the second image frame 104 as a new reference frame and repeat the estimation process for the image frame 104, the third image frame 106, and the remaining frames to obtain motion information for one or more complete cardiac cycles.

Fig. 1B illustrates an example of cardiac image registration, which may involve two or more images (e.g., reference image 112 and sensed image 114) of the same region being aligned and/or superimposed at different times, from different viewpoints, taken by different image sensors, etc. Image registration is an important part of many medical applications in which images from various sources can be combined to obtain a complete assessment of the patient's physical condition (e.g., for monitoring tumor growth, treatment verification, comparison of patient data with anatomical atlases (atlases), etc.). The image processing system described herein may be trained to perform the image registration tasks of images 112 and 114 by: significant and distinct objects in the image (e.g., closed boundary regions, edges, contours, line intersections, corners, etc.) are detected, a correspondence between the objects detected in the reference image 112 and the objects detected in the sensed image 114 is established (e.g., via a displacement field or transformation matrix), a transformation model (e.g., one or more mapping functions) for aligning the sensed image 114 with the reference image 112 is estimated based on the established correspondence, and then the sensed image 114 is resampled and/or transformed (e.g., via linear interpolation) to align with the reference image 112 (or vice versa) using the transformation model.

In the motion tracking and image registration examples shown in fig. 1A and 1B, the image processing system may be configured to determine a motion field (e.g., in the form of a vector field, a vector grid, a vector valued function, or a combination thereof) that represents a displacement of the visual feature from a first image of the anatomical structure to a second image of the anatomical structure and indicates a motion (e.g., translation, rotation, scaling, etc.) of the anatomical structure from the first image to the second image. When the respective motion fields between a plurality of pairs of such images are obtained (e.g. as shown in fig. 1A), the image processing system is able to track the motion of the anatomical structure through the plurality of images. Similarly, based on a motion field between a pair of images (e.g., as shown in fig. 1B), the image processing system can perform motion registration on the pair of images.

Fig. 2 is a block diagram illustrating an example of estimating motion fields using an image processing system 200 as described herein. This example will be described with reference to certain neural network structures or components, but it should be noted that the techniques disclosed herein may also be implemented using other types of neural network structures or components. Accordingly, the structures or components shown or described herein are merely illustrative and not limiting.

As shown in fig. 2, the image processing system 200 may include a feature tracking neural network 202 configured to receive a pair of input images 204a and 204b of an anatomical structure, such as a myocardium, and extract visual features from the images. The image may be part of a heart movie, for example, similar to the movie described in connection with fig. 1. The feature tracking network 202 may include twin subnetworks 202a and 202b, which may be arranged in a Siamese (Siamese) configuration, processing the input images 204a and 204b in tandem. Subnets 202a and 202b may be twin neural networks having similar operating parameters or weights (e.g., the weights of subnets 202a and 202b may be the same). Each of the subnetworks 202a and 202b may include an artificial neural network, such as a Convolutional Neural Network (CNN) or a fully convolutional neural network (FCN), and in turn, may include multiple layers, such as one or more convolutional layers, one or more pooled layers, and/or one or more fully connected layers. The convolutional layer of the neural network may include a plurality of convolutional kernels or filters configured to extract particular features from the input image 204a or 204b via one or more convolution operations. The convolution operation may be followed by batch normalization and/or nonlinear activation, and features extracted by the convolution layer (e.g., in the form of one or more feature maps) may be downsampled (e.g., using a 2 x 2 window and a step of 2) by one or more pooling layers to reduce redundancy and/or size of the features (e.g., by a factor of 2). As a result of the convolution operation and/or the downsampling operation, respective feature representations (e.g., hidden space representations) of the input images 204a and 204b may be obtained, for example, in the form of a twinned feature map or feature vector and/or at multiple levels of scaling and abstraction.

The feature maps or feature vectors associated with the input images 204a and 204b may be compared or matched (e.g., at the block level and/or via a correlation layer) to determine differences or changes (e.g., displacements) between the two input images, and the motion (e.g., flow) of the anatomical structure is further estimated based on the determined differences or changes (e.g., as indicated by the similarity metric or scoring map). The image processing system 200 may include a motion tracking neural network 206 (e.g., a multi-scale decoder network) configured to perform such estimation tasks. The motion tracking network 206 may include one or more CNNs or FCNs, and each CNN or FCN may include multiple layers, such as multiple convolutional layers (e.g., transpose convolutional layers) and/or non-pooled layers. Through these layers, the motion tracking network 206 may track the motion of the objectThe feature map or feature vector produced by the extraction network 202 performs a series of upsampling operations and/or transpose convolution operations (e.g., scaling and abstraction at multiple levels) to decode the features and restore them to the original image size or resolution. For example, the motion tracking network 206 may upsample the feature representation generated by the feature extraction network 202 based on the pooled index stored by the feature extraction network 202. The motion tracking network 206 may then process the upsampled representation by one or more transpose convolution operations (e.g., using a 3 x 3 transpose convolution kernel with a stride of 2) and/or one or more batch normalization operations to obtain one or more dense feature maps (e.g., scaled by a factor of 2). Based on the dense feature map, the motion tracking network 206 may predict the motion field 208 _ab The motion field represents a displacement of the visual feature from the input image 204a to the input image 204b (e.g., in the form of a vector field, a vector grid, a vector valued function, or a combination thereof), thereby indicating movement of the anatomical structure from the image 204a to the image 204b.

The image processing system 200 (e.g., the feature extraction network 202 and/or the motion tracking network 206) shown in fig. 2 may learn and predict the motion field 208 through personalized training and end-to-end training _ab Associated parameters (e.g., weights). Training may be performed, for example, using a training set that includes a plurality of images or videos of the target anatomy. Since annotating the motion field of the anatomical structure is a very tricky task, the training of the image processing system 200 may be performed in an unsupervised or self-supervised manner. For example, the image processing system 200 may include a spatial transformation network 210 (e.g., a micro-space-transformable network) configured to be based on the input image 204a and the predicted motion field 208 _ab A warped image 212 is generated. Training of the image processing system 200 may then be performed in order to minimize the difference between the warped image 212 and the input image 204b, which serves as a reference image during the training process.

The spatial transformation network 210 may include an input layer, one or more hidden layers (e.g., convolutional layers), and/or an output layer. In operation, the spatial transformation network 210 may obtain an input mapImage 204a (e.g., one or more feature maps of input image 204a generated by feature extraction network 202) and/or motion field 208 _ab Based on sports fields 208 _ab A plurality of transformation parameters are obtained and a sampling grid is created that includes a set of points from which the input image 204a may be sampled to generate a transformed or warped image 212. The input image 204a and the sampling grid may then be provided to a sampler of the spatial transformation network 210 to produce an output image (e.g., a warped image 212) by sampling from the input image 204a at the grid points.

The difference between the warped image 212 and the reference image 204b may be determined by the reconstruction loss function L _recon The reconstruction loss function may be based on, for example, a Mean Square Error (MSE) between the warped image 212 and the reference image 204b. In addition to the reconstruction loss function L _recon In addition, training of image processing system 200 may also take into account motion field smoothness loss L _smooth (e.g., to prevent predictions that lead to unrealistic, abrupt motion changes between neighboring image frames) and/or bi-directional stream consistency loss L _con This bi-directional stream consistency loss ensures that the respective motion fields predicted in the forward direction (e.g., using input image 204a as the source image and input image 204b as the target image) and in the backward direction (e.g., using input image 204b as the source image and input image 204a as the target image) are consistent with each other (e.g., have a difference less than a predetermined threshold). Then, the total loss L can be derived _total (e.g., as shown in equation (1) below) and is used to guide the training of the image processing system 200:

L _total ＝L _recon +α _s L _smooth +β _c L _con (1)

wherein alpha is _s And beta _c Is a balance parameter that can be adjusted during training to improve training quality.

While image processing system 200 may learn a baseline model for predicting motion fields associated with anatomical structures via the training process described above, performance of the model may be compromised at test or presumption time, for example, when image processing system 200 and a neural network comprised thereof are brought online (e.g., after training and/or after deployment) to process medical imaging in real-time. Many factors may contribute to performance degradation, including, for example, a mismatch between the distribution of data used to train image processing system 200 and the distribution of data to be processed after deployment (e.g., due to long tail problems often present in medical imaging data). Accordingly, the image processing system 200 described herein may be configured with online learning capabilities such that when deployed to process real medical image data, the system may further optimize its parameters (e.g., adapt the predictive model learned by pre-training) according to the characteristics of the data to be processed.

FIG. 3 illustrates an example online optimization process 300 that may be implemented by an image processing system described herein (e.g., image processing system 200 of FIG. 2) to adjust one or more predetermined parameters of the image processing system. The online optimization process 300 may begin at 302, for example, by an online optimizer of an image processing system (e.g., a gradient descent-based optimizer) and in response to one or more neural networks of the image processing system being brought online to process medical image data associated with anatomical structures (e.g., cardiac cine videos depicting the left ventricle and/or myocardium). At 304 (e.g., prior to beginning processing medical image data), the online optimizer may select (e.g., randomly) videos from the online data for evaluating and/or adjusting predetermined parameters (e.g., weights) of the neural network. At 306, the online optimizer may derive an initial version (denoted θ _t '), e.g., as a copy of a predetermined parameter θ (e.g., θ) _t ' s). At 306, the online optimizer may also initialize other parameters associated with the optimization process, including, for example, a learning rate α, which may be predetermined as a super parameter or meta-learning, a number of online optimization steps to be completed m, an iterator t for the optimization steps (initially set to 1), and so on.

At 308, the online optimizer may sample K pairs of images from the selected video to form an optimized data set D _t ＝{a _t ^(j) , b _t ^(j) }, itWhere j=1..k (e.g., K may be greater than 1 for motion tracking tasks and K may be equal to 1 for image registration tasks), and each pair of sampled images may include a source image and a reference image (e.g., similar to images 204a and 204b in fig. 2) that may be used to predict a corresponding motion field. At 310, the online optimizer may use the current network parameter θ _t ' evaluating (e.g., determining) and processing the optimized dataset D _t Associated losses. The online optimizer may determine the loss, for example, based on the loss function defined in equation (1) above. In response to determining the loss, the online optimizer may also determine that the current network parameter θ should be treated as a factor in reducing or minimizing the loss _t ' adjustment performed. For example, the online optimizer may be based on a gradient descent (e.g., random gradient descent) of the loss function of equation (1)To determine the regulation, wherein->Can represent and use the current network parameter theta _t ' processing optimized dataset D _t Associated losses. Once the adjustments are determined, the online optimizer may apply the adjustments at 312, for example, by back-propagating the adjustments through the neural network based on the learning rate α, as exemplified below:

at 314, the online optimizer may determine whether additional optimization steps need to be performed, such as by comparing the value of t to m. If t is determined to be equal to or less than m, then at 316, the online optimizer may increment (e.g., increment by one) the value of t and repeat the operations of 310-314. If the determination at 314 is that t is greater than m, the online optimizer may output and/or store an adjusted parameter θ at 318 _t ' and exits the online optimization process 300 at 320.

Since the above-described optimization operations are performed online, it may be desirable to complete the operations with only a small number of steps and/or a small number of even one online sample so that the optimization of the system may be completed quickly. The image processing system described herein may be provided with the ability to perform online optimization in this desired manner via meta-learning. FIG. 4 illustrates an example meta-learning process 400 that may be performed by an image processing system to train an online optimizer to quickly adapt or quickly adjust a pre-learning model based on a small set of online data samples. This example will be described with reference to using multiple training videos. It should be noted, however, that meta-learning may also be performed using training sets that include images (e.g., paired images) and/or other forms of data, such as for image registration tasks.

The meta-learning process 400 may begin at 402, for example, during offline training of the image processing system (e.g., when the image processing system has learned a baseline pattern f for motion tracking or image registration _θ After that). At 404, the image processing system may obtain a pre-learning model f _θ And initialize one or more other parameters associated with the meta-learning process 400 including, for example, learning rates α and β to be applied in the meta-learning process (e.g., predetermined as hyper-parameters or meta-learned) (e.g., α and β may be the same or different), the number of optimization steps m to be performed during the meta-learning process, etc. At 406, the image processing system may select a plurality (e.g., N) of videos of the anatomical structure from the training set. The distribution of the N videos may match the distribution of the videos for the pre-training image processing system (e.g., to obtain the predetermined parameter θ), or the distribution of the N videos may not match the distribution of the pre-training videos (e.g., the N videos may be from a different training set than the pre-training set).

At 408, the image processing system may begin processing N videos. For each video i, the image processing system may sample K pairs of images from the video at 410 to form a data set D _i ＝{a _i ^(j) ,b _i ^(j) J=1..k (e.g., for motion tracking tasks, K may beGreater than 1, while for the image registration task K may be equal to 1), and each pair of sampled images may include a source image and a reference image (e.g., similar to images 204a and 204b in fig. 2) that may be used to predict the corresponding motion field. At 410, the image processing system may also initialize an optimization step iterator t (e.g., set to 1) and derive an initial version θ 'of the parameter to be adjusted by copying from the predetermined base line parameter θ' _i ,. At 412, the image processing system may evaluate (e.g., determine) and use the current copy of the network parameters θ _i ' processing data set D _i Associated losses. The image processing system may determine the loss, for example, based on a loss function defined in equation (1) above. In response to determining the loss, the image processing system may also determine that the current network parameter θ should be treated as to reduce or minimize the loss _i ' adjustment performed. For example, the image processing system may be based on a gradient descent (e.g., random gradient descent) of the loss function of equation (1)To determine the adjustment, wherein->Can represent and use the current network parameter theta _i ' processing optimized dataset D _i Associated losses. Once the adjustments are determined, the online optimizer may apply the adjustments at 414, for example, by back-propagating the adjustments through a neural network of the image processing system based on the learning rate α, as illustrated below:

at 416, the image processing system may determine whether additional optimization steps need to be performed, such as by comparing the value of t to m. If t is determined to be equal to or less than m, then at 418 the image processing system may increment the value of t (e.g., by one) and repeat the operations of 412-416. If the determination at 416 is that t is greater thanm, the image processing system may go to 420, where the image processing system may resample (and/or store) the K pair image D 'from video i in 420' _i And/or determine (e.g., recalculate) and use the optimization parameter θ _i ' handle the penalty associated with resampling the image.

From 420, the image processing system may return to 408 and repeat the operations at 410-420 until all N videos are processed. The image processing system may then proceed to 422 to base the learning rate β and respective D 'before exiting the meta-learning process 400 at 424' _i The predetermined parameter θ is adjusted (e.g., K pairs of images resampled from video i). For example, the image processing system may base and use the optimization parameter θ at 422 by _i ' handle re-computed penalty associated with K-pair images from resampling of individual videosTo adjust the predetermined parameter θ: determining the respective loss associated with the N videos>To determine adjustments to be made to the parameter θ based on random gradient drops (e.g., second derivatives) associated with average losses, and back-propagate the adjustments through the image processing system based on the learning rate β, as exemplified below:

through the meta-learning process described herein, the image processing system may obtain high quality initial model parameters that allow for a fast and flexible adaptation of the model parameters based on the real data once the image processing system is brought online and provided with samples of the real medical imaging data for optimization. It should be noted that the online optimization and meta-learning techniques described herein may be generally applicable to many types of applications and may not need to be well-known with the examplesSpecific neural network structures, processes or algorithms are disclosed. For example, the meta-learning process illustrated in FIG. 4 (e.g., as shown in FIG. 5) may be modified such that after the inner for loop, the image processing system may determine a random gradient descent G for each of the N videos based on the loss _i This loss is related to the corresponding optimized copy θ using the network parameters _i 'processing resampled K-pair image D' _i In association, then (e.g., after the outer for loop), the image processing system adjusts the predetermined parameter θ by: computing respective gradient drops G associated with N videos _i The adjustment to be made to the parameter θ is determined based on the average gradient descent and the adjustment is back-propagated through the image processing system based on the learning rate β, as illustrated below and as illustrated by 520-522 of fig. 5.

Such modifications may increase the efficiency of computing and/or graphics processing unit GPU memory usage. For example, the image processing tasks described herein may involve storing a larger number of feature maps (e.g., given a larger image size of 192 x 192), and thus, it can be seen that a large amount of GPU memory may be required. By swapping the gradient operators and the average operators as shown in equation (4) and equation (5), the gradient can be calculated on one or more GPUs before transferring the gradient to the CPU. As another example of modification, instead of using a second derivative that may involve calculating a second order Hessian (Hessian) matrix during back propagation (e.g., as shown in equations (4) and (5)), a first order approximation may be applied to reduce the computational cost of meta-learning.

The image processing systems described herein (e.g., system 200 in fig. 2) may be implemented using one or more processors, one or more storage devices, and/or other suitable auxiliary devices (such as display devices, communication devices, input/output devices, etc.). Fig. 6 is a block diagram illustrating an example image processing system 600 described herein. As shown, the image processing system 600 may include a processor (e.g., one or more processors) 602, which may be a Central Processing Unit (CPU), graphics Processing Unit (GPU), microcontroller, reduced Instruction Set Computer (RISC) processor, application Specific Integrated Circuit (ASIC), application specific instruction set processor (ASIP), physical Processing Unit (PPU), digital Signal Processor (DSP), field Programmable Gate Array (FPGA), or any other circuit or processor capable of performing the functions described herein. Image processing system 600 may also include communication circuitry 604, memory 606, mass storage 608, input devices 610, and/or communication links 612 (e.g., a communication bus) through which one or more of the components shown in fig. 6 may exchange information. The communication circuitry 604 may be configured to transmit and receive information using one or more communication protocols (e.g., TCP/IP) and one or more communication networks, including a Local Area Network (LAN), a Wide Area Network (WAN), the internet, a wireless data network (e.g., wi-Fi, 3G, 4G/LTE, or 5G network). The memory 606 may include a storage medium configured to store machine-readable instructions that, when executed, cause the processor 602 to perform one or more functions described herein. Examples of machine-readable media may include volatile or nonvolatile memory, including but not limited to semiconductor memory (e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM)), flash memory, and the like. Mass storage device 608 may include one or more magnetic disks, such as one or more internal hard disks, one or more removable disks, one or more magneto-optical disks, one or more CD-ROM or DVD-ROM disks, etc., on which instructions and/or data may be stored to facilitate operation of processor 602. Input devices 610 may include a keyboard, mouse, voice-controlled input device, touch-sensitive input device (e.g., touch screen), etc., for receiving user input from image processing system 600.

It should be noted that image processing system 600 may operate as a stand-alone device or may be connected (e.g., networked or clustered) with other computing devices to perform the functions described herein. And even though only one example of the various components is shown in fig. 6, those skilled in the art will appreciate that image processing system 600 may include multiple examples of one or more of the components shown in the figures. Further, while examples are described herein with reference to various types of neural networks, various types of layers, and/or various tasks performed by certain types of neural networks or layers, these references are made for illustrative purposes only and are not intended to limit the scope of the present disclosure. Additionally, the operations of an example image processing system are depicted and described herein in a particular order. However, it should be understood that these operations may occur in various orders, concurrently, and/or with other operations not presented or described herein. And not all operations that an image processing system is capable of performing are depicted and described herein, and not all illustrated operations need be performed by the system.

Fig. 7 is a flow chart of an example process 700 for training a neural network (e.g., the feature tracking network 202 or the motion tracking network 206 of fig. 2). Process 700 may begin at 702, and at 704, a neural network may initialize its operating parameters, such as weights associated with one or more filters or cores of the neural network. The parameters may be initialized, for example, based on a sample of one or more probability distributions or parameter values from another neural network having a similar architecture. At 706, the neural network can receive one or more training images, process the images through various layers of the neural network, and predict target results (e.g., motion fields, classification charts, etc.) using the currently assigned parameters. At 708, the neural network may determine an adjustment to be made to the currently assigned parameter based on the loss function and a gradient descent (e.g., a random gradient descent) associated with the loss function. For example, the loss function may be implemented based on a Mean Square Error (MSE) or L1 norm distance between the prediction and a gold standard associated with the prediction. At 710, the neural network may adjust the currently assigned parameters, for example, via a back-propagation process. At 712, the neural network may determine whether one or more training termination criteria are met. For example, if the neural network has completed a predetermined number of training iterations, if the difference between the predicted value and the golden standard value is below a predetermined threshold, or if the change in the value of the loss function between two training iterations is below a predetermined threshold, the neural network may determine that the training termination criteria is met. If it is determined at 712 that the training termination criteria are not met, the neural network may return to 706. If it is determined at 712 that the training termination criteria are met, the neural network may end the training process 700 at 714.

Although the present disclosure has been described in terms of certain embodiments and generally associated methods, alterations and permutations of the embodiments and methods will be apparent to those skilled in the art. Thus, the above description of example embodiments does not limit the present disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure. In addition, unless specifically stated otherwise, discussions utilizing terms such as "analyzing," "determining," "enabling," "identifying," "modifying," or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulate and transform data represented as physical (e.g., electronic) quantities within the computer system's registers and memories into other data represented as physical quantities within the computer system memories or other such information storage, transmission or display devices.

It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims

1. A neural network online adaptation system, the system comprising at least one processor, wherein the at least one processor is configured to:

implementing one or more artificial neural networks, wherein the one or more artificial neural networks are configured with a plurality of predetermined parameters for processing images of anatomical structures;

bringing the one or more artificial neural networks online to process the image; and is also provided with

While the one or more artificial neural networks are online:

performing an online adjustment of the predetermined parameters of the one or more artificial neural networks based on a first set of online images of the anatomical structure, wherein the online adjustment is performed at least by: determining a loss associated with processing the first set of online images using the predetermined parameters and adjusting the predetermined parameters based on gradient descent associated with the loss, and wherein the predetermined parameters are obtained via offline meta-learning that facilitates the online adjustment; and is also provided with

Processing a second set of online images of the anatomical structure using the adjusted parameters of the one or more artificial neural networks.

2. The system of claim 1, wherein the online adjustment of the predetermined parameter is performed by counter-propagating the gradient descent of the one or more artificial neural networks.

3. The system of claim 1, wherein the meta-learning is performed using a plurality of training videos of the anatomical structure, the meta-learning comprising:

obtaining baseline parameters for processing the image of the anatomical structure;

optimizing the respective copies of the baseline parameters based on respective first losses associated with processing each of the plurality of training videos using the respective copies of the baseline parameters;

determining a respective second penalty associated with processing each of the plurality of training videos using the respective optimized copy of the baseline parameter; and

the baseline parameters are updated based on the respective second losses associated with the plurality of training videos.

4. A system according to claim 3, wherein the meta-learning comprises:

for each of the plurality of training videos:

deriving the respective copies of the baseline parameters of the training video;

selecting a first set of training images from the training video;

determining a first penalty associated with processing the first set of training images using the respective copies of the baseline parameters;

optimizing the respective copies of the baseline parameters based on gradient drops associated with the first loss;

selecting a second set of training images from the training video; and

determining a second penalty associated with processing the second set of training images using the respective optimized copies of the baseline parameters;

determining an average of the respective second losses associated with processing the respective second set of training images of the plurality of training videos; and

the baseline parameter is updated based on a gradient descent associated with the determined average of the respective second losses.

5. The system of claim 4, wherein the respective first and second losses are determined based on a loss function, and wherein the baseline parameter is updated based on a first order approximation of the loss function.

6. A system according to claim 3, wherein the meta-learning comprises:

for each of the plurality of training videos:

selecting a first set of training images from the training video;

selecting a second set of training images from the training video; and

determining a second loss associated with processing the second set of training images using the respective optimized copies of the baseline parameters and a gradient descent associated with the second loss;

determining an average of the respective gradient drops associated with processing the respective second set of training images of the plurality of training videos; and

updating the baseline parameter based on the determined average of the corresponding gradient drops.

7. The system of claim 3, wherein the baseline parameter is derived based on a first training set characterized by a first distribution, and wherein the plurality of training videos are derived from a second training set characterized by a second distribution, the second distribution not matching the first distribution.

8. The system of claim 1, wherein the at least one processor configured to process the image of the anatomical structure comprises: the at least one processor is configured to track movement of the anatomical structure based on the image or to perform image registration based on the image.

9. The system of claim 8, wherein the anatomical structure comprises heart muscle and the image is derived from a Cardiac Magnetic Resonance (CMR) video.

10. An on-line adaptation method of a neural network implemented via at least one processor, the method comprising:

bringing one or more artificial neural networks online to process an image of an anatomical structure, wherein the one or more artificial neural networks are configured with a plurality of predetermined parameters for processing the image of the anatomical structure;

while the one or more artificial neural networks are online:

performing an online adjustment of the predetermined parameters of the one or more artificial neural networks based on a first set of online images of the anatomical structure, wherein the online adjustment is performed at least by: determining a loss associated with processing the first set of online images using the predetermined parameters and adjusting the predetermined parameters based on gradient descent associated with the loss, and wherein the predetermined parameters are obtained via offline meta-learning that facilitates the online adjustment; and