WO2019090023A1

WO2019090023A1 - System and method for interactive representation learning transfer through deep learning of feature ontologies

Info

Publication number: WO2019090023A1
Application number: PCT/US2018/058855
Authority: WO
Inventors: Vivek Prabhakar Vaidya; Rakesh Mullick; Krishna Seetharam Shriram; Sohan Rashmi Ranjan; Pavan Kumar V Annangi; Sheshadri Thiruvenkadam; Chandan Kumar Mallappa Aladahalli; Arathi Sreekumari
Original assignee: General Electric Company
Priority date: 2017-11-03
Filing date: 2018-11-02
Publication date: 2019-05-09
Also published as: EP3704636A1; JP7467336B2; CN111316290A; JP2021507327A; CN111316290B

Abstract

A method for interactive representation learning transfer to a convolutional neural network (CNN) is presented. The method includes obtaining at least first and second input image datasets from first and second imaging modalities. Furthermore, the method includes performing at least one of jointly training a first supervised learning CNN based on labels associated with the first input image dataset and a second supervised learning CNN based on labels associated with the second input image dataset to generate one or more common feature primitives and corresponding mapping functions and jointly training a first unsupervised learning CNN and a second unsupervised learning CNN with the first and second input image dataset respectively to learn compressed representations of the input image datasets, including common feature primitives and corresponding mapping functions and storing the common feature primitives and the corresponding mapping functions in a feature primitive repository.

Description

SYSTEM AND METHOD FOR INTERACTIVE REPRESENTATION LEARNING TRANSFER THROUGH DEEP LEARNING OF FEATURE

ONTOLOGIES

BACKGROUND

[0001] Embodiments of the present specification relate generally to a system and method for generating transferable representation learnings of medical image data of varied anatomies obtained from varied imaging modalities for use in learning networks. Specifically, the system and method are directed to determining a representation learning of the medical image data as a set of feature primitives based on the physics of a first and/or second imaging modality and the biology of the anatomy to configure new convolutional networks for learning problems such as classification and segmentation of the medical image data from other imaging modalities.

[0002] As will be appreciated, machine learning is the subfield of computer science that "gives computers the ability to learn without being explicitly programmed." Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. In machine learning, feature learning or representation learning is a set of techniques that transform raw data input into a representation that can be effectively exploited in machine learning tasks. Representation learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, real-world data such as images, video, and sensor measurement is usually complex, redundant, and highly variable. Thus, it is desirable to identify useful features or representations from raw data. Currently, manual feature identification methods require expensive human labor and rely on expert knowledge. Also, manually generated representations normally do not lend themselves well to generalization, thereby motivating the design of efficient representation learning techniques to automate and generalize feature or representation learning.

[0003] Moreover, in machine learning, a convolutional neural network (CNN or ConvNet) or a deep convolutional neural network is a type of feed-forward artificial neural network in which the connectivity pattern between its neurons is inspired by the organization of the animal visual cortex. Convolutional neural networks (CNNs) are biologically inspired variants of multilayer perceptrons, designed to emulate the behavior of a visual cortex with minimal amounts of preprocessing. It may be noted that a multilayer perceptron network (MLP) is a feed-forward artificial neural network model that maps sets of input data onto a set of appropriate outputs. An MLP includes multiple layers of nodes in a directed graph, with each layer fully connected to the next one. In current deployments, CNNs have wide applications in image and video recognition, recommender systems, and natural language processing. CNNs are also known as shift invariant or space invariant artificial neural networks (SIANNs) based on their shared weights architecture and translation invariance characteristics.

[0004] It may be noted that in deep CNN architectures, a convolutional layer is the core building block. Parameters associated with a convolutional layer include a set of learnable filters or kernels. During a forward pass, each filter is convolved across the width and height of the input volume, computing the dot product between the entries of the filter and the input and producing a two-dimensional (2D) activation map of that filter. Thus, the network learns filters that are activated when the network detects a specific type of feature at a given spatial position in the input. Deep CNN architectures are typically purpose-built for deep learning problems. Deep learning is known to be effective in many tasks that involve human-like perception and decision making. Typical applications of deep learning are handwriting recognition, image recognition, and speech recognition. Deep learning techniques construct a network model by using training data and outcomes that correspond to the data. Once the network model is constructed, it can be used on new data for determining outcomes. Moreover, it will be appreciated that a deep learning network once learnt for a specific outcome may be reused advantageously for related outcomes.

[0005] Furthermore, CNNs may be used for supervised learning, where an input dataset is labelled, as well as for unsupervised learning, where the input dataset is not labelled. A labelled input dataset is one where the elements of the dataset are pre- associated with a classification scheme, represented by labels. Thus, the CNN is trained with a labelled subset of the dataset, and may be tested with another subset to verify an accurate classification result

[0006] A deep CNN architecture is multi-layered, where the layers are hierarchically connected. As the number of input to output mappings and filters for each layer increases, a multi-layer deep CNN may result in a huge number of parameters that need to be configured for its operation. If training data for such a CNN is scarce, the learning problem is under-determined. In this situation, it is advantageous to transfer certain parameters from a pre-learned CNN model. Transfer learning reduces the number of parameters to be optimized by freezing the pre-learned parameters in a subset of layers and provides a good initialization for tuning the remaining layers. In the medical image problem domain, using transfer learning to pre-configure CNNs for various problems of classification and identification is advantageous in ameliorating the situations where data is scarce, the challenges of heterogenous data types due to the plurality of imaging modalities and anatomies, and other clinical challenges.

BRIEF DESCRIPTION

[0007] In accordance with one aspect of the present specification, a method for a method for interactive representation learning transfer to a convolutional neural network (CNN) is presented. The method includes obtaining at least a first input image dataset from a first imaging modality and a second input image dataset from a second imaging modality. Furthermore, the method includes performing at least one of jointly training a first supervised learning CNN based on labels associated with the first input image dataset and a second supervised learning CNN based on labels associated with the second input image dataset to generate one or more common feature primitives and corresponding mapping functions and jointly training a first unsupervised learning CNN with the first input image dataset and a second unsupervised learning CNN with the second input image dataset to learn compressed representations of the input image datasets, where the compressed representation includes one or more common feature primitives and corresponding mapping functions. Additionally, the method includes storing at least the one or more common feature primitives and the corresponding mapping functions in a feature primitive repository.

[0008] In accordance with another aspect of the present specification, an interactive representation learning transfer (IRLT) unit for interactive representation learning transfer to a convolutional neural network (CNN) is presented. The IRLT unit includes an interactive learning network configurator configured to obtain at least a first input image dataset from a first imaging modality and a second input image dataset from a second imaging modality and perform at least one of jointly training a first supervised learning CNN based on labels associated with the first input image dataset and a second supervised learning CNN based on labels associated with the second input image dataset to generate one or more common feature primitives and corresponding mapping functions and jointly training a first unsupervised learning CNN with the first input image dataset and a second unsupervised learning CNN with the second input image dataset to learn compressed representations of the input image datasets, where the compressed representation includes one or more common feature primitives and corresponding mapping functions. Moreover, the IRLT unit includes a feature primitive repository configured to store the at least one or more common feature primitives and the corresponding mapping functions in a feature primitive repository. [0009] In accordance with yet another aspect of the present specification, a multimodality transfer learning system, is presented. The multimodality transfer learning system includes a processor unit and a memory unit communicatively operatively coupled to the processor unit. Furthermore, the multimodality transfer learning system includes an interactive representation learning transfer (IRLT) unit operatively coupled to the processor unit and including an interactive learning network configurator configured to obtain at least a first input image dataset from a first imaging modality and a second input image dataset from a second imaging modality and perform at least one of jointly training a first supervised learning CNN based on labels associated with the first input image dataset and a second supervised learning CNN based on labels associated with the second input image dataset to generate one or more common feature primitives and corresponding mapping functions and jointly training a first unsupervised learning CNN with the first input image dataset and a second unsupervised learning CNN with the second input image dataset to learn compressed representations of the input image datasets, where the compressed representation includes one or more common feature primitives and corresponding mapping functions. In addition, the IRLT unit includes a feature primitive repository configured to store the at least one or more common feature primitives and the corresponding mapping functions.

DRAWINGS

[0010] These and other features and aspects of embodiments of the present specification will become better understood when the following detailed description is read with references to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:

[0011] FIG. 1 is a schematic diagram of an exemplary system for interactive representation learning transfer through deep learning of feature ontologies, in accordance with aspects of the present specification; [0012] FIG. 2 is a flowchart illustrating a method for building a set of feature primitives from unlabeled image data to augment a feature primitive repository, in accordance with aspects of the present specification;

[0013] FIG. 3 is a flowchart illustrating a method for building a set of feature primitives from labeled image data to augment a feature primitive repository, in accordance with aspects of the present specification;

[0014] FIG. 4 is a flowchart illustrating a method for pre-configuring a CNN with mapping functions to learn an unseen dataset based on a selection of feature primitives, in accordance with aspects of the present specification; and

[0015] FIG. 5 is a schematic diagram of one embodiment of an interactive learning network configurator, in accordance with aspects of the present specification.

DETAILED DESCRIPTION

[0016] As will be described in detail hereinafter, various embodiments of an exemplary system and method for interactive representation learning transfer through deep learning of feature ontologies are presented. In an effort to provide a concise description of these embodiments, all features of an actual implementation may not be described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation- specific decisions must be made to achieve the developer's specific goals such as compliance with system-related and business-related constraints.

[0017] When describing elements of the various embodiments of the present specification, the articles "a," "an," "the," and "said" are intended to mean that there are one or more of the elements. The terms "comprising," "including" and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements. Furthermore, the terms "build" and "construct" and their variations are intended to mean a mathematical determination or computation of mathematical constructs. The terms "data drawn on arbitrary domains" or "data on arbitrary domains" are intended to mean data corresponding to a domain for example, social media data, sensor data, enterprise data and the like.

[0018] The term "transfer learning" or "inductive transfer" as used in the present specification is intended to mean an approach of machine learning that focuses on applying the knowledge or learning gained while solving one problem to a different, related problem. This knowledge or learning may be characterized and/or represented in various ways, typically by combinations and variations of transfer functions, mapping functions, graphs, matrices, and other primitives. Also, the term "transfer learning primitive" as used in the present specification is intended to mean the characterization and/or representation of knowledge or learning gained by solving a machine learning problem as described hereinabove.

[0019] Moreover, the term "feature primitive" as used in the present specification is intended to mean a characterization of an aspect of an input dataset, typically, an appearance, shape geometry, or morphology of a region of interest (ROI) corresponding to an image of an anatomy, where the image may be obtained from an imaging modality such as an ultrasound imaging system, a computed tomography (CT) imaging system, a positron emission tomography-CT (PET- CT) imaging system, a magnetic resonance (MR) imaging system, and the like. Also, the anatomy may include an internal organ of the human body such as, but not limited to, the lung, liver, kidney, stomach, heart, brain, and the like.

[0020] An exemplary system 100 for interactive representation learning transfer through deep learning of feature ontologies, in accordance with aspects of the present specification, is illustrated in FIG. 1. In a presently contemplated configuration of FIG. 1, the system 100 includes a multimodality transfer learning (MTL) subsystem 102. The MTL subsystem 102 includes an interactive representation learning transfer (IRLT) unit 104, a processor unit 108, a memory unit 110, and a user interface 106. The processor unit 108 is communicatively coupled to the memory unit 110. The user interface 106 is operatively coupled to the IRLT unit 104. Also, the IRLT unit 104 is operatively coupled to the processor unit 108 and the memory unit 110. The system 100 and/or the MTL subsystem 102 may include a display unit 134. It may be noted that the MTL subsystem 102 may include other components or hardware, and is not limited to the components shown in FIG. 1.

[0021] The user interface 106 is configured to receive user input 130 corresponding to characteristics of an input image 128. The user input 130 may include aspects or characteristics of the input image 128, such as, but not limited to, the imaging modality, the anatomy generally represented by the input image 128, the appearance of the ROI corresponding to the input image 128, and the like.

[0022] In certain embodiments, the IRLT unit 104 may be implemented as software systems or computer instructions executable via one or more processor units 108 and stored in the memory unit 110. In other embodiments, the IRLT unit 104 may be implemented as a hardware system, for example, via FPGAs, custom chips, integrated circuits (ICs), Application Specific ICs (ASICs), and the like.

[0023] As illustrated in FIG. 1, the IRLT unit 104 may include an interactive learning network configurator (ILNC) 112, one or more CNNs configured for unsupervised learning (unsupervised learning CNN) 114, one or more CNNs configured for supervised learning (supervised learning CNN) 116, and a feature primitive repository 118. The ILNC 112 is operatively coupled to the unsupervised learning CNNs 114, the supervised learning CNNs 116, and the feature primitive repository 118. In one embodiment, the INLC 112 may be a graphical user interface subsystem configured to enable a user to configure one or more supervised learning CNNs 116 or unsupervised learning CNNs 114. [0024] The feature primitive repository 118 is configured to store one or more feature primitives corresponding to an ROI in the input image 128 and one or more corresponding mapping functions. The term "mapping function" as used in the present specification is intended to represent a transfer function or a CNN filter that maps the ROI to a compressed representation such that an output of the CNN is a feature primitive that characterizes the ROI of the input image 128 based on an aspect of the ROI. In one example, the aspect of the ROI may include a shape geometry, an appearance, a morphology, and the like. Accordingly, the feature primitive repository 118 is configured to store one or more mapping functions. These mapping functions, in conjunction with the corresponding feature primitives, may be used to pre-configure a CNN to learn a new training set. Thus, the feature primitive repository 118 is configured to store feature primitives and mapping functions which are transfer learnings obtained from other CNNs to pre-configure a new CNN to learn an unseen dataset. As shown in FIG. 1, some non-limiting examples of feature primitives that the feature primitive repository 118 is configured to store include feature primitives characterizing an appearance 120 corresponding to the ROI of the image 128, a shape geometry 124 corresponding to the ROI of the image 128, and an anatomy 126 corresponding to the image 128.

[0025] In the presently contemplated configuration of FIG. l, the ILNC 112 is configured to present a user with various tools and options to interactively characterize one or more aspects of the ROI corresponding to the input image 128. An exemplary embodiment of the ILNC 112 is shown in FIG. 5 and the working of the ILNC 112 will be described in greater detail with reference to FIGs. 4 and 5.

[0026] The system 100 is configured to develop a portfolio of feature primitives and mapping functions across one or more anatomies and imaging modalities to be stored in the feature primitive repository 118. Subsequently, the system 100 may provide the user with an ILNC 112 to allow the user to pre-configure a CNN for learning a new, unseen image dataset based on a selection of modality, anatomy, shape geometry, morphology, and the like corresponding to one or more ROIs of the unseen image dataset. By way of example, one learning outcome of the CNN may be a classification scheme categorizing the image dataset. Other non-limiting examples of learning outcomes may include pixel-level segmentation, regression, and the like. The working of the system 100 will be described in greater detail with reference to FIGs. 2-5.

[0027] Additionally, the system 100 and/or the MTL subsystem 102 may be configured to visualize one or more of the feature primitives, the mapping functions, the image datasets, and the like on the display unit 134.

[0028] Turning now to FIG. 2, a flowchart 200 generally representative of a method for building a set of feature primitives from unlabeled image data to augment a feature primitive repository, in accordance with aspects of the present specification, is presented. The method 200 is described with reference to the components of FIG. 1.

[0029] It may be noted that the flowchart 200 illustrates the main steps of the method for building a set of feature primitives from unlabeled image data to augment a feature primitive repository such as the feature primitive repository 118. In some embodiments, various steps of the method 200 of FIG. 2, more particularly, steps 202- 208 may be performed by the processor unit 108 in conjunction with memory unit 110 and the ILNC 112 of the IRLT unit 104. Moreover, step 210 may be performed by the processor unit 108 in conjunction with the memory unit 110 and one or more supervised learning CNNs 116. Also, steps 214-216 may be performed by the processor unit 108 in conjunction with the memory unit 110 and one or more unsupervised learning CNNs 114.

[0030] The method 200 starts at step 202 where at least a first input image dataset 220 and a second input image dataset 222 corresponding to a first imaging modality and a second imaging modality are obtained. In one embodiment, the first and second input image datasets 220, 222 may correspond to an anatomy of the human body, such as the liver, lung, kidney, heart, brain, stomach, and the like. Also, the first and second imaging modalities may include an ultrasound imaging system, an MR imaging system, a CT imaging system, a PET-CT imaging system, or combinations thereof.

[0031] At step 204, a check is carried out to determine whether the input image datasets 220, 222 are labelled. As previously noted, labels referencing the input image datasets may be generally representative of a classification scheme or score that characterizes an aspect of the input images, such as the shape geometry, appearance, morphology, and the like. If, at step 204, it is determined that the input image datasets are labelled, control passes to step 210, where a first CNN and a second CNN are configured for supervised learning of the input image datasets 220, 222. Step 210 will be described in greater detail with reference to FIG. 3. Subsequent to step 210, the method 200 is terminated, as indicated by step 212.

[0032] Referring again to step 204, if it is determined that the input image datasets are unlabeled, control passes to step 206, where a second check is carried out to determine if the first input image dataset 220 and the second input image dataset 222 include sufficient data to adequately train one or more CNNs. If, at step 206, it is determined that the first and second input image datasets 220, 222 include sufficient data, control passes to step 214. However, at step 206 it is determined that the first and second input image datasets 220, 222 do not have sufficient data, control passes to step 208.

[0033] In one example, at step 208, it may be determined that the first input image dataset 220 includes sufficient data and the second input image dataset 222 does not include sufficient data. Accordingly, in this example, at step 208, the second input dataset 222 corresponding to the second imaging modality is augmented with additional data. The additional data for augmenting the second input image dataset 222 may be obtained by processing the first input image dataset 220 corresponding to the first imaging modality via an intensity mapping function. One non-limiting example of the intensity mapping function may include a regression framework that takes multi-modality acquisitions via use of a first imaging modality and a second imaging modality corresponding to one or more subjects, for example, CT and MR, and learns a patch level mapping from the first modality to the second modality, using deep learning, hand crafted intensity features, or a combination thereof to map the intensities of the first modality to the second modality. Subsequently, control passes to step 214.

[0034] At step 214, a first unsupervised learning CNN and a second unsupervised learning CNN are jointly trained with the first input image dataset 220 and the second input image dataset 222 to learn compressed representations of the input image datasets 220, 222, where the compressed representations include one or more common feature primitives and corresponding mapping functions.

[0035] The one or more feature primitives characterize aspects of the images of the first input image dataset 220. It may be noted that the mapping functions map the input image dataset to the corresponding feature primitive. In one embodiment, the mapping function may be defined in accordance with equation (1).

[0036] In equation (1), her is a set of feature primitives obtained when a region of interest of an image eris mapped using a mapping function /and weights w. In this example, the image PC_T corresponds to an image obtained via use of a CT imaging system. [0037] It may be noted that consequent to step 214, one or more mapping functions corresponding to the first imaging modality and the second imaging modality are generated. It may be noted that these mapping functions map the first and second input image datasets 220, 222 to the same feature primitives. In one embodiment, a second mapping function may be defined in accordance with equation (2).

[0038] In equation (2), HMR is a set of feature primitives obtained when a region of interest of an image PMR is mapped using a mapping function/and weights w. In this example, the image PMR is obtained using an MR imaging system.

[0039] Furthermore, at step 218, the at least the one or more feature primitives and the corresponding mapping functions are stored in the feature primitive repository 118. Control is then passed to step 212 to terminate the method 200.

[0040] Turning now to FIG. 3, a flowchart 300 generally representative of a method for building a set of feature primitives from labeled image data to augment a feature primitive repository, in accordance with aspects of the present specification, is presented. More particularly, the method 300 describes step 210 of FIG. 2 in greater detail. Also, the method 300 is described with reference to the components of FIG. 1.

[0041] It may be noted that the flowchart 300 illustrates the main steps of the method for building a set of feature primitives from labeled image data to augment a feature primitive repository. In some embodiments, various steps of the method 300 of FIG. 3, more particularly, steps 302-308 may be performed by the processor unit 108 in conjunction with memory unit 110 and the ILNC 112 of the IRLT unit 104. Moreover, steps 310-314 may be performed by processor unit 108 in conjunction with memory unit 110 and the one or more supervised learning CNNs 116. [0042] The method 300 starts at step 302 where at least a first input image dataset 316 and a second input image dataset 318 corresponding to at least a first imaging modality and a second imaging modality are obtained. In one embodiment, the first and second input image datasets 316, 318 may correspond to an anatomy of the human body, for example, liver, lung, kidney, heart, brain, stomach, and the like. Also, the first and second imaging modalities may include an ultrasound imaging system, an MR imaging system, a CT imaging system, a PET-CT imaging system, or combinations thereof. It may be noted that a learning outcome of the one or more supervised CNNs may be a classification of images in the first and second input image datasets 316, 318.

[0043] At step 304, a check is carried out to determine if the first input image dataset 316 and the second input image dataset 318 include sufficient data to adequately train one or more CNNs. If, at step 304, it is determined that the first and second input image datasets 316, 318 include sufficient data, control is passed to step 308. However, at step 304, it is determined that the first and second input image datasets 316, 318 do not have sufficient data, control is passed to step 306. In one example, at step 306, it may be determined that the first input image dataset 316 includes sufficient data and the second input image dataset 318 does not include sufficient data. Accordingly, in this example, at step 306, the second input image dataset 318 corresponding to the second modality is augmented with additional data. It may be noted that the additional data is obtained by processing the first input image dataset 316 corresponding to the first imaging modality via an intensity mapping function. As previously noted, one non-limiting example of the intensity mapping function may include a regression framework that takes multi-modality acquisitions via use of a first imaging modality and a second imaging modality corresponding to one or more subjects, for example, CT and MR, and learns a patch level mapping from the first modality to the second modality, using deep learning, hand crafted intensity features, or a combination thereof to map the intensities of the first modality to the second modality. Subsequently, control is passed to step 308. [0044] Furthermore, at step 308, a first supervised learning CNN and a second supervised learning CNN are jointly trained based on labels associated with the first input image dataset 316 and labels associated with the second input image dataset 318 to generate one or more feature primitives and corresponding mapping functions.

[0045] In one embodiment, the learning outcome may include one or more feature primitives that characterize aspects of the images of the first input image dataset 316 and aspects of the images of the second input image dataset 318 and corresponding mapping functions where the mapping functions map the corresponding first input image dataset 316 and the second input image dataset 318 to the one or more feature primitives. Thus, the feature primitives are independent of the imaging modality used to acquire the first input image dataset 316 and the second input image dataset 318. Referring now to step 312, the one or more feature primitives and the corresponding mapping functions are stored in a feature primitive repository.

[0046] The methods 200 and 300 described hereinabove enable the creation of a portfolio of feature primitives and mapping functions corresponding to images generated for a plurality of anatomies across a plurality of modalities. The feature primitives and the mapping functions are stored in the feature primitive repository 118. Also, this portfolio of feature primitives and mapping functions characterizes the learning gained in the training of the CNNs with the input image datasets. Moreover, the learning may be transferred to pre-configure new CNNs to obtain learning outcomes for different, unseen datasets.

[0047] With the foregoing in mind, FIG. 4 illustrates a flowchart 400 depicting a method for pre-configuring a CNN with mapping functions to learn an unseen dataset based on a selection of feature primitives, in accordance with aspects of the present specification. The method 400 is described with reference to FIGs. 1, 2 and 3. It may be noted that the flowchart 400 illustrates the main steps of the method 400 for pre- configuring a CNN with mapping functions to learn an unseen dataset based on a selection of feature primitives. In some embodiments, various steps of the method 400 of FIG. 4, more particularly, steps 402-406 may be performed by the processor unit 108 in conjunction with the memory unit 110 and the ILNC 112 of the IRLT unit 104. Moreover, steps 408-410 may be performed by processor unit 108 in conjunction with the memory unit 110 and the one or more supervised learning CNNs 116.

[0048] The method 400 starts at step 402, where at least one input image dataset 404 may be obtained. The at least one input image dataset 404 is representative of an unseen input image data set. Additionally, at least one learning parameter 406 and a learning outcome 408 corresponding to input image dataset 404 may be obtained. In certain embodiments, the input image dataset 404, the learning parameter 406, and the learning outcome 408 may be obtained as user input 410. In one embodiment, the learning parameter 406 may include an imaging modality, an image anatomy, or a combination thereof. Also, the learning outcome 408 may include a classification scheme, a regression scheme, or a pixel level output like segmentation.

[0049] At step 412, at least one feature primitive and a corresponding mapping function corresponding to the learning parameter 406 and the learning outcome 408 are obtained from the feature primitive repository 118. Subsequently, a CNN is configured for learning the input image dataset 404 using the at least one feature primitive and the at least one mapping function, as indicated by step 414. In certain embodiments, the configuration of the CNN may entail setting one or more filters obtained from the feature primitive repository 118 to the mapping functions. Consequent to the processing of step 414, a pre-configured CNN is generated. Further, at step 416, the pre-configured CNN is optimized with at least a training subset of the input image dataset 404.

[0050] In one embodiment, a trained convolutional autoencoder (CAE) for supervised learning that uses labelled data corresponding to a first imaging modality is adapted for the input image dataset 404, with a few parameters, in accordance with equation (3).

[0051] In equation (3), w is a set of feature primitives obtained when a region of interest of an image Pi corresponding to the first imaging modality is mapped using a mapping function / and weights w(a, w), where a is a sparse set of the CAE parameters. In this way, the number of filters may be reduced. The CAE parameters may be further optimized with at least the training subset of the input image dataset 404. In this way for a supervised learning problem corresponding to a second imaging modality, the framework for learning may be defined in accordance with the following formulation.

[0052] In formulation (4), the mapping function/obtained in equation (3) is applied over the mapping function fa corresponding to the region of interest of an image corresponding to the second imaging modality. Subsequently, at step 418, the input image dataset 404 is processed via the optimized CNN to obtain a learning outcome 420, corresponding to the requested learning outcome 408.

[0053] The workflow embodied in the method 400 is described in greater detail with reference to FIG. 5. FIG.5 is a schematic diagram 500 of one embodiment of the interactive learning network configurator 112 of the interactive representation learning transfer unit 104 of FIG. 1, in accordance with aspects of the present specification. As illustrated in FIG. 5, the block diagram 500 is generally representative of the ILNC 112 as shown in FIG. 1. Reference numerals 502-508 are generally representative of visualizations of feature primitives respectively corresponding to an imaging modality, anatomy, appearance, and shape geometry. The data for the visualizations 502-508 may be obtained from the feature primitive repository 118 of FIG. 1.

[0054] In FIG. 5, the ILNC 500 provides a user a selection of interactive menus. In particular, using the ILNC 500, the user may select one or more aspects of an unseen image dataset to be learned by a CNN. Reference numerals 510-516 are generally representative of interactive menu options that may be available to a user to aid in the characterization of the unseen image dataset. By way of example, reference numeral 510 may pertain to the imaging modality of the unseen image dataset. The menu options of block 510 may include CT, MR, PET, ultrasound, and the like. In a similar fashion, reference numerals 512-516 may show menu options pertaining to the appearance, shape geometry, and anatomy respectively of the unseen image dataset.

[0055] The selections made by the user from the interactive menus enable the pre- configuration of a CNN with feature primitives and mapping functions corresponding to the menu selections. Reference numeral 518 is generally representative of a visualization of the pre-configured CNN. In certain embodiments, reference numeral 518 may correspond to one or more supervised learning CNNs 116 of the IRLT unit 104 of FIG. 1. Additionally, the user may graphically browse, visualize, and combine feature primitives of blocks 502-508 to create the pre-configured CNN 518.

[0056] The systems and methods for interactive representation learning transfer through deep learning of feature ontologies presented hereinabove provide a transfer learning paradigm where a portfolio of learned feature primitives and mapping functions may be combined to configure CNNs to solve new medical image analytics problems. Advantageously, the CNNs may be trained for learning appearance and morphology of images. By way of example, tumors may be classified into blobs, cylinders, disks, bright/dark or banded, and the like. Overall, networks may be trained for various combinations of anatomy, modality, appearance, and morphology (shape geometry) to generate a rich portfolio configured to immediately provide a transference of pre-learnt features to a new problem at hand.

[0057] It is to be understood that not necessarily all such objects or advantages described above may be achieved in accordance with any embodiment. Thus, for example, those skilled in the art will recognize that the systems and techniques described herein may be embodied or carried out in a manner that achieves or improves one advantage or group of advantages as taught herein without necessarily achieving other objects or advantages as may be taught or suggested herein.

[0058] While the technology has been described in detail in connection with only a limited number of embodiments, it should be readily understood that the specification is not limited to such disclosed embodiments. Rather, the technology can be modified to incorporate any number of variations, alterations, substitutions or equivalent arrangements not heretofore described, but which are commensurate with the spirit and scope of the claims. Additionally, while various embodiments of the technology have been described, it is to be understood that aspects of the specification may include only some of the described embodiments. Accordingly, the specification is not to be limited by the foregoing description, but is only limited by the scope of the appended claims.

Claims

CLAIMS:

1. A method for interactive representation learning transfer to a convolutional neural network (CNN), comprising: obtaining at least a first input image dataset from a first imaging modality and a second input image dataset from a second imaging modality;

performing at least one of jointly training a first supervised learning CNN based on labels associated with the first input image dataset and a second supervised learning CNN based on labels associated with the second input image dataset to generate one or more common feature primitives and corresponding mapping functions and jointly training a first unsupervised learning CNN with the first input image dataset and a second unsupervised learning CNN with the second input image dataset to learn compressed representations of the input image datasets, wherein the compressed representation comprises one or more common feature primitives and corresponding mapping functions; and

storing at least the one or more common feature primitives and the corresponding mapping functions in a feature primitive repository.

2. The method of claim 1, further comprising:

obtaining at least one unseen input image dataset;

obtaining at least one learning parameter and at least one learning outcome corresponding to the at least one unseen input image dataset;

obtaining, via the feature primitive repository, at least one feature primitive and a corresponding mapping function corresponding to the at least one learning parameter and the at least one learning outcome;

configuring a CNN for learning the at least one unseen input image dataset based on the at least one feature primitive and the at least one mapping function; optimizing the configured CNN with at least a training subset of the unseen input image dataset; and

processing the at least one unseen input image dataset via the optimized CNN to obtain the at least one learning outcome.

3. The method of claim 1, wherein jointly training the first supervised learning CNN and the second supervised learning CNN to generate the one or more common feature primitives and the corresponding mapping functions comprises jointly training the first supervised learning CNN with the first input image dataset and the second supervised learning CNN with the second input image dataset to obtain at least the one or more common feature primitives characterizing the first input image dataset and the second input image dataset, and the corresponding mapping functions of the first supervised learning CNN and the second supervised learning CNN.

4. The method of claim 1 , further comprising augmenting the second input image dataset corresponding to the second imaging modality with additional data, wherein the additional data is obtained by processing the first input image dataset corresponding to the first imaging modality via an intensity mapping function.

5. The method of claim 4, wherein processing the first input image dataset corresponding to the first imaging modality via the intensity mapping function comprises:

performing, via a regression framework in the intensity mapping function, multi-modality acquisitions from a first imaging modality and a second imaging modality corresponding to one or more subjects; and

learning, via the regression framework in the intensity mapping function, a patch level mapping from the first imaging modality to the second imaging modality to map intensities of the first imaging modality to the second imaging modality.

6. An interactive representation learning transfer unit for interactive representation learning transfer to a convolutional neural network (CNN), comprising: an interactive learning network configurator configured to:

obtain at least a first input image dataset from a first imaging modality and a second input image dataset from a second imaging modality;

perform at least one of jointly training a first supervised learning CNN based on labels associated with the first input image dataset and a second supervised learning CNN based on labels associated with the second input image dataset to generate one or more common feature primitives and corresponding mapping functions and jointly training a first unsupervised learning CNN with the first input image dataset and a second unsupervised learning CNN with the second input image dataset to learn compressed representations of the input image datasets, wherein the compressed representation comprises one or more common feature primitives and corresponding mapping functions; and

a feature primitive repository configured to store the at least one or more common feature primitives and the corresponding mapping functions.

7. The interactive representation learning transfer unit of claim 6, wherein the interactive learning network configurator is further configured to:

obtain at least one unseen input image dataset;

obtain at least one learning parameter and at least one learning outcome corresponding to the at least one unseen input image dataset;

obtain, via the feature primitive repository, at least one feature primitive and corresponding mapping function corresponding to the at least one learning parameter and the at least one learning outcome; configure a CNN for learning the at least one unseen input image dataset based on the at least one feature primitive and the at least one mapping function;

optimize the configured CNN with at least a training subset of the unseen input image dataset; and

process the at least one unseen input image dataset via the optimized CNN to obtain the at least one learning outcome.

8. The interactive representation learning transfer unit of claim 6, wherein the interactive learning network configurator is further configured to jointly train the first supervised learning CNN with the first input image dataset and the second supervised learning CNN with the second input image dataset to obtain at least the one or more common feature primitives characterizing the first input image dataset and the second input image dataset, and the corresponding mapping functions of the first supervised learning CNN and the second supervised learning CNN.

9. The interactive representation learning transfer unit of claim 6, wherein the interactive learning network configurator is further configured to augment the second input image dataset corresponding to the second imaging modality with additional data, and wherein the additional data is obtained by processing the first input image dataset corresponding to the first imaging modality via an intensity mapping function.

10. The interactive representation learning transfer unit of claim 9, wherein the intensity mapping function comprises a regression framework configured to: perform multi-modality acquisitions from a first imaging modality and a second imaging modality corresponding to one or more subjects; and learn a patch level mapping from the first imaging modality to the second imaging modality to map intensities of the first imaging modality to the second imaging modality.

11. A multimodality transfer learning system, comprising:

a processor unit;

a memory unit operatively coupled to the processor unit;

an interactive representation learning transfer unit operatively coupled to the processor unit and comprising:

an interactive learning network configurator configured to:

perform at least one of jointly training a first supervised learning CNN based on labels associated with the first input image dataset and a second supervised learning CNN based on labels associated with the second input image dataset to generate one or more common feature primitives and corresponding mapping functions and jointly training a first unsupervised learning CNN with the first input image dataset and a second unsupervised learning CNN with the second input image dataset to learn compressed representations of the input image datasets, wherein the compressed representation comprises one or more common feature primitives and corresponding mapping functions; and a feature primitive repository configured to store the at least one or more common feature primitives and the corresponding mapping functions.

12. The multimodality transfer learning system of claim 11, wherein the interactive learning network configurator is further configured to: obtain at least one unseen input image dataset;

obtain, via the feature primitive repository, at least one feature primitive and corresponding mapping function corresponding to the at least one learning parameter and the at least one learning outcome;

configure a CNN for learning the at least one unseen input image dataset, based on the at least one feature primitive and the at least one mapping function;

13. The multimodality transfer learning system of claim 11 , wherein the interactive learning network configurator is further configured to:

jointly train the first supervised learning CNN with the first input image dataset and the second supervised learning CNN with the second input image dataset to obtain at least the one or more common feature primitives characterizing the first input image dataset and the second input image dataset, and the corresponding mapping functions of the first supervised learning CNN and the second supervised learning CNN.