WO2024116137A1

WO2024116137A1 - Method and system for training a machine learning system for image processing

Info

Publication number: WO2024116137A1
Application number: PCT/IB2023/062119
Authority: WO
Inventors: Jaishree NAIDOO; Andrei Victorovich MIGATCHEV
Original assignee: Envisionit Deep Ai Ltd
Priority date: 2022-12-02
Filing date: 2023-12-01
Publication date: 2024-06-06
Also published as: GB2625069A; GB202218138D0

Abstract

A system and method for training a machine deep learning system for image processing using a prediction model for object detection of inference classes is disclosed. The method may provide processed images including initial model prediction outcomes of areas of detected objects of a set of current inference classes from the prediction model having initial training and outputting the processed images. The method may include collecting, via a user input to the user interface, moderated data associated with the initial model prediction outcomes in the form of moderated feedback data from at least two sources to avoid user bias. The moderated data may be merged with a representative sample training dataset of the initial prediction model to produce a focused dataset. The focused dataset may be used for a burst of additional model training of the initial prediction model to obtain an updated model.

Description

METHOD AND SYSTEM FOR TRAINING A MACHINE LEARNING SYSTEM FOR IMAGE PROCESSING

FIELD OF THE INVENTION

This technology relates to a system and method for training a machine learning system for image processing. The technology may find particular, but not exclusive, application in training a machine deep learning model for object detection in inference classes in image processing.

BACKGROUND TO THE INVENTION

Machine learning and related technologies have a wide range of applications and, as a result, have witnessed an exponential rise in use and popularity in the last couple of years. These technologies are often used in applications to determ ine/identify trends, patterns, conditions, or the like.

For example, in the medical industry, machine learning algorithms and models are often used on data, such as images, numbers, text, or the like, to determine whether the data is indicative of a certain predefined condition. If the data is indicative of such a condition, the machine learning model may be configured to make/suggest a diagnosis relating to the condition. For example, the machine learning model may be used to determine whether x-rays of the lungs of a patient show any indication of a condition, such as cancer, and detect such pathology accordingly.

Such machine learning models are trained using large sets of training data. In order to ensure that the model provides accurate results, the training data needs to be of a high quality and care needs to be taken when acquiring the training data. Training a model using low quality training data may result in an inaccurate model returning false/incorrect results. In addition to requiring accurate data, a substantial amount of data needs to be acquired to enable a machine learning module to return accurate results. The process to acquire such data is time consuming and often comes at a considerable cost.

The training data may be collected in a variety of ways and from a variety of sources. However, a common problem associated with data collection, even if due care is taken, is that the data often includes large volumes of duplicate, spurious and/or incorrect data. This may result in an imbalance in the data which eventually leads to poor accuracy of models.

One method to ensure the validity and correctness of the data is to manually review the collected data and eliminate/remove all unwanted data. Manually reviewing the data is, however, not always practical as the volume of the collected data, for machine learning purposes, is substantial.

A more time-efficient method with which some of the above problems may be alleviated, is the deduplication of data. Data deduplication (deduping) is the process of filtering duplicate data from a set of data to streamline data processing and mitigate the effect of duplicate data on the accuracy of a model in which the data is used for training purposes. Not only does data deduplication limit the effect of duplicate data on a machine learning model, it also increases processing time, as the data volume is also reduced.

That said, even if data deduplication is applied, machine learning models are still known to be error prone. In light of the error prone nature of machine learning models, user intervention is often required to approve or reject the outputs obtained by the models. Accordingly, the Applicant considers there to be room for improvement.

The preceding discussion of the background to the invention is intended only to facilitate an understanding of the present invention. It should be appreciated that the discussion is not an acknowledgment or admission that any of the material referred to was part of the common general knowledge in the art as at the priority date of the application.

SUMMARY OF THE INVENTION

In accordance with an aspect of the invention there is provided a computer-implemented method of training a machine learning system for image processing using a prediction model for object detection of inference classes, comprising: providing processed images with initial model prediction outcomes of areas of detected objects of a set of current inference classes from the prediction model having initial training and outputting the processed images to a user interface for display; collecting moderated data associated with the initial model prediction outcomes, wherein the moderated data is moderated feedback data from at least two sources to avoid user bias; merging the collected moderated data with a representative sample training dataset of the initial prediction model to produce a focused dataset, wherein the representative sample training dataset is representative across the set of current inference classes and class permutations; and using the focused dataset for a burst of additional model training of the initial model to obtain an updated model, wherein the burst of additional model training is of shorter duration than the initial training.

The method may include validating the performance metrics of the updated model against the performance metrics of the initial model; and replacing the initial model with the updated model when the performance metrics of the updated model is an improvement compared to the performance metrics of the initial model.

Validating the performance metrics of the updated model may include a technical validation. The technical validation may include using a test dataset including moderating processed images for validating the updated model. Validating the performance metrics of the updated model may include a staged mode of test use of the updated model and collecting feedback via the user interface.

The method may include repeating the method with the updated model as the initial model to provide continuous model training using moderated data and reducing the focused dataset to be used as the new representative sample training dataset. The burst of additional model training may be a reduced duration in terms of training epochs, wherein the reduced epoch burst executes limited epochs iteration of training. The method may include obtaining a representative sample of the focused dataset and replacing the representative sample training dataset with the representative sample focused dataset.

The step of collecting moderated data may include: receiving feedback data received from multiple users; transmitting the feedback data to a moderator user interface; using secondary moderation if feedback disagreements exist in the original feedback; and receiving the moderated data from the moderator user interface that provides a ground truth for the model prediction. The step of collecting moderated data may include collecting one or more of the group of: moderated data of a single moderator user; moderated data as a consensus of multiple moderator users; and moderated data as an outcome of a hierarchy of moderator users. The moderated data may include adjusted processed images with annotations relating to the initial model prediction outcomes. The annotations of the processed images may replace and/or add to the model prediction outcome of the processed image.

The initial model prediction outcomes may include superimposed frames or bounding boxes on the processed images to provide an indication of an area of the image to which the model prediction relates.

The feedback data may be received in one or more of the following manners: deselecting false positive results included in the one or more prediction outcomes; adding false negative results that are not included in the one or more prediction outcomes; moving an area of a model prediction to a more accurate location in the processed image; leaving additional notes associated with one or more of the model predictions. The method may include generating the initial model prediction outcomes. Each of the model prediction outcomes may be the result of a machine learning algorithm being executed on a set of training data including a full set of training epochs.

In accordance with a further aspect of the invention there is provided a system for training a machine learning system for image processing using a prediction model for object detection of inference classes, the system including a memory for storing computer-readable program code and a processor for executing the computer-readable program code, the system comprising: a processed image providing component for providing processed images with initial model prediction outcomes of areas of detected objects of a set of current inference classes from the prediction model having initial training and outputting the processed images to a user interface for display; a moderated data collecting component for collecting, via user input to a moderator user interface, moderated data on the initial model prediction outcomes, wherein the moderated data is moderated feedback data from at least two sources to avoid use bias; a data merging component for merging the collected moderated data with a representative sample training dataset of the initial prediction model to produce a focused dataset, wherein the representative sample training dataset is representative across the set of current inference classes and class permutations; and an additional burst training component for using the focused dataset for a burst of additional model training of the initial model to obtain an updated model, wherein the burst of additional model training is of shorter duration than the initial training.

The system may include a validating component for validating the performance metrics of the updated model against the performance metrics of the initial model. The system may include a model updating component for replacing the initial model with the updated model when the performance metrics of the updated model is an improvement compared to the performance metrics of the initial model.

The validating component may include a technical validation component for using a test dataset including moderating processed images for validating the updated model. The validating component may further include a user validation component for providing a staged mode of test use of the updated model and collecting feedback via the user interface.

The system may include a continuous training component for repeating the method with the updated model as the initial model to provide continuous model training using moderated data.

The additional burst training component may provide a reduced epoch burst of training. The reduced epoch burst may execute a limited epoch iteration of training.

The system may include a replacement training dataset component for obtaining a representative sample of the focused dataset and replacing the representative sample training dataset with the representative sample focused dataset.

The moderated data collecting component may include receiving secondary moderation from the moderator user interface for resolving disagreement in initial moderator feedback. The system may further include a moderator user interface for receiving feedback data relating to the processed images with initial prediction outcomes and providing moderated data relating to the processed images with initial prediction outcomes.

The system may include an imaging platform including a user interface for display of processed images with initial model prediction outcomes and for collection of user input. The collection of user input may include receiving input from a user device having access to the user interface of the imaging platform.

The imaging platform may be configured to enable a user to provide feedback in one or more of the following manners: deselecting false positive results included in the one or more prediction outcomes; adding results that are not included in the one or more prediction outcomes; moving the superimposed frame associated with each model prediction to a more accurate location; leaving additional notes associated with one or more of the model predictions for another user, such as a moderation team.

In accordance with a further aspect of the invention there is provided a computer program product for training a machine deep learning system for image processing using a prediction model for object detection of inference classes, the computer program product comprising a computer- readable medium having stored computer-readable program code for performing the steps of: providing processed images with initial model prediction outcomes of areas of detected objects of a set of current inference classes from the prediction model having initial training and outputting the processed images to a user interface for display; collecting, via user input to the user interface, moderated data on the initial model prediction outcomes, wherein the moderated data is moderated feedback data from at least two sources to avoid user bias; merging the collected moderated data with a representative sample training dataset of the initial prediction model to produce a focused dataset, wherein the representative sample training dataset is representative across the set of current inference classes and class permutations; and using the focused dataset for a burst of additional model training of the initial model to obtain an updated model, wherein the burst of additional model training is of shorter duration than the initial training.

Further features provide for the computer-readable medium to be a non-transitory computer- readable medium and for the computer-readable program code to be executable by a processing circuit.

Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings:

Figure 1 is a schematic diagram which illustrates an exemplary embodiment of a system of training a machine learning system for image processing according to aspects of the present disclosure;

Figure 2 is a flow diagram which illustrates an example embodiment of a method of training a machine learning system for image processing;

Figures 3A-3B are schematic diagrams illustrating an example embodiment of a graphical user interface of a user feedback application for the described system;

Figure 4 is a schematic diagram illustrating an example embodiment of a graphical user interface of a user moderation application for the described system;

Figure 5 is a high-level component diagram of a computing device according to aspects of the present disclosure;

Figure 6 is a flow diagram which illustrates an example embodiment of a method of producing a machine learning model according to aspects of the present disclosure;

Figure 7 is a schematic diagram which illustrates an example representation of providing a representative sample of a dataset according to aspects of the present disclosure;

Figure 8 shows an example graphical representation of an image that may be used for calculating a centre of each of locations of a given class according to aspects of the present disclosure;

Figure 9 is a flow diagram showing an example method of producing a machine learning model to obtain initial model predictions and using the initial model predictions in the ongoing training of a machine learning model according to aspects of the present disclosure; and Figure 10 illustrates an example of a computing device in which various aspects of the disclosure may be implemented.

DETAILED DESCRIPTION WITH REFERENCE TO THE DRAWINGS

Aspects of the present disclosure relate to a system and method for training a machine learning system using deep learning for image processing using a learned prediction model to provide predictions relating to areas of detected objects in the images. In particular, the system and method relate to ongoing training of a machine learning system using moderated user feedback for focused training of the prediction model. The disclosed system and method may find particular application and are described below in the radiology field for pathology detection, although other applications are also anticipated.

The method may include creating an initial dataset for training a machine deep learning system to perform image processing and applying the trained machine learning model to input images for processing to predict areas of detected objects in the images for output to a user interface. The input images may be medical diagnosis images of a part of a body in order to identify locations and types of pathologies.

The system may be configured to provide results displayed on the processed images to a user of the system via an imaging platform. The results may be provided via a user interface configured to display processed images with initial model predictions. In some embodiments, the initial model predictions may be superimposed on the processed images to provide an indication of an area of the image to which the model prediction relates. The user interface may further be configured to enable multiple users to provide feedback on the initial model prediction outcomes.

This may include the user checking the initial model prediction outcomes and deselecting any false positives that were detected via the initial model. For example, this may include a user, such as a radiologist, inspecting the results and deselecting any pathologies provided in a list or marked on a processed image that were incorrectly detected. Further feedback that may be provided by the user may include adding results, such as pathologies, that were not detected on the processed images (false negatives), but that are present. Such further results may be added by selecting relevant results from a drop-down list and marking the relevant area on a processed image, or the like. It should be appreciated that in some cases no false negatives or false positives may be detected (or not detected) by the initial model. In such cases, no user feedback may be required.

The system may include a moderator user interface for receiving the feedback from multiple users on the initial predictions of processed images and collecting moderated data relating to the initial predictions. The moderated data may provide an overall output from the feedback of the multiple users. This is referred to as initial moderation feedback. The provided user feedback may be collected and transmitted to a further group consisting of one or more users acting as a secondary moderator, such as radiologists with a required sub specialization or super specialization, via the moderator user interface for moderation, including analysis and ground truthing (in other words checking the initial model predictions for accuracy against the feedback from the users).

There can often be a disagreement between users when providing feedback. When an image is annotated by only one user there will be a learning bias to the pathology annotation and reporting style of this user. Using multiple users for providing feedback for the same image overcomes this learning bias. When considering feedback on the same image from multiple users, identification of ground truth may use a consensus approach. For example, if majority of users agree on the same set of findings this becomes a ground truth for an image. However, there may be disagreement between the multiple users completing initial moderation feedback and, in such instances, a secondary moderator usually in the form of a sub or a super specialist, is used to provide the final annotations and reading. For example, If majority of users provide different or contradictory feedback on a Al detections for a particular image, with no majority consensus, it may be considered as a disagreement and the moderator (a sub or a super specialist) may resolve the disagreement by providing their expert opinion. Therefore, moderated data is defined as data annotated by multiple users for a same image, with disagreements resolved by using a secondary moderation process.

Ground truthing is a well-known concept and may be read to mean collecting consensus or expert based interpretation of images based on the same set of labels and classes as used by the Artificial Intelligence model in order to validate the results of the model for accuracy against these pre-defined real world results. During inference, briefly discussed below, a machine learning model may predict a label, which can be compared with the ground truth label, if it is available. The ground truth label is a label defined by a user of the system. Ground truthing is normally performed by using a ground truth dataset. Developing a ground truth dataset may require important tasks to be performed under user supervision, such as model design, data labelling, algorithm design and training/testing. Ground truth labels for datasets are mostly moderated by a group of moderators and then later compared using different techniques to set target labels for the dataset.

The moderated, ground-truthed data may be collected. The collected dataset may, for example, include a set of images that have been reviewed and ground-truthed by the users of the system. The collected dataset may then be merged with a representative sample training dataset of the initial prediction model to produce a focussed dataset. In other words, the collected dataset including the reviewed data may be merged with a representative sample of the dataset that was used to train the initial machine learning model. A representative sample training dataset is constructed following a method depicted in Figure 7 and described in detail further in this document. A representative sample training dataset is constructed in a way to include images from all the inference classes in a balanced or equal proportion across the entirety of the dataset that was used to the train the initial machine learning model. However, the representative sample dataset may not include every single image from this initial dataset, but substantially enough images to have all the inference classes, in all the annotated locations and all the different sizes of the inference classes represented in a balanced or equal proportion to ensure uniformity of the resultant dataset.

After creation of the focused dataset, containing all the images from the collected dataset and only a sample of the images from the initial training dataset, the focussed dataset may be used for a short burst of additional model training of the initial model in order to obtain an updated model. The short burst of additional model training may, for example, include performing GPU (graphics processing unit) accelerated training over a reduced number of epochs than would typically be required for normal training or that were used for the initial training. The accelerated training may include graphics processing unit (GPU) or tensor processing unit (TPU) accelerated training.

Training epochs are used to measure the duration of the training through the model set and, typically refers to a single cycle of training through the entire training dataset. Training a prediction model may take 100s of epochs to achieve acceptable levels of accuracy and precision. Some, complex or unbalanced, datasets may require even more than that; however, care needs to be taken to ensure the resultant model does not “overfit”, in other words learns the features and noise of the training dataset too well and is unable to generalise to new data. A reduced number of epochs will typically be a portion of the usual number of epochs, typically calculated using one of the formulae that takes into consideration number of inference classes, number of images in the training dataset, learning rate of the deep learning algorithm and several other characteristics of the previous training cycle.

Once the short burst of training has been completed, the updated model may be evaluated. This may include validating the current performance metrics and accuracy metrics of the updated model against the initial model. The updated model may replace the initial model if the results of the performance and accuracy metrics is an improvement over that of the initial model. It should be appreciated that an updated model having a worse performance than the original model may be discarded. In some embodiments where the performance metrics and accuracy metrics of the updated model are borderline improvements over the initial model, the updated model may be submitted for manual validation.

If the updated model is deemed to be a better model, it will replace the initial model. The method may also include de-duplicating the focussed dataset used for training of the updated model to form a training dataset that replaces the training dataset of the initial model. If the updated model is rejected, the process of collecting feedback, moderating, and creating of a collected feedback dataset will continue without the deployment of the updated model.

It should be appreciated that this process is cyclic in nature and will iterate at intervals, for example, once one hundred or more images have been assessed by the users of the system and feedback recorded. Training of the models, although short in nature, is usually scheduled to run overnight and will immediately be suspended if new images are received for processing for the current model. When iterating, the focused training dataset becomes the initial training dataset. In this way, the continuous learning of the deep learning model is periodically updated using a focused dataset that is representative across the learning field.

The present method of continuous training addresses the problem of “forgetting” that frequently occurs with other methods of continuous training or additional training of deep learning models that does not include the entire initial training dataset and the additional datasets. The present method significantly reduces the “forgetting” of any of the learned object predictions for the inference classes of the model whilst not using the entire dataset with each learning set. Training is carried out for shorter periods of time and without the entire dataset, but the method ensures models remain consistent with what it could identify previously. The representative sample training set, which the moderated data is merged with, is balanced and representative as it includes a representation of each inference class and its permutations.

The method selects and splits a dataset into different segments to select different images to reconstruct a smaller dataset that is balanced and representative of different image objects in different conditions in different scenarios. Newly annotated candidates are added to previously learned data to keep refreshing in the model. Previously selected data is representative, so the model keeps on learning across the full range.

Figure 1 is a schematic diagram showing an example system (100) for training a machine learning-based system including a deep learning model (140) for image processing according to aspects of the present disclosure.

Deep learning models that may be used for this method and have been tested are 2D Deep Convolutional Neural Networks with set input and output dimensions for the purposes of Computer Vision inferences. However, other deep learning algorithms may benefit from this method as well.

The system (100) may include a computing device (102) and one or more user devices (104, 106) in data communication with each other via an appropriate communication network (108), such as the Internet or any other suitable communication network. The computing device (102) and one or more user devices may include processors capable of processing data, memory units capable of storing data and communication components capable of sending data to other devices and locations. The computing device (102) may for example be a server computer, which may be in the form of a cluster of server computers, a distributed server computer, cloud-based server computer or the like. The physical location of the server computer may be unknown and irrelevant to users of the system and method described herein. The user devices may be any suitable computing devices such as a mobile phone, laptop computer, tablet, or the like.

It should be appreciated that even though only two user devices (104, 106) are shown, a plurality of users controlling a plurality of user devices may be present in a practical implementation. In the example embodiment shown, one of the user devices is a feedback device (104) and the other user device is a moderator device (106).

The computing device (102) may include a processor (110) for executing the functions of components described below, which may be provided by hardware or by software units executing on the computing device (102). The software units may be stored in a memory component (1 12) and instructions may be provided to the processor (1 10) to carry out the functionality of the described components. In some cases, for example in a cloud computing implementation, software units arranged to manage and/or process data on behalf of the computing device may be provided remotely or in a distributed fashion.

The computing device (102) support a training system (150) for training a machine deep learning system for image processing using a prediction model for object detection of inference classes. The training system (150) may have access to or may maintain one or more training database(s) (114) in which training data may be stored and from which training data may be accessed. The training databases (114) may include an initial training dataset (116) and a representative sample of training dataset (1 18).

The initial training dataset (116) may include data acquired from one or more sources. For example, the data in the dataset may be acquired from public sources, internal databases, or third parties (such as third parties with which an agreement for sharing and acquisition of data is established). The acquired data may include data, such as images, relating to proposed inference classes. In most machine learning systems, inference classes are determined using an inference engine that applies logical rules to a knowledge base for evaluating and analysing new information. In the process of machine learning, determining the inference classes may include developing intelligence by recording, storing, and labelling data. If, for example, the system is being trained for pathology detection, the machine-learning algorithm may be fed with different images, such as x-rays, used for medical diagnosis. The computing device may then use the intelligence gathered and stored to understand new data acquired by the system. This may include the system using inference to identify and categorise new images based on features detected in the images. The initial training dataset may, for example, be required to include at least 500 images per inference class to establish a baseline and allow for eventually determining viability of a selected machine learning model, as will be explained in more detail below.

In order to enable the training system (150) to process an image to detect pathology in the image, the computing device requires the initial training dataset (116) to train the various models, algorithms, etc. to evaluate data elements of the training data against, for example, predefined thresholds etc. associated with pathology detection. The initial training dataset may be analysed and prepared for use in the system as described in more detail below.

The training dataset (116) may be continuously re-evaluated and/or re-calibrated in order to achieve greater consistency and accuracy outputs. This may for example include updating the data in the dataset in response to new methods of detecting pathology.

The training system (150) may further be configured to execute one or more algorithms, such as deep learning algorithms, on a set of data, including the training dataset (1 16) stored in the training database (114) for outputting models (140). The one or more models may be configured for machine learning based evaluation and processing of images for pathology detection. Accordingly, the training system (150) may include or have access to a model repository in and from which one or more models may be stored and accessed.

The computing device (102) may include one or more data sources (120) in which input data elements, such as newly acquired real life images, may be stored and from which input data elements may be retrieved for processing. In some embodiments, different data sources may be under the control of different entities, and may for example be physically and/or logically separated. The computing device may include or have access to one or more third party data repositories from which supplemental data relating to the input data elements may be retrieved. The third party data repositories may be web-addressable repositories, such as third party websites or the like. It should further be appreciated that the computing device (102) may form part of a network of medical devices, such as ultrasound and magnetic resonance imaging (MRI) machines, positron emission tomography (PET) and computed tomography (CT) scanners, x-ray machines, or the like, and that the one or more data sources may be in network communication with the medical devices, so as to store data captured from these devices for processing by the computing unit.

The computing device (102) may be configured to communicate with one or more of the user devices, the training database, third party data repositories, model repository and data sources (120) via a suitable communication network, such as the Internet.

Each of the user devices (104, 106) may include a processor for executing the functions of an application, such as a feedback application (122) and/or a moderator application (124), which may be provided by hardware or by software units executing on the respective user device (104, 106). The software units may be stored in a memory component and instructions may be provided to the processor to carry out the functionality of the described components in cooperation with an operating system of the device. In some cases, for example in a cloud computing implementation, software units arranged to manage and/or process data on behalf of the mobile electronic device may be provided remotely. Some or all of the components may be provided by a software application downloadable onto and executable on the respective user device.

The feedback device (104) may include a user interface (123) and display (126) for displaying and interacting with the relevant application (122). Similarly, the moderator device (106) may include a moderator interface (125) and display (128) for displaying and interacting with the relevant application (124). The moderator user interface (125) may be configured to receive feedback data relating to the processed images with initial prediction outcomes (as discussed in more detail below) and providing moderated data relating to the processed images with initial prediction outcomes.

For example, initial model predictions may be displayed to a user of the feedback device (104) via the display (126) of the feedback device. The feedback application (122) may be configured to allow the user of the feedback device (104) to provide feedback on the initial model predictions via the user interface (123). This may include, for example, the user, such as a radiologist in a medical implementation, correcting false positives or false negatives by providing an input via the user interface.

The provided user feedback and initial model predictions may at least be temporarily stored and transmitted to the moderator device (106), operated by another user, such as a more qualified radiologist, or team of experts, or the like. In some embodiments, this may include the user feedback and initial model predictions first being sent to the computing device (102) and then transferred to the moderator device (106).

The moderator device (106) may be configured to receive the initial model predictions together with the feedback from the computing device (102) or the feedback device (104), whichever the case may be. The user of the moderator device (106) may then analyse and ground truth the initial model predictions including the user feedback, displayed to the user on the display of the moderator device (128), via a moderator interface (125) provided by the moderator application (124). This process will be explained in more detail below.

In one embodiment, the feedback device (104) may be provided for input from a specialised user during their workflow, such as a radiologist, and the moderator device (106) may be provided for input from a moderator or team of moderators, for example, an internal team of experts in the relevant field who review the feedback.

The user interface (123) and/or moderator interface (125) may be part of, or provided by, an existing imaging platform (130) that provides the necessary tools for trained specialists that use the platform to provide feedback on the models’ performance. The existing imaging platform (130) may be maintained by the computing device (102). For example, in one embodiment an existing imaging platform (130) may be a medical imaging platform for use by radiologists or other medical personnel for receiving and reviewing prediction results of medical images. Other uses of an existing imaging platform may include other imaging modalities and medical professions, such as use by a dermatologist or a general practitioner for receiving and reviewing prediction results of dermatology related images. In another embodiment, an imaging platform can be used by engineers and construction workers to receive and review Artificial Intelligence model analysis of continuous or intermittent stream of images and /or video of a structure or a machine, such as windmill blades for any damage. Other use of such imaging platform can also include a platform used by farmers for receiving and reviewing prediction results of analysis of crops images.

In some embodiments, each of the one or more user devices (104, 106) may be in communication with a storage component. The storage component may be an on-board storage component, or it may be a remote storage component, such as a cloud storage network, or a database, which is maintained at the computing device, or an alternative server computer, for example, a medical server.

It should be appreciated that, in some embodiments, the computing device (102) and the one or more user devices (104, 106) may be a single device capable of being used by one or more users. For example, the computing device may be a centrally located computing device in a medical facility.

Figure 2 is a flow diagram which illustrates an example embodiment of a method (200) for training a machine deep learning system for image processing to provide predictions relating to areas of the images. The predictions provide object detection of inference classes. Inference classes are a representation of a single type of object or “thing” that the process wants to detect on an image. In one embodiment, the image processing is medical image processing for detection of objects and the classes are the different pathologies or abnormalities, such as consolidation, mass, pleural effusion, etc. The method may be executed by a training system (150) operating on a computing device (102).

The method may include generating (202) one or more initial model prediction outcomes of areas of detected objects of a set of current inference classes from a prediction model having initial training and outputting the processed images. The one or more initial prediction outcomes of processed images and may be the result of a machine learning model executed on a set of data, such as input data elements. For example, the input data elements may be unprocessed images that are used as input to a machine learning model to detect the occurrence of an area of interest in the image. The input data elements may be retrieved from the one or more data sources (120) of the computing device (102) for processing.

The method may include providing (204) the processed images with the one or more initial model prediction outcomes to the user via display in a user interface. The user interface may be part of an existing imaging platform (for example, a medical imaging platform) that provides the necessary tools for users such as trained specialists that use the platform to provide feedback on the models’ performance. The feedback is collected in a simplified, streamlined process that does not add a significant overhead to the radiologist as part of their daily workflow.

The one or more prediction model outcomes may be displayed to multiple users of the system via a feedback application (122) executed on the feedback device (104). The users may review the one or more model prediction outcomes and provide feedback (206) on the model prediction outcomes via the user interface. The user feedback may replace the model prediction outcomes, or may be shown as corrections made to the model prediction outcomes. It should be appreciated that, in use, the model prediction outcomes may contain some inaccuracies. For example, the model prediction outcomes may indicate detection of one or more pathologies that are incorrectly detected. The user may then, via the user interface deselect the incorrectly detected predictions and thereby update/replace the model prediction outcomes with feedback of the user.

It should be appreciated that the computing device (102) may be configured to send prompts and/or notifications to the one or more user devices (104, 106) via the imaging platform (130) and the relevant application (122, 124) installed and/or resident on the respective user devices. Similarly, the one or more users of the user devices may provide feedback to the prompts and/or notifications via the user interface of the relevant application.

The computing device (102) may receive the users feedback and transmit the model prediction outcomes including the users feedback (206) to a moderator user interface of a moderator device (106) for review of the updated model prediction outcomes for collecting moderated data (208) This may also include the computing device (102) providing a moderator user interface for display of the processed images with the model prediction outcomes. The moderated data is moderated feedback from at least two sources to avoid user bias. A moderating user, such as a single moderator or a moderation team of experts in a field, may review the updated model prediction outcomes and ground truth the model prediction outcomes and feedback. Collecting moderated data may include moderated data of a single moderator user; moderated data as a consensus of multiple moderator users, or moderated data as an outcome of a hierarchy of moderator users.

It should be appreciated that, in some embodiments, the user devices, being the feedback device and the moderator device, may be in network communication with one another and the feedback obtained via the feedback device may be transmitted to the moderator device directly. In other words, the one or more model prediction outcomes including the user feedback may be transmitted to the moderator device from the feedback device instead of from the computing device.

The reviewed, ground-truthed model prediction outcomes and feedback may be grouped into a single data packet herein referred to as moderated data. The moderated data may be associated with the initial model prediction outcomes. The moderated data may include a plurality, such as 100 or more, images that have been reviewed and ground-truthed by one or more users. It should be appreciated that, in practice, the data may be reviewed and moderated numerous times and a plurality of user devices may form part of the network. The moderated data may take the form of annotated processed images with one or more areas of interest in the images highlighted and/or annotated (for example, with labels).

The training system (150) may collect (208), via the user input to the user interface, the moderated data associated with the initial model prediction outcomes. The training system (150) may merge (210) the collected moderated data with a representative sample training dataset (118) of the initial prediction model to produce a focussed dataset. In other words, the initial model prediction outcomes including the feedback thereon may be merged with a reduced set of the training data that was previously used to train the initial machine learning model. The representative sample training dataset is representative across the set of current inference classes and class permutations. Class permutations may include one or more of: size, location, interrelationship with other objects of the same or different class, and other distribution factors. This may include a subset of images of each of the current inference classes.

The aim is to reduce a large dataset to a smaller more manageable dataset that includes a representation of each of the instances of what the model should look for. In other words, if there are 1000 images of a consolidation of about 2cm in diameter in the top corner of the right lung, the method may reduce these to 10 images (or 50 images, or whichever sample size is configured for the specific requirement) to provide a representative sample.

The method may include using (212) the focused dataset for a short burst of additional model training of the initial model to obtain an updated model. The burst of additional model training is of shorter duration than the initial training. The short burst of additional model training may provide an accelerated training for a reduced number of training epochs. In an example embodiment the computing unit may perform a short burst of 10 to 20 epochs of GPU or TPU accelerated training. Using a focussed dataset, including the updated model prediction outcomes (i.e. the model prediction outcomes that are reviewed and ground-truthed) together with the representative sample training dataset enables the system to be trained with a focus on the collected moderated data without disregarding the initial training data.

In one embodiment, the number of epochs is determined based on a number of inference classes and a number of permutations of a single class, in combination with a learning rate between iterations of training.

In one such formula that may be used, the required number of epochs for the training burst is:

Where e is the number of epochs of the short burst of training, n is the number of inference classes, p is the number of permutations of a single inference class, / is the learning rate, dis the number of images in the dataset, _c denotes current training session, and _p_denotes previous training session. Permutations of an inference class may be defined by variability of various factors including: size, location, object shape, greyscale intensity, etc. These factors are described further in the discussion of Figure 7 below.

For example, given 5 inference classes with {3, 4, 5, 4, 3} permutations for each inference class, current learning rate of 0.0004 and previous learning rate of 0.0002, 10000 in the current dataset and 100000 in the previous dataset, this formula may be used as follows:

5 * (3 + 4 + 5 + 4 + 3) 0.0004 * 10000 95 c — - * - = — * 0 2 = 5

(3 + 4 + 5 + 4 + 3)/ 0.0002 * 100000 3.8 ’

After the model has been trained using the focused dataset, the training system (150) may validate (214) the updated model. Validating the updated model may include validating (216) performance metrics of the updated model against the initial model. If the updated model provides an improved performance, the training system (150) may replace (218) the initial model with the updated model. Similarly, if the updated model preforms worse than the initial model, the updated model may be rejected (220) and discarded. In some embodiments, the performance metrics of each model may be associated with an accuracy score. The accuracy score may be determined by a mathematical calculation for determining the accuracy of the model. This may include the computing unit (102) determining the amount of feedback required by the user of the system. The more feedback required, the lower the accuracy score and vice versa. Accordingly, the higher the accuracy score, the higher the performance metrics.

The validation may include a technical validation and a user validation. For technical validation, a pre-selected test dataset may be used that includes images with ground truth classes and their locations. The new model is used to run detection on this set and provide the results, which may then automatically be analysed in terms of True Positives and Negatives, and False Positives and Negatives. The results in terms of an Al bounding box may be compared to the ground truth bounding box to evaluate against True Positive (TP), False Positive (FP), False Negative (FN), and True Negative (TN) measurements, as well as calculate intersect over union (loll) that provides details not whether the detection was in correct location, but also of correct size. This may be used to automatically calculate the accuracy, precision and other model performance related metrics and compare these to the current model metrics to evaluate improvement. This all is automated and performed immediately post training, but it can also be performed “ad hoc” when evaluating a model that performed “slightly worse”, however with a replaced or augmented testing dataset.

User validation may involve technical field users and experts using the new model “in staged mode” to validate its performance via the same feedback user interface. The feedback is used to calculate model performance metrics and then compare these to the current model. User validation may be particularly useful in complex cases, where there are borderline predictions made by the model. By having technical field users and experts validating the performance metrics, borderline cases may be more accurately validated. If the updated model provides an improved performance compared to the initial model, the focused dataset (used for training the updated model) may be reduced (222) again to representative sample to replace (224) the initial representative sample of training dataset. If the updated model is rejected (220), the method continuously training the model as set out above may be repeated and re-evaluated. If the updated model is still rejected (220), additional moderated data may be required for further training as set out in this method. The method may repeat with the updated model as the initial model to provide continuous model training using moderated data and reducing the focused dataset to be used as the new representative sample training dataset.

Figures 3A and 3B are schematic diagrams illustrating an example embodiment of a graphical user interface of a user feedback application executing on a feedback device. As discussed above, the user interface may be configured display the results of one or more model prediction outcomes and to collect user feedback on the one or more model prediction outcomes.

For the purposes of this description, some of the functionality provided by the application will be discussed in no particular order. In other words, it should be appreciated that the functions elaborated on below do not have to be used by a user in sequential order, unless the description suggests otherwise.

As can be seen from the schematic diagrams, the user interface may present a representation of the image being processed to the user together with one or more indications of the detected pathologies. The area as well as the type of detected pathology may be indicated to the user. For example, as can be seen in both Figures 3A and 3B, the detected pathologies (being consolidation and infiltrates in the lungs) are shown on the processed image with one or more overlaid frames and/or indicators (250) showing where the pathologies are detected. The overlaid frames and/or indicators may be coloured to easily distinguish between the detection of different pathologies. In addition, the user interface may include an information pane/window listing all of the pathologies detected, and a confidence score associated with the detected pathology. In some embodiments, the listed pathologies may also be colour coded (written in colour or have a colour indicator, for example), to easily associate the listed pathology with an area marked on the processed image.

The user may provide feedback on the one or more predictions by, for example, selecting and/or deselecting detected or undetected pathologies, moving the overlaid frame associated with each detected pathology to a more accurate location, leaving additional notes for the moderation team, or the like. For example, Figure 3B provides open text boxes for a user to enter notes and/or comments. Additional functionalities may include attaching voice files to a particular comment for a more detailed explanation thereof, or the like.

It should be appreciated that Figured 3A and 3B are simply examples of a user interface and should not be considered as limiting. There are various other functionalities that may be included that are not discussed in detail herein. The information displayed on the user interface may also be unique to a particular user, as a user may, for example, adjust various settings and display options according to a user’s preference.

An example embodiment of a user interface of a moderator application executing on a moderator device is illustrated in Figure 4.

As shown, the user may be provided with a large representation of the processed image together with the detected pathologies, similar to Figures 3A and 3B. However, in addition to the aforementioned, the user interface may include the feedback provided by the user of the feedback device. For example, as can be seen in Figure 4, additional overlaid comments (252) are provided by the therapist that conducted the review of the initial model predicted outcomes. For ease of reference, the pathologies detected by the model and the feedback provided during the feedback process may be labelled differently. For example, the colour may differ, or a user name may be provided in brackets or the like.

In addition, the user interface may provide the user with a plurality of categories to select which, upon selection, navigates the user to additional information associated with the particular category. The information associated with the particular category may be unique to the user and based on information provided by the user during setup of the application.

The categories to be selected may, for example, include one or more of: details, including detailed information of the patient such as medical history, or the like; images, including alternative images of the same patient or similar images of other patients for reference; report, including the model prediction outcomes and feedback; messages, providing access to a message functionality to discuss matters with colleagues. It should be appreciated that alternative embodiments with different categories may be implemented.

Functional components of a computing device (300) are shown in the high-level block diagram in Figure 5 in which an example embodiment is shown of a system for training a machine deep learning model for image processing. The computing device (102) includes a processor (110) for executing the functions of components described below, which may be provided by hardware or software units executing on the computing unit. The software units are stored in a memory (112) which provides instructions to the processor (1 10) to carry out the functionality of the described components.

The training system (310) may include a processed image providing component (306) arranged to provide processed images with initial model prediction outcomes to a user interface for display by the one or more user devices (104, 106). The processed image providing component (306) may be for providing processed images with initial model prediction outcomes of areas of detected objects of a set of current inference classes from the prediction model having initial training and outputting the processed images to a user interface for display.

The processed image providing component (306) may include a moderated data collection component (308) configured to collect moderated data associated with the initial model prediction outcomes. The moderated data may be collected via a user input to the moderator interface. The moderated data may include moderated feedback data from at least two sources to avoid user bias. The moderated feedback data may be ground-truthed feedback. The moderated data collection component (308) may include receiving secondary moderation from the moderator user interface for resolving disagreement in initial moderator feedback.

The training system (310) may include data merging component (312) arranged to merge the collected moderated data with a representative sample training dataset obtained by a representative sample component (314), to produce a focused dataset. The representative sample training dataset may be representative across the set of current inference classes and class permutations. The training system (310) may also include a replacement training dataset component (316) arranged to replace the training dataset with a representative sample focussed dataset.

The training system (310) may further include an additional burst training component (315) configured for using the focused dataset for a burst of additional model training of the initial model to obtain an updated model, where the burst of additional model training is of shorter duration than the initial training. The training data may be updated regularly in accordance with aspects of the present disclosure. The additional burst training component (315) may be configured to provide a reduced epoch burst of training. The reduced epoch burst may execute a limited epoch iteration of training.

The training system (310) may include a continuous training component (317) to continuously update the model in response to the additional bursts of model training using iterations of focussed datasets. The continuous training component (317) may repeat the training method with the updated model as the initial model to provide continuous model training using moderated data.

The training system (310) may include a validating component (320) arranged to validate the performance metrics of the updated model against the initial model. Validating the performance metrics of the updated model against the initial model may include replacing the initial model with the updated model if the updated model provides improved performance, or rejecting the updated model if the updated model provides worse performance metrics. The validating component (320) may include a technical validation component (322) for using a test dataset including moderating processed images for validating the updated model. The validating component (320) may further include a user validation component (324) for providing a staged mode of test use of the updated model and collecting feedback via the user interface. The model updating component (318) may replace the initial model with the updated model when the performance metrics of the updated model is an improvement compared to the performance metrics of the initial model.

Figure 6 is a flow diagram showing an example method of producing a machine learning model. It should be appreciated that, in some embodiments, the quality of the dataset that is used for the initial training of machine learning models has to be of high quality in order to provide the desired results. Training of the machine learning models using substandard datasets or simply partially training the models may not reach adequate levels of performance for the intended use of the models.

It should be appreciated that the method set out below is just one example of how a machine learning model may be trained for initial use of the machine learning model. Various other methods may be envisaged and applied in practice. The method may include the following steps:

Step 1 : Initial data acquisition (402) - During this step, data, related and appropriate to proposed inference classes, is collected from public open sources, internal datasets as well as any 3^rd parties, which the machine learning-system has data sharing and /or acquisition agreements with. For the initial dataset approximately 500 images per class are collected to establish a baseline and viability of proceeding with the proposed model(s).

Step 2: Data analysis and preparation (404) - All acquired data is analysed for the presence of required classes and classified per inference class. Further data analysis is performed on the quality, format, encoding and size of images (in the acquired data) to ensure uniformity of data for the initial training of the machine learning models. If the dataset to be used for training is not uniformed or cannot be made uniformed, more data needs to be acquired. When the training data is uniformed, the data is inspected for duplicated images, which are then discarded. Should the data now contain less than the required minimum of images per inference class, more data needs to be acquired.

This step further includes annotation validation and labelling (405). Annotation validation and labelling is only performed when existing data annotations cannot be confirmed to have come from legitimate and respected sources and/or if the data has no annotations whatsoever. During this step, the data is loaded into a pre-selected labeller platform with prepopulated annotations. Once the data is loaded, an internal medical team may work through the data, correcting, augmenting, and providing new annotations for the images in the data.

Step 3: Initial training of best suited algorithms, based on base hypothesis (406). This step includes training selected algorithms on the initial dataset, being the data acquired and discussed above, for a predefined time period, such as 48 hours, with precision metrics collected in set intervals, such as every 2 hours. The initial selection of algorithms is performed using latest artificial intelligence (Al) research and literature as well as existing internal experience for a proposed usage hypothesis. Initial training is generally performed on GPU- or TPU-enabled hardware for best performance; however, if the selected algorithms do not support GPU acceleration or proposed usage of the models exclude ability to use accelerated hardware, models are trained purely using CPUs with adjusted training duration, performance metrics collection and evaluation.

Step 4: Initial model validation and algorithm selection (408) - Performance metrics of trained models from the previous step are collected and compared between each other as well as against a pre-defined base acceptance criteria. Models not meeting the required acceptance criteria are discarded and the best performing model or models are selected for further development. Should (409) no models meet the acceptance criteria, the process moves back to the previous step and the previous step is repeated.

Step 5: API-enablement of the selected models (410) - Before the models can be submitted to an internal medical team for validation, pipeline application programming interfaces (APIs) need to either be written to allow the models to be accessed via a user-friendly interface, or models must be wrapped with existing APIs. This activity will produce testable set of services that are available and deployed via container images in a selected deep Al platform. For every service created during this step, a new record, such as a GitLab Project, is created to ensure tracking and monitoring of the development effort.

Step 6: Internal model validation by the medical team (412) - The internal medical team or medical domain experts may validate model performance during this stage using a subset of the initial dataset acquired in Step 1 . This dataset is purposefully excluded from the training done in Step 3 and is representative of as many scenarios of the inference classes as possible. It is possible to use a different dataset for validation purposes during this activity, if the dataset has not been used during Step 3 and ground truth for the inference classes can be established via consensus between two or more medical experts. If (413) one or more models are rejected, these models are moved back to Step 3, either for additional training or algorithm selection.

Step 7: Further model training on additional datasets (414) - Approved models are trained on additional datasets, which have undergone the same data preparation as the initial dataset in Step 2. The training time varies and is dependent on the size of the additional dataset, learning rate of the models, hardware used for training and other parameters. Same as with the initial training, precision metrics are taken at set intervals, such as every 2 hours, and remediation actions may be taken should the models fail to perform well. If (415) the models are altogether not responsive to further training, the process is moved back to Step 3 for algorithm selection.

Step 8: Final model validation by the medical team (416) - Trained models are again validated by the internal medical team or domain experts; however, during this activity the accuracy requirement for the models are higher than during Step 6. For example, models may now be required to achieve a score over 90% to be moved to clinical trials. The dataset used during this stage also includes images that were not pre-processed and often include poor quality images, images differing in size and formats as well as very complex scenarios that test models’ ability to correctly detect inference classes. Rejected models are moved (417) back to Step 7 in the process.

Step 9: Model deployment for clinical trials (418) - During this step approved models are deployed to new or existing sites for clinical trials. Cloud deployments are preferred for the trials as hardware can be managed and replaced easier than with on-premises deployments; however, if cloud deployment is not possible, hardware is procured, configured, and deployed at a trial site.

Step 10: Performance monitoring and metrics collection (420) - During the clinical trials, machine learning models as well as underlying API services, operating systems, integration components and other hardware is closely monitored, and performance metrics are collected. Samples of models’ predictions are validated as they are completed by the internal medical team as well as medical professionals involved in clinical trials. Post completion of clinical trials, all images and model predictions are analysed for inference accuracy and performance; all other software, integration and hardware metrics are analysed by a technical team to ensure performance, usage and other characteristics expected from the platform. Should (421) the models not perform as expected during or post clinical trials, the process is moved back to Step 8 for image and prediction analysis as well as further remediation activities. Step 11 : Models are stabilized and moved to continuous learning mode (422) - Models, which complete clinical trials and are approved for production deployments, are now available for deployments at new and existing production sites. They are also moved to production container repositories and machine learning model weights’ buckets. Continuous learning mode, discussed with reference to Figure 2 above, is also enabled on these models.

Figure 7 is a schematic diagram which illustrates an example representation of providing a representative sample of a dataset. It should be appreciated that only the high-levels steps are discussed below.

For the first step in the process, the data is split into separate datasets per class that is present on an image. Images that contain multiple classes are duplicate for each class-specific dataset. This is a fairly simple process and does not require any intricate calculations or data manipulations. In practice, for single-class models, this step is generally omitted.

It should be appreciated that from here on, every following step is performed on every dataset generated by the previous step. The diagram illustrated in Figure 7 is simplified to include the flow of only the first output dataset to the next step in order to keep the graphical representation manageable.

Even though a number of classes detected by models can occur almost anywhere on the image, it is an important part of the process to segment the dataset based on the location of the class.

This may be done by using a 4-quadrant location segmentation. Note however, that the true centre of the image is not taken as the (0, 0) coordinate of the four quadrants. The way that this may be calculated is by calculating the centre of each of the locations of the given class and plot the distribution of these centre points. An example graphical representation hereof is illustrated in Figure 8.

Based on the distribution, the (0, 0) coordinate of the four quadrants would most likely be off centre of the true centre of the image. Then, the set will be divided into 5 subsets - upper left, lower left, upper right, lower right and cross segment. The last subset would include all the images where class location extends beyond a quadrant.

The next step in the process is to take each of the images in the resultant quadrant datasets and subdivide it into further sets based on the class appearance. This may be the most intensive part of the entire de-duplication process. Several edge tracing techniques may be used to define the outline/edges of the object that the class represents within its location. Then, a set of metrics may be used to categorise the objects into a finite number of classes based on the shape, size, appearance of the object.

Below are 3 broad categories/metrics that are used for this dataset segmentation:

- Number of objects within the specified location as at times edge detection may reveal several objects rather than a single object within the specified area. Especially if the objects are small, the user would typically mark multiple objects near each other in a single class location. This segmentation would typically result in two subsets of data - single or multiple objects.

Object area is calculated by estimating the centre of the shape and the average radius from this centre. This is a simpler and more computationally efficient method of calculating an approximation of the object’s area as some of the object may have very peculiar outlines with hundreds of edges.

Regularity quality of the edges approximates how regular (or smooth) or irregular (or jagged) object’s edges are. This is typically calculated using the number of path points, distances, and angles of path points from one another. To keep the number of subsets generated by this step in the process to a manageable minimum, we limit the number of sets generated by this metric to at most five. Although, it should be appreciated that this threshold is a parameter and can be adjusted should the need arise.

Finally, the datasets of the previous step may be divided based on the “greyscale intensity” of the objects, whose edges were defined in the previous step. Although this example step deals only with greyscale, since radiological images are greyscale in nature, the process can be modified to segment by colour as well.

The aim of this segregation is to divide the set based on how dense or solid the object of interest is, less dense objects would be represented by a darker shade of grey. This segregation is achieved by calculating an average of each pixel’s greyscale value (between 0 and 255) within the object’s edges.

From the resulting datasets at least 3 images are selected - 2 for training and 1 for testing sets. If the set contains more than 30 images, 10% of the images would be selected; while for large subsets of 1 ,000 of more images, a configured maximum of 100 images would be selected - 80 for training and 20 for testing. It should be appreciated that these thresholds are configurable parameters. Candidate images are chosen based on the following broad criteria, which can be adjusted from time to time to reduce any potential bias of the resultant dataset:

- Source of data, to ensure that at least a single image is chosen from each of the available data sources - these would include geographical and other location specific identifiers for the sources.

Demographics, especially patient’s age group, to ensure that at least the following broad age groups are equally represented, if available in the dataset - 0 - 18 months, 18 months - 16 years, and 16 years and above.

Image quality, to ensure that both high and low image qualities are included in the resultant dataset. If the set includes analogue images that were subsequently digitized, these are included as well.

For other imaging modalities, the selection of the images may differ from the above, for example for mammography data demographics may be excluded from the qualifying criteria for image selection.

As represented in Figure 7, the sets are merged in reverse order with the following two conditions checked:

1 . Removing images that are duplicated. It is possible for the same image to appear in multiple subsets, for example a particular complex case with multiple different classes may appear in two or more class subsets. Since we require only single presence of the image in the resultant dataset, once it is added as part of one subset, it will be ignored for all others.

2. At least 1 ,000 images to be available in the resultant set per each class category (CDx datasets).

Figure 9 is a flow diagram showing an example method of producing a machine learning model to obtain initial model predictions and using the initial model predictions in the ongoing training of a machine learning model.

The method may include identifying (500) an initial training dataset for training a machine learning model. This may include the computing device selecting the dataset from a database maintained by the computing device, or to which the computing device has access. The method may include determining whether there is sufficient data available to ground truth the training data. If (502) there is no ground truth data available, the initial training dataset may be ground-truthed by consortium. In other words, the training data may be ground-truthed by one or more specialists. When the dataset is ground-truthed, the dataset may be pre-processed (504). Pre-processing (504) the dataset may, for example, include annotation validation and labelling of the dataset. The pre-processed dataset may be stored as an initial training dataset (D1 ). The initial training dataset may be processed (505) to form a reduced representative sample dataset (RSD1 ), using the steps described with reference to Figure 7.

The method may include selecting (506) a suitable machine learning algorithm to train a machine learning model for image processing. Once an algorithm has been selected, initial training (508) of the model may commence. The best suited algorithms may be based on a base hypothesis associated with the purpose of the system.

After the initial model has been sufficiently trained, the model may undergo (510) technical and/or clinical validation. This may include determining whether the model is ready for deployment for clinical trials.

If the model is determined to be ready for deployment, the model is released (512) for clinical trials and production. If, however, the model is not ready for deployment, the model is investigated (514) and the dataset as well as training metrics associated with the model are identified. After identifying the dataset and training metrics, it can be determined if more training is required. If more training is required, the model may be trained (508) further. However, if more training is not required, the method may include determining if more data is required.

If more data is required, the method may move back to the initial step of identifying (500) the training dataset. At this point, a new/improved dataset may be used for training of the model, and all the steps may be repeated. If it is determined that new data is not required, the method may include determining if another machine learning algorithm is required. When a new algorithm is required, the method may return to the step of selecting a suitable algorithm candidate. If a new algorithm is not required, the method flow may be abandoned (516), and the model discarded.

The disclosure provides a method and system enabling deep learning algorithms, which are used to perform pathology detection, to enhance and improve at a much more rapid rate than is possible with traditional methodologies for training and improving deep learning algorithms.

When the model is determined to be ready for clinical deployment, the model may be released (512) for clinical trials and production.

It is at this point where the continual learning process is implemented. The steps described in this section of the process is described in detail with reference to Figure 2. The method may include collecting (518) feedback data on the model predictions from a user, such as a radiologist. The feedback may be received from a user device under the control of the radiologist. The method may include determining if enough feedback data is provided by the radiologist. If enough data is provided, the method may include transmitting the model predictions and feedback data for moderation and ground truthing, alternatively, the previous step to collect feedback data may be repeated.

The method may include moderating and ground truthing (520) the collected data. The data may be moderated and ground-truthed by one or more moderators, such as a group of one or more radiologists. The method may include determining if further feedback is required. This may for example, include determining if a disagreement between moderators exist, and, if so, the method may include providing feedback and additional training (522). This may include sending the model predictions to secondary moderation for further feedback and ground truthing. If no feedback is needed, the method may include preparing a moderated dataset (CD1 ), including the moderated and ground-truthed feedback data and model predictions. The moderated dataset (CD1) may be merged (524) with a representative sample dataset of the initial training dataset to create a focussed dataset (MD1). The initial training dataset may be processed to obtain a representative sample dataset once the training dataset has been pre-processed or at any time before the focussed dataset (MD1 ) is created.

The focussed dataset (MD1) may be used to train (526) the model for short bursts of time so as to train the model using the new focussed dataset. Once the model has been trained (526) using the focussed dataset (MD1), the model may undergo further technical (528) and/or clinical (530) validation. During this step, the performance and accuracy metrics of the model may automatically be validated against the current performance and accuracy metrics. A model that is performing worse than the original or previous model should not be allowed to replace a better performing current model. For borderline improvements, or improvements in some areas with minor decrease in performance in others, the model will be submitted for manual clinical validation.

If the model is deemed as a better model, the model may be considered ready for deployment, and it may replace the original model. In other words, the model may be deployed (532). The dataset used for training this model, being the focussed dataset (MD1), may be processed (534) to provide a representative sample dataset again and replace RSD1 . Alternatively, if the model is deemed not to be ready for deployment, the model may be rejected and the steps of collecting feedback, moderating, and adding to the new dataset will continue without the deployment of the new model.

It should be appreciated that this process is cyclic in nature and will repeat at pre-defined intervals, typically once one hundred or more images have been assessed by the radiologists’ users and feedback recorded. Training of the models, although short in nature, is usually scheduled to run overnight and will immediately be suspended if new images are received for processing for the current models.

The method and system described uses a combination of user interface and user experience elements to collect feedback from user, such as radiologists, via a medical imaging platform, ground truth the collected feedback and a unique training data preparation process. These features are configured to convert newly inferred images into new data sources to train existing machine learning models on an ongoing basis.

The system and method described herein may improve the efficiency and speed of training new deep learning models. However, it should be appreciated that the efficiency and accuracy of the deep learning models may be directly related to the accuracy of data that was used to train the initial models. In other words, the initial models used as input into the system and method must be trained to a certain predefined level of accuracy, i.e., already have pre-trained weights and a substantial initial training and testing dataset.

Figure 10 illustrates an example of a computing device (600) in which various aspects of the disclosure may be implemented. The computing device (600) may be embodied as any form of data processing device including a personal computing device (e.g. laptop or desktop computer), a server computer (which may be self-contained, physically distributed over a number of locations), a client computer, or a communication device, such as a mobile phone (e.g. cellular telephone), satellite phone, tablet computer, personal digital assistant or the like. Different embodiments of the computing device may dictate the inclusion or exclusion of various components or subsystems described below.

The computing device (600) may be suitable for storing and executing computer program code. The various participants and elements in the previously described system diagrams may use any suitable number of subsystems or components of the computing device (600) to facilitate the functions described herein. The computing device (600) may include subsystems or components interconnected via a communication infrastructure (605) (for example, a communications bus, a network, etc.). The computing device (600) may include one or more processors (610) and at least one memory component in the form of computer-readable media. The one or more processors (610) may include one or more of: CPUs, graphical processing units (GPUs), microprocessors, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs) and the like. In some configurations, a number of processors may be provided and may be arranged to carry out calculations simultaneously. In some implementations various subsystems or components of the computing device (600) may be distributed over a number of physical locations (e.g. in a distributed, cluster or cloud-based computing configuration) and appropriate software units may be arranged to manage and/or process data on behalf of remote devices.

The memory components may include system memory (615), which may include read only memory (ROM) and random access memory (RAM). A basic input/output system (BIOS) may be stored in ROM. System software may be stored in the system memory (615) including operating system software. The memory components may also include secondary memory (620). The secondary memory (620) may include a fixed disk (621 ), such as a hard disk drive, and, optionally, one or more storage interfaces (622) for interfacing with storage components (623), such as removable storage components (e.g. magnetic tape, optical disk, flash memory drive, external hard drive, removable memory chip, etc.), network attached storage components (e.g. NAS drives), remote storage components (e.g. cloud-based storage) or the like.

The computing device (600) may include an external communications interface (630) for operation of the computing device (600) in a networked environment enabling transfer of data between multiple computing devices (600) and/or the Internet. Data transferred via the external communications interface (630) may be in the form of signals, which may be electronic, electromagnetic, optical, radio, or other types of signal. The external communications interface (630) may enable communication of data between the computing device (600) and other computing devices including servers and external storage facilities. Web services may be accessible by and/or from the computing device (600) via the communications interface (630).

The external communications interface (630) may be configured for connection to wireless communication channels (e.g., a cellular telephone network, wireless local area network (e.g. using Wi-Fi™), satellite-phone network, Satellite Internet Network, etc.) and may include an associated wireless transfer element, such as an antenna and associated circuitry.

The computer-readable media in the form of the various memory components may provide storage of computer-executable instructions, data structures, program modules, software units and other data. A computer program product may be provided by a computer-readable medium having stored computer-readable program code executable by the central processor (610). A computer program product may be provided by a non-transient or non-transitory computer- readable medium, or may be provided via a signal or other transient or transitory means via the communications interface (630).

Interconnection via the communication infrastructure (605) allows the one or more processors (610) to communicate with each subsystem or component and to control the execution of instructions from the memory components, as well as the exchange of information between subsystems or components. Peripherals (such as printers, scanners, cameras, or the like) and input/output (I/O) devices (such as a mouse, touchpad, keyboard, microphone, touch-sensitive display, input buttons, speakers and the like) may couple to or be integrally formed with the computing device (600) either directly or via an I/O controller (635). One or more displays (645) (which may be touch-sensitive displays) may be coupled to or integrally formed with the computing device (600) via a display or video adapter (640).

The foregoing description has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.

Any of the steps, operations, components or processes described herein may be performed or implemented with one or more hardware or software units, alone or in combination with other devices. Components or devices configured or arranged to perform described functions or operations may be so arranged or configured through computer-implemented instructions which implement or carry out the described functions, algorithms, or methods. The computer- implemented instructions may be provided by hardware or software units. In one embodiment, a software unit is implemented with a computer program product comprising a non-transient or non- transitory computer-readable medium containing computer program code, which can be executed by a processor for performing any or all of the steps, operations, or processes described. Software units or functions described in this application may be implemented as computer program code using any suitable computer language such as, for example, Java™, C++, or Perl™ using, for example, conventional or object-oriented techniques. The computer program code may be stored as a series of instructions, or commands on a non-transitory computer-readable medium, such as a random access memory (RAM), a read-only memory (ROM), a magnetic medium such as a hard-drive, or an optical medium such as a CD-ROM. Any such computer-readable medium may also reside on or within a single computational apparatus and may be present on or within different computational apparatuses within a system or network.

Flowchart illustrations and block diagrams of methods, systems, and computer program products according to embodiments are used herein. Each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, may provide functions which may be implemented by computer readable program instructions. In some alternative implementations, the functions identified by the blocks may take place in a different order to that shown in the flowchart illustrations. Some portions of this description describe the embodiments of the invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations, such as accompanying flow diagrams, are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. The described operations may be embodied in software, firmware, hardware, or any combinations thereof.

The language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.

Finally, throughout the specification and accompanying claims, unless the context requires otherwise, the word ‘comprise’ or variations such as ‘comprises’ or ‘comprising’ will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.

Claims

CLAIMS:

1. A computer-implemented method of training a machine deep learning system for image processing using a prediction model for object detection of inference classes, comprising: providing processed images with initial model prediction outcomes of areas of detected objects of a set of current inference classes from the prediction model having initial training and outputting the processed images to a user interface for display; collecting moderated data associated with the initial model prediction outcomes, wherein the moderated data is moderated feedback data from at least two sources to avoid user bias; merging the collected moderated data with a representative sample training dataset of the initial prediction model to produce a focused dataset, wherein the representative sample training dataset is representative across the set of current inference classes and class permutations; and using the focused dataset for a burst of additional model training of the initial model to obtain an updated model, wherein the burst of additional model training is of shorter duration than the initial training.

2. The method as claimed in claim 1 , including validating performance metrics of the updated model against performance metrics of the initial model, and replacing the initial model with the updated model when the performance metrics of the updated model is an improvement compared to the performance metrics of the initial model.

3. The method as claimed in claim 2, wherein the validating includes a technical validation using a test dataset including moderating processed images for validating the updated model.

4. The method as claimed in claim 2 or claim 3, wherein the validating includes a staged mode of test use of the updated model and collecting feedback via the user interface.

5. The method of any one of claims 2 to 4, including repeating the method with the updated model as the initial model to provide continuous model training using moderated data and reducing the focused dataset to be used as the new representative sample training dataset.

6. The method as claimed in any one of the preceding claims, wherein the burst of additional model training is of a reduced duration in terms of training epochs, wherein the reduced epoch burst executes limited epochs iteration of training.

7. The method as claimed in any one of the preceding claims, including obtaining a representative sample of the focused dataset and replacing the representative sample training dataset with the representative sample focused dataset.

8. The method as claimed in any one of the preceding claims, wherein the step of collecting moderated data includes: receiving feedback data received from multiple users; transmitting the feedback data to a moderator user interface; using secondary moderation if feedback disagreements exist in the original feedback; and receiving the moderated data from the moderator user interface that provides a ground truth for the model prediction.

9. The method as claimed in any one of the preceding claims, wherein the moderated data includes adjusted processed images with annotations relating to the initial model prediction outcomes, wherein the annotations replace and/or add to the model prediction outcomes of the processed image.

10. The method as claimed in any one of the preceding claims, wherein the initial model prediction outcomes include bounding boxes on the processed images to provide an indication of an area of the image to which the model prediction relates.

11 . The method as claimed in any one of the preceding claims, wherein receiving moderated data includes feedback in one or more of the following manners: deselecting false positive results included in the one or more prediction outcomes; adding false negative results that are not included in the one or more prediction outcomes; moving an area of a model prediction to a more accurate location in the processed image; leaving additional notes associated with one or more of the model predictions.

12. The method as claimed in any one of the preceding claims, including generating the initial model prediction outcomes, wherein each of the model prediction outcomes is the result of a machine learning algorithm being executed on a set of training data including a full set of training epochs.

13. A system for training a machine deep learning system for image processing using a prediction model for object detection of inference classes, the system including a memory for storing computer-readable program code and a processor for executing the computer-readable program code, the system comprising: a processed image providing component for providing processed images with initial model prediction outcomes of areas of detected objects of a set of current inference classes from the prediction model having initial training and outputting the processed images to a user interface for display; a moderated data collecting component for collecting, via user input to a moderator user interface, moderated data on the initial model prediction outcomes, wherein the moderated data is moderated feedback data from at least two sources to avoid use bias; a data merging component for merging the collected moderated data with a representative sample training dataset of the initial prediction model to produce a focused dataset, wherein the representative sample training dataset is representative across the set of current inference classes and class permutations; and an additional burst training component for using the focused dataset for a burst of additional model training of the initial model to obtain an updated model, wherein the burst of additional model training is of shorter duration than the initial training.

14. The system as claimed in claim 13, including: a validating component for validating the performance metrics of the updated model against the performance metrics of the initial model; and a model updating component for replacing the initial model with the updated model when the performance metrics of the updated model is an improvement compared to the performance metrics of the initial model.

15. The system as claimed in claim 14, including a continuous training component for repeating the method with the updated model as the initial model to provide continuous model training using moderated data.

16. The system as claimed in any one of claims 13 to 15, wherein the additional burst training component provides a reduced epoch burst of training, wherein the reduced epoch burst executes a limited epoch iteration of training.

17. The system as claimed in any one of claims 13 to 16, including a replacement training dataset component for obtaining a representative sample of the focused dataset and replacing the representative sample training dataset with the representative sample focused dataset.

18. The system as claimed in any one of claims 13 to 17, wherein the moderated data collecting component includes receiving secondary moderation from the moderator user interface for resolving disagreement in initial moderator feedback.

19. The system as claimed in any one of claims 13 to 18, including an imaging platform including a user interface for display of processed images with initial model prediction outcomes and for collection of user input, wherein the collection of user input includes receiving input from a user device having access to the user interface of the imaging platform.

20. A computer program product for training a machine deep learning system for image processing using a prediction model for object detection of inference classes, the computer program product comprising a computer-readable medium having stored computer-readable program code for performing the steps of: providing processed images with initial model prediction outcomes of areas of detected objects of a set of current inference classes from the prediction model having initial training and outputting the processed images to a user interface for display; collecting, via user input to the user interface, moderated data on the initial model prediction outcomes, wherein the moderated data is moderated feedback data from at least two sources to avoid user bias; merging the collected moderated data with a representative sample training dataset of the initial prediction model to produce a focused dataset, wherein the representative sample training dataset is representative across the set of current inference classes and class permutations; and using the focused dataset for a burst of additional model training of the initial model to obtain an updated model, wherein the burst of additional model training is of shorter duration than the initial training.