WO2023056507A1

WO2023056507A1 - System and method using machine learning algorithm for vital sign data analysis

Info

Publication number: WO2023056507A1
Application number: PCT/AU2022/051182
Authority: WO
Inventors: Robert Antle McNamara; Shiv Akarsh Meka
Original assignee: East Metropolitan Health Service
Priority date: 2021-10-07
Filing date: 2022-10-04
Publication date: 2023-04-13

Abstract

A system and method of vital sign data analysis is disclosed. It includes an arrangement for receiving and transmitting vital sign data in real time, as well as a machine learning algorithm for processing the vital sign data in real time and predicting, forecasting, optimising, prognosticating and/or diagnosing a patient condition based on analysis of the vital sign data. A pipeline and method for pre-processing the vital sign data prior to processing by the machine learning algorithm is also disclosed. Pre-processing the vital sign data includes data cleaning as biomedical signals sampled at a high-frequency are prone to various noise forms. Advantageously the system and method of vital sign data analysis is employed in m-sTBI ICU neuroprotective management for predicting when a tIH episode is likely to occur.

Description

“SYSTEM AND METHOD USING MACHINE LEARNING ALGORITHM FOR VITAL SIGN DATA ANALYSIS” Field of the Invention The present invention relates to a system and method using a machine learning algorithm for vital sign data analysis and relates particularly, though not exclusively, to such a method for predicting traumatic intracranial hypertension (tIH). Background to the Invention One of the most common complications of moderate to severe traumatic brain injury (m-sTBI) is a rise in intracranial pressure (ICP). If significant, rises in ICP cause a condition known as traumatic intracranial hypertension (tIH) which can inflict further damage to the injured brain and is strongly associated with poor outcomes. tIH is defined as a rise in ICP above 22 mmHg which lasts for 5 or more minutes. Accordingly, a cornerstone of m- sTBI ICU neuroprotective management is the monitoring of ICP and treatment of tIH using a treatment strategy which escalates in intensity in response to ICP rises. Due to the pathophysiology of tIH, wherein rapid ICP rises occur as skull capacity is reached, prior art methods for the treatment of tIH are universally reactive. Despite numerous observational trials demonstrating the benefit of ICP-guided m-sTBI treatment algorithms, multiple interventional trials aimed at reducing ICP values have failed to demonstrate benefits in terms of patient outcomes. One potential explanation for this apparent lack of benefit is the reactive application of interventions tested, wherein treatments were administered after damaging tIH episodes, often lasting for several hours, had occurred. This has led many to theorise that pre-empting ICP rises may offer better results and lead to improved outcomes in m-sTBI patients. It has long been known that prevention of a disease or early detection and intervention into a disease process can lead to better outcomes. This maxim of medicine has led to the hypothesis that prevention and/or early intervention into episodes of tIH will lead to improved outcomes. In this effort several groups have attempted to develop the ability to forecast ICP values and/or predict tIH episodes over the past 45 years. In 2013, Guiza et al., using data collected in 2005, built predictive models using multivariate logistic regression and the machine learning technique known as Gaussian Processes.¹ Using these techniques, the team developed an algorithm which in computer modelling demonstrated a sensitivity of 82% and specificity of 75% for predicting an tIH event 30 minutes prior to the episode. In this and all subsequent studies by Guiza et al.’s group, a tIH event was classified as a rise in ICP above 30 mmHg which lasted for at least 10 minutes. In 2016, Myers et al. described their use of the Autoregressive-Ordinal Regression (AR-OR) machine learning technique to approach the problem of tIH prediction.² In this study, tIH events were defined as an ICP greater than or equal to 20 mm Hg which lasted for at least 15 minutes. With the use of the AR-OR methodology, Myers demonstrated the ability to predict an ICP event 30 minutes prior with an Area Under the Receiver Operator Curve (AUROC) of 0.85. These examples and all other similar efforts utilised retrospective data. None of these prior efforts have been successful in reliably predicting when a tIH episode is likely to occur in real time. The present invention was developed with a view to providing a more reliable and accurate method of predicting tIH in real time using a machine learning algorithm. However, it will be understood that the same method may be readily adapted to other kinds of vital sign wave data analysis. ¹ Güiza F, Depreitere B, Piper I, Van den Berghe G, Meyfroidt G. Novel methods to predict increased intracranial pressure during intensive care and long-term neurologic outcome after traumatic brain injury: development and validation in a multicenter dataset. Critical care medicine.2013;41(2):554-564. ² Myers RB, Lazaridis C, Jermaine CM, Robertson CS, Rusin CG. Predicting Intracranial Pressure and Brain Tissue Oxygen Crises in Patients With Severe Traumatic Brain Injury. Crit Care Med.2016;44(9):1754-1761. Therefore, the method of the present invention is not limited in its application to predicting tIH but can be applied more broadly to other kinds of vital sign wave data analysis for facilitating timely medical intervention and providing more accurate and timely prognostic information. References to prior art in this specification are provided for illustrative purposes only and are not to be taken as an admission that such prior art is part of the common general knowledge in Australia or elsewhere. Summary of the Invention According to one aspect of the present invention there is provided a system of vital sign data analysis, the system comprising: an arrangement for receiving and transmitting vital sign data in real time; and, a machine learning algorithm for processing the vital sign data in real time and predicting, forecasting, optimising, prognosticating and/or diagnosing a patient condition based on analysis of the vital sign data. Advantageously the machine learning algorithm is a Transformer algorithm. Preferably the Transformer algorithm is selected from the group of Bidirectional Encoder Representations from Transformers (BERT)-derived algorithms. Preferably the arrangement for receiving and transmitting vital sign data comprises a pipeline for pre-processing the vital sign data prior to processing by the machine learning algorithm. Advantageously the pre-processing pipeline comprises an autoencoder artificial neural network architecture for embedding high frequency vital sign data for algorithm processing. Preferably the autoencoder architecture comprises an encoder-decoder combination in the form of a machine learning model. Preferably the encoder is used for ‘compressing’ and the decoder for ‘decompressing’. Typically, edge-based and lightweight autoencoders are used in the architecture, in which latent space information from an ensemble of layers is gathered and stacked. Advantageously the vital sign data is multimodal data. Preferably multimodal data comprises receiving string, numerical data of patient’s medical history which can include medication, pathology reports, past diagnoses and image data, together with vital sign wave data. Preferably the BERT-derived algorithm employs a numerical latent space representation of each “feature of interest” which are learnt as the algorithm is trained. Each modality/feature (i.e., wave data, image data, text data, structured data) typically has its own latent space representation. Preferably the pre-processing pipeline further comprises data cleaning means in which data is initially prepared by being passed to denoising autoencoders and clustered using their cosine similarity. Preferably data clusters that have a representative presence of outliers and jitter are removed. Advantageously samples from the leftover data clusters are prepared in a variety of ways selected from the following list in varying portions: i) Data is augmented using adaptive spectral mixing; ii) Values from individual sensors are clipped and blacked out for portions between 10-30% of data for a 15 min interval block; iii) Outliers and unrealistic derangements are added to isolated signals such as ABP, ICP and CPP; and iv) Phase between ECG and rest of the signals are shifted. Preferably the data cleaning means further comprises, separately, networks comprising convolutional autoencoders and U-net/Sparse Fast Fourier Transformation (SFFT) algorithms which are trained to segment individual waveforms. Preferably models that operate on data generated are trained to predict unaugmented target data. Preferably, in the case of U-net/SFFT, the output also undergoes inverse Fourier transform and is represented back in timeseries. Typically, a normalized distance metric in the frequency domain is used as the loss function when training the model. Advantageously the system further comprises a cloud computing facility to develop swarm learning networks and allow continuous machine learning algorithm improvement while eliminating the need for the transmission of patient data from contributing facilities. In a preferred embodiment of the present invention, the system of vital sign data analysis is employed in m-sTBI ICU neuroprotective management for predicting when a tIH episode is likely to occur, and the arrangement for receiving and transmitting vital sign data includes receiving and transmitting ICP data. According to another aspect of the present invention there is provided a method of vital sign data analysis, the method comprising: receiving and transmitting vital sign data in real time; and, processing the vital sign data in real time in a machine learning algorithm for predicting, forecasting, optimising, prognosticating and/or diagnosing a patient condition based on analysis of the vital sign data. Advantageously the machine learning algorithm is a Transformer algorithm. Preferably the Transformer algorithm is selected from the group of Bidirectional Encoder Representations from Transformers (BERT)-derived algorithms. Preferably the method further comprises: pre-processing the vital sign data prior to processing in the machine learning algorithm. Advantageously the step of pre-processing the vital sign data comprises a method of ‘compressed sensing’ which involves embedding high frequency vital sign data for algorithm processing. Preferably the step of pre- processing the vital sign data compresses “features-of-interest” which are stored and optimized for temporal addressing. Typically, in the pre- processing step the vital sign data is simultaneously combined and compressed at the transmitter and reconstructed at the receiver. Advantageously the vital sign data is multimodal data. Preferably the method further comprises receiving string and image data in real time, together with vital sign wave data, and processing the string and image data together with the vital sign wave data in the machine learning algorithm that uses multimodal inputs. Advantageously the method further comprises near complete decentralisation of a cloud computing facility for processing the vital sign data and the establishment of a swarm of local nodes at each participating site. Preferably local data is processed by a respective local node, with algorithm performance parameters being shared across the swarm to allow continuous machine learning algorithm improvement while eliminating the need for the transmission of patient data from contributing facilities. In a preferred embodiment of the present invention, the method of vital sign data analysis is employed in m-sTBI ICU neuroprotective management for predicting when a tIH episode is likely to occur, and the process of receiving and transmitting vital sign data includes receiving and transmitting ICP data. According to a further aspect of the present invention there is provided a pre-processing pipeline for vital sign data, the pre-processing pipeline comprising: an autoencoder artificial neural network architecture for embedding high frequency vital sign data for algorithm processing. Preferably the autoencoder architecture comprises an encoder-decoder combination in the form of a machine learning model. Preferably the encoder is used for ‘compression’ and the decoder for ‘decompression’. Typically, edge-based and lightweight autoencoders are used in the autoencoder architecture, in which latent space information from an ensemble of layers is gathered and stacked. Preferably the pre-processing pipeline further comprises data cleaning means in which data that is initially prepared by passing all data to denoising autoencoders, is clustered using their cosine similarity. Preferably data clusters that have a representative presence of outliers and jitter are removed. Advantageously samples from the leftover data clusters are prepared in a variety of ways selected from the following list in varying portions: i) Data is augmented using adaptive spectral mixing; ii) Values from individual sensors are clipped and blacked out for portions between 10-30% of data for a 15 min interval block; iii) Outliers and unrealistic derangements are added to isolated signals such as ABP, ICP and CPP; and iv) Phase between ECG and rest of the signals are shifted. Preferably the data cleaning means further comprises, separately, networks comprising convolutional autoencoders and U-net/SFFT which are trained to segment individual waveforms. Preferably models that operate on data generated are trained to predict unaugmented target data. Preferably, in the case of U-net/SFFT, the output also undergoes inverse Fourier transform and represented back in timeseries. Typically a normalized distance metric in the frequency domain is used as loss function when training the model. According to a still further aspect of the present invention there is provided a pre-processing method of vital sign data, the pre-processing method comprising: embedding high frequency vital sign data using an autoencoder artificial neural network architecture wherein the vital sign data is pre-processed for algorithm processing. Typically, latent space information from an ensemble of layers is gathered and stacked using edge-based and lightweight autoencoders in the autoencoder architecture. Preferably data is initially cleaned by passing all data to denoising autoencoders and clustering using their cosine similarity. Preferably data clusters that have a representative presence of outliers and jitter are removed. Advantageously samples from the leftover data clusters are prepared in a variety of ways selected from the following list in varying portions: i) Data is augmented using adaptive spectral mixing; ii) Values from individual sensors are clipped and blacked out for portions between 10-30% of data for a 15 min interval block; iii) Outliers and unrealistic derangements are added to isolated signals such as ABP, ICP and CPP; and iv) Phase between ECG and rest of the signals are shifted. Preferably individual waveforms are separately segmented using networks comprising trained convolutional autoencoders and U-net/SFFT. Preferably models that operate on data generated are trained to predict unaugmented target data. Preferably, in the case of U-net/SFFT, the output also undergoes inverse Fourier transform and is represented back in timeseries. Typically a normalized distance metric in the frequency domain is used as the loss function when training the model. Throughout the specification, unless the context requires otherwise, the word “comprise” or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers. Likewise, the word “preferably” or variations such as “preferred”, will be understood to imply that a stated integer or group of integers is desirable but not essential to the working of the invention. Brief Description of the Drawings The nature of the invention will be better understood from the following detailed description of several specific embodiments of the method of vital sign wave data analysis, given by way of example only, with reference to the accompanying drawings, in which: Figure 1 is a schematic diagram of a first embodiment of a machine learning (ML) model employed in the present invention in the form of a stacked Long Short-Term Memory (LSTM) using multiple sampling frequencies; Figure 2 is a schematic diagram of a second embodiment of a machine learning (ML) model employed in the present invention in the form of a LSTM encoder-decoder using a combination of convolutional filters and multi-layer perceptron layers to smooth waves and low frequency data; Figure 3 is a schematic diagram of a third embodiment of a machine learning (ML) model employed in the present invention in the form of a LSTM / GRU with memory elements; Figure 4 is a schematic diagram of a fourth embodiment of a machine learning (ML) model employed in the present invention in the form of a BERT-like Transformer model used to process multimodal data; Figure 5 is a schematic diagram of one embodiment of an autoencoder architecture employed in the present invention for compressed sensing, that combines variable-bitrate samples into a compressed form in medium-bandwidth applications; Figure 6 is a schematic diagram of a second embodiment of an autoencoder architecture employed in the present invention for compressed sensing, that contains an additional module used to generate and transmit cluster identification (ID) and cluster distance for low-bandwidth applications; Figure 7 is a data flow diagram of one embodiment of the overall data collection process employed in the present invention, which comprises a combination of Pawsey Supercomputing Cloud services and Amazon lambda functions to process sensor data at scale; and, Figure 8 is a process flow diagram illustrating a preferred arrangement for parsing of requests from various systems and partitioning independent processes based on the research study participation; Detailed Description of Preferred Embodiments The present inventors began development of a tIH prediction algorithm in 2020 and were successful in this endeavour by March 2021. During the research and development phase of the Artificial Intelligence-enhanced Management of Severe Traumatic Brain Injury (AIMS-TBI) research project, twenty-eight different machine learning (ML) algorithms were developed and trialled using retrospective patient data. Of these twenty-eight algorithms, four were successful in terms of being able to both predict ICP rises with a high degree of accuracy and operate in real time. The best performing of these AIMS-TBI algorithms are the BERT-derived and RoBERTa³-derived Transformer algorithms. Preferred embodiments of the four different ML algorithm models developed and successfully trialled will now be described with reference to Figures 1 to 4 of the accompanying drawings. Long Short-Term Memory/GRU architecture (stacked - LSTM) LSTMs are stateful architectures that are commonly used in time-series processing. “Stateful” architectures operate on the principle of carrying- forward-from-where-last-left. In other words, as most data transmission happens asynchronously i.e., several times an hour, data flow is designed to store the present state of the LSTM and be used at a later stage without requiring supplying the LSTM with past “n”-minute history to generate forecasts. In the context of AIMS-TBI, LSTMs were initially trialled and optimized using a three-layer stacked LSTM/GRU model – one for each signal – ECG, ABP, ICP, as shown in Figure 1. Two separate models were developed for Royal Perth Hospital (RPH) and Royal Melbourne Hospital (RMH). The models were trained using the binary cross entropy loss function. Gated-Recurrent Units (GRUs) yielded better results compared to LSTM in terms of precision-recall. Outputs from the GRU feeds into a ³ Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. dropout layer (dropout=0.2) during training. Input features are binned into (ICP, ABP, ECG: 100,100,300) for RMH and (ICP, ABP, ECG: 125,125,250) for RPH. The size of the model ranges between 70,000-85,000 parameters. Although the model boasts of fewer parameters than any of the other methods or models in this work, it doesn’t support multi-modal input i.e., simultaneous processing of image, sensor, and text data. Positives • Supports variable sensors sampled at various frequencies. • Very few parameters which allows it to be implemented on light computer architectures. • Best results for 3-min rollback window to predict for the next fifteen minutes. Drawbacks • Doesn’t support multimodal input data. • LSTMs/GRUs are known for “short-memory” and the pipeline is therefore limited by the maximum look-back period of the LSTM/GRU. LSTM/GRU encoder-decoder architecture LSTM/GRU encoder-decoder models are auto-regressive methods that are built around leveraging the power of several 1D-convolution layers operating individually on continuous sensor streams. Latent space vectors from the filters are passed to a Bidirectional LSTM (BiLSTM)/GRU encoder to produce time-series hidden representations: (ht). A multi-layered perceptron (MLP) model combines these hidden representations with past decoder values along with low frequency sensor readings such as ETCO₂, SPO₂, and HR data (see Figure 2). Normalized attention values modulate the past encoder hidden states to produce a single value that feeds into the decoder. Based on the past state and the current input, a decoder is trained to predict a time resolved hazard function of tIH survival. Positives • Light weight model that can be implemented on light computer architectures. • Support for sensors sampled at variable frequencies. • In theory, autoregressive models can be used to forecast for a “variable” time period. Negatives • The model, in its current form, doesn’t support multimodal data. • We are limited by LSTM’s maximum look-back period. LSTM/GRU with Memory elements This model is similar to a standard LSTM with a minor difference of an additional memory element and a counter added to the architecture. Although LSTMs/GRUs are known to address the problem of traditional RNNs, the lookback is still miniscule compared to time between successive ICH events. In this model, we implement two differentiable modules: a memory element (called the stack) and counter element. Stack is further segregated into local and global (see Figure 3). Local stack and Global stack have predefined memory allocations. Continuous hidden representations from LSTM/GRU stack enter the local stack and the machine learning model is trained to push “relevant” values into the global stack. The values that the model deems as relevant are discovered during the training process. Counter is used to give a temporal sense of events of interest. The model is trained both on survival analysis function as well as binary cross-entropy. It is trained end-to-end. Positives • Operates on continuous waves and asynchronous wave data. • Can be trained on data as a whole with little need to filter out anomalies. • Overcomes the maximum look back period by storing information to look back upon. • Counter object stores difference of the timeframes between subsequent events. Drawbacks • Slow to train. • Doesn’t support multimodal data. • Can be subject to overfitting for large lookbacks. Transformers (BERT) Transformer architectures are traditionally used in natural language processing (NLP). The construct is based on a relationship between queries and keys, which are learnt according to the underlying task. Words are encoded into letters and subsequently transformed into attention vectors. In our approach, a BERT-like transformer model is used to process multimodal data. Queries and keys are obtained from multimodal representations (as shown in Figure 4): past state information, sensor data, image, text, and structured data. The architectures are different for the different input modes. For sensors, we use the traditional convolution blocks as in LSTM/GRU encoder-decoder architecture. ResNet, used for images, generates a low dimensional output. Each mode has a latent space representation that feeds into a transformer encoder as an embedding. Fully dense networks are used for both structured data and low frequency sensors. We use a four- headed, one-layer encoder network. Different to the conventional network is an output dense layer which is modified to carry-over state information to the frame. Positives • Supports variable sensors sampled at various frequencies. • Model with an accuracy of 0.89 on RPH data and 0.93 on RMH data. • Supports multi-modal data. • Makes use of highly parallel GPU. Drawbacks • Large model with 3-5 million parameters. Transformer algorithms (‘Transformers’) were first described in 2017 by Vaswani et al.⁴ This family of Machine Learning (ML) algorithms can be characterised by their combined use of attention analysis and parallel processing. The first operational Transformer, known as the Bidirectional Encoder Representations from Transformers (BERT) algorithm, was released by Google’s Devlin et al. in 2018.⁵ BERT was designed as a natural language processor (NLP) algorithm. Subsequently, several Transformer algorithms (see Table 1 below) have been developed, either de novo or BERT-derived, for a variety of tasks.

In July 2021, Google released the Perceiver algorithm (Google LLC, Mountain View, USA).⁶ This BERT-derived algorithm is the first ML algorithm with multimodal input processing capabilities. The Perceiver algorithm can process audio wave, video, image, string, and Light Distance and Ranging (LIDAR) data simultaneously. Prior to the Perceiver, each ⁴ Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017). Attention is all you need. In: Advances in neural information processing systems, pps.5998-6008. ⁵ Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. ⁶ Jaegle A, Gimeno F, Brock A, Zisserman A, Vinyals O, Carreira J. Perceiver: General perception with iterative attention. arXiv preprint arXiv:210303206.2021. input modality required a separate ML algorithm. Google is planning on expanding the Perceiver algorithm to enable ‘end-to-end input modalities’ according to published reports. The present invention is the only other ML algorithm currently capable of multimodal input processing; it was developed by the project team for the purposes of traumatic intracranial hypertension (tIH) prediction. This BERT-derived algorithm, known as the AIMS⁷ algorithm, is capable of processing language (NLP), image, and vital sign wave data simultaneously. While the AIMS algorithm is designed for use in m-sTBI patients undergoing invasive neuromonitoring, it can be readily adapted for a variety of other clinical tasks. In 2019 three Australian metropolitan intensive care units (ICU) near simultaneously adopted the use of ICM+ neuromonitoring software (Cambridge Enterprise Ltd., Cambridge, UK), a technology which allows for the capture and real time analysis of high-resolution patient monitoring data. With the adoption of ICM+ software, a large amount of data generated in the process of caring for m-sTBI patients became available for further analysis. To facilitate data management and analysis, the ICUs formed the Monitoring with Advanced Sensors, Transmission and E-Resuscitation in Traumatic Brain Injury (MASTER-TBI) collaborative project in early 2020. Machine learning algorithms rely on large datasets to enable training of the ML algorithms. Therefore, a major focus of the MASTER-TBI project is data collection, analysis, and transmission of the ICM+ datasets to facilitate use in the AIMS algorithm. Typical data captured by the ICM+ software includes arterial blood pressure (ABP), ECG, oxygen saturation (SpO2), end tidal CO2, respiratory rate, temperature, central venous pressure (CVP), intracranial pressure (ICP) and cerebral perfusion pressure (CPP). The MASTER-TBI project standardised data capture frequencies to the following: • 300 and 500 Hz – ECG ⁷ Artificial Intelligence-enhanced Management of Severe Traumatic Brain Injury (AIMS-TBI or AIMS for short) • 250 Hz – ABP, CVP, CPP & ICP • 1 Hz – Temperature, Respiratory Rate, SpO2, PbtO2 (if available) With the advent of additional neuromonitoring devices, brain tissue oxygenation and temperature, cerebral microdialysis and/or quantitative EEG data may be added in the future. The ICM+ system captures data for the duration of ICP monitoring (typically between 3-10 days). From the captured data a wide variety of secondary information is derived which helps guide ICU clinicians in managing patients with severe TBI. Inherent to the collection of vital sign monitoring data is the issue of artefact signals. For example, signal artefacts may arise from routine access to a patient's arterial line, patient shivering, accidental removal of monitoring probes, blockage of end tidal carbon dioxide monitors, zero drift of the ICP monitor, etc. All these occurrences can lead to erroneous conclusions if not taken into account. Thus, MASTER-TBI requires ongoing interaction between machine learning / artificial intelligence experts and clinicians to ensure accuracy of any algorithm developed. The MASTER-TBI team has developed automated cleaning algorithms for ICM+ data processing. The MASTER-TBI project identified the need for an algorithm that was capable of real time vital sign wave processing/analysis, in addition to processing string, numerical data of patient’s past history which can include medication, pathology reports, past diagnoses, and image data. As noted above, this eventually resulted in the de novo development of the preferred algorithm in the form a Transformer algorithm, known as the AIMS algorithm, that was capable of multimodal vital sign data processing. To achieve this, the MASTER-TBI team developed a unique pre-processing pipeline comprising a method of ‘compressed sensing’ which involves embedding high frequency vital sign waves (typically between 200-500 Hz) for algorithm processing. This method overcame the significant data latency problems that occur when processing multiple high frequency waves simultaneously. This enabled the processing of vital sign waves by a Transformer algorithm for the first time. Compressed Sensing Memory limited case ICM+ is configured to post updates to sensor-data at user-defined intervals. Each listener (a unique ICM+ installation) downloads a profile from the cloud and determines the sampling window based on parameters defined for the study. Sensor data is packed as a JSON object with sensor type assuming the key, and the corresponding readings gathered by the minute filling the values. The object is processed as a POST request over the secure HTTP protocol (HTTPS). A sample request looks like: {“timestamp”:8488484, “icp”:[8,12,12.5,12.4..], “abp”:[…]} Bandwidth limited case Streaming high frequency wave data over the network may not be tenable for some locations. For medium/low and ultralow bandwidth applications, compression is used to transmit data from the bedside laptop to the cloud where the signals are reconstructed. Edge-based and lightweight auto- encoders are used in the process. Latent space information from an ensemble of layers is gathered and stacked. Depending on the bandwidth, two approaches have been implemented for compressed sensing: medium/low bandwidth and ultralow bandwidth. Medium/low bandwidth implementations are designed to cater to venues with a minimum bandwidth of 10 kbps. In medium/low bandwidth applications, machine learning models are trained to produce a low dimensional representation of the actual data. For example, in the medium/low bandwidth regime: {“timestamp”:8488484, “icp”:[8,12,12.5,12.4],”abp”:[..]} is converted to an array of size 1/20^th the original size: {“timestamp”:8488484,”data”:[0.1,0.2,-1.8,..]} As shown in Figure 5, compression ratios of around 20x may be achieved using this encoding technique. The encoding schemes that are used are lossy in nature. An autoencoder architecture is employed in the pre- processing pipeline which comprises an encoder-decoder combination in the form of a monolithic machine learning model. The encoder is used for ‘compression’ and the decoder for ‘decompression’. Overall, the model is penalized to predict output that is same as input – a process called “training” in autoencoders (a class of machine learning architectures). Autoencoders are trained in isolation and are generally compatible with most predictive models when used in conjunction with an appropriate data engineering pipeline. Once trained, the encoder of the model is split from the decoder and implemented at the transmitter (i.e. bedside laptop) and the decoder resides at the cloud. On the receiver end, data streams are reconstructed using the decoder. The decoder stack reconstructs the original signal into individual sensor Internet of Things (IoT) streams. This method of compression is achieved through “pooling” layers within a convolutional neural network. A convolutional network can be thought of as filters in an audio equalizer with the difference being that the position of the knobs in the equalizer are learnt as the model is trained. In the context of autoencoders, ‘training/learning’ is the process of ensuring that the model’s output and input predictions match. The purpose of matching the predictions is that as the signal passes through the model (goes through a bottleneck layer), it is naturally compressed and penalized if there is a discrepancy between the input and the output. Penalizing an autoencoder network involves changing its parameters until a point when the model produces accurate reconstructed waves that are the same as the input. This is illustrated in Figure 5. Furthermore, when data is passed into a convolutional layer and pooled, the original input becomes a portion of its original size during each pass. The output at each pass is a low-dimensional representation of the signal. Several of these low-dimensional representations unique to each signal types (like ICP, ABP, ECG) combine to form a bottleneck output which is still 1/20^th the size of the original input. This whole process is part of encoding a signal. At the decoder, the model expands and filters again, increasing the size of the output at each pass until a point where the size of the signal assumes the size of the original input. An ultralow bandwidth architecture, as shown in Figure 6, employs a slightly different approach. Instead of passing a set of arrays, compressed forms are clustered into representations that are similar in nature. For example, a short ECG burst combined with a range of high ICP values can be clustered differently to another set of ECG/ICP ranges. The former can assume an index 0, and latter, an index 1. Indices are sent across to the decoder to reconstruct signals back into their original form. In our example, index 0 may simply imply {“ecg”: [-0.5, -0.34, etc.], “icp”:[20, 20.1, etc.]} and index 1 : {“ecg”:[-0.22, -0.4, etc.], “icp”:[6, 7, 9]}. Machine learning determines the number of clusters that are ought to be created and the reconstruction penalty is imposed on the decoding process which maps the indices back to the original waveforms. In this way, latent space vectors undergo clustering and only the relevant cluster ID and cluster distance is transmitted. The receiver reconstructs the waves based on the relevant cluster ID and distance. For example, in the ultralow bandwidth regime: {“timestamp”:8488484, “icp”:[8,12,12.5,12.4], “abp”:[..]} is converted to an array up to 1/200^th of the original size: {“timestamp”:8488484, “clusters”:[0,8,100,..], “cluster_d”:[0.11,5.5,..]} High compression ratios (for example greater than 200x) can be achieved using this technique. Autoencoders are also trained on transformed data with induced temporal errors – phase shifts, missing data, and random noise processes. In the above-described method of pre-processing the vital sign waves are simultaneously combined and compressed at the transmitter and reconstructed at the receiver. Apart from high compression ratio, the process is also resilient to minor errors in phase shifts. The additive drop in accuracy is largely model based. For low bandwidth applications, the additive accuracy drop varies between 6% - 8% for the various models tested in low bandwidth regime and between 10% - 22% for ultralow bandwidth. The current state of the art pulse wave data compression is 4x; using the above method, pulse wave compression ratios of 20x and 200x were achieved. Automated cleaning Biomedical signals sampled at a high-frequency are prone to various noise forms. This can lead to errors in subsequent analysis pipelines and adversely affect rule-based decision support systems when used in clinical settings. Traditionally, signals had to be pre-processed using computationally intensive data flows that may involve cascades of various time-series processing algorithms such as spectral filtering, and statistical techniques. The algorithms could be used to improve the resolution of low frequency data, or to fill in gaps in data where isolated waveforms are unavailable. Data is initially prepared by passing all data to denoising autoencoders and clustered using their cosine similarity. Clusters that have a representative presence of outliers and jitter are removed. Samples are prepared from the leftover clusters in a variety of ways listed below in varying portions: i) Data is augmented using adaptive spectral mixing ii) Values from individual sensors are clipped and blacked out for portions between 10-30% of data for a 15 min interval block iii) Outliers and unrealistic derangements are added to isolated signals such as ABP and CPP. iv) Phase between ECG and rest of the signals are shifted Separately, networks comprising of convolutional autoencoders and U- net/SFFT are trained to segment individual waveforms. The models that operate on the generated data are trained to predict unaugmented target data. In the case of U-net/SFFT, the output undergoes also inverse Fourier transform and represented back in timeseries. Normalized distance metric in the frequency domain is used as the loss function when training the model. The AIMS algorithm A key advantage of the AIMS algorithm is its memory-based architecture. Employing differentiable memory and counter elements in tandem with a traditional LSTM stack assists in associating time-distant dependencies of cause and effect in neural networks. In the past, Neural Turing Machines (NTM) were proposed as distinct differentiable memory elements. In contrast to NTMs, which require read-out of the entire address space and are spatially addressed, the AIMS architecture uses compressed forms of the “events-of-interest’ which are stored and optimized for temporal addressing. The Transformer algorithm used for processing the ICM+ data to make the tIH prediction can be characterised by its combined use of attention analysis and parallel processing. Crucial to an understanding of how the AIMS TBI algorithm processes the multimodal vital sign data to make a tIH prediction is the concept of latent space. Latent space is a numerical representation of “features of interest” which are learnt as the model is trained to forecast ICH events. Each modality/feature (i.e. waves, images, text, structured data) has their own latent space representation. Latent space contextual information comprises information of the feature(s) of interest. For example, latent space representation of a CT scan with a haemorrhage in a frontal lobe can be represented as [-0.3, 0.2, 0.55, 0.3333] and a normal CT may be represented as [0.4, 0.1, 0.33, 0.33]. Similarly, a representation for ECG with and without arrythmia may assume: [0.0, -0.3, 0.4, 0.6] and [0.2, 0.4, 0.1, 0.4]. Simply put, these vectors can be thought of as alphabets, that when stitched together, act as an input to the language model that is used to predict a yes/no answer. For example, if the input to the language model is [A] – [K] – [N] –[O], the set of alphabets can mean something like: [arrythmia in the last 5min]-[haemorrhage in the frontal lobe]-[icp high]-[cpp low] The algorithm then gives an output as [1, 0]: 1 implying an impending ICH event and 0 otherwise. Unlike the compressed autoencoder form used previously, these latent space representations are not trained to generate input features. Instead, the model is trained to predict ICH events on a training/validation dataset split in 70/30 ratio and the training process maps the model’s feature(s) of interest determinations to outcomes (ICP rises). In the context of multimodal processing for vital signs, the AIMS algorithm is the only ML algorithm that works on multimodal vital sign data. Currently, image, text and structured input are used as place holders. Low frequency input (such as images of the CT scan / structured data of pathology metrics / diagnosis text) will also be encapsulated and passed on as latent representations directly to the Transformer. Note that computation on the network pertaining to a modal / feature is only performed during a feature update thus saving the computational cost of having to reanalyse existing latent space numerics. In other words, if a CT scan were supplied to the model, the model pertaining to the CT scan (like the ResNet in Figure 4) computes embeddings for the available CT scan. As explained above, the latent space vectors are concatenated and passed to the Transformer architecture. In our case, we currently use Gaussian noise for images, structured data, and text. The MASTER-TBI project contains several cloud resident modules and methods that are used in data collection, processing, and analysis. Patient vital signs are continuously monitored using the ICM+ software. An add-on was developed within the construct of ICM+ to interface with MASTER-TBI and stream data directly to the AIMS algorithm. The project preferably uses a web interface to monitor patients, store data, and perform seamless analytics. It encompasses a collection of various data cleaning methods, machine learning models, and visualization routines to quantify disease burdens and outcomes across various hospitals. It also allows for multi- venue study-based segregation of processes wherein each study may be architected to employ specific process pipelines i.e., data and machine learning models independent of others. A hybrid cloud facility has been created by Western Australian Department of Health (WA DOH) data scientists utilising Pawsey Supercomputing Centre (PSC) cloud and Amazon Web Services Government Cloud (AWS- GC) resources. The MASTER-TBI project makes use of the flexibility offered by the AWS-GC to process lambda functions (aka anonymous functions) and to develop swarm learning networks. Lambda functions (λ) are processing algorithms which are instantly created to process data and then immediately deleted once the task is complete. In the case of the MASTER- TBI project, the ICP prediction algorithm (or any other algorithm developed) would be the lambda function wherein, when required, the algorithm is copied to the temporary lambda cloud facility, the streamed data is processed, and subsequently deleted upon completion of the required processing task. The lambda system allows for instant expansion of processing capabilities in a highly efficient manner. Swarm learning networks allow for continuous machine learning algorithm improvement while eliminating the need for the transmission of patient data from contributing facilities. Algorithm coding and processing data will be stored on the AWS S3 (Simple Storage Solution) cloud storage facility. No identifiable patient data is planned to be stored on the AWS-GC. A schematic diagram of the hybrid MASTER-TBI cloud network is illustrated in Figure 7. The current embodiment employs ICM+ neuromonitoring software for the capture and real time analysis of high-resolution patient monitoring data. However, it will be understood that data acquisition from a bedside monitor may also be performed using other software technology. For example, Raspberry Pi mini-computers have been developed as an alternative option to ICM+ in streaming data to the cloud. Under the current system, the monitoring data is streamed to ICM+ and then, using a python script, the ICM+ software is directed to stream the data to the cloud for processing. The Raspberry Pi machines provide an alternative approach to set up a data relay system. The Raspberries are plugged directly into the monitor and are programmed to establish a unique SSH datalink with the hybrid cloud facility. The Raspberry processors poll monitoring signals, capture the relevant waves, and encrypt and stream the data at speeds of up to 1000 Hz. In addition, the Raspberry Pi machines can stream data over cellular, Wi-Fi, and/or via direct cable links. Data can also be streamed from the Pi to ICM+ for routine ICM+ operation. There are typically two types of requests: software and user. Software requests are automated web requests initiated by a software module (such as ICM+) that post wave data directly to the server minutely. User requests are those that pertain to users visiting the project’s web-portal for live monitoring or web visualization or file upload. In all cases, HTTPS is a de- facto protocol used to process requests. A cloud-based load-balancer (see Figure 7) appropriately routes requests to specific resources based on the request type. This is required because the software request and browser request structures differ. A software request is a JSON (JavaScript Object Notation) GET object that is packaged with a timestamp along with other data metrics such as waves i.e., ECG, blood pressure, intracranial pressure, etc. Software requests are mapped to an algorithms stack that participate in the studies assigned to the location. AWS Lambda/serverless calls are extensively used in our pipeline because they are modular and reduce redundant processing. The framework allows for customization in terms of user-defined code blocks and may be used mostly in any phase of the process lifecycle. Users may also submit their own cleaning and prediction routines to integrate with AIMS. Containerised code blocks that contain these granularized routines are verified and wrapped into serverless calls for use in the AIMS dashboard. These calls serve at a granular level and allow for scalability across venues and studies. User requests are handled separately by a lambda call stack which tracks both the session and visual analytics requests. Specifically, serverless calls serve a variety of functions: i) Process selection • Lambda functions that identify and spawn subsequent lambda call cascades based on the request type ii) Data ingestion • The process stores data into an S3 bucket or an unstructured database that resides on the cloud iii) Data cleaning • A sequence of lambda functions that pre-process data iv) Machine learning models • Predictive models that: a) load the model, b) fetch data from step iii) and c) generate predictions v) Analytics • Predictions in step iv) are recorded, displayed on live-monitor, and statistics of the study cohort are updated There are two distinct methods of uploading data to the AIMS website for processing. The first of these methods is described in Case study 1 below, which describes the processes involved in the real time transmission and processing of vital sign wave data from multiple patients. The second method, described in Case Study 2, describes the processes involved in processing HDF5 files uploaded retrospectively. Parsing of software requests: Case Study 1 Live monitoring of TBI patients from two different venues As shown in Figure 8, a patient at Royal Perth Hospital (RPH) and another patient at Royal Melbourne Hospital (RMH) are to be monitored simultaneously. RPH is enrolled in both Study 1 and Study 2 while RMH participates in Study 2 only. Both RPH and RMH use cloud-based processing of data. Tier III models are used in both locations (See Model tiers in section below). Table 2. Summary of algorithms used in each case

Journey of software request from RPH: • ICM+ samples ECG at 250 Hz, SPO₂ at 1 Hz, ICP at 125 Hz, ABP at 125 Hz • The python module written specifically for RPH packages it into the following format: {timestamp: 10039948, ecg:[0.2,0.3,0.4,…], abp:[40, 40.4,..]…} • The request is sent to the AWS front-end • AWS routes request to a non-blocking asynchronous serverless call (called Process Selection) that identifies relevant studies that the location participates in. It then spawns several parallel lambda calls and passes the wave metrics alongside successive anonymous patient IDs. • Assuming RPH participates in Study 1 and Study 2, the parallel sequence of calls fired are: Request 1: outlier_removal (waves, patientID, [imputation, beat_alignment, bert, analytics]) Request 2: outlier_removal (waves, patientID, [imputation, imputation, lstm_mem, analytics]) • Request 1 and Request 2 are processed as asynchronous lambda calls, and within each request, subsequent calls are daisy chained. • In Request 1, outliers are removed, and the processed set of values are passed to an imputation serverless call to fix for missing data. The processed data is further transferred to a beat alignment call where ICP and ECG waveforms are aligned. A BERT-derived algorithm is then used to forecast before applying clean-up routines where predictions and cohort statistics are updated. All the serverless functions are called in an asynchronous mode. Case Study 2 Uploading Wave files User calls are handled by the load-balancer and branched processed differently to software calls. • Situation: User chooses a study/location and uploads files using the web interface. • The load balancer forwards the request to the Pawsey cloud. • A resident scheduler on the Pawsey cloud removes identifiers, stores a local copy, and mirrors the copy to the Amazon S3 object store facility. • Depending on the study, predictive models are applied, and the cohort statistics are updated. Data preparation for ML models • Outlier removal module Sampling discrepancies result in error codes outside the normal metric range of the sensor. It is there required to remove data points that fall outside the range. • Data augmentation for ML training Low bandwidth spectral noise introduced to ICP, ABP waves during training to increase the number of training samples to the machine learning model. • Data imputation module Missing data points can be filled with continuous moving average imputation provided that the amount of missing data is less than 20% of the frame. The imputation module allows users to set thresholds and filtering rules for the missing data. • ICP/ECG pulse alignment module Clock synchronisation is an issue when sensors operate independently on various signals. Overtime, clock discrepancies lead to cumulative phase lag. To synchronize waveforms, the pulse alignment module time aligns the ICP/ECG pulse signals. • Auto-frequency estimation module (ETCO₂/SPO₂) The module automatically computes frequency ratio of wave combinations present in the data. Such an exercise is useful in short-listing compatible models. • Named Entity Recognition module to remove identifiers The module uses NLP language models to remove identifiable information from the stored files and procedures. Models used in the AIMS algorithm are trained on data from RPH, Alfred and RMH. Data from each hospital is cleaned according to predefined cleaning rules. Cleaned data and data free of distortion (for e.g. large missing signal values) is prepared for training. Regular Expression (Regex) rules were then used to create training datasets of events and non-events. The prepared datasets are then passed to various models for training and validation. The data is split into a 70/30 ratio. Training, in machine learning jargon, involves the process of penalizing an algorithm for i) missing out on ICH events and ii) predicting an event as ICH when none exists. We trained the algorithm separately using two loss functions for penalty: i) survival function based and ii) binary classification. The degree of penalty is based on the error rate in predictions. The model is iteratively trained until the model’s accuracy showed a saturation. We thus achieved an average of ~90% accuracy on validation on back-testing using Transformers architecture. To address the issue of algorithm ‘explainability’ required the development of a novel method of Heart Rate Variability (HRV) analysis. From in silico testing, it is known that the BERT-derived (and RoBERTa) tIH prediction algorithms are heavily reliant on the ECG wave for the purposes of ICP prediction. However, it is unclear exactly which part of the ECG wave is focused on and if accuracy is compromised on models trained only on ECG. During the effort to understand how the BERT-derived AIMS algorithms made tIH predictions, a novel method of dynamic heart rate variability (HRV) analysis which involves the optimisation of HRV frequency analysis ranges for each patient, was developed. Using this method, depending on the cohort selection, approximately 60-70% of ICH causality can be explained. This new method of HRV analysis may be used in other settings. Work is planned to further investigate and characterize this novel method of HRV analysis. As noted above, swarm learning networks facilitate improved efficiencies in machine learning algorithm implementation and the transmission of patient data from contributing facilities. An article in Nature⁸ describes a swarm learning architecture for decentralised and confidential clinical machine learning. This approach provides a way to eliminate problems associated with transmission of patient data across jurisdictional lines. Essentially it involves the near complete decentralisation of the project’s cloud-based infrastructure and the establishment of local nodes at each participating site. Local data is processed by the respective local node, with algorithm performance parameters (not patient data) being shared across the swarm. In this way, algorithm performance is continuously enhanced as each swarm node learns from all the other nodes in the network, all while keeping patient data secure at the respective site of origin. Patient data at each site is de-identified and encrypted. It is planned that access to the system will be via blockchain authentication. Once active, the system will only process data labelled with a validated blockchain sequence. In this way, the MASTER-TBI team will be provided with a highly efficient and secure method of access control. The algorithm is provided to each swarm node using homomorphic encryption, thus keeping the source code secure while allowing it to process data. ⁸ Warnat-Herresthal S, Schultze H, Shastry KL, et al. Swarm Learning for decentralized and confidential clinical machine learning. Nature.2021;594(7862):265-270. A further development currently being investigated is the option of legacy swarming. Simply put, legacy swarming involves using historical data files and presenting them to an algorithm as if they are being acquired in real time. Once processed, it is then possible to use the legacy-file assigned algorithm(s) to swarm with other algorithms processing actual real time acquired patient data. In doing so, it is possible to enhance the performances of the algorithms overall by increasing the size of the swarm. This will theoretically allow for the use of historical medical records to enhance the performance of current, live data-processing algorithms. One implication of legacy swarming is that data being collected in real time along with retrospective data, can be used to assist in improving care for future patients. In conjunction with the development of ICM+ capabilities, the development of automated data extraction systems for CT images and ICU CIS data is progressing. The project aims to extract and align in real time the CT image and CIS text and tabular data which corresponds with the ICM+ wave data. For CIS data, this will involve periodic copying of the patient CIS file to a temporary folder on the MASTER-TBI hybrid cloud, which then allows for machine learning algorithm processing of the file using PSC compute (and not the CIS host) resources. In this process, CIS data is first encrypted and stripped of any identifying demographic data (and potentially blockchain- labelled in the future) and/or irrelevant data. The file is then aligned with the corresponding ICM+ wave data and CT image data (if available). The elements of the CIS file deemed relevant will be identified using transformer machine learning processing. As the computational power required for processing is proportional to the number of data fields present, it is essential from a computing resource, environmental, and ethical perspective to reduce the amount of data fields as much as possible. From experience with ICM+ wave data analysis, by allowing the machine to determine what is or is not relevant eliminates human preconceptions and allows for more efficient and accurate analyses. Once the processing algorithms have identified the CIS fields of interest, the MASTER-TBI team will be able to advise of the precise data fields and parameters being utilised by the algorithms for CIS data processing. It is expected that each algorithm developed will have different data requirements. The AIMS algorithm is the world’s first operational tIH prediction ML algorithm. It can detect a rise in intracranial pressure up to 30 minutes prior to the tIH event with an accuracy of up to 93%. This will, for the first time, provide clinicians with the ability to intervene or prevent a rise in ICP before it occurs. Now that preferred embodiments of the method of vital sign wave data analysis have been described in detail, it will be apparent that the described embodiments provide a number of advantages over the prior art, including the following: (i) The ability to predict with high accuracy when a tIH episode is likely to occur. (ii) The consequent ability to proactively prevent and/or intervene early into episodes of tIH, potentially leading to improved patient outcomes. (iii) The ability to adapt the AIMS algorithm to any number of clinical tasks, particularly for acute care specialties (ICU, ED, Cardiology, operating theatres, etc) where wave data is routinely generated. (iv) The AIMS algorithm is the only other transformer algorithm that is capable of multimodal input processing (apart from Google’s Perceiver algorithm). It will be readily apparent to persons skilled in the relevant arts that various modifications and improvements may be made to the foregoing embodiments, in addition to those already described, without departing from the basic inventive concepts of the present invention. For example, the AIMS TBI algorithm currently is based on a BERT-like Transformer algorithm, which has produced excellent results. However, it will be understood that other suitable ML algorithms may also be used to achieve similar results. The system and method according to the invention typically involves a patient connected to a monitor and some mechanism to stream the monitoring data to a computer powerful enough to run the ML algorithm. The described embodiment is cloud-based because it allows for flexibility and the ability to deliver the algorithm to anywhere in the country (or world). In addition, the cloud format allows for swarming. Currently AWS is used for this purpose, but it could easily be done using the Microsoft Azure Cloud, the Google cloud, or any of the other private clouds that exist. In theory, a single ICP prediction algorithm could be run for a single patient in a unit not connected to the intranet, providing there is a desktop with sufficient RAM available and a method of transmitting and pre-processing the patient monitoring data fast enough to the computer for processing. Furthermore, it is not necessary to use a supercomputer to run the algorithm; but a supercomputer provides a vast amount of flexibility like the ability to train and retrain algorithms almost at will, and the ability to run multiple algorithms all at once. Therefore, it will be appreciated that the scope of the invention is not limited to the specific embodiments described.

Claims

Claims 1. A system of vital sign data analysis, the system comprising: an arrangement for receiving and transmitting vital sign data in real time; and, a machine learning algorithm for processing the vital sign data in real time and predicting, forecasting, optimising, prognosticating and/or diagnosing a patient condition based on analysis of the vital sign data.

2. A system of vital sign data analysis as claimed in claim 1, wherein the machine learning algorithm is a Transformer algorithm.

3. A system of vital sign data analysis as claimed in claim 2, wherein the Transformer algorithm is selected from the group of Bidirectional Encoder Representations from Transformers (BERT)-derived algorithms.

4. A system of vital sign data analysis as claimed in any one of claims 1, 2 or 3, wherein the arrangement for receiving and transmitting vital sign data comprises a pre-processing pipeline for pre-processing the vital sign data prior to processing by the machine learning algorithm.

5. A system of vital sign data analysis as claimed in claim 4, wherein the pre-processing pipeline comprises an autoencoder artificial neural network architecture for embedding high frequency vital sign data for algorithm processing.

6. A system of vital sign data analysis as claimed in claim 5, wherein the autoencoder architecture comprises an encoder-decoder combination in the form of a machine learning model.

7. A system of vital sign data analysis as claimed in claim 6, wherein the encoder is used for ‘compressing’ and the decoder for ‘decompressing’.

8. A system of vital sign data analysis as claimed in any one of claims 5, 6 or 7, wherein edge-based and lightweight autoencoders are used in the architecture, in which latent space information from an ensemble of layers is gathered and stacked.

9. A system of vital sign data analysis as claimed in claim 4, wherein the vital sign data is multimodal data.

10. A system of vital sign data analysis as claimed in claim 9, wherein the multimodal data comprises receiving string, numerical data of patient’s past history which can include medication, pathology reports, past diagnoses, and image data, together with vital sign wave data.

11. A system of vital sign data analysis as claimed in claim 10, wherein the BERT-derived algorithm employs a numerical latent space representation of each “feature of interest” which are learnt as the algorithm is trained, wherein each modality/feature (i.e., wave data, image data, text data, structured data) has its own latent space representation.

12. A system of vital sign data analysis as claimed in claim 4, wherein the pre-processing pipeline further comprises data cleaning means in which data that is initially prepared by passing all data to denoising autoencoders, is clustered using their cosine similarity.

13. A system of vital sign data analysis as claimed in claim 12, wherein data clusters that have a representative presence of outliers and jitter are removed.

14. A system of vital sign data analysis as claimed in claim 13, wherein samples from the leftover data clusters are prepared in a variety of ways selected from the following list in varying portions: i) Data is augmented using adaptive spectral mixing; ii) Values from individual sensors are clipped and blacked out for portions between 10-30% of data for a 15 min interval block; iii) Outliers and unrealistic derangements are added to isolated signals such as ABP, ICP and CPP; and iv) Phase between ECG and rest of the signals are shifted.

15. A system of vital sign data analysis as claimed in claim 14, wherein the data cleaning means further comprises, separately, networks comprising convolutional autoencoders and U-net/SFFT which are trained to segment individual waveforms.

16. A system of vital sign data analysis as claimed in claim 15, wherein models that operate on data generated are trained to predict unaugmented target data.

17. A system of vital sign data analysis as claimed in claim 16, wherein, in the case of U-net/SFFT, the output also undergoes inverse Fourier transform and is represented back in timeseries.

18. A system of vital sign data analysis as claimed in claim 17, wherein a normalized distance metric in the frequency domain is used as the loss function when training the model.

19. A system of vital sign data analysis as claimed in claim 1, the system further comprises a cloud computing facility to develop swarm learning networks and allow continuous machine learning algorithm improvement while eliminating the need for the transmission of patient data from contributing facilities.

20. A system of vital sign data analysis as claimed in any one of the preceding claims, wherein the system is employed in m-sTBI ICU neuroprotective management for predicting when a tIH episode is likely to occur, and the arrangement for receiving and transmitting vital sign data includes receiving and transmitting ICP data.

21. A method of vital sign data analysis, the method comprising: receiving and transmitting vital sign data in real time; and, processing the vital sign data in real time in a machine learning algorithm for predicting, forecasting, optimising, prognosticating and/or diagnosing a patient condition based on analysis of the vital sign data.

22. A method of vital sign data analysis as claimed in claim 21, wherein the machine learning algorithm is a Transformer algorithm.

23. A method of vital sign data analysis as claimed in claim 22, wherein the Transformer algorithm is selected from the group of Bidirectional Encoder Representations from Transformers (BERT)-derived algorithms.

24. A method of vital sign data analysis as claimed in any one of claims 21, 22 or 23, wherein the method further comprises: pre-processing the vital sign data prior to processing in the machine learning algorithm.

25. A method of vital sign data analysis as claimed in claim 24, wherein the step of pre-processing the vital sign data comprises a method of ‘compressed sensing’ which involves embedding high frequency vital sign data for algorithm processing.

26. A method of vital sign data analysis as claimed in claim 25, wherein the step of pre-processing the vital sign data compresses “features-of-interest” which are stored and optimized for temporal addressing.

27. A method of vital sign data analysis as claimed in claim 26, wherein in the step of pre-processing the vital sign data is simultaneously combined and compressed at the transmitter and reconstructed at the receiver.

28. A method of vital sign data analysis as claimed in claim 24, wherein the vital sign data is multimodal data.

29. A method of vital sign data analysis as claimed in claim 28, wherein the method further comprises receiving string, numerical data of patient’s past history which can include medication, pathology reports, past diagnoses, and image data in real time, together with vital sign wave data, and processing the string and image data together with the vital sign wave data in the machine learning algorithm using multimodal input processing.

30. A method of vital sign data analysis as claimed in claim 21, wherein the method further comprises near complete decentralisation of a cloud computing facility for processing the vital sign data and the establishment of a swarm of local nodes at each participating site.

31. A method of vital sign data analysis as claimed in claim 30, wherein local data is processed by a respective local node, with algorithm performance parameters being shared across the swarm to allow continuous machine learning algorithm improvement while eliminating the need for the transmission of patient data from contributing facilities.

32. A method of vital sign data analysis as claimed in any of claims 21 to 31, wherein, the method is employed in m-sTBI ICU neuroprotective management for predicting when a tIH episode is likely to occur, and the process of receiving and transmitting vital sign data includes receiving and transmitting ICP data.

33. A pipeline for pre-processing vital sign data, the pre-processing pipeline comprising: an autoencoder artificial neural network architecture for embedding high frequency vital sign data for algorithm processing.

34. A pipeline for pre-processing vital sign data as claimed in claim 33, wherein the autoencoder architecture comprises an encoder-decoder combination in the form of a machine learning model.

35. A pipeline for pre-processing vital sign data as claimed in claim 34, wherein the encoder is used for ‘compressing’ and the decoder for ‘decompressing’.

36. A pipeline for pre-processing vital sign data as claimed in claim 33 or claim 34, wherein, edge-based and lightweight autoencoders are used in the autoencoder architecture, in which latent space information from an ensemble of layers is gathered and stacked.

37. A pipeline for pre-processing vital sign data as claimed in claim 33, further comprising data cleaning means in which data that is initially prepared by passing all data to denoising autoencoders, is clustered using their cosine similarity.

38. A pipeline for pre-processing vital sign data as claimed in claim 37, wherein data clusters that have a representative presence of outliers and jitter are removed.

39. A pipeline for pre-processing vital sign data as claimed in claim 38, wherein samples from the leftover data clusters are prepared in a variety of ways selected from the following list in varying portions: i) Data is augmented using adaptive spectral mixing; ii) Values from individual sensors are clipped and blacked out for portions between 10-30% of data for a 15 min interval block; iii) Outliers and unrealistic derangements are added to isolated signals such as ABP, ICP and CPP; and iv) Phase between ECG and rest of the signals are shifted.

40. A pipeline for pre-processing vital sign data as claimed in claim 39, wherein the data cleaning means further comprises, separately, networks comprising convolutional autoencoders and U-net/SFFT which are trained to segment individual waveforms.

41. A pipeline for pre-processing vital sign data as claimed in claim 40, wherein models that operate on data generated are trained to predict unaugmented target data.

42. A pipeline for pre-processing vital sign data as claimed in claim 41, wherein, in the case of U-net/SFFT, the output also undergoes inverse Fourier transform and represented back in timeseries.

43. A method of pre-processing of vital sign data, the pre-processing method comprising: embedding high frequency vital sign data using an autoencoder artificial neural network architecture wherein the vital sign data is pre-processed for algorithm processing.

44. A method of pre-processing of vital sign data as claimed in claim 43, wherein latent space information from an ensemble of layers is gathered and stacked using edge-based and lightweight autoencoders in the autoencoder architecture.

45. A method of pre-processing of vital sign data as claimed in claim 44, wherein data that is initially cleaned by passing all data to denoising autoencoders, is clustered using their cosine similarity.

46. A method of pre-processing of vital sign data as claimed in claim 45, wherein data clusters that have a representative presence of outliers and jitter are removed.

47. A method of pre-processing of vital sign data as claimed in claim 46, wherein samples from the leftover data clusters are prepared in a variety of ways selected from the following list in varying portions: i) Data is augmented using adaptive spectral mixing; ii) Values from individual sensors are clipped and blacked out for portions between 10-30% of data for a 15 min interval block; iii) Outliers and unrealistic derangements are added to isolated signals such as ABP, ICP and CPP; and iv) Phase between ECG and rest of the signals are shifted.

48. A method of pre-processing of vital sign data as claimed in claim 47, wherein individual waveforms are separately segmented using networks comprising trained convolutional autoencoders and U-net/SFFT.

49. A method of pre-processing of vital sign data as claimed in claim 48, wherein models that operate on data generated are trained to predict unaugmented target data.

50. A method of pre-processing of vital sign data as claimed in claim 49, wherein, in the case of U-net/SFFT, the output also undergoes inverse Fourier transform and is represented back in timeseries.

51. A method of pre-processing of vital sign data as claimed in claim 50, wherein a normalized distance metric in the frequency domain is used as the loss function when training the model.