WO2023144396A1

WO2023144396A1 - Method and hardware architecture for event-based classification of electrocardiogram (ecg) signals

Info

Publication number: WO2023144396A1
Application number: PCT/EP2023/052209
Authority: WO
Inventors: Johannes Partzsch; Matthias Jobst; Chen Liu; Liyuan GUO; Christian Georg Mayr
Original assignee: Spinncloud Systems Gmbh; Technische Universität Dresden
Priority date: 2022-01-28
Filing date: 2023-01-30
Publication date: 2023-08-03

Abstract

The invention relates to a method and a hardware architecture for event-based classification. One of the objectives of the invention is to build a hardware architecture that, once designed and manufactured, can be configured for a variety of NN and preprocessing hardware architectures. This is satisfied by a method and a hardware architecture processing the following steps: - feature extraction, - sparsification of the feature extraction output, - relation of the features in time, - confidence-based adaptive classification, estimating confidence in the classification and finishing processing as soon as a confidence threshold is reached.

Description

METHOD AND HARDWARE ARCHITECTURE FOR EVENT-BASED CLASSIFICATION OF ELECTROCARDIOGRAM (ECG) SIGNALS

Existing machine-learning based solutions for classi fication of sensor signals typically have high demands on memory space and computational load . Correspondingly, hardware solutions for these classi fiers take a lot of silicon area and energy per classi fication .

Still , solutions based on machine learning can be trained automatically for a wide variety of classi fication tasks , which makes them very attractive for hardware implementation : Systems can be re-used for di f ferent tasks , no re design or manufacturing is needed . As a prerequisite for this , hardware implementations must allow for flexible configuration of the classi fier architecture . This flexibility, however, may cause a signi ficant overhead .

Classi fiers often contain pre-processing of the sensor data, also referred to as feature extraction . This often reduces the ef fort of the subsequent classi fication . However, fixed, pre-defined processing operations are often hand crafted and have to be tuned manually to a new type of sensor signal . In contrast , end-to-end machine learning approaches allow for training of the whole classi fier based on the raw sensor signal , eliminating hand-tuning of the feature extraction, but resulting in higher memory and computing demands .

Recent approaches for reducing memory and processing load of machine-learning based classi fiers employ neural networks

(NNs ) with sparse weights and/or neuron outputs ( also called activations) . These approaches are very effective, but often require a more complex hardware architecture for handling sparse data. A sparsity exploitation that results in low hardware overhead compared to a conventional solution is required .

Hardware for time series classification, e.g. electrocardiogram (ECG) signal classification typically relies on extraction of fixed, hand-crafted features to achieve a high energy efficiency. For example, arrhythmia in ECG can be detected via joint optimization of analog-digital blocks for data acquisition to achieve the minimum energy point. This is described by Chen, Yen-Po, et al. "An injectable 64 nW ECG mixed-signal SoC in 65 nm for annythmia monitoring." IEEE Journal of Solid-State Circuits 50.1 (2014) : 375-390.

As described by Kiranyu. Serkan, et al. Convolutional neural networks for patient-spedf ic ECG classification." 2015 37^th Annual International Conference of the IEEE Engineering in Medicine and Biology Sociely (EMBC) . IEEE, 2015, alternative, end-to-end convolutional neural network (CNN) solutions exist, but typically result in high computational load and memory requirements.

Several approaches try to reduce computations and memory effort by exploiting properties of the neural network and/or the input data, like dynamically throttleable neural networks, (see H. Liu et al. "Dynamically Throttleable Neural Networks (TNN)". arXiv;2011 .02836v 1 (2020) ) , gating approaches (see Z. Chen et 81. "You look twice: GaterNet for

Dynamic Filler Selection in CNNs", arXiv: 1811.11205v2 (2019) ) or slimmable neural networks (see J. Yu el al. "Slimmable Neural Networks-, arXiv: 181 2.08928 (2018) ) .

As published in US 2021/0166113 Al and by Neil, Daniel, et al. "Delta networks for optimized recurrent network computation." International Conference on Machine Learning. PMLR, 20 17, recurrent neural networks with delta encoding addressing serial events have been investigated before.

A similar classification approach as in the invention has been published by M. Jobst et al "Event-based Neural Network for ECG Classification with Delta Encoding and Early Stopping." 2020 6th International Conference on Event-Based Control, Communication and Signal Processing (EBCCSP) , 2020, pp. 1-4, doi: 10.1109/EBCCSP51266.2020.92913S7 and by M. Jobst et al., "ZEN: A flexible energy-efficient hardware classifier exploiting temporal sparsity in ECG data, " 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS) , Incheon, Korea, Republic of, 2022, pp. 214-217, doi: 10.1109/AICAS54282.2022.9869958. Some recent hardware architectures allow exploitation of sparsity, for example the ESE hardware as described in Han. S. et al. "ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA" , arXiv: 1612.00694 (2017) . However, they are intended for higher processing loads than targeted by this invention.

It is the object of the invention to build a hardware architecture that, once designed and manufactured, can be configured for a variety of NN and preprocessing hardware architectures .

It is further an object of the invention to develop a classifier with preprocessing operations that are tuned during training of the classifier to allow automatic adaptation to new types of sensor signals and classification tasks while reducing the classification effort.

Another object of the invention is to exploit sparsity with low hardware overhead in both the preprocessing and the NN- based classifier to reduce the processing overhead. The characteristics of the classification task should be used to finish the classification as early as possible, further reducing the number of processing steps.

The objectives are solved by a method according to claims 1 to 6 and by a hardware architecture according to claims 7 to 15.

The inventive method for event-based classification is mainly based on a combination of feature extraction, eventbased recurrent neural network and early stopping based on confidence estimate. Thereby the following steps are performed :

- feature extraction step, e.g. realized by a combination of filters with trainable coefficients,

- sparsif ication of the feature extraction output, e.g. realized via subsampling or delta encoding,

- a relation of the features in time, e.g. realized by a recurrent neural network,

- a confidence-based adaptive classification, estimating confidence in the classification and

- finishing processing as soon as a confidence threshold is reached, e.g. realized for binary classification by a low- pass filter of the difference between class outputs. The inventive hardware architecture for reali zation of the event-based classi fication method is very flexible and mainly based on a programmable dataflow and scalability by storing parameters , activations and dataflow in one or several memory blocks . It is consisting of the following elements :

- a one or several memory blocks for storing model parameters and intermediate values ,

- a one or several calculation blocks for performing preprocessing ( e . g . filtering) ,

- a one or several calculation blocks for performing the classi fication,

- a one or several calculation blocks to generate and output a confidence-based classi fication result ,

- a programmable control module for configuring and starting the calculation blocks .

A programmable control module , is supporting the following operations :

- It reads commands from one of the memory blocks ,

- it is started whenever a new input data is arriving,

- it starts calculation modules one by one and

- supports fixed and conditional j ump commands .

A feature extraction module is consisting of

- a MAC unit for sequential calculation of finite-impulse response ( FIR) filters ,

- a support for strided FIR filter execution for simple down- samp ling a unit for activation function calculation and range reduction

- a down-sampling unit , keeping the maximum within a configurable number of successive input values

- a control logic for address calculation and read and write operations from/to memory, according to virtual configurable input and output ring buf fers whose data values are stored in one of the memory blocks .

A data reduction structure consisting of a Delta encoding module

- loads input values from memory,

- computes di f ference between two time-successive values ,

- compares di f ference to threshold and sets di f ference values ( deltas ) with an absolute value below a configurable threshold to zero ,

- performs rounding of values ,

- stores resulting thresholded delta values to memory and updates current values in memory and

- forwards only non- zero delta values together with their position indices to the gated recurrent unit and fully- connected computation module .

A gated recurrent unit ( GRU) and a fully-connected ( FC ) layer compute module , is consisting of

- a multi- function multiply-accumulate (MAC ) array consisting of flexible multi- function,

- multiply-accumulate units which perform all computations of delta-encoded GRU,

- delta-encoded fully-connected layer or fully-connected layer or other recurrent neural network layers such as LSTM or LMU .

- a control module which configures the MAC array to perform the wanted computation steps and controls the loading and storing of input and output data, whereby

- a supported computation intrinsics are multiplication of two values , addition of two values , limiting, shi fting and rounding of two values and combinations thereof ,

- input data is loaded from the Data reduction structure together with indices of non- zero values or directly from memory,

- computation of non-linear functions is supported by utili zing a combination of the computation intrinsics to apply a piecewise linear approximation of said non-linear functions , for example sigmoid and tangent hyperbolicus functions

- utili zing multiple internal registers in each MAC unit , multiple di f ferent values can be computed without having to reload the input data from memory, saving memory bandwidth .

I f the layer to compute has the same or a lower number of output neurons than MAC units are available , all outputs of one timestep are computed in one pass over the input data . Otherwise , the compute module iterates over the input data, computing a set of outputs equal to the number of MAC units at a time .

A confidence and decision control module is consisting of a logic to compute a confidence level for the classi fication result .

For example , in the case of binary classi fication, the di f ference between two classi fication outputs is low-pass filtered and then its absolute value is compared to a decision threshold .

When the low-pass filtered absolute value is below the threshold, the confidence is low . When it is above the threshold, the confidence is considered high .

The classi fication result is output together with a signal showing that the classi fication threshold is reached .

The signal showing that the classi fication threshold is reached is forwarded to the global control module to tenninate the classi fication task in order to save power .

The low-pass filtered absolute value is output and can be used by an application as a confidence measure for further purposes .

The advantages of the invention are minimi zed memory and processing ef fort , reali zed by a combination of feature extraction, calculation only at value changes and early stopping of calculations when reaching suf ficient confidence in the classi fication result . Additionally, processing of input samples one-by-one , as well as use of recurrent neural networks alleviates the need for storing an extensive history of past values , reducing it to the absolutely necessary values .

These optimi zations result in a highly energy-ef ficient hardware solution.

A flexible definition of the classifier's architecture via programmable control module with minimum hardware overhead, is achieved by only implementing absolutely necessary control commands for realizing the method for event-based classification.

Scalability of the hardware from tiny classification tasks with a few filters and recurrent neurons to bigger timeseries classification tasks is enabled by only changing the size of the memory blocks.

The invention will be explained in more detail below using an example of an embodiment. In the associated drawings, is shown

Fig. 1 a general scheme of the inventive method,

Fig. 2 a general scheme of the inventive hardware architecture,

Fig. 3 a programmable control module,

Fig. 4 a feature extraction modul,

Fig. 5 a delta encoding structure,

Fig. 6 a MAC unit and

Fig. 7 a confidence and decision control.

As shown in Fig. 1 the inventive method for event-based classification of a time series, is consisting of a feature extraction 1, a sparsif ication 2 of the feature extraction output, a relation 3 of the sparse features in time and a confidence-based adaptive classification 4.

As shown in Fig. 2 a hardware architecture for event-based classification, is consisting a memory module 5, calculation modules for feature extraction 6, classification 7 and confidence-based decision 8, a programmable control module and a I/O interface 10.

As shown in Fig. 3 in the programmable control module 9 commands are loaded 11 from memory and executed, either resulting in an operation 12 (starting the calculation modules 6; 7; 8) or a jump 13 (continue command loading at a specified memory address) after receiving the external start signal (together with the first input sample) . A jump command can be configured to wait for the next input sample.

As shown in Fig. 4 the feature extraction modules 6 are used to realize FIR filters. It consists of 3 components. A MAC unit 14 performs filtering sequentially. All samples in different filter stages pass through this single unit for saving silicon area. An activation functional unit computes a nonlinear function (e.g. abslog) 15 and constrains data in a limited range. A down-sampling unit (maxpool) 16 preserves the local maximum value within a mini batch of samples. The filter length, filter stage, shift width and down-sampling batch size are all under the control 17 of the received command 18.

The delta encoding module 19 as shown in Fig. 5 computes the delta between two vectors of successive time steps. Only deltas 20 whose absolute value lies above a threshold are forwarded together with their index. In addition to forwarding the thresholded delta values to the next module, it also stores the computed deltas in a memory 21 for future use. The delta encoding module 19 is also able to fetch these precomputed thresholded deltas in a future cycle as required by the GRU module as shown in Fig. 6. The GRU unit is considered one implementation of a classi fication module 7 in Fig . 2 . The GRU unit contains the three submodules : a delta encoding module as described above and shown in Fig . 5 , an array of MAC units whereas one MAC unit 22 shown in Fig . 6 and a control module Depending on the command given to the control module , it configures and controls the other modules to compute a Delta GRU layer, fully-connected layer or other supported layer types .

Each MAC unit 22 contains a very flexibly configurable data path with a multiplier 23 , an adder 24 , a set of registers 25 and a shi ft and quanti ze unit 26 . Being controlled by the associated control unit , all the required functions for the GRU execution with sparse inputs and fully connected layers are supported . These include matrix multiplication with dense and sparse inputs , elementwise multiplication and addition operations , scaling, rounding, requanti zation, limiting, piecewise linear non-linear activation functions such as hard tanh, hard sigmoid and ReLU . All MAC units within the array of MAC units work in parallel , speeding up the computation .

The piecewise linear non-linear activation functions are implemented using multiplications and additions with constant factors combined with shi fting and limiting .

Fig . 7 shows the confidence and decision control unit computing the confidence in the classi fication output by low-pass filtering the di f ference between the two classi fication outputs . When the confidence exceeds a set threshold, the module signals classi fication done and outputs the classi fication result . In this case , the global control can terminate all further computations . Method and hardware architecture for event-based classification

Reference mark list feature extraction sparsi f ication relation classi fication memory module calculation module for feature extraction calculation module for classi fication calculation module for confidence-based decision programmable control module I /O interface load command operation j ump MAC-unit nonlinear function down-sampling unit control command delta encoding module delta between two vectors memory MAC unit multiplier adder set of registers shi ft and quanti ze unit

Claims

Method and hardware architecture for event-based classification Claims

1. Method for event-based classification of a time series in hardware, comprising the steps

- feature extraction,

- sparsif ication of the feature extraction output,

- relation of the features in time,

- confidence-based adaptive classification, estimating confidence in the classification and finishing processing as soon as a confidence threshold is reached .

2. Method for event-based classification according to claim 1, wherein feature extraction is realized by a combination of filters with trainable coefficients

3. Method for event-based classification according to claim 1 or 2, wherein sparsif ication of the feature extraction output is realized via subsampling.

4. Method for event-based classification according to claim 1 or 2, wherein sparsif ication of the feature extraction output is realized via delta encoding

5. Method for event-based classification according to one of the claims 1 to 4, wherein the relation of the features in time is reali zed by a recurrent neural network Method for event-based classi fication according to one of the claims 1 to 5 , wherein finishing processing is reali zed for binary classi fication by a low-pass filter of the di f ference between class outputs . Hardware architecture for reali zation of the eventbased classi fication method, consisting of

- one or several memory blocks for storing model parameters and intermediate values

- one or several calculation blocks for performing preprocessing ( e . g . filtering)

- one or several calculation blocks for performing the classi fication

- one or several calculation blocks to generate and output a confidence-based classi fication result and

- a Programmable control module for configuring and starting the calculation blocks . Hardware architecture according to claim 7 , comprising a programmable control module , supporting the following operations :

- reads commands from one of the memory blocks is started whenever a new input data is arriving starts calculation modules one by one and supports fixed and conditional j ump commands . Hardware architecture according to claim 7 or 8 , comprising a Feature extraction module , consisting of :

- a MAC unit for sequential calculation of finite- impulse response ( FIR) filters

- a Support for strided FIR filter execution for simple down- samp ling

- a unit for activation function calculation and range reduction

- a down-sampling unit , keeping the maximum within a configurable number of successive input values and

- a control logic for address calculation and read and write operations from/to memory, according to virtual configurable input and output ring buf fers whose data values are stored in one of the memory blocks . Hardware architecture according to one of the claims 7 to 9 , comprising a data reduction structure , consisting of a Delta encoding module , that

- loads input values from memory

- computes di f ference between two time-successive values

- compares di f ference to threshold and sets di f ference values ( deltas ) with an absolute value below a configurable threshold to zero performs rounding of values

- stores resulting thresholded delta values to memory and updates current values in memory

- forwards only non- zero delta values together with their position indices to the gated recurrent unit and fully-connected computation module . Hardware architecture according to one of the claims 7 to 10 , comprising a gated Recurrent Unit ( GRU) and a fully-connected ( FC ) compute module , consisting of

- a multiply-accumulate units which perform all computations of delta-encoded GRU, a delta-encoded fully-connected layer or fully-connected layer or other recurrent neural network layers such as LSTM or LMU

- a control module which configures the MAC array to perfonn the wanted computation steps and controls the loading and storing of input and output data, wherein

- the supported computation intrinsics are multiplication of two values , addition of two values , limiting , shi fting and rounding of two values and combinations thereof

- Input data is loaded from the Data reduction structure together with indices of non- zero values or directly from memory

- Computation is only performed on non- zero input data

- in the utili zing multiple internal registers in each MAC unit , multiple di f ferent values can be computed without having to reload the input data from memory, saving memory bandwidth

- all outputs of one timestep are computed in one pass over the input data i f the layer to compute has the same or a lower number of output neurons than MAC units are available , or otherwise , iterating the compute module over the input data, computing a set of outputs equal to the number of MAC units at a time . Hardware architecture according to one of the claims 7 to 11 , comprising a Confidence and Decision Control module , consisting of a logic to compute a confidence level for the classi fication result outputting together with a signal showing that the classi fication threshold is reached and forwarding the signal showing that the classi fication threshold is reached to the global control module to terminate the classi fication task in order to save power . Hardware architecture according to claim 12 , comprising a low-pass outputting a filtered absolute value to be used by an application as a confidence measure for further purposes . Hardware architecture according to claim 11 or 12 , comprising a logic to compute a confidence level for the classi fication result in the case of binary classi fication . Hardware architecture according to claim 14 designed for low-pass- filtering the di f ference between two classi fication outputs and comparing its absolute value to a decision value considering the confidence is low when the low-pass filtered absolute value is below the threshold or considering the confidence is high when the low-pass filtered absolute value is above the threshold .