HRP20140414B1

HRP20140414B1 - System and computer implemented method of detection and recognition of wave forms in time series

Info

Publication number: HRP20140414B1
Application number: HRP20140414AA
Authority: HR
Inventors: Marko Velić
Original assignee: Sveuäśiliĺ Te U Zagrebu Fakultet Organizacije I Informatike Varaĺ˝Din
Priority date: 2014-05-08
Filing date: 2014-05-08
Publication date: 2017-02-10
Also published as: HRP20140414A2

Abstract

Predmetni izum se odnosi na sustav i računalno implementirani postupak detekcije i raspoznavanja oblika valova u vremenskim serijama, posebno fiziološkog signala koji uključuje postupak predprocesiranja izvornog signala u svrhu uklanjanja šuma, izlučivanja morfoloških značajki signala, detekciju zanimljivih dijelova signala, izlučivanje dinamičkih značajki, raspoznavanje/klasificiranje oblika valova i grupiranje/klasteriranje sličnih odsječaka signala. Izlučivanje morfoloških značajki signala se odvija transformacijom izvornog jednodimenzionalnog signala u niz karakterističnih vektora Vpercepcije koji sažimaju morfologiju signala u okolini točke u trenutku T, koji vektori Vse sastoje od skupa značajki m;m;…;m, pri čemu dio navedenih značajki opisuje geometrijske karakteristike odsječka signala, dio čine dinamički kumulativi, a dio vremensko morfološke determinante VMD, pri čemu se transformacija serije provodi unaprijed i unatrag u odnosu na vremenski slijed. Rezultat transformacije signala se koristi kao ulaz u klasifikator u svrhu izgradnje modela I za detekciju QRS odsječaka. Dinamičke značajke zajedno sa morfološkim značajkama signala (rezultat transformacije) tvore skup proširenih značajki, odnosno vektore s proširenim značajkama koje uz odgovarajuće oznake izvornog signala postaju temelj za proces nadziranog učenja u svrhu raspoznavanja oblika valova. Rezultat takvog procesa strojnog učenja je model II koji se koristi za klasifikaciju odnosno raspoznavanje oblika valova (slika 10 c). Model II se potom može koristiti za raspoznavanje oblika valova novih signala (slika 10 d) odnosno raspoznate EKG valove. Model I i model II se pohranjuju na računalu ili mogu biti pohranjeni na bilo kojem računalno čitljivom mediju.The present invention relates to a system and computer implemented method of detecting and recognizing waveforms in time series, in particular a physiological signal that includes a preprocessing of the original signal for the purpose of noise removal, extraction of morphological features of the signal, detection of interesting parts of the signal, extraction of dynamic features, recognition / classification waveforms and grouping / clustering of similar signal segments. The extraction of the morphological features of a signal is done by transforming the original one-dimensional signal into a series of characteristic vectors Vperceptions that summarize the morphology of the signal around the point at time T, which vectors All consist of a set of features m; m;…; signals, part is dynamically cumulative and part is the time-morphological determinant of VMD, whereby the series transformation is performed forward and backward relative to the time sequence. The signal transformation result is used as an input to the classifier for the purpose of building Model I for QRS section detection. Dynamic features, together with the morphological features of the signal (transformation result), form a set of extended features, that is, vectors with extended features, which, along with the corresponding source signal tags, become the basis for the supervised learning process for waveform recognition. The result of such a machine learning process is Model II, which is used to classify or recognize waveforms (Figure 10 c). Model II can then be used to recognize the waveforms of new signals (Figure 10 d) or the recognized ECG waves. Model I and Model II are stored on a computer or can be stored on any computer-readable medium.

Description

Područje tehnike The field of technology

Predmetni izum se odnosi na sustav i računalno implementirani postupak detekcije i raspoznavanja oblika valova u vremenskim serijama, posebno fiziološkog signala. Postupak omogućuje detekciju zanimljivih oblika (unaprijed određenih oblika) krivulje i raspoznavanje (klasifikaciju) oblika krivulje. Temeljem rezultata postupka postiže se automatska obrada signala koja ne zahtijeva ljudskog stručnjaka. Primjena računalno implementiranog postupka omogućava brzu obradu velikih količina podataka. Automatska dijagnostika temeljena na modelima za detekciju zanimljivih odsječaka EKG signala (QRS kompleksa) te klasifikaciju oblika valova omogućuje bržu dijagnostiku i olakšava rad liječnika. Predmetni izum se odnosi i na računalni program za detekciju i raspoznavanje oblika valova u vremenskim serijama. The subject invention relates to a system and a computer-implemented procedure for detecting and recognizing waveforms in time series, especially physiological signals. The procedure enables the detection of interesting shapes (predetermined shapes) of the curve and recognition (classification) of the shape of the curve. Based on the results of the procedure, automatic signal processing is achieved that does not require a human expert. The use of a computer-implemented procedure enables fast processing of large amounts of data. Automatic diagnostics based on models for the detection of interesting sections of the ECG signal (QRS complex) and the classification of waveforms enable faster diagnostics and facilitate the work of doctors. The present invention also relates to a computer program for the detection and recognition of waveforms in time series.

Tehnički problem Technical problem

Iako su do sada razvijeni vrlo precizni i brzi QRS detektori, a postignuti su i dosta dobri rezultati u odnosu na postojeće testove za algoritme, automatizirana EKG dijagnostika još uvijek nije u dovoljnoj mjeri zaživjela u praksi. Elektrokardiogram (EKG) je crtež kojeg proizvodi elektrokardiograf, uređaj koji bilježi električnu aktivnost srca u vremenu. Analiza različitih valova i vektora depolarizacije i repolarizacije dovodi do značajnih podataka u dijagnostici bolesti. EKG se sastoji od P-vala, QRS-kompleksa (kompleks se sastoji od Q, R i S-vala), T-vala te ponekad U vala, vidi sliku 1 i 2. U predmetnom području tehnike, najvrjednija dostignuća očituju se u metodama koje postižu veoma visoke rezultate detekcije QRS kompleksa EKG signala mjereno prema zadanim standardima. „State of the art“ metodama smatraju se one koje imaju osjetljivost, pozitivnu prediktivnost i općenitu točnost detektiranja QRS kompleksa iznad 95%. Osim detekcije zanimljivih odsječaka (otkucaja srca tj. QRS kompleksa) izazov je i raspoznavanje oblika valova koje omogućuje precizniju dijagnostiku poremećaja u radu srca. Metode raspoznavanja (klasifikacije) oblika valova tek se približavaju tim pragovima. Although very precise and fast QRS detectors have been developed so far, and quite good results have been achieved in comparison to the existing tests for algorithms, automated ECG diagnostics have not yet taken root in practice to a sufficient extent. An electrocardiogram (ECG) is a drawing produced by an electrocardiograph, a device that records the electrical activity of the heart over time. The analysis of different waves and vectors of depolarization and repolarization leads to significant data in the diagnosis of diseases. ECG consists of P-wave, QRS-complex (the complex consists of Q, R and S-wave), T-wave and sometimes U-wave, see figure 1 and 2. In the subject field of technology, the most valuable achievements are manifested in methods which achieve very high detection results of the QRS complex of the ECG signal measured according to the given standards. "State of the art" methods are those that have sensitivity, positive predictivity and general accuracy of QRS complex detection above 95%. In addition to the detection of interesting segments (heartbeat, i.e. QRS complex), the challenge is also to recognize the shape of the waves, which enables a more precise diagnosis of disorders in the work of the heart. Waveform recognition (classification) methods are only approaching these thresholds.

Tehnički problem koji se rješava predmetnim izumom je pružiti računalno implementirani postupak za rano dijagnosticiranje srčanih bolesti koji daje veću točnost i pouzdanost EKG dijagnostike u odnosu na poznate algoritme. The technical problem that is solved by the present invention is to provide a computer-implemented procedure for early diagnosis of heart diseases that provides greater accuracy and reliability of ECG diagnostics compared to known algorithms.

Drugi cilj izuma je stvaranje baze znanja koja sadrži skup proširenih značajki nastao izlučivanjem dinamičkih i morfoloških karakteristika signala u svrhu obrade algoritmima strojnog nenadziranog učenja i pronalaženja najsličnijih odsječaka signala za novi odsječak signala ili najsličnijih oblika valova za novi val. Another goal of the invention is to create a knowledge base that contains a set of extended features created by extracting dynamic and morphological characteristics of signals for the purpose of processing with machine unsupervised learning algorithms and finding the most similar signal segments for a new signal segment or the most similar waveforms for a new wave.

Stanje tehnike State of the art

Rad „Perception-based approach to time series data mining“; BATYRSHIN, I. Z., AND SHEREMETOV, L. B.. Appl. Soft Comput. 8, 3 (June 2008), 1211–1221. općenito razotkriva metodu opisivanja oblika iz vremenskih serija temeljenu na principima ljudskih opisa oblika i korištenje takvih opisa u svrhu rudarenja podacima u vremenskim serijama. Slični koncepti prikazivanja vremenskih serija temeljeni na idejama ljudske percepcije i opisivanja oblika su prikazani u radu autora KLEPAC, G., Otkrivanje zakonitosti temeljem jedinstvenoga modela transformacije vremenske serije, doktorska disertacija, Fakultet organizacije i informatike, Varaždin, Sveučilište u Zagrebu., 2005. Rad opisuje model transformacije vremenske serije sličan onom koji se opisuje u predmetnom izumu (REF II). Prema predmetnom izumu opisani postupak pruža manju razinu detaljizacije u opisima oblika te obuhvaća manje segmente krivulja. Predmetnim izumom se postiže preciznije opisivanje oblika vremenskih serija i time pouzdanija i preciznija primjena istih u rješavanju tehničkih problema iz predmetne domene. Rad, u nastavku referenca [1], „Automatic classification of heartbeats using ecg morphology and heartbeat interval features“, DE CHAZAL, P., O’DWYER, M., AND REILLY, R. B., IEEE Transactions on Biomedical Engineering 51, 7 (2004), 1196–1206., prikazuje postupak za automatsku obradu elektrokardiograma ( EKG ) za klasifikaciju otkucaja srca koji uključuje tri faze: fazu pred-obrade, fazu obrade i fazu klasifikacije. Razlika između postupka prema predloženom izumu i postupka u navedenom radu se sastoji od dodatnog koraka u fazi obrade signala izlučivanjem morfoloških značajki transformacijom izvornog jednodimenzionalnog signala u niz karakterističnih vektora VPT percepcije koji sažimaju morfologiju signala u okolini točke u trenutku T. Tim korakom postignuta je veća točnost (ACC) algoritma u odnosu na rezultate navedenog rada (referenca [1]). Rad, u nastavku referenca [2], „Heartbeat classification using feature selection driven by database generalization criteria“, LLAMEDO, M., AND MARTÍNEZ, J. P., Biomedical Engineering, IEEE Transactions on 58, 3 (2011), 616–625, prikazuje jednostavan rad klasifikatora srca na temelju EKG modela značajki odabranih s naglaskom na poboljšanu sposobnost generalizacije. Kao i u prethodnom radu (referenca [1]), postupkom prema predmetnom izumu postignuta je veća točnost algoritma. Rad, u nastavku referenca [3], „Heartbeat classification using morphological and dynamic features of ecg signals“, YE, C., BHAGAVATULA, V., AND COIMBRA, M., prikazuje novi pristup za klasifikaciju otkucaja srca koji se temelji na morfološkim i dinamičkim karakteristikama. Paper "Perception-based approach to time series data mining"; BATYRSHIN, I. Z., AND SHEREMETOV, L. B.. Appl. Soft Comput. 8, 3 (June 2008), 1211–1221. generally discloses a method of describing shapes from time series based on the principles of human shape descriptions and the use of such descriptions for time series data mining. Similar concepts of displaying time series based on the ideas of human perception and describing shapes are presented in the work of the author KLEPAC, G., Detecting regularities based on a unique model of time series transformation, doctoral dissertation, Faculty of Organization and Informatics, Varaždin, University of Zagreb, 2005. Paper describes a time series transformation model similar to that described in the subject invention (REF II). According to the subject invention, the described procedure provides a lower level of detail in the shape descriptions and includes smaller curve segments. The subject invention achieves a more precise description of the form of time series and thus a more reliable and precise application of the same in solving technical problems from the subject domain. Paper, below reference [1], "Automatic classification of heartbeats using ecg morphology and heartbeat interval features", DE CHAZAL, P., O'DWYER, M., AND REILLY, R. B., IEEE Transactions on Biomedical Engineering 51, 7 ( 2004), 1196–1206, shows a procedure for automatic processing of electrocardiograms (ECG) for heartbeat classification that includes three phases: pre-processing phase, processing phase and classification phase. The difference between the procedure according to the proposed invention and the procedure in the mentioned paper consists of an additional step in the signal processing phase by extracting morphological features by transforming the original one-dimensional signal into a series of characteristic VPT perception vectors that summarize the morphology of the signal in the vicinity of the point at time T. This step achieves greater accuracy (ACC) algorithm in relation to the results of the mentioned work (reference [1]). The work, below reference [2], "Heartbeat classification using feature selection driven by database generalization criteria", LLAMEDO, M., AND MARTÍNEZ, J. P., Biomedical Engineering, IEEE Transactions on 58, 3 (2011), 616–625, shows simple heart classifier operation based on ECG model features selected with an emphasis on improved generalization ability. As in the previous work (reference [1]), a higher accuracy of the algorithm was achieved by the procedure according to the subject invention. The paper, hereinafter referred to [3], "Heartbeat classification using morphological and dynamic features of ecg signals", YE, C., BHAGAVATULA, V., AND COIMBRA, M., presents a new approach for heartbeat classification based on morphological and dynamic characteristics.

Prema dosadašnjim saznanjima najuspješniji pristupi u objavljenim radovima, reference [1], [2] i [3], koji su objavili rezultate prema AAMI (ANSI/AAMI EC57:1998/(R)2008) standardu u usporedbi s postupkom prema predmetnom izumu pokazuju manju točnost (ACC) algoritma. Tablica 18 prikazuje usporedbu tri varijante izvođenja postupka prema predmetnom izumu i nekih od najuspješnijih pristupa koji su objavili rezultate prema AAMI standardu. Računalno implementirani postupak detekcije i raspoznavanja oblika valova u vremenskim serijama, posebno EKG fiziološkog signala prema predmetnom izumu pokazuje u sva tri načina izvođenja izuma veću točnost (ACC) algoritma. According to current knowledge, the most successful approaches in published works, references [1], [2] and [3], which published results according to the AAMI (ANSI/AAMI EC57:1998/(R)2008) standard in comparison with the procedure according to the subject invention show lower accuracy (ACC) algorithm. Table 18 shows a comparison of three variants of the process according to the subject invention and some of the most successful approaches that have published results according to the AAMI standard. The computer-implemented procedure for detecting and recognizing waveforms in time series, especially the ECG physiological signal according to the subject invention, shows in all three ways of implementing the invention a higher accuracy (ACC) algorithm.

Osim navedenih radova, može se kao manje relevantni navesti i rad, „A novel method for detecting r-peaks in electrocardiogram (ecg) signal“, MANIKANDAN, M. S., AND SOMAN, K., Biomedical Signal Processing and Control 7, 2 (2012), 118–128 koji prikazuje algoritam za detekciju QRS segmenata. In addition to the mentioned works, the work, "A novel method for detecting r-peaks in electrocardiogram (ecg) signal", MANIKANDAN, M. S., AND SOMAN, K., Biomedical Signal Processing and Control 7, 2 (2012) can be cited as less relevant. ), 118–128 showing an algorithm for detecting QRS segments.

Od patentne literature dokument US2008/0103403 prikazuje postupak za dijagnosticiranje tihih/ili simptomatskih srčanih bolesti kod ljudi, na osnovu ekstrakcije i analize skrivenih čimbenika, ili kombinacije skrivenih i poznatih čimbenika EKG signala. Postupak dijagnoze koristi EKG signal u mirovanju skupine dijagnosticiranih bolesnika. Skupina se sastoji od pacijenata s apriori dijagnozom kao bolesni pacijenti i pacijenta s apriori dijagnozom kao zdravi pacijenti. Nadalje, opisana metoda koristi neuronske mreže (NN) koje kao ulaz primaju blok signala (nakon filtriranja bez transformacija). Glavna razlika između predmetnog izuma i dokumenta US2008/013403 se sastoji u tome što navedeni dokument opisuje metodu koja klasificira samo je li pacijent zdrav ili bolestan. Postupak prema predmetnom izumu razlikuje i vrste poremećaja prema AAMI standardu. Nadalje, postupak prema predmetnom izumu kao ulaz ne uzima sekvencijalni blok signala već transformirani signal tj. rezultat transformacije dijela signala koji onda sažima morfološke karakteristike. Osim navedenoga, dokument US2008/0103403 ne daje egzaktne testove temeljem kojih bi se mogla usporediti uspješnost algoritma s algoritmom prema predmetnom izumu. From the patent literature document US2008/0103403 shows a procedure for diagnosing silent/or symptomatic heart diseases in humans, based on the extraction and analysis of hidden factors, or a combination of hidden and known factors of the ECG signal. The diagnosis procedure uses the resting ECG signal of a group of diagnosed patients. The group consists of patients with a priori diagnosis as sick patients and patients with a priori diagnosis as healthy patients. Furthermore, the described method uses neural networks (NN) that receive a block of signals as input (after filtering without transformations). The main difference between the subject invention and document US2008/013403 consists in that said document describes a method that only classifies whether a patient is healthy or sick. The procedure according to the subject invention also distinguishes the types of disorders according to the AAMI standard. Furthermore, the procedure according to the subject invention does not take as input a sequential block of signals but a transformed signal, i.e. the result of the transformation of a part of the signal which then summarizes the morphological characteristics. Apart from the above, the document US2008/0103403 does not provide exact tests based on which the success of the algorithm could be compared with the algorithm according to the subject invention.

Predmet izuma Subject of the invention

Predmetni izum se odnosi na sustav i računalno implementirani postupak detekcije i raspoznavanja oblika valova u vremenskim serijama, posebno fiziološkog signala. Postupak omogućuje detekciju zanimljivih oblika (unaprijed određenih oblika) krivulje i raspoznavanje (klasifikaciju) oblika krivulje. Postupak se posebno odnosi na transformaciju izvornog signala vremenske serije u vektore percepcije što je temelj za nekoliko primjena. To su izgradnja modela za QRS detekciju i klasifikacija QRS kompleksa, izgradnja modela za raspoznavanje oblika valova i klasifikacija valova te klasteriranje (grupiranje) sličnih valova i pronalazak valova najsličnijih promatranom valu. Izvorni signal se nakon uklanjanja šuma, poznatim tehnikama filtriranja signala, pretvara u niz vektora transformacijom prema predmetnom izumu. Tako transformirana vremenska serija je temelj za učenje klasifikatora (algoritma razvrstavanja) u svrhu otkrivanja (detekcije) QRS kompleksa u signalu. S obzirom da se radi o procesu nadziranog učenja podrazumijeva se da je signal za učenje označen (anotiran) tj. da sadrži oznake gdje se nalaze QRS kompleksi. Rezultat procesa strojnog učenja je model I za QRS detekciju (ili klasifikator) koji može biti pohranjen na računalu u obliku programskog koda (slika 10 - a)). Takav model I se potom može koristiti za klasifikaciju odnosno detekciju QRS segmenata u novim signalima (slika) 10 - b). Algoritam za klasifikaciju primijenjen u opisanom izumu je algoritam Slučajne šume – engl. Random Forest. Detektirani QRS kompleksi temelj su za izlučivanje dinamičkih značajki signala (bazirano na međusobnim razmacima QRS kompleksa). Dinamičke značajke zajedno sa morfološkim značajkama (rezultat transformacije) tvore skup proširenih značajki koje uz odgovarajuće oznake izvornog signala postaju temelj za proces nadziranog učenja u svrhu raspoznavanja oblika valova. Rezultat takvog procesa strojnog učenja je model koji se koristi za klasifikaciju odnosno raspoznavanje oblika valova (slika 10 - c)). Model se potom može koristiti za raspoznavanje oblika valova novih signala (slika 10 - d)). The subject invention relates to a system and a computer-implemented procedure for detecting and recognizing waveforms in time series, especially physiological signals. The procedure enables the detection of interesting shapes (predetermined shapes) of the curve and recognition (classification) of the shape of the curve. The procedure specifically refers to the transformation of the original time series signal into perception vectors, which is the basis for several applications. These are the construction of models for QRS detection and classification of QRS complexes, the construction of models for recognition of wave forms and classification of waves, and clustering (grouping) of similar waves and finding the waves most similar to the observed wave. After noise removal, the original signal is converted into a series of vectors by transformation according to the subject invention, using known signal filtering techniques. The transformed time series is the basis for learning a classifier (sorting algorithm) for the purpose of detecting QRS complexes in the signal. Given that it is a process of supervised learning, it is understood that the learning signal is marked (annotated), i.e. that it contains marks where the QRS complexes are located. The result of the machine learning process is model I for QRS detection (or classifier) which can be stored on the computer in the form of program code (Figure 10 - a)). Such model I can then be used for classification or detection of QRS segments in new signals (Figure 10 - b). The classification algorithm applied in the described invention is the Random Forest algorithm. Random Forest. The detected QRS complexes are the basis for extracting the dynamic features of the signal (based on the intervals between the QRS complexes). Dynamic features together with morphological features (result of transformation) form a set of extended features which, along with the corresponding labels of the original signal, become the basis for the process of supervised learning for the purpose of recognizing waveforms. The result of such a machine learning process is a model that is used for classification or recognition of waveforms (Figure 10 - c)). The model can then be used to recognize the waveforms of new signals (Figure 10 - d)).

Skup proširenih značajki nastao izlučivanjem dinamičkih i morfoloških karakteristika je pohranjen kao baza znanja koja se koristi u svrhu obrade algoritmima strojnog nenadziranog učenja. Takvi algoritmi rezultiraju kreiranjem grupa sličnih valova (klasteri). Primjeri takvih algoritama su algoritam K-sredina, hijerarhijsko klasteriranje, samoorganizirajuće mape i sl. Skup proširenih značajki nastao izlučivanjem dinamičkih i morfoloških karakteristika je pohranjen kao baza znanja koja se koristi u svrhu pronalaska najsličnijih valova za neki novi val. Takvi valovi pronalaze se pomoću izračuna matematičke udaljenosti između vektora koji su reprezentacija valova u bazi znanja i vala za koji se traže najsličniji valovi iz baze. Primjeri udaljenosti su Euklidska udaljenost, Manhattan udaljenost, Mahalanobisova udaljenost i sl. A set of extended features created by extracting dynamic and morphological characteristics is stored as a knowledge base that is used for processing with machine unsupervised learning algorithms. Such algorithms result in the creation of groups of similar waves (clusters). Examples of such algorithms are the K-means algorithm, hierarchical clustering, self-organizing maps, etc. The set of extended features created by extracting dynamic and morphological characteristics is stored as a knowledge base that is used for the purpose of finding the most similar waves for a new wave. Such waves are found by calculating the mathematical distance between the vectors representing the waves in the knowledge base and the wave for which the most similar waves from the base are searched. Examples of distances are Euclidean distance, Manhattan distance, Mahalanobis distance, etc.

Razvoj specijaliziranog klasifikatora izlazi izvan okvira ovog izuma te su s obzirom na problem izrazito nebalansiranih podataka, brzinu izvođenja te još neke druge u nastavku opisane parametre razmotreni postojeći klasifikatori. Nakon testiranja različitih klasifikatora na ograničenom skupu podataka, odabran je algoritam Slučajne šume (RF - engl. Random Forest). RF algoritam je relativno brz i veoma precizan klasifikator, čije su mogućnosti potvrđene u ranijim sličnim istraživanjima. Algoritam slučajne šume spada u skupinu tzv. ansambala klasifikatora (više zasebnih klasifikatora povezanih u jedan model) i temelji se na stablima odlučivanja. The development of a specialized classifier goes beyond the scope of this invention, and with regard to the problem of extremely unbalanced data, execution speed and some other parameters described below, existing classifiers were considered. After testing different classifiers on a limited data set, the Random Forest (RF) algorithm was selected. The RF algorithm is a relatively fast and very precise classifier, whose capabilities have been confirmed in earlier similar research. The random forest algorithm belongs to the group of so-called ensemble of classifiers (multiple separate classifiers connected in one model) and is based on decision trees.

Stablo odlučivanja je jednostavan klasifikator koji se dobije izgradnjom strukture stabla na način da se u svakoj grani stabla podatkovni skup dijeli prema varijabli koja podskup u toj grani najbolje razdvaja s obzirom na ciljnu varijablu. Izračun "najbolje varijable" u pojedinoj grani temelji se na entropiji ili Gini mjeri nečistoće. Postupak se ponavlja rekurzivno dok se ne zadovolje uvjeti prestanka rasta stabla. Prednosti stabala odlučivanja su jednostavna matematika na kojoj se temelje, mogućnost interpretacije tj. očitavanja postupka donošenja odluke slijedom grananja stabla od korijena do listova te mogućnost interakcije s pojedinim dijelovima stabla. Zamke koje kriju modeli stabala odlučivanja su činjenica da dobro pogađaju klasu ali ne i vjerojatnosti za klasu, nestabilnost modela, sklonost pretreniranju (zbog čega se ugrađuju pravila ograničavanja rasta ili podrezivanje) te činjenica da loše podnose mnogo varijabli. U svrhu prevladavanja ovih problema Leo Breiman predložio je model Slučajne šume RF. Općenito, ansamble klasifikatora možemo podijeliti na skupine nezavisnih klasifikatora (engl. bagging) i sekvencijalne klasifikatore (engl. boosting). Kod nezavisnih odluka modela donosi po nekom principu glasovanja. klasifikatora, svaki pojedini klasifikator samostalno donosi odluku i onda se konačna Sekvencijalni klasifikatori ovise jedan o drugom tj. svaki sljedeći klasifikator koji se gradi usavršava općeniti model temeljem rezultata (pogrešaka) prethodnog. RF algoritam je primjer nezavisnih stabala odlučivanja koja su izgrađena na način da je za svako pojedino stablo u šumi iskorišten samo dio podatkovnog skupa (uzorkovanje s ponavljanjem, engl. bootstrap) te je za svako grananje pojedinog stabla razmatran slučajan podskup atributa. Na ovaj način uvedena su dva izvora slučajnosti u model što pomaže prilikom sprečavanja pretreniranosti. Svako stablo šume na ovaj način nosi dio informacije, a ukupna tj. prosječna ocjena prilikom predikcije je vrlo točna. RF algoritam je po mnogim autorima jedan od najboljih algoritama strojnog učenja uopće, dobro podnosi mnogo varijabli te dobro procjenjuje buduću pogrešku modela (podaci koji nisu ušli u bootstrap uzorak ostaju za testiranje - engl. Out of bag error). Nadalje, RF algoritam sam po sebi provodi analizu važnosti varijabli kroz postupak randomizirane izgradnje mnogo stabala i internog mjerenja uspješnosti svakog od njih na podskupu podataka koji nisu bili korišteni za učenje. Pored toga, RF nije osjetljiv na nenormalnosti u razdiobama varijabli prediktora niti na njihove različite skale. Mana RF algoritma je nemogućnost interpretacije, no za tehnički problem izuma važnija je bila ukupna točnost modela. Jedan od logičnih kandidata pri odabiru algoritma strojnog učenja su svakako i neuronske mreže. Provedeno je nekoliko testnih klasifikacija i rezultati nisu bili bolji od RF algoritma. Dodatno, kod neuronskih mreža je potrebno "žrtvovati" dio podataka iz seta za učenje u svrhu validacije modela (osiguranje od efekta pretreniranosti), a s obzirom na ionako relativno mali broj opservacija koje pripadaju nekim ciljnim klasama u ovom izumu neuronske mreže nisu bile konačni odabir. U predmetnom izumu primijenjene su implementacije RF algoritma u programskim paketima za strojno učenje Orange i Weka. Implementacija u alatu Orange je korištena u QRS detekciji nefiltriranih podataka, a u kasnijim testiranjima (testiranje bazirano na subjektu i klasifikacija oblika) je zbog veličine podatkovnih setova i duljine izvršavanja iskorištena "Fast Random Forest" implementacija. Što se tiče nebalansiranih podataka, kod testa na filtriranim podacima primijenjen je kombinirani pristup ranije navedenih rješenja. Konkretno, podaci su poduzorkovani na način da je odabrano 5 posto podataka koji predstavljaju negativne opservacije te je taj uzorak zajedno sa svim pozitivnim opservacijama (QRS kompleksima) činio osnovni uzorak nad kojim je onda proveden test baziran na subjektu opisan u narednom tekstu. Također, klasifikator je kreiran kroz tzv. meta-klasifikator što znači da je u procesu učenja RF algoritma, funkcija koštanja koja optimizira algoritam vodila računa o izmijenim vrijednostima koštanja pogrešaka u korist točne detekcije QRS kompleksa. Nadalje, s obzirom na nebalansirane podatke, za zaključivanje da se dogodio otkucaj potrebni prag vjerojatnosti koji daje klasifikator spušten je sa pretpostavljenih 0,5 na 0,4. A decision tree is a simple classifier obtained by building a tree structure in such a way that in each branch of the tree the data set is divided according to the variable that best separates the subset in that branch with respect to the target variable. The calculation of the "best variable" in a particular branch is based on the entropy or Gini measure of impurity. The process is repeated recursively until the conditions for the tree to stop growing are met. The advantages of decision trees are the simple mathematics on which they are based, the possibility of interpretation, i.e. the reading of the decision-making process in the sequence of branches of the tree from the root to the leaves, and the possibility of interaction with individual parts of the tree. The pitfalls of decision tree models are the fact that they predict well the class but not the probabilities for the class, the instability of the model, the tendency to overtrain (which is why growth limiting rules or pruning are built in) and the fact that they handle many variables poorly. In order to overcome these problems, Leo Breiman proposed the Random Forest RF model. In general, ensembles of classifiers can be divided into groups of independent classifiers (eng. bagging) and sequential classifiers (eng. boosting). The model makes independent decisions based on a voting principle. classifiers, each individual classifier independently makes a decision and then the final Sequential classifiers depend on each other, i.e. each subsequent classifier that is built improves the general model based on the results (errors) of the previous one. The RF algorithm is an example of independent decision trees that were built in such a way that only a part of the data set was used for each individual tree in the forest (sampling with repetition, English bootstrap) and a random subset of attributes was considered for each branching of an individual tree. In this way, two sources of randomness are introduced into the model, which helps prevent overtraining. In this way, each tree of the forest carries part of the information, and the overall, i.e., average score during the prediction is very accurate. According to many authors, the RF algorithm is one of the best machine learning algorithms in general, it tolerates many variables well and estimates the future error of the model well (data that did not enter the bootstrap sample remains for testing - Out of bag error). Furthermore, the RF algorithm itself performs variable importance analysis through the process of randomly building many trees and internally measuring the performance of each tree on a subset of the data that was not used for training. In addition, RF is not sensitive to non-normality in the distributions of the predictor variables or to their different scales. The disadvantage of the RF algorithm is the impossibility of interpretation, but the overall accuracy of the model was more important for the technical problem of the invention. Neural networks are certainly one of the logical candidates when choosing a machine learning algorithm. Several test classifications were performed and the results were not better than the RF algorithm. Additionally, with neural networks, it is necessary to "sacrifice" part of the data from the learning set for the purpose of validating the model (insurance against the overtraining effect), and considering the relatively small number of observations belonging to some target classes in this invention, neural networks were not the final choice. In the subject invention, implementations of the RF algorithm were applied in the software packages for machine learning Orange and Weka. The implementation in the Orange tool was used in QRS detection of unfiltered data, and in later tests (subject-based testing and shape classification), the "Fast Random Forest" implementation was used due to the size of the data sets and the length of the execution. As for the unbalanced data, a combined approach of the previously mentioned solutions was applied in the test on the filtered data. In particular, the data were subsampled in such a way that 5 percent of the data representing negative observations were selected, and that sample, together with all positive observations (QRS complexes), formed the basic sample on which the subject-based test described in the following text was performed. Also, the classifier was created through the so-called meta-classifier, which means that in the process of learning the RF algorithm, the cost function that optimizes the algorithm took into account the changed values of the cost of errors in favor of the correct detection of the QRS complex. Furthermore, given the unbalanced data, the required probability threshold given by the classifier to conclude that a beat has occurred is lowered from the assumed 0.5 to 0.4.

U svrhu usporedbe sa navedenim istraživanjima i pri detekciji QRS kompleksa i pri klasifikaciji aritmija, implementirani su digitalni filtri. Median-filtar za uklanjanje pomaka bazne linije, te "buterworth" filtar za uklanjanje šuma koji nastaje zbog smetnji gradske mreže. Većina recentnih istraživanja opisuje primjenu istih ili sličnih tehnika pretprocesiranja što omogućava korektnu usporedbu u smislu izlučivanja značajki i klasifikacije tako obrađenih signala. For the purpose of comparison with the aforementioned studies, digital filters were implemented in the detection of QRS complexes and in the classification of arrhythmias. Median-filter to remove the base line shift, and "buterworth" filter to remove noise caused by city network interference. Most recent research describes the application of the same or similar preprocessing techniques, which enables a correct comparison in terms of feature extraction and classification of the processed signals.

Kratki opis slika Short description of the pictures

U nastavku će izum biti detaljno opisan s pozivanjem na slike pri čemu: In the following, the invention will be described in detail with reference to the drawings, whereby:

Slika 1 prikazuje Morfološke i dinamičke karakteristike različitih EKG valova, Figure 1 shows the morphological and dynamic characteristics of different ECG waves,

Slika 2 prikazuje Karakteristične dijelove normalnog EKG odsječka, Figure 2 shows the characteristic parts of a normal ECG section,

Slika 3 prikazuje tipične korake u računalnoj obradi EKG signala, Figure 3 shows typical steps in computer processing of ECG signals,

Slika 4 prikazuje nekoliko odsječaka krivulje sa ilustriranim morfološkim razlikama koje se pokušavaju obuhvatiti vektorima kao rezultat transformacije, Figure 4 shows several sections of the curve with illustrated morphological differences that are attempted to be captured by vectors as a result of the transformation,

Slika 5 prikazuje konceptualni prikaz metode prema predmetnom izumu, Figure 5 shows a conceptual representation of the method according to the subject invention,

Slika 6 prikazuje osnovne korake postupka prema predmetnom izumu, Figure 6 shows the basic steps of the procedure according to the subject invention,

Slika 7 prikazuje metodu klizećih prozora, Figure 7 shows the sliding window method,

Slika 8 prikazuje metodu skačućih prozora, Figure 8 shows the pop-up method,

Slika 9 prikazuje transformaciju EKG signala prema predmetnom izumu, Figure 9 shows the transformation of the ECG signal according to the present invention,

Slika 10 prikazuje dijagram toka postupka prema predmetnom izumu, i Figure 10 shows a flowchart of the process according to the subject invention, i

Slika 11 prikazuje dijagram toka nenadziranog učenja. Figure 11 shows the flow chart of unsupervised learning.

Detaljni opis izuma Detailed description of the invention

Ljudski EKG signal prikazan na slikama 1 i 2 sastoji se od nekoliko karakterističnih točaka odnosno valova koji odražavaju aktivnosti fizioloških procesa u srcu. EKG se sastoji od P vala, QRS kompleksa (kompleks se sastoji od Q, R i S valova), T vala i ponekad U vala. Većina algoritama koji su trenutno poznati za cilj imaju prepoznavanje upravo QRS kompleksa EKG signala. Takvi algoritmi nazivaju se QRS detektori jer njihovo prepoznavanje R zupca zapravo podrazumijeva prepoznavanje QRS kompleksa koji uključuje lokalne minimume lijevo i desno od R zupca. QRS kompleks zapravo odražava jedan otkucaj srca, a ispravno detektiranje QRS kompleksa, osnova je za analizu srčanog ritma. Trenutne metode postižu vrlo visoke rezultate u detekciji QRS kompleksa i sve se izvode nad filtriranim signalima te implementirajući fazu traženja unatrag (engl. search back). Faza traženja unatrag je zapravo odgovor na nekoliko problema koji se javljaju prilikom detekcije otkucaja. Prvi je jedna negativna posljedica filtriranja izvornog signala, a to je promjena morfologije izvornog signala nakon filtriranja gdje se ekstremi krivulje vrlo često nakon filtriranja pomaknu za nekoliko vremenskih jedinica, najčešće u smjeru prolaska filtra. Zbog tog posmaka u filtriranom signalu, nakon identifikacije potencijalnog R zupca QRS kompleksa, ANSI standard dozvoljava traženje stvarnog R zupca u originalnom signalu u intervalu od 150 ms ispred ili iza točke koja je pronađena u filtriranom signalu. Drugi razlog je postojanje oštrih i po amplitudi visokih T valova, te s druge strane malih i širokih QRS kompleksa kod nekih pacijenata. Stoga je potrebno ugraditi dodatnu logiku kako bi se prilikom QRS detekcije izbjegla detekcija T valova, a omogućila uspješna detekcija malih QRS kompleksa. Općenito, većina metoda računalne obrade EKG signala, bilo da se radi o QRS detekciji ili o klasifikaciji aritmija, podrazumijevaju dva osnovna koraka prije same detekcije, odnosno klasifikacije. Prvi korak je pretprocesiranje signala (uklanjanje šuma), a sljedeći je izlučivanje značajki (prikazano na slici 3). Slika 6 prikazuje osnovne korake postupka prema predmetnom izumu, koji uključuje postupak predprocesiranja izvornog signala u svrhu uklanjanja šuma, izlučivanja morfoloških značajki signala, detekciju zanimljivih dijelova signala odnosno QRS detekciju, izlučivanje dinamičkih značajki, raspoznavanje/klasificiranje oblika valova i grupiranje/klasteriranje sličnih odsječaka signala. Prema predmetnom izumu signal sa smanjenim šumom iz vremenske domene transformirat će se nizom proračuna na način da se od jednodimenzionalnog signala dobije višedimenzionalni signal čije će dimenzije zapravo govoriti o različitim morfološkim karakteristikama izvornog signala. Izlučivanje morfoloških značajki signala se odvija transformacijom izvornog jednodimenzionalnog signala u niz karakterističnih vektora VPT percepcije koji sažimaju morfologiju signala u okolini točke u trenutku T, koji vektori VPT se sastoje od skupa značajki m1T ;m2T ;…;mnT , pri čemu dio navedenih značajki opisuje geometrijske karakteristike odsječka signala, dio čine dinamički kumulativi, a dio vremensko morfološke determinante VMD, pri čemu se transformacija serije provodi unaprijed i unatrag u odnosu na vremenski slijed, prikazano na slici 9. The human ECG signal shown in Figures 1 and 2 consists of several characteristic points or waves that reflect the activities of physiological processes in the heart. The EKG consists of a P wave, a QRS complex (the complex consists of Q, R, and S waves), a T wave, and sometimes a U wave. Most algorithms that are currently known aim to recognize the QRS complex of the ECG signal. Such algorithms are called QRS detectors because their recognition of the R wave actually implies the recognition of a QRS complex that includes local minima to the left and right of the R wave. The QRS complex actually reflects one heartbeat, and the correct detection of the QRS complex is the basis for heart rhythm analysis. Current methods achieve very high results in QRS complex detection and are all performed over filtered signals and implementing a search back phase. The backward lookup phase is actually a response to several problems that occur during beat detection. The first is a negative consequence of filtering the original signal, which is a change in the morphology of the original signal after filtering, where the extremes of the curve are very often shifted by several time units after filtering, most often in the direction of the filter passage. Because of this shift in the filtered signal, after identifying the potential R wave of the QRS complex, the ANSI standard allows searching for the actual R wave in the original signal in the 150 ms interval before or after the point found in the filtered signal. Another reason is the existence of sharp and high-amplitude T waves, and on the other hand, small and wide QRS complexes in some patients. Therefore, it is necessary to incorporate additional logic in order to avoid the detection of T waves during QRS detection, and enable the successful detection of small QRS complexes. In general, most methods of computer processing of ECG signals, whether it is QRS detection or arrhythmia classification, involve two basic steps before the actual detection or classification. The first step is signal preprocessing (noise removal), and the next step is feature extraction (shown in Figure 3). Figure 6 shows the basic steps of the process according to the subject invention, which includes the process of pre-processing the original signal for the purpose of removing noise, extracting morphological features of the signal, detecting interesting parts of the signal, i.e. QRS detection, extracting dynamic features, recognizing/classifying waveforms and grouping/clustering similar segments of the signal . According to the subject invention, a signal with reduced noise from the time domain will be transformed by a series of calculations in such a way as to obtain a multidimensional signal from a one-dimensional signal, the dimensions of which will actually speak of different morphological characteristics of the original signal. The extraction of the morphological features of the signal takes place by transforming the original one-dimensional signal into a series of characteristic VPT perception vectors that summarize the morphology of the signal in the vicinity of the point at time T, which VPT vectors consist of a set of features m1T ;m2T ;…;mnT , where part of the mentioned features describe geometric characteristics of the signal segment, part of which is dynamically cumulative, and part of the temporal morphological determinant VMD, whereby the transformation of the series is carried out forward and backward in relation to the time sequence, shown in Figure 9.

Rezultat transformacije signala se koristi kao ulaz u klasifikator u svrhu izgradnje modela I za detekciju zanimljivih oblika odsječaka signala. Model I se koristi za detekciju zanimljivih odsječaka signala u svrhu detekcije takvih odsječaka u novim signalima, Slika 10 a i b prikazuje izgradnju i korištenje modela I. Postupak izgradnje modela I je slijedeći. Izvorni signal ili signal za učenje se podvrgava pretprocesiranju u svrhu uklanjanja šuma. Izvorni jednodimenzionalni signal sa smanjenim šumom se podvrgava transformaciji u niz karakterističnih vektora VPT percepcije koji sažimaju morfologiju signala u okolini točke u trenutku T prema gore navedenom postupku, a koji će kasnije biti detaljnije opisan. Rezultat transformacije je niz karakterističnih vektora VPT ili značajki koje služe za učenje u svrhu QRS detekcije. Rezultat je izgradnja klasifikatora – modela I, odnosno programski kod za QRS detekciju ili općenito zanimljivih odsječaka signala. Model I za QRS detekciju se koristi za klasifikaciju odsječaka u novim signalima koji su prethodno podvrgnuti postupku uklanjanja šuma i gore navedenoj transformaciji. S obzirom da se radi o procesu nadziranog učenja podrazumijeva se da je signal za učenje označen (anotiran) tj. da sadrži oznake gdje se nalaze QRS kompleksi. Rezultat procesa strojnog učenja je model I koji je pohranjen na računalu u obliku programskog koda (slika 10 a). Takav model se potom koristi za klasifikaciju odnosno detekciju QRS odsječaka signala u novim signalima (slika 10 b). Nadalje, rezultat gore navedene transformacije i rezultat detekcije zanimljivih oblika tj. QRS detekcije (Model I) se koristi kao ulaz u klasifikator u svrhu izgradnje modela II za raspoznavanje oblika odsječaka signala izvornog signala ili signala za učenje i novog signala. Postupak izgradnje modela II je slijedeći. Izvorni signal ili signal za učenje se podvrgava pretprocesiranju u svrhu uklanjanja šuma. Signal sa smanjenim šumom se podvrgava klasifikaciji odsječaka signala korištenjem modela I i nakon toga se izlučuju dinamičke značajke. Istovremeno se signal sa smanjenim šumom podvrgava gore navedenoj transformaciji u niz karakterističnih vektora VPT percepcije. Rezultat ta dva istovremena koraka su vektori s proširenim značajkama te izgradnja klasifikatora odnosno modela II za raspoznavanje oblika koji je pohranjen na računalu u obliku programskog koda. Korištenje modela II za raspoznavanje oblika odsječaka signala u svrhu raspoznavanja oblika u novim signalima, odnosno raspoznatim EKG valovima se odvija na slijedeći način. Novi signal se podvrgava postupku uklanjanja šuma i potom gore navedenoj transformaciji u niz karakterističnih vektora VPT percepcije. Korištenjem modela I se odvija detekcija zanimljivih odsječaka signala odnosno klasifikacija odsječaka signala (QRS detekcija). Otkriveni QRS segmenti se podvrgavaju izlučivanju dinamičkih značajki signala rezultat čega su proširene značajke odnosno vektori s proširenim značajkama. Korištenjem prethodno izgrađenog modela II se odvija klasifikacija oblika valova što rezultira raspoznatim EKG valovima. Detektirani QRS kompleksi temelj su za izlučivanje dinamičkih značajki signala (bazirano na međusobnim razmacima QRS kompleksa). Dinamičke značajke zajedno sa morfološkim značajkama (rezultat transformacije) tvore skup proširenih značajki, odnosno vektore s proširenim značajkama koje uz odgovarajuće oznake izvornog signala postaju temelj za proces nadziranog učenja u svrhu raspoznavanja oblika valova. Rezultat takvog procesa strojnog učenja je model II koji se koristi za klasifikaciju odnosno raspoznavanje oblika valova (slika 10 c). Model II se potom može koristiti za raspoznavanje oblika valova novih signala (slika 10 d) odnosno raspoznate EKG valove. The result of the signal transformation is used as an input to the classifier for the purpose of building a model I for the detection of interesting shapes of signal segments. Model I is used to detect interesting segments of the signal for the purpose of detecting such segments in new signals, Figure 10 a and b shows the construction and use of Model I. The procedure for building Model I is as follows. The source or training signal is pre-processed to remove noise. The original one-dimensional signal with reduced noise is transformed into a series of characteristic vectors of VPT perception that summarize the morphology of the signal in the vicinity of the point at time T according to the above procedure, which will be described in more detail later. The result of the transformation is a series of characteristic VPT vectors or features that are used for learning for the purpose of QRS detection. The result is the construction of a classifier - model I, that is, a program code for QRS detection or generally interesting signal segments. Model I for QRS detection is used to classify segments in new signals that have previously been subjected to the denoising process and the above transformation. Given that it is a process of supervised learning, it is understood that the learning signal is marked (annotated), i.e. that it contains marks where the QRS complexes are located. The result of the machine learning process is model I, which is stored on the computer in the form of program code (Figure 10 a). Such a model is then used for the classification or detection of QRS signal segments in new signals (Figure 10 b). Furthermore, the result of the above transformation and the result of the detection of interesting shapes, i.e. QRS detection (Model I), is used as an input to the classifier for the purpose of building model II for distinguishing the shape of signal segments of the original signal or the learning signal and the new signal. The procedure for building model II is as follows. The source or training signal is pre-processed to remove noise. The de-noised signal is subjected to segment classification using I-model and then the dynamic features are extracted. At the same time, the signal with reduced noise is subjected to the above-mentioned transformation into a series of characteristic vectors of VPT perception. The result of these two simultaneous steps are vectors with extended features and the construction of a classifier or model II for recognizing the shape that is stored on the computer in the form of program code. The use of model II for recognizing the shape of signal segments for the purpose of recognizing the shape in new signals, i.e. recognized ECG waves, takes place in the following way. The new signal is subjected to the noise removal procedure and then to the above-mentioned transformation into a series of characteristic vectors of VPT perception. By using model I, the detection of interesting signal segments, or the classification of signal segments (QRS detection) takes place. The detected QRS segments are subjected to the extraction of dynamic signal features resulting in extended features or vectors with extended features. By using the previously built model II, the classification of waveforms takes place, which results in recognized ECG waves. The detected QRS complexes are the basis for extracting the dynamic features of the signal (based on the intervals between the QRS complexes). Dynamic features together with morphological features (result of transformation) form a set of extended features, ie vectors with extended features which, along with the corresponding labels of the original signal, become the basis for the process of supervised learning for the purpose of recognizing waveforms. The result of such a machine learning process is model II, which is used for the classification or recognition of waveforms (Figure 10 c). Model II can then be used to recognize the waveforms of new signals (Figure 10 d) or recognized ECG waves.

Model I i model II se u obliku programskog koda pohranjuju na računalu ili mogu biti pohranjeni na bilo kojem računalno čitljivom mediju. Izlučivanje dinamičkih i morfoloških karakteristika u niz karakterističnih vektora VPT, odnosno rezultat transformacije se koristi u svrhu grupiranja po obliku sličnih odsječaka signala čega rezultat je kreiranje grupa ili klastera sličnih odsječaka signala. Grupiranje po obliku sličnih odsječaka signala uz izvorni signal rezultira novim pridruženim informacijama koje se pohranjuju. Izlučivanje dinamičkih i morfoloških karakteristika u niz karakterističnih vektora VPT, odnosno rezultat transformacije se nadalje koristi u svrhu pronalaženja najsličnijih odsječaka signala za novi odsječak signala ili najsličnijih oblika valova za novi val. Model I and Model II are stored in the form of program code on a computer or may be stored on any computer-readable medium. The extraction of dynamic and morphological characteristics into a series of characteristic VPT vectors, i.e. the result of the transformation is used for the purpose of grouping by shape similar segments of the signal, the result of which is the creation of groups or clusters of similar segments of the signal. Grouping by shape similar signal segments with the original signal results in new associated information being stored. The extraction of dynamic and morphological characteristics into a series of characteristic VPT vectors, i.e. the transformation result is further used for the purpose of finding the most similar signal segments for a new signal segment or the most similar waveforms for a new wave.

Skup proširenih značajki nastao izlučivanjem dinamičkih i morfoloških karakteristika je pohranjen kao baza znanja te se koristi u svrhu obrade algoritmima strojnog nenadziranog učenja. Takvi algoritmi rezultiraju kreiranjem grupa sličnih valova (klasteri). Primjeri takvih algoritama su algoritam K-sredina, hijerarhijsko klasteriranje, samoorganizirajuće mape i sl. Nadalje, skup proširenih značajki nastao izlučivanjem dinamičkih i morfoloških karakteristika je pohranjen kao baza znanja se koristi u svrhu pronalaženja najsličnijih valova za neki novi val. Takvi valovi pronalaze se pomoću izračuna matematičke udaljenosti između vektora koji su reprezentacija valova u bazi znanja i vala za koji se traže najsličniji valovi iz baze. Primjeri udaljenosti su Euklidska udaljenost, Manhattan udaljenost, Mahalanobisova udaljenost i sl. The set of extended features created by extracting dynamic and morphological characteristics is stored as a knowledge base and is used for the purpose of processing with machine unsupervised learning algorithms. Such algorithms result in the creation of groups of similar waves (clusters). Examples of such algorithms are the K-means algorithm, hierarchical clustering, self-organizing maps, etc. Furthermore, a set of extended features created by extracting dynamic and morphological characteristics is stored as a knowledge base and is used for the purpose of finding the most similar waves for a new wave. Such waves are found by calculating the mathematical distance between the vectors representing the waves in the knowledge base and the wave for which the most similar waves from the base are searched. Examples of distances are Euclidean distance, Manhattan distance, Mahalanobis distance, etc.

Vektori VPT percepcije koji sažimaju morfologiju signala u okolini točke u trenutku T se sastoje od skupa značajki m1T ;m2T ;…;mnT , pri čemu dio navedenih značajki opisuje geometrijske karakteristike odsječka signala, dio čine dinamički kumulativi, a dio vremensko morfološke determinante VMD. Dinamički kumulativi komponente x vektora [image] izračunavaju se posebno za vrijeme trajanja trenda signala i za vrijeme trajanja konkavnosti signala. VPT perception vectors summarizing the morphology of the signal in the vicinity of the point at time T consist of a set of features m1T ;m2T ;...;mnT , where part of the mentioned features describe the geometric characteristics of the signal segment, part are dynamic cumulative, and part are temporal morphological determinants of VMD. The dynamic cumulative components of the x vector [image] are calculated separately for the duration of the trend of the signal and for the duration of the concavity of the signal.

U nastavku je detaljno opisan niz proračuna kojima se od jednodimenzionalnog signala dobije višedimenzionalni signal čije će dimenzije zapravo govoriti o različitim morfološkim karakteristikama izvornog signala. Također, u nastavku su opisani i postupak izračuna kumulativa komponente x vektora [image] za vrijeme trajanja trenda, postupak izračuna kumulativa komponente x vektora [image] za vrijeme trajanja konkavnosti i postupak izračuna VMD vrijednosti komponente x vektora [image] za vrijeme trajanja trenda. Below is a detailed description of a series of calculations that turn a one-dimensional signal into a multidimensional signal, the dimensions of which will actually speak about the different morphological characteristics of the original signal. Also, the procedure for calculating the cumulative component of the x vector [image] for the duration of the trend, the procedure for calculating the cumulative component of the x vector [image] for the duration of the concavity and the procedure for calculating the VMD value of the component x vector [image] for the duration of the trend are also described below.

Vremenska serija predstavlja vrijednost jedne kontinuirane varijable kroz vrijeme. Činjenica da se radi o mjerenju jedno te istog svojstva u različitim trenucima znači da je vrijednost izmjerena u jednom trenutku direktno ovisna o vrijednosti izmjerenoj u trenutku ranije. Ovo svojstvo vremenske serije onemogućava primjenu standardnih statističkih metoda s obzirom da mjerenja u različitim trenucima (opservacije) nisu međusobno nezavisne. Upravo ova međusobna zavisnost svih opservacija biti će temelj za razvoj nove metode predstavljene u nastavku. Kako je opisano ranije, ljudska interpretacija vremenskih serija očituje se u opisu oblika koje primjećujemo, odnosno segmenata serije ili točnije susjednih opservacija i međuodnosa njihovih vrijednosti. Suvremeni nalazi iz područja neuroznanosti dokazuju kako i neuroni u mozgu viših sisavaca funkcioniraju na sličan način odnosno kako se grupe neurona ponašaju usklađeno te se njihova aktivacija odvija istovremeno kada se u vidnom polju subjekta nađu točno određeni uzorci. Ove ideje bile su polazište za razvoj koncepta koji bi na neki način obuhvatio osnovne morfološke karakteristike segmenata vremenske serije. Prije svega bilo je potrebno odlučiti na koji način će se segmentirati dijelovi vremenskih serija, a potom i osmisliti način na koji će se opisati morfologija tih segmenata. Za segmentaciju su se prirodnim učinile točke infleksije, a morfološke karakteristike se pokušalo sažeti u skup jednostavnih i brzih proračuna. Koncept korištenja specijaliziranih neurona u opisivanju vremenske serije prikazan je na slici 5. I segmentiranje i morfološko opisivanje serije odvija se u vremenskoj domeni tj. na izvornoj seriji bez pretvaranja u frekvencijsku domenu ili primjene neke druge transformacije (npr. valićne). Signal iz vremenske domene transformirat će se nizom spomenutih proračuna na način da se od jednodimenzionalnog signala dobije višedimenzionalni signal čije će dimenzije zapravo govoriti o različitim morfološkim karakteristikama izvornog signala. Skup vrijednosti takve višedimenzionalne reprezentacije izvornog signala koji odgovara točno jednom trenutku u izvornoj seriji nazvat ćemo "Vektor percepcije" [image] . Vektor percepcije [image] u trenutku T sastoji se od skupa vrijednosti m1T ;m2T ; ….mnT ϵ M koje sažimaju morfologiju signala u okolini točke u trenutku T. A time series represents the value of one continuous variable over time. The fact that one and the same property is measured at different times means that the value measured at one time is directly dependent on the value measured at an earlier time. This property of time series prevents the application of standard statistical methods, given that measurements at different moments (observations) are not mutually independent. This mutual dependence of all observations will be the basis for the development of the new method presented below. As described earlier, human interpretation of time series is manifested in the description of the shapes we notice, that is, segments of the series or, more precisely, adjacent observations and the interrelationship of their values. Modern findings from the field of neuroscience prove that the neurons in the brain of higher mammals function in a similar way, that is, that groups of neurons behave in harmony and that their activation takes place simultaneously when precisely defined patterns are found in the subject's field of vision. These ideas were the starting point for the development of a concept that would somehow encompass the basic morphological characteristics of time series segments. First of all, it was necessary to decide how the parts of the time series would be segmented, and then to devise a way to describe the morphology of those segments. Inflection points were natural for segmentation, and morphological characteristics were tried to be summarized in a set of simple and fast calculations. The concept of using specialized neurons in describing the time series is shown in Figure 5. Both the segmentation and the morphological description of the series take place in the time domain, i.e. on the original series without converting it to the frequency domain or applying some other transformation (eg, wavelet). The signal from the time domain will be transformed by a series of mentioned calculations in such a way that a multidimensional signal is obtained from a one-dimensional signal, the dimensions of which will actually speak about the different morphological characteristics of the original signal. We will call the set of values of such a multidimensional representation of the original signal corresponding to exactly one moment in the original series "Perception vector" [image] . The perception vector [image] at time T consists of a set of values m1T ;m2T ; ….mnT ϵ M which summarize the morphology of the signal in the vicinity of the point at time T.

[image] [image]

U smislu strojnog učenja, vektor percepcije [image] će predstavljati transformiranu seriju u trenutku T što je de facto skup varijabli (značajki) i uz njega se može vezati klasa K koja će biti korištena kao zavisna ili ciljna varijabla u nadziranom učenju ili statističkom modeliranju ili se pak može razmatrati samostalno kao opservacija izvorne serije u trenutku T te se koristiti kao zasebna opservacija u nenadziranom učenju. Transformacija koja će popuniti sve dimenzije vektora [image] će istovremeno odraditi segmentiranje serije i popunjavanje vektora odnosno njegovih komponenti iz skupa M na način da pojedini elementi skupa M predstavljaju morfologiju u bližoj okolini M1, a drugi elementi predstavljaju morfologiju u daljoj okolini M2 istog segmenta, [image] i [image] . U nastavku je opisana transformacija za kreiranje skupa M. Kako bi se "uhvatila" morfologija signala i popunili svi elementi skupa M, transformacija originalne serije će se odvijati slijedno u smjeru prolaska vremena odnosno dolaska opservacija (snimanja ili mjerenja). To znači da će u različitim trenucima t1 i t2 biti popunjena dva vektora VPt1 i VPt2. Transformacija će obuhvatiti morfološke karakteristike nizom kumulativnih proračuna koji su u opisani u nastavku, a navedeno se može implementirati u realnom vremenu pomoću metode pomičnih prozora (engl. sliding window). Kako je vidljivo na slici 7, postoje konstantni pomaci prozora u kojem se promatra vremenska serija pa figurativno kažemo da prozor "klizi" kroz vrijeme. Kako će biti prikazano kasnije, transformacija se ne mora odvijati samo u jednom smjeru (smjeru prolaska vremena) već se ekvivalentan postupak može obaviti i u suprotnom smjeru generirajući elemente skupa M, stvarajući podskup MRev. Tako popunjeni novonastali skup morfoloških karakteristika možemo nazvati metaskupom MM koji je popunjen iz oba smjera tj. [image] i [image] . U takvom slučaju implementacija ne bi bila moguća u realnom vremenu, već u približno realnom vremenu metodom skačućih prozora (engl. hopping window). Slika 8 prikazuje prozore u kojima se promatra vremenska serija, a koji nisu konstantnog razmaka već svaki sljedeći počinje od neke specifične točke vremenske serije detektirane u prethodnom prozoru (ekstrem ili infleksija). Figurativno rečeno, kažemo da prozor "skače" kroz vrijeme. Praktično, ovaj koncept bi bio ekvivalentan procesu u kojem liječnik koji promatra EKG signal u realnom vremenu, prije donošenja svog suda, pričeka da se iscrta cijeli QRS kompleks na ekranu. Nakon toga, liječnik fokusira svoj pogled na iscrtani val (najprominentniju infleksiju - ekstrem tj. vrh vala) te na temelju oblika cijelog vala (lijeva i desna okolina) donosi sud tj. dijagnozu. Opisani koncept prikazan je na slici 9. U nastavku je opisan postupak kreiranja vektora percepcije generiranjem skupova iz jednog i iz drugog smjera, obzirom da se razmatra problem raspoznavanja oblika pri čemu je bitno sagledati cijeli oblik tj. njegovu lijevu i desnu stranu. Radi jednostavnosti, u nastavku teksta koristit ćemo oznaku M kao oznaku za metaskup MM morfoloških karakteristika (i lijevi i desni smjer transformacije), odnosno skup vrijednosti koje čine vektor [image] Transformacija u svrhu popunjavanja elemenata vektora [image] može se odvijati na izvornom signalu tj. izvornoj seriji ili na seriji koja je prethodno pretprocesirana (transformirana) ili reuzorkovana nekom drugom metodom, npr. minmax transformacijom, z-skaliranjem ili reuzorkovanjem (interpolacijom). Razlog iz kojeg bi tako nešto bilo potrebno je usporedivost serija iz različitih izvora. Primjerice, ako se signali sa različitih mjernih uređaja žele uspoređivati ili obrađivati istim algoritmom. Kada su signali izrađeni u različitim skalama tj. amplitudama može se primijeniti min-max transformacija In terms of machine learning, the perception vector [image] will represent a transformed series at time T which is a de facto set of variables (features) and can be associated with a class K that will be used as a dependent or target variable in supervised learning or statistical modeling or can be considered independently as an observation of the original series at time T and used as a separate observation in unsupervised learning. The transformation that will fill all the dimensions of the vector [image] will simultaneously segment the series and fill in the vector or its components from the set M in such a way that some elements of the set M represent the morphology in the immediate vicinity of M1, and other elements represent the morphology in the further vicinity of M2 of the same segment, [image] and [image] . The transformation for creating set M is described below. In order to "capture" the morphology of the signal and fill all the elements of set M, the transformation of the original series will take place sequentially in the direction of the passage of time or the arrival of observations (recordings or measurements). This means that at different times t1 and t2 two vectors VPt1 and VPt2 will be filled. The transformation will encompass morphological characteristics through a series of cumulative calculations described below, and the above can be implemented in real time using the sliding window method. As can be seen in Figure 7, there are constant shifts of the window in which the time series is observed, so we figuratively say that the window "slides" through time. As will be shown later, the transformation does not have to take place only in one direction (the direction of the passage of time), but an equivalent procedure can be performed in the opposite direction, generating the elements of the set M, creating the subset MRev. The newly created set of morphological characteristics filled in this way can be called the metaset MM which is filled from both directions, i.e. [image] and [image] . In such a case, implementation would not be possible in real time, but in approximately real time using the hopping window method. Figure 8 shows the windows in which the time series is observed, which are not of constant spacing, but each subsequent one starts from some specific point of the time series detected in the previous window (extreme or inflection). Figuratively speaking, we say that the window "jumps" through time. Practically, this concept would be equivalent to the process in which a doctor watching the ECG signal in real time, before making his judgment, waits for the whole QRS complex to be drawn on the screen. After that, the doctor focuses his gaze on the drawn wave (the most prominent inflection - the extreme, i.e. the top of the wave) and based on the shape of the entire wave (left and right surroundings), makes a judgment, i.e. a diagnosis. The described concept is shown in Figure 9. The procedure for creating a perception vector by generating sets from one and the other direction is described below, given that the problem of shape recognition is being considered, where it is important to look at the entire shape, i.e. its left and right sides. For the sake of simplicity, in the rest of the text we will use the label M as a label for the metaset MM of morphological characteristics (both the left and right direction of transformation), i.e. the set of values that make up the vector [image] The transformation for the purpose of filling the elements of the vector [image] can take place on the original signal i.e. the original series or on a series that has previously been preprocessed (transformed) or resampled by some other method, eg minmax transformation, z-scaling or resampling (interpolation). The reason why such a thing would be necessary is the comparability of series from different sources. For example, if signals from different measuring devices are to be compared or processed with the same algorithm. When the signals are made in different scales, i.e. amplitudes, the min-max transformation can be applied

[image] [image]

ili z-skaliranje (svođenje na jedinice standardne devijacije) or z-scaling (reduction to standard deviation units)

[image] [image]

gdje je μ srednja vrijednost, a б standardna devijacija amplitude signala. Min-max transformacija omogućuje proizvoljnu skalu, ali joj je mana što moramo znati minimalne i maksimalne vrijednosti serije koju mjerimo što je često kod implementacija u realnom vremenu nemoguće. U takvim situacijama možemo periodično računati standardnu devijaciju amplitude signala i izražavati izvornu seriju u jedinicama standardne devijacije preko navedene transformacije z-skaliranja. Ovakva transformacija je potrebna kod problema promjenjivih amplituda QRS kompleksa. Ukoliko su serije koje želimo obrađivati diskretizirane različitim frekvencijama, tada je prije transformacije u vektore percepcije potrebno učiniti reuzorkovanje s obzirom da vrijednosti pojedinih vektora [image] direktno ovise o broju opservacija koje su obuhvaćene određenim transformacijskim ciklusom što će biti pojašnjeno u nastavku. Ovo istraživanje obuhvaćalo je podatke iz iste referentne baze te stoga ovakve transformacije nije bilo potrebno provoditi, osim kod izračuna prirasta funkcije tj. kutnog otklona (angle) što će biti pojašnjeno u nastavku, a normalizirana vrijednost je također pohranjena u samostalnu varijablu normalized kao komponenta vektora [image] . where μ is the mean value and b is the standard deviation of the signal amplitude. The min-max transformation allows an arbitrary scale, but it has the disadvantage that we need to know the minimum and maximum values of the series we are measuring, which is often impossible in real-time implementations. In such situations, we can periodically calculate the standard deviation of the signal amplitude and express the original series in standard deviation units via the specified z-scaling transformation. This kind of transformation is necessary for the problem of variable amplitudes of the QRS complex. If the series we want to process are discretized with different frequencies, then before transformation into perception vectors it is necessary to resampling, considering that the values of individual vectors [image] directly depend on the number of observations that are included in a certain transformation cycle, which will be explained below. This research included data from the same reference base, and therefore it was not necessary to carry out such transformations, except when calculating the increment of the function, i.e. the angular deflection (angle), which will be explained below, and the normalized value is also stored in the independent variable normalized as a component of the vector [image] .

Razmatrajući način na koji ljudska percepcija možda funkcionira, a na temelju iskustvenih spoznaja (lingvistička razmatranja i saznanja iz neuroznanosti), kao razdjelnici pojedinih segmenata uzete su točke infleksije signala. Metode diferencijalnog računa omogućavaju pronalazak ekstrema i točaka infleksije primjenom prve i druge derivacije, odnosno mjerenjem kuta tangente na krivulju u trenutku T Considering the way in which human perception might function, and based on experiential knowledge (linguistic considerations and knowledge from neuroscience), signal inflection points were taken as dividers of individual segments. The methods of differential calculus make it possible to find extrema and inflection points by applying the first and second derivatives, i.e. by measuring the angle of the tangent to the curve at the moment T

[image] [image]

odnosno, kada Δx teži nuli that is, when Δx tends to zero

[image] [image]

Kako se radi o digitalnim odnosno digitaliziranim signalima, točke infleksije nije moguće izračunati deriviranjem kontinuirane funkcije već su one aproksimirane na slijedeći način kako je Since these are digital or digitized signals, the inflection points cannot be calculated by deriving a continuous function, but they are approximated as follows:

[image] [image]

gdje x može biti i normiran na raspon 0-1, a ΔX iznosi 1 pa nazivnika niti nema. Upravo na taj način izračunata je i prva komponenta vektora [image] koju ćemo u nastavku nazivati angle (engl. angle za riječ kut). Alternativno, kada znamo da je frekvencija uzorkovanja prilikom diskretizacije izvornog signala bila konstantna, derivaciju u trenutku T možemo razmatrati kao kutni otklon sekante krivulje koja prolazi točkama T − 1 i T + 1 s obzirom da su takva sekanta i tangenta u točki T paralelne: where x can be normalized to the range 0-1, and ΔX is 1, so there is no denominator. This is exactly how the first component of the vector [image] was calculated, which we will refer to below as angle (angle for the word angle). Alternatively, when we know that the sampling frequency during discretization of the original signal was constant, the derivative at time T can be considered as the angular deflection of the secant of the curve that passes through the points T − 1 and T + 1, given that such secant and tangent at point T are parallel:

[image] [image]

Nailazak procesa transformacije na ekstrem funkcije za sobom će povlačiti logiku resetiranja pripadajućih kumulativa te time odražavati morfološke karakteristike signala. Osim samih ekstrema, kao razdjelnici za detaljnije morfološke karakteristike (kumulative) uzete su finije promjene u trendu tj. promjene u detaljnom trendu. Na ovaj način omogućeno je pohranjivanje morfoloških karakteristika signala za bliže i dalje "susjedstvo". Kao identifikator tih promjena u trendu preuzet je sustav granica, odnosno raspona između vrijednosti kutnog otklona, opisan u REF II modelu s tim da je osim gradacije opisane REF II modelom, uvedena i općenitija komponenta trenda kako je prikazano u tablici 1 u nastavku. The encounter of the transformation process with the extremum of the function will entail the logic of resetting the associated cumulatives, thus reflecting the morphological characteristics of the signal. In addition to the extremes themselves, finer changes in the trend, i.e. changes in the detailed trend, were taken as dividers for more detailed morphological characteristics (cumulative). In this way, it is possible to store the morphological characteristics of the signal for the near and far "neighborhood". As an identifier of these changes in the trend, the system of limits, i.e. the range between the values of the angular deflection, described in the REF II model was adopted, with the fact that, in addition to the gradation described by the REF II model, a more general component of the trend was also introduced, as shown in table 1 below.

[image] [image]

Tablica 1: Rasponi vrijednosti za oznake trenda i detaljnog trenda (REF) Table 1: Value ranges for trend and detailed trend (REF) labels

Vrijednost granica g1-g6 odabrana je empirijski. Vrijednosti granica koje su primijenjene u predmetnom izumu prikazane su u tablici 1. Različite vrijednosti granica odgovaraju za različite vrste problema koji se nastoje riješiti opisanom metodom. Ovakvi rasponi odabrani su na način da dobro obuhvaćaju razlike u strmim dijelovima QRS kompleksa EKG signala različitih patologija. Nije isključeno da navedene granice nisu optimalne. Tablica 2 prikazuje granice za kutni otklon primijenjene u istraživanju. The value of the limits g1-g6 was chosen empirically. The limit values applied in the subject invention are shown in Table 1. Different limit values correspond to different types of problems that are attempted to be solved by the described method. These ranges were chosen in such a way that they well encompass the differences in the steep parts of the QRS complex of the ECG signals of different pathologies. It is possible that the specified limits are not optimal. Table 2 shows the limits for angular deflection applied in the research.

[image] [image]

Tablica 2 Table 2

Zahvaljujući aproksimaciji prve derivacije možemo pronaći lokalne ekstreme odnosno minimume i maksimume krivulje. Nadalje, drugom derivacijom možemo pronaći točke infleksije. Thanks to the approximation of the first derivative, we can find the local extremes, that is, the minima and maxima of the curve. Furthermore, by the second derivative we can find the inflection points.

[image] [image]

Ovime je popunjena još jedna komponenta vektora [image] koju ćemo nazvati stacionarna točka ili prema engleskom nazivu stationary_point. This fills in another component of the vector [image], which we will call the stationary point or according to the English name stationary_point.

Iz REF II modela preuzeta je još i površina ispod krivulje, a dodatno su u vektor [image] uvrštene ostale morfološke komponente opisane u nastavku. Navedene komponente su izvorni doprinos kod predmetnog izuma. Temeljna ideja izračuna površine proizlazi iz izračuna određenog integrala odnosno pravila trapezoida. Za općeniti slučaj funkcije nad n točaka (x1; f(x1)), (x2; f(x2)), …., (xn; f(xn)), gdje su [image] poredani rastuće, približna vrijednost integrala [image] je određena sa The area under the curve was also taken from the REF II model, and the other morphological components described below were additionally included in the vector [image]. The listed components are the original contribution to the subject invention. The basic idea of calculating the area comes from the calculation of a definite integral, or the trapezoid rule. For the general case of a function over n points (x1; f(x1)), (x2; f(x2)), ...., (xn; f(xn)), where [image] are arranged in ascending order, the approximate value of the integral [image ] is determined by

[image] [image]

Kako su u slučaju signala diskretiziranog jednolikom frekvencijom 1, tada površinu između dvije točke (T i T − 1) jednostavno možemo aproksimirati formulom As in the case of a signal discretized with a uniform frequency of 1, then the area between two points (T and T − 1) can be simply approximated by the formula

[image] [image]

Ovime je popunjena i treća komponenta [image] koju ćemo nazivati surface. Dodatna morfološka karakteristika koju ćemo uključiti je i konkavnost krivulje. Na diskretnom signalu konkavnost krivulje u točki T možemo doznati promatranjem vrijednosti točke prije i nakon trenutka T te odnosa vrijednosti u toj točki i susjednih vrijednosti. U tu svrhu uvest ćemo pojam sredina koji će označavati fiktivnu srednju vrijednost između točaka koje su prethodna i sljedeća u odnosu na točku T. Vrijednost sredina ćemo izračunati kao interpolaciju između prethodne i sljedeće točke odnosno vrijednost koju bi navedena točka poprimila u slučaju kada bi krivulja zapravo bila pravac koji prolazi točkama T − 1 i T − 2. Kako ćemo transformaciju računati u jednom prolazu kroz vrijednosti vremenskog niza, svi izračuni provodit će se za prethodnu točku (T −1 izračunato na temelju susjednih točaka T i T − 2). This fills in the third component [image], which we will call surface. An additional morphological characteristic that we will include is the concavity of the curve. On a discrete signal, we can find out the concavity of the curve at point T by observing the value of the point before and after the moment T and the relationship between the value at that point and the neighboring values. For this purpose, we will introduce the term midpoint, which will denote a fictitious mean value between the points that are previous and following in relation to point T. We will calculate the midpoint value as an interpolation between the previous and next point, that is, the value that the specified point would take in the event that the curve actually was the direction passing through the points T − 1 and T − 2. As we will calculate the transformation in one pass through the values of the time series, all calculations will be performed for the previous point (T −1 calculated on the basis of the adjacent points T and T − 2).

[image] [image]

Nadalje, razliku prethodne vrijednosti u odnosu na izračunatu sredinu, Δ sredina definirat ćemo kao Furthermore, we will define the difference of the previous value in relation to the calculated mean, Δ mean as

[image] [image]

a razliku trenutne (stvarne) vrijednosti niza i prethodno izračunate sredine, Δ stvarni kao and the difference between the current (real) value of the series and the previously calculated mean, Δ real as

[image] [image]

Izračunate vrijednosti sredine stavit ćemo u omjer te ćemo dobivenu vrijednost nazvati koeficijentom konkavnosti We will put the calculated values of the middle in the ratio, and we will call the obtained value the concavity coefficient

[image] [image]

Na temelju odnosa sredine i vrijednosti u prethodnom trenutku (t−1), odredit ćemo konkavnost krivulje u trenutku t −1: Based on the relationship between the mean and the value at the previous moment (t−1), we will determine the concavity of the curve at the moment t−1:

[image] [image]

Izračunati koeficijent čini komponentu vektora [image] koju ćemo nazvati coefficient_concavity te ćemo kao u slučaju trenda napraviti diskretizaciju ove varijable dijeljenjem u razrede kako je prikazano u tablici 2. Ovime je popunjena još jedna komponenta vektora [image] koja označava detaljnu konkavnost, a nazvat ćemo ju prema engleskom prijevodu concavity_detail. U slučaju linearne krivulje, odnosno nepostojanja konkavnosti, će varijable za konkavnost i detaljnu konkavnost poprimiti vrijednost 0. Kao i kod granica za trendove, granice za konkavnosti odabrane su proizvoljno na temelju empirijskih zaključaka. Također i u slučaju ovih granica, moguće je osmisliti model za njihovu optimizaciju. The calculated coefficient forms a component of the vector [image], which we will call coefficient_concavity, and as in the case of the trend, we will discretize this variable by dividing it into classes as shown in table 2. This fills in another component of the vector [image], which indicates the detailed concavity, and we will call ju according to the English translation concavity_detail. In the case of a linear curve, i.e. no concavity, the concavity and detailed concavity variables will take the value 0. As with the limits for trends, the limits for concavities are chosen arbitrarily based on empirical conclusions. Also in the case of these limits, it is possible to design a model for their optimization.

Temelj metode kreiranja vektora percepcije [image] je nastojanje da se jednodimenzionalna krivulja odnosno njezin oblik opiše pomoću višedimenzionalnih vektora koji će biti jedinstveni za svaku točku krivulje (vremenske serije), a opet slični drugim vektorima koji predstavljaju one točke krivulje čije su okoline po obliku međusobno slične. Predstavljeno je takvo nastojanje odnosno predložen je postupak kreiranja vektora [image] . Postupak koji je opisan u nastavku prema saznanjima autora nije opisan ranije u literaturi i predstavlja izvorni doprinos. Ideja da se obuhvati morfologija signala "u susjedstvu" promatrane točke kojoj je pridružen vektor [image] podrazumijeva tzv. dinamičke kumulative koji zapravo jednostavnim i brzim matematičkim kalkulacijama pokušavaju uhvatiti točno određene morfološke karakteristike odsječaka krivulje. Ovi kumulativi su dinamički zbog toga što trajanje njihovog kumuliranja nije fiksno već je određeno trendom i detaljnim trendom na odsječku krivulje koji se obrađuje. Tako razlikujemo kumulative koji se računaju za vrijeme trajanja trenda i one koji se računaju za vrijeme trajanja detaljnog trenda. Diskretizacija varijable kutnog otklona u ove dvije varijable trenda bila je važna upravo zbog ovih kumulativa tj. kako bi se postigla "otpornost" na male promjene u trendu i omogućilo adekvatno akumuliranje varijabli. U nastavku je opisan generalni proces akumuliranja vrijednosti koji je moguće primijeniti na bilo koju kontinuiranu varijablu iz transformacije. The basis of the method of creating a vector of perception [image] is the effort to describe a one-dimensional curve or its shape using multidimensional vectors that will be unique for each point of the curve (time series), yet similar to other vectors that represent those points of the curve whose surroundings are similar in shape to each other alike. Such an effort was presented, that is, the procedure for creating a vector [image] was proposed. The procedure described below, to the best of the author's knowledge, has not been previously described in the literature and represents an original contribution. The idea to include the morphology of the signal "in the neighborhood" of the observed point to which the vector [image] is associated implies the so-called dynamic cumulatives that actually try to capture the exact morphological characteristics of the curve sections with simple and quick mathematical calculations. These cumulatives are dynamic because the duration of their cumulation is not fixed but is determined by the trend and the detailed trend on the segment of the curve being processed. Thus, we distinguish between cumulatives that are calculated for the duration of the trend and those that are calculated for the duration of the detailed trend. The discretization of the angular deflection variable into these two trend variables was important precisely because of these cumulatives, ie to achieve "resistance" to small changes in the trend and enable adequate accumulation of the variables. The general process of accumulating values that can be applied to any continuous variable from a transformation is described below.

[image] [image]

Tablica 3: Rasponi vrijednosti za oznake konkavnosti i detaljne konkavnosti Table 3: Value ranges for concavity labels and detailed concavities

[image] [image]

Tablica 4: Granice za konkavnost primijenjene u istraživanju Table 4: Limits for concavity applied in the research

Postupak izračuna kumulativa komponente x vektora [image] za vrijeme trajanja trenda računa se na sljedeći način. Ako je trend u točki T jednak trendu u točki T-1, tada se trajanje trenda poveća za jedan te se kumulativ poveća za vrijednost varijable x. Ako trend u točki T nije jednak trendu u točki T-1 tada se trajanje trenda postavlja na 1, a kumulativ na vrijednost varijable x. Ako je detaljni trend u točki T jednak detaljnom trendu u točki T-1, tada se trajanje detaljnog trenda poveća za jedan te se detaljni kumulativ poveća za vrijednost varijable x. Ako detaljni trend u točki T nije jednak detaljnom trendu u točki T-1 tada se trajanje detaljnog trenda postavlja na 1, a detaljni kumulativ na vrijednost varijable x. The procedure for calculating the cumulative component of the x vector [image] for the duration of the trend is calculated as follows. If the trend at point T is equal to the trend at point T-1, then the trend duration is increased by one and the cumulative is increased by the value of the variable x. If the trend at point T is not equal to the trend at point T-1 then the trend duration is set to 1 , and cumulative to the value of the variable x. If the detailed trend at point T is equal to the detailed trend at point T-1, then the duration of the detailed trend is increased by one and the detailed cumulative is increased by the value of the variable x. If the detailed trend at point T is not equal detailed trend at point T-1 then the duration of the detailed trend is set to 1, and the detailed cumulative to the value of the variable x.

Osim akumuliranja za vrijeme trajanja trenda, vektor [image] popunit ćemo i komponentama koje govore o morfologiji za vrijeme trajanja stupnja konkavnosti. In addition to accumulating for the duration of the trend, we will also fill the vector [image] with components that speak about the morphology for the duration of the degree of concavity.

Postupak izračuna kumulativa komponente x vektora [image] za vrijeme trajanja stupnja konkavnosti računa se na sljedeći način. Ako je stupanj konkavnosti u točki T jednak stupnju konkavnosti u točki T-1, tada se trajanje stupnja konkavnosti poveća za jedan te se kumulativ poveća za vrijednost varijable x. Ako stupanj konkavnosti u točki T nije jednak stupnju konkavnosti u točki T-1 tada se trajanje stupnja konkavnosti postavlja na 1, a kumulativ na vrijednost varijable x. Ako je detaljni stupanj konkavnosti u točki T jednak detaljnom stupnju konkavnosti u točki T-1, tada se trajanje detaljnog stupnja konkavnosti poveća za jedan te se detaljni kumulativ poveća za vrijednost varijable x. Ako detaljni stupanj konkavnosti u točki T nije jednak detaljnom stupnju konkavnosti u točki T-1 tada se trajanje detaljnog stupnja konkavnosti postavlja na 1, a detaljni kumulativ na vrijednost varijable x. The procedure for calculating the cumulative x component of the vector [image] for the duration of the degree of concavity is calculated as follows. If the degree of concavity at point T is equal to the degree of concavity at point T-1, then the duration of the degree of concavity is increased by one and the cumulative value is increased by the value of the variable x. If the degree of concavity at point T is not equal to the degree of concavity at point T-1 then sets the duration of the degree of concavity to 1 and the cumulative to the value of the variable x. If the detailed degree of concavity at point T is equal to the detailed degree of concavity at point T-1, then the duration of the detailed degree of concavity is increased by one and the detailed cumulative is increased by the value of the variable x If the detailed degree of concavity at point T is not equal to the detailed degree of concavity at point T-1, then the duration of the detailed degree of concavity is set to 1, and the detailed cumulative to the value of the variable x.

Na ovaj način izračunate su sljedeće varijable i kumulativi: trajanje trenda (duration_trend), kumulativ površine (cum_surface), kumulativ konkavnosti za vrijeme trajanja trenda (cum_concavity_trend), kumulativ promjene površine (cum_change_surface_trend), kumulativ kutnog otklona (cum_angle), trajanje detaljnog trenda (duration_ref), kumulativna promjena površine za vrijeme trajanja detaljnog trenda (cum_change_surface_ref), kumulativ kutnog otklona za vrijeme trajanja detaljnog trenda (cum_angle_detail), kumulativ koeficijenta konkavnosti za vrijeme trajanja stupnja konkavnosti (cum_concavity), kumulativ koeficijenta konkavnosti za vrijeme trajanja detaljnog stupnja konkavnosti (cum_concavity_detail), kumulativ koeficijenta konkavnosti za vrijeme trajanja trenda (cum_concavity_trend). Detaljne kumulative moguće je napraviti i za varijable kumulativne površine, kumulativnog koeficijenta konkavnosti itd. The following variables and cumulatives were calculated in this way: duration of the trend (duration_trend), cumulative surface (cum_surface), cumulative concavity for the duration of the trend (cum_concavity_trend), cumulative change of the surface (cum_change_surface_trend), cumulative angular deflection (cum_angle), duration of the detailed trend ( duration_ref), cumulative surface change for the duration of the detailed trend (cum_change_surface_ref), cumulative angular deflection for the duration of the detailed trend (cum_angle_detail), cumulative concavity coefficient for the duration of the concavity degree (cum_concavity), cumulative concavity coefficient for the duration of the detailed concavity degree (cum_concavity_detail) ), cumulative concavity coefficient for the duration of the trend (cum_concavity_trend). Detailed cumulatives can also be made for the variables of cumulative area, cumulative concavity coefficient, etc.

Pored gore opisanih kumulativa, osmišljena je još jedna nelinearna metoda za kreiranje značajki odnosno komponenti vektora [image] koja za cilj ima opisivanje morfologije krivulje u okolini pojedine točke serije. S obzirom da navedeni izračun za obrađivanu vrijednost x direktno ovisi o trajanju izračuna (broju točaka serije koje su ušle u izračun) i o obliku krivulje (izračun se vrši na temelju morfoloških karakteristika), dobivene vrijednosti nazvane su "vremensko morfološke determinante" (skraćeno VMD). Za razliku od dinamičkih kumulativa koji daju određenu informaciju o tome kakav oblik se nalazi u neposrednoj okolini, cilj vremensko morfoloških determinanti je da daju i informaciju gdje se neki oblik nalazi te dodatno da što je oblik dalje u prošlosti (budućnosti) njegova devijacija (morfologija) manje utječe na vrijednost VMD varijable u promatranom trenutku. In addition to the cumulatives described above, another non-linear method was devised for creating features or components of the vector [image], which aims to describe the morphology of the curve in the vicinity of a particular point of the series. Given that the specified calculation for the processed value x directly depends on the duration of the calculation (the number of points of the series included in the calculation) and on the shape of the curve (the calculation is performed on the basis of morphological characteristics), the obtained values are called "temporal morphological determinants" (abbreviated VMD) . In contrast to dynamic cumulatives that provide certain information about what shape is in the immediate environment, the goal of temporal morphological determinants is to provide information about where a shape is located and additionally to give the shape's deviation (morphology) further in the past (future). less affects the value of the VMD variable at the observed moment.

Postupak izračuna VMD vrijednosti komponente x vektora [image] za vrijeme trajanja trenda računa se na sljedeći način. Ako je trend u točki T jednak trendu u točki T-1, tada se trajanje VMD brojača poveća za jedan te se VMD komponenta poveća za umnožak varijable x i VDM brojača. Ako trend u točki T nije jednak trendu u točki T-1 tada se VMD brojač postavlja na 1, a VMD komponenta na vrijednost varijable x. Ako je detaljni trend u točki T jednak detaljnom trendu u točki T-1, tada se vrijednost detaljnog VMD brojača poveća za jedan te se detaljna VMD komponenta poveća za umnožak varijable x i detaljnog VDM brojača. Ako detaljni trend u točki T nije jednak detaljnom trendu u točki T-1 tada se vrijednost detaljnog VMD brojača postavlja na 1, a detaljna VMD komponenta na vrijednost varijable x. The procedure for calculating the VMD value of the component x of the vector [image] for the duration of the trend is calculated as follows. If the trend at point T is equal to the trend at point T-1, then the duration of the VMD counter is increased by one and the VMD component is increased by the product of the variable x and the VDM counter. If the trend at point T is not equal to the trend at point T-1, then the VMD counter is set to 1, and the VMD component is set to the value of variable x. If the detailed trend at point T is equal to the detailed trend at point T-1, then the value of the detailed VMD counter is increased by one and the detailed VMD component is increased by the product of the variable x and the detailed VDM counter. If the detailed trend at point T is not equal to the detailed trend at point T-1 then the value of the detailed VMD counter is set to 1 and the detailed VMD component is set to the value of variable x.

Ako promatramo neki ekstrem krivulje i pretpostavimo da je trend prije tog ekstrema trajao neko izvjesno vrijeme, zahvaljujući linearnom povećanju vmd_brojaca i njegovom množenju s ciljanom vrijednošću x razvijamo nelinearnost. Točnije, možemo uočiti da zahvaljujući ovoj nelinearnosti (množenju), morfološke promjene bliže promatranoj točki imaju veći utjecaj na vremensko morfološku determinantu od onih koje su "dalje u prošlosti" (tj. lijevo ili desno). Na prikazani način kreirane su sljedeće vremensko morfološke determinante (VMD): VMD promjene površine za vrijeme trajanja trenda (cum_vmd_surface_trend), VMD prve derivacije za vrijeme trajanja trenda (cum_vmd_trend), VMD koeficijenta konkavnosti za vrijeme trajanja trenda (cum_vmd_concavity_trend), VMD promjene površine za vrijeme trajanja detaljnog trenda (cum_vmd_surface_ref), VMD prve derivacije za vrijeme trajanja trenda (cum_vmd_ref), VMD koeficijenta konkavnosti za vrijeme trajanja trenda (cum_vmd_concavity_ref). Kod transformacije unatrag generiraju se iste značajke sa prefiksom rev. Kao i kod dinamičkih kumulativa, mogućnost kreiranja novih vremensko morfoloških determinanti je ograničena samo kreativnošću istraživača. Cjelokupna opisana transformacija izvršena je i unatrag, a popis kreiranih značajki (varijabli ili komponenata vektora [image] ) prikazan je u tablici u nastavku. If we observe an extreme of the curve and assume that the trend before that extreme lasted for some time, thanks to the linear increase of vmd_counts and its multiplication with the target value x, we develop nonlinearity. More precisely, we can observe that thanks to this nonlinearity (multiplication), morphological changes closer to the observed point have a greater influence on the temporal morphological determinant than those "further in the past" (i.e. left or right). The following temporal morphological determinants (VMD) were created in the manner shown: VMD of surface changes for the duration of the trend (cum_vmd_surface_trend), VMD of the first derivative for the duration of the trend (cum_vmd_trend), VMD of the concavity coefficient for the duration of the trend (cum_vmd_concavity_trend), VMD of surface changes for duration of the detailed trend (cum_vmd_surface_ref), VMD of the first derivative for the duration of the trend (cum_vmd_ref), VMD of the coefficient of concavity for the duration of the trend (cum_vmd_concavity_ref). In backward transformation, the same features are generated with the rev prefix. As with dynamic cumulatives, the possibility of creating new time-morphological determinants is limited only by the creativity of the researcher. The entire described transformation was also performed backwards, and the list of created features (variables or vector components [image] ) is shown in the table below.

[image] [image]

Tablica 5: Varijable transformacije primijenjene u istraživanju Table 5: Transformation variables applied in the research

Slika 4 prikazuje nekoliko odsječaka krivulje sa ilustriranim morfološkim razlikama koje se pokušavaju obuhvatiti vektorima kao rezultat transformacije. Istaknute su razlike u nagibu i površini (a), konkavnosti (b), dinamičkim kumulativima (c) i VMD komponentama vektora (d). U svrhu ilustracije ideje za pojedine komponente vektora, na slici 3 je prikazano nekoliko parova krivulja. Prvi par krivulja (na slici a) ima očite razlike u nagibu i površini ispod krivulje. Drugi par krivulja (na slici b) prikazuje naglašene razlike u konkavnosti. Treći par (na slici c) prikazuje dvije sigmoidalne krivulje koje se nakon obrade opisanom transformacijom uvelike razlikuju u komponentama dinamičkih kumulativa. Četvrti par krivulja (na slici označeno d) prikazuje dva slična odsječka koji imaju sličan "trbuh", ali koji se nalazi na različitim dijelovima krivulja. Ovo je obuhvaćeno VMD komponentama vektora [image] . Figure 4 shows several sections of the curve with illustrated morphological differences that are attempted to be captured by vectors as a result of the transformation. Differences in slope and surface (a), concavity (b), dynamic cumulatives (c) and VMD vector components (d) are highlighted. In order to illustrate the idea for individual components of the vector, several pairs of curves are shown in Figure 3. The first pair of curves (in Figure a) have obvious differences in slope and area under the curve. The second pair of curves (in figure b) shows pronounced differences in concavity. The third pair (in figure c) shows two sigmoidal curves which, after processing with the described transformation, differ greatly in the components of dynamic cumulatives. The fourth pair of curves (marked d in the figure) shows two similar segments that have a similar "belly", but located on different parts of the curves. This is covered by the VMD components of the vector [image] .

TESTIRANJE METODE METHOD TESTING

U svrhu provedbe ovog istraživanja korišten je dobro poznati postojeći skup podataka – MITBIH Arrhythmia Database (u nastavku MIT-BIH AD). Navedena baza podataka je naširoko prihvaćena i većina radova iz područja prijavljuje rezultate nad tom bazom što ju čini dobrim "benchmarkom" za usporedbu raznih pristupa i algoritama. MIT-BIH AD baza nastala je suradnjom MIT-a (Massachusets Institute od Technology) i Beth-Israel Hospital (BIH) bolnice u Bostonu, SAD. Baza je razvijena kao standardni test za evaluaciju algoritama za računalnu obradu EKG-a. Baza sadrži 48 polusatnih EKG zapisa izuzetih od 47 pacijenata. Signali su digitalizirani frekvencijom od 360 Hz. Zapisi su odabrani na način da 23 sadrže normalni sinusni ritam (NSR) i reprezentativne aritmije, dok ostalih 25 sadrži rjeđe, ali klinički značajne patološke EKG valove. Baza je anotirana, što znači da su uz signal pridružene i oznake pojedinih EKG artefakata. Ove anotacije su nastale tako da su dva ili više kardiologa neovisno označavali signale te potom usporedili i konsenzusom finalno usuglasili oznake. Svaki signal sadrži podatke iz dvaju EKG odvoda. U 45 slučajeva prvi odvod je modificirani odvod na ruci II (MLII), a drugi odvod je obično modificirani odvod V1 (ponekad V2, V5 i u jednom slučaju V4). U ostala tri slučaja, prvi odvod je V5 i drugi odvod je V2 (dva zapisa) ili MLII (zapis 114 ima obrnute odvode). Oblici pojedinih valova drugačiji su u prvom i drugom odvodu. Općenito, može se uočiti kako normalni sinusni ritam ima jasnije valove u prvom, a neka patološka stanja u drugom odvodu. For the purpose of conducting this research, a well-known existing data set - MITBIH Arrhythmia Database (hereinafter MIT-BIH AD) was used. The mentioned database is widely accepted and most works in the field report results on this database, which makes it a good "benchmark" for comparing various approaches and algorithms. The MIT-BIH AD base was created through the collaboration of MIT (Massachusetts Institute of Technology) and Beth-Israel Hospital (BIH) in Boston, USA. The database was developed as a standard test for evaluating algorithms for computerized ECG processing. The database contains 48 half-hour ECG records excluded from 47 patients. The signals are digitized with a frequency of 360 Hz. The recordings were selected in such a way that 23 contain normal sinus rhythm (NSR) and representative arrhythmias, while the other 25 contain rarer but clinically significant pathological ECG waves. The base is annotated, which means that the signals are also associated with the markings of individual ECG artifacts. These annotations were created by two or more cardiologists independently labeling the signals and then comparing and finally agreeing the labels by consensus. Each signal contains data from two ECG leads. In 45 cases the first lead is a modified arm lead II (MLII) and the second lead is usually a modified lead V1 (sometimes V2, V5 and in one case V4). In the other three cases, the first lead is V5 and the second lead is V2 (two records) or MLII (record 114 has reversed leads). The shapes of individual waves are different in the first and second drain. In general, it can be observed that normal sinus rhythm has clearer waves in the first lead, and some pathological conditions in the second lead.

OPIS TESTA TEST DESCRIPTION

Većina algoritama za QRS detekciju predstavljenih u literaturi ne zahtijeva fazu učenja. To znači da su autori osmislili algoritam koji je testiran na svakom pojedinom zapisu iz MIT-BIH AD baze uz iste postavke algoritma. Kod takvih pristupa nema problema sa odabirom uzoraka za učenje i testiranje. Ovdje predstavljeni pristup zahtijeva fazu učenja i kod takvih algoritama najrealniji test je tzv. test baziran na subjektu (engl. subject based test). Kod takvog testa, svaki zapis iz baze se izdvaja iz uzorka za učenje i služi za test metode, a ostali signali koriste se za učenje (trening) i ako je potrebno validaciju modela (regularizacija i sprečavanje pretreniranosti). Takav pristup je najbliži potencijalnoj kliničkoj primjeni metode jer se prilikom testa algoritam izlaže EKG signalu pacijenta "kojeg nikad prije nije vidio" (podaci nisu korišteni ni za učenje niza validaciju modela). To zapravo znači da je izgrađeno i testirano 48 zasebnih modela (po jedan model za svaki zapis). Modeli su izgrađeni (naučeni) nad uzorkom koji je sadržavao 5% nebitnih ekstrema i sve QRS komplekse iz 47 zapisa te su potom testirani na cjelokupnom skupu podataka 48. zapisa. Ovime se na najbolji mogući način provjerava stupanj generalizacije modela odnosno primjene na novim podacima. Kod algoritama koji ne zahtijevaju učenje postoji mogućnost da su parametri algoritma podešeni za sve zapise iz testne baze te iako algoritam ne zahtijeva učenje nije moguće pouzdano procijeniti mogućnosti generalizacije za primjenu nad novim podacima. Primijenjena je Fast Random Forest implementacija, a izgrađeni model imao je 1000 stabala u šumi od kojih je svako izgrađeno razmatranjem 15 slučajnih varijabli. Zbog izrazito nebalansiranih podataka, eksperimenti su pokazali da su najbolji rezultati ostvareni primjenom meta klasifikatora sa sljedećom matricom koštanja, prikazano u tablici 6. Most algorithms for QRS detection presented in the literature do not require a learning phase. This means that the authors designed an algorithm that was tested on every single record from the MIT-BIH AD database with the same algorithm settings. With such approaches, there is no problem with selecting samples for learning and testing. The approach presented here requires a learning phase, and with such algorithms, the most realistic test is the so-called subject-based test. In such a test, each record from the database is extracted from the learning sample and serves for the test method, while the other signals are used for learning (training) and, if necessary, model validation (regularization and prevention of overtraining). Such an approach is the closest to the potential clinical application of the method, because during the test the algorithm is exposed to the ECG signal of a patient "whom it has never seen before" (the data was not even used for learning a series of model validations). This effectively means that 48 separate models (one model for each track) were built and tested. The models were built (learned) on a sample that contained 5% of irrelevant extremes and all QRS complexes from 47 records and were then tested on the entire data set of the 48th record. In this way, the degree of generalization of the model, ie its application to new data, is checked in the best possible way. With algorithms that do not require learning, there is a possibility that the parameters of the algorithm are adjusted for all records from the test database, and although the algorithm does not require learning, it is not possible to reliably assess the generalization possibilities for application to new data. The Fast Random Forest implementation was applied, and the built model had 1000 trees in the forest, each of which was built by considering 15 random variables. Due to extremely unbalanced data, the experiments showed that the best results were achieved by applying the meta classifier with the following cost matrix, shown in Table 6.

[image] [image]

Tablica 6: Matrica koštanja pojedinih klasifikacija Table 6: Cost matrix of individual classifications

Vrijednost 3.2 za koštanje FN pogreške odabrana je zbog dva razloga. Prvi razlog je što je nakon poduzorkovanja klase nebitnih valova omjer i dalje bio 1.6 naprema 1 u korist nebitnih, a drugi razlog su uočeni visoki i oštri T valovi u nekim zapisima naspram QRS kompleksa male amplitude u drugim (npr. zapis 108). Zbog toga je koštanje FN pogreške duplo veće od omjera distribucija unutar klase otkucaja. The value of 3.2 for the FN error cost was chosen for two reasons. The first reason is that after subsampling the nonsignificant wave class, the ratio was still 1.6 to 1 in favor of nonsignificant, and the second reason is that tall and sharp T waves were observed in some records versus low-amplitude QRS complexes in others (eg, record 108). Therefore, the cost of the FN error is twice the ratio of the distributions within the beat class.

U nastavku su tablici 7 dani rezultati QRS detekcije nad filtriranim podacima - 360 ms Below, Table 7 shows the results of QRS detection over filtered data - 360 ms

[image] [image]

Tablica 7 Table 7

Rezultati prikazuju odličnu sposobnost metode u razlikovanju bitnih od nebitnih valova EKG signala koja je usporediva sa state of the art metodama objavljenima u literaturi. Greške su primjetne u zapisima koji su i prema literaturi teški za raspoznavanje i koji sadrže mnogo šuma ili male QRS komplekse (poput zapisa 108). Posebno se ističe 30 FP pogrešaka iz zapisa 113. Pregledom pogrešaka utvrđeno je da se radi o T valovima koji se nalaze izvan praga od 360 ms. Kako je osjetljivost klasifikatora povećana spuštanjem praga vjerojatnosti za prihvaćanje QRS kompleksa na 0,4, onda se može očekivati i raspoznavanje velikog broja T valova kao QRS kompleksa. Ove pogreške eliminirale bi se povećanjem praga (npr. na 375 ms) no onda ne bi bila moguća poštena usporedba sa radovima iz literature. Usporedba s najuspješnijim pristupom prema dostupnoj literaturi dana je u tablici 8. The results show the excellent ability of the method in distinguishing important from non-essential waves of the ECG signal, which is comparable to state-of-the-art methods published in the literature. Errors are noticeable in records that are difficult to recognize even according to the literature and that contain a lot of noise or small QRS complexes (like record 108). Of particular note are the 30 FP errors from record 113. An examination of the errors revealed that they are T waves that are outside the threshold of 360 ms. As the sensitivity of the classifier is increased by lowering the probability threshold for accepting a QRS complex to 0.4, it can be expected to recognize a large number of T waves as QRS complexes. These errors would be eliminated by increasing the threshold (eg to 375 ms), but then a fair comparison with works from the literature would not be possible. A comparison with the most successful approach according to the available literature is given in Table 8.

[image] [image]

Tablica 8: Usporedba predložene metode i pristupa iz literature Table 8: Comparison of the proposed method and approaches from the literature

Kada govorimo o klasifikaciji oblika EKG valova, veoma je važno istaknuti razliku između metodologija testiranja koje se koriste pri objavi rezultata. Metodologija testiranja, odnosno način odabira uzoraka za učenje i testiranje značajno utječe na rezultate testa. Česti pristup u recentnim radovima koji prikazuju istraživanja iz područja klasifikacije aritmija odnosno oblika EKG valova je testiranje bazirano na klasi. Kod ovakvih istraživanja cjelokupna baza EKG signala se podijeli na skup podataka za učenje i skup podataka za testiranje. Uobičajena je metoda unakrsnog testiranja sa nekoliko preklopa što znači da se mjeri uspješnost algoritma na različitim uzorcima za test te se onda računaju prosječne mjere uspješnosti. Problem kod ovakvog pristupa je pristranost metoda zbog već spomenute varijacije u oblicima EKG valova koja postoji između pacijenata, a koja na ovakav način nije adekvatno obuhvaćena testom. When we talk about the classification of ECG waveforms, it is very important to highlight the difference between the testing methodologies used when publishing the results. The testing methodology, i.e. the way of selecting samples for learning and testing significantly affects the test results. A common approach in recent works that present research in the field of classification of arrhythmias, ie ECG waveforms, is class-based testing. In this type of research, the entire ECG signal database is divided into a learning data set and a testing data set. A common method is cross-testing with several folds, which means that the performance of the algorithm is measured on different test samples and then average performance measures are calculated. The problem with this approach is method bias due to the already mentioned variation in ECG waveforms that exists between patients, which is not adequately covered by the test in this way.

AAMI standard opisuje procedure i metrike za testiranje algoritama za raspoznavanje patologija EKG signala. Unazad nekoliko godina pojavljuju se radovi koji prate navedene preporuke što omogućuje relevantnu usporedbu različitih pristupa. Radova koji prijavljuju rezultate prema ovom standardu je u literaturi iznenađujuće malo. U ovom dijelu biti će prikazani rezultati testiranja u skladu s preporukama standarda i praksama u navedenim objavljenim istraživanjima. Navedeni standard preporuča izostavljanje EKG zapisa koji sadrže vođeni ritam (pejsmejker), konkretno zapise 102, 104, 107 i 217. Nadalje, skup ostalih zapisa se dijeli na dva podskupa od kojih svaki sadrži po 22 zapisa. Podjela zapisa u podskupove za učenje i testiranje prikazana je u tablici. Kako ovakav pristup testiranju osigurava da cjelokupni zapisi koji su uključeni u testni podatkovni skup nisu u skupu za učenje, ovakav način se u literaturi također podrazumijeva pod sintagmom "testiranje bazirano na subjektu". Neki od objavljenih radova koriste ovu metodologiju testiranja sa dodatnom iznimkom, a to je uključivanje malog dijela testnog signala (npr. prvih 5 minuta) u proces učenja kako bi se algoritam prilagodio na dotičnog pacijenta. Iako su rezultati u tom slučaju znatno bolji, valja napomenuti kako se u tom slučaju ne radi o potpuno automatskom klasifikatoru jer bi u stvarnoj kliničkoj primjeni bila potrebna intervencija eksperta koji bi za svakog pacijenta prilikom snimanja obavio ručno anotiranje dijela signala. Rezultati predstavljeni ovdje ne uključuju takve prilagodbe već se radi o potpuno automatskom klasifikatoru. The AAMI standard describes procedures and metrics for testing algorithms for recognizing ECG signal pathologies. Over the past few years, papers have appeared that follow the mentioned recommendations, which enables a relevant comparison of different approaches. There are surprisingly few papers reporting results according to this standard in the literature. In this part, the results of testing will be presented in accordance with the recommendations of the standards and practices in the aforementioned published research. The specified standard recommends the omission of ECG records that contain a controlled rhythm (pacemaker), specifically records 102, 104, 107 and 217. Furthermore, the set of other records is divided into two subsets, each of which contains 22 records. The division of records into subsets for learning and testing is shown in the table. As this approach to testing ensures that all records that are included in the test data set are not in the learning set, this method is also understood in the literature under the phrase "subject-based testing". Some of the published works use this testing methodology with the additional exception of including a small portion of the test signal (eg the first 5 minutes) in the learning process to adapt the algorithm to the patient in question. Although the results in this case are significantly better, it should be noted that in this case it is not a fully automatic classifier, because in real clinical application the intervention of an expert would be required, who would manually annotate part of the signal for each patient during recording. The results presented here do not include such adjustments but are a fully automatic classifier.

[image] [image]

Tablica 9: Podjela MIT-BIH AD baze u podskup za učenje i podskup za testiranje Table 9: Division of the MIT-BIH AD database into a learning subset and a testing subset

Podjela je uobičajena u spomenutim radovima i učinjena je na način da se u oba podskupa nalazi otprilike podjednak broj opservacija svake klase. Originalnih 16 klasa otkucaja grupirani su prema AAMI standardu u 5 klasa, konkretno "N" (bilo koja oznaka koja ne pripada u "S", "V", "F" ili "Q" klase), "S" (supraventrikularni ektopični otkucaj), "V" (ventrikularni ektopični otkucaj), "F" (združeni otkucaj) i „Q“ (nepoznati otkucaj). Originalnih 16 klasa prikazane su u tablici 12. Mapiranje MIT-BIH AD klasa u AAMI klase prikazano je u tablici 11. Učenje algoritma provodi se na prvom podskupu, a testiranje na drugom. Na ovaj način, algoritam se izlaže polovici testne baze na način da niti jedan od zapisa na kojem se testira nije uključen u učenje te je ovakva testna metodologija dobar pokazatelj mogućnosti generalizacije algoritma i mogućnosti potencijalne kliničke primjene. Kako su neki valovi koji pripadaju različitim klasama vrlo slični po obliku, a razlikuju se po dinamičkim karakteristikama (npr. N i A valovi), potrebno je uključiti i određene dinamičke karakteristike kao komponente u vektore [image] . The division is common in the mentioned works and was done in such a way that in both subsets there is an approximately equal number of observations of each class. The original 16 beat classes are grouped according to the AAMI standard into 5 classes, namely "N" (any designation not belonging to the "S", "V", "F" or "Q" classes), "S" (supraventricular ectopic beat ), "V" (ventricular ectopic beat), "F" (coupled beat) and "Q" (unknown beat). The original 16 classes are shown in Table 12. The mapping of MIT-BIH AD classes to AAMI classes is shown in Table 11. Algorithm learning is performed on the first subset, and testing on the second. In this way, the algorithm is exposed to half of the test database in such a way that none of the records on which it is tested is included in learning, and this kind of test methodology is a good indicator of the possibility of generalization of the algorithm and the possibility of potential clinical application. As some waves belonging to different classes are very similar in shape, but differ in dynamic characteristics (eg N and A waves), it is necessary to include certain dynamic characteristics as components in the vectors [image] .

[image] [image]

Tablica 10: Originalne MIT-BIH AD klase (cijela baza bez isključivanja pejsmejkera) Table 10: Original MIT-BIH AD classes (entire database without excluding the pacemaker)

[image] [image]

Tablica 11: Mapiranje MIT-BIH AD klasa u AAMI klase Table 11: Mapping of MIT-BIH AD classes to AAMI classes

S obzirom da je predmetni izum fokusiran na analizu morfoloških karakteristika, uključen je proizvoljni minimum takvih komponenti. U literaturi nalazimo brojne druge dinamičke karakteristike, a njihovo uključivanje može biti predmet budućih istraživanja i potencijalno poboljšati rezultate raspoznavanja. Dinamičke karakteristike koje su uključene kao značajke prikazane su u tablici 12. Vrijeme promatrane točke izraženo u milisekundama označeno je kao T0, vrijeme prethodne točke kao T−1 i.t.d. Given that the subject invention is focused on the analysis of morphological characteristics, an arbitrary minimum of such components is included. We find numerous other dynamic characteristics in the literature, and their inclusion can be the subject of future research and potentially improve recognition results. The dynamic characteristics that are included as features are shown in table 12. The time of the observed point expressed in milliseconds is marked as T0, the time of the previous point as T−1, and so on.

[image] [image]

Tablica 12: Dinamičke značajke uključene u vektor Table 12: Dynamic features included in the vector

Kako bi se "uhvatila" varijabilnost ritma odnosno relativni položaj promatranog vala u odnosu na njegove neposredne susjedne valove, promatran je lokalni prosjek trajanja intervala između prethodnih deset QRS kompleksa. Dodatno, uključen je samo jedan val u budućnosti što ostavlja mogućnost implementacije u skoro-realnom vremenu ranije opisanom metodom skačućih prozora. Omjeri prethodnog i sljedećeg intervala i prosjeka zadnjih deset valova stavljeni su u dodatni međusobni omjer, a zbog asimetrije distribucije dotične varijable učinjena je logaritamska transformacija. In order to "capture" the variability of the rhythm, that is, the relative position of the observed wave in relation to its immediate neighboring waves, the local average of the duration of the interval between the previous ten QRS complexes was observed. Additionally, only one wave in the future is included, which leaves the possibility of near-real-time implementation using the pop-up method described earlier. The ratios of the previous and next intervals and the average of the last ten waves were put in an additional mutual ratio, and due to the asymmetry of the distribution of the respective variable, a logarithmic transformation was performed.

U svrhu raspoznavanja valova izgrađena su dva modela, jedan temeljen na zapisima iz prvog EKG odvoda i drugi temeljen na zapisima drugog odvoda. Oba modela imaju istu strukturu tj. radi se o šumama sa 500 stabala od kojih je svako izgrađeno razmatranjem 15 slučajnih varijabli. In order to recognize the waves, two models were built, one based on the records from the first ECG lead and the other based on the records from the second lead. Both models have the same structure, i.e. they are forests with 500 trees, each of which is built by considering 15 random variables.

[image] [image]

Tablica 13: Matrica zabune za raspoznavanje oblika EKG valova temeljem prvog odvoda i svih atributa Table 13: Confusion matrix for recognizing ECG waveforms based on the first lead and all attributes

Ukupna točnost modela izgrađenog na temelju testnog seta prvog odvoda iznosi 95,812%. Možemo primijetiti kako je najlošija klasifikacija "F" valova. Ovi rezultati poklapaju se sa pronalascima drugih autora čiji pristupi također u značajnoj mjeri griješe pri raspoznavanju "F" otkucaja [2]. The overall accuracy of the model built on the basis of the test set of the first drain is 95.812%. We can notice that the worst classification is "F" waves. These results coincide with the findings of other authors whose approaches also significantly err in recognizing the "F" beat [2].

[image] [image]

Tablica 14: Matrica zabune za raspoznavanje oblika EKG valova temeljem drugog odvoda Table 14: Confusion matrix for the recognition of ECG waveforms based on the second lead

Model temeljen na drugom odvodu ima znatno nižu ukupnu točnost koja iznosi 90,6795%, no postiže značajno bolju klasifikaciju "F" valova. Ovo je za očekivati jer drugi odvod (koji daje perspektivu na srce iz drugog ugla) takve otkucaje prikazuje sa većom amplitudom pa ih klasifikator može lakše razaznati. Razmatrajući samo rezultate klasifikacije dobivene na prvom odvodu i uspoređujući ih sa trenutnim stanjem tehnike tj. objavljenim radovima u literaturi, bez obzira na nemogućnost raspoznavanja "F" klase, a s obzirom na vrlo dobre rezultate u raspoznavanju ostalih klasa, možemo konstatirati kako metoda spada među trenutne "state of the art" metode. Moguće je i kombiniranje zaključaka iz dvaju odvoda primjenom uvjetnih vjerojatnosti (Bayesov produkt), a što je već opisano u literaturi. U svrhu usporedbe s nekima od najuspješnijih algoritama dostupnima u literaturi prikazane su matrice zabune istraživanja prikazanih u [2], [3] i [1]. The model based on the second lead has a significantly lower overall accuracy of 90.6795%, but achieves a significantly better classification of "F" waves. This is to be expected because the second lead (which gives a perspective on the heart from another angle) shows such beats with a higher amplitude, so the classifier can distinguish them more easily. Considering only the classification results obtained on the first drain and comparing them with the current state of the art, i.e. published works in the literature, regardless of the impossibility of recognizing the "F" class, and considering the very good results in recognizing the other classes, we can state that the method belongs to the current "state of the art" methods. It is also possible to combine conclusions from two leads using conditional probabilities (Bayesian product), which is already described in the literature. For the purpose of comparison with some of the most successful algorithms available in the literature, the confusion matrices of the research presented in [2], [3] and [1] are presented.

[image] [image]

Tablica 15: Matrica zabune za raspoznavanje oblika EKG valova ref. [2] Table 15: Confusion matrix for recognizing the shape of ECG waves ref. [2]

[image] [image]

Tablica 16: Matrica zabune za raspoznavanje oblika EKG valova ref. [3] Table 16: Confusion matrix for recognizing the shape of ECG waves ref. [3]

[image] [image]

Tablica 17: Matrica zabune za raspoznavanje oblika EKG valova ref. [1] Table 17: Confusion matrix for recognizing ECG waveforms ref. [1]

[image] [image]

Tablica 18: Usporedba uspješnosti različitih pristupa Table 18: Comparison of the performance of different approaches

Tablica 18 prikazuje usporedbu tri varijante predložene metode i nekih od najuspješnijih pristupa koji su objavili rezultate prema AAMI standardu. Predložena metoda označena kao „a“ odnosi se na klasifikaciju temeljem prvog EKG odvoda koja uključuje sve značajke. Varijanta označena sa „b“ podrazumijeva klasifikaciju temeljenu također na odvodu 1, ali uz korištenje najrelevantnijih 30 atributa dok varijanta „c“ označava klasifikaciju temeljem drugog odvoda. Podebljano su označene uspješnosti na kojima je pojedini pristup najuspješniji (između tri predložena), dok je kurzivom naznačena veća uspješnost prijavljena u literaturi. U literaturi postoji još istraživanja koja koriste AAMI standard testiranja, ali uz uključivanje prvih 5 minuta testnih EKG zapisa u fazi učenja. Takva testiranja nisu u potpunosti korektan pokazatelj potencijalne kliničke primjene s obzirom da bi za svakog pacijenta bilo potrebno 5 minuta označavati signale što u kliničkim situacijama nije uvijek praktično i vremenski je zahtjevno. Stoga usporedba s takvim istraživanjima nije niti rađena. Table 18 shows a comparison of three variants of the proposed method and some of the most successful approaches that have published results according to the AAMI standard. The proposed method marked as "a" refers to the classification based on the first ECG lead that includes all features. The variant marked with "b" implies a classification also based on drain 1, but with the use of the most relevant 30 attributes, while the variant "c" indicates a classification based on the second drain. The success rates where a particular approach is the most successful (among the three proposed) are marked in bold, while the higher success rates reported in the literature are indicated in italics. There are other studies in the literature that use the AAMI testing standard, but with the inclusion of the first 5 minutes of test ECG recordings in the learning phase. Such tests are not a completely correct indicator of potential clinical application, considering that it would take 5 minutes for each patient to mark the signals, which is not always practical in clinical situations and is time-consuming. Therefore, a comparison with such research was not even made.

U nastavku je dan kratki opis sustava za detekciju i raspoznavanje oblika valova u vremenskim serijama, posebno fiziološkog EKG signala koji se temelji na primjeni transformacije signala, te na računalni program i na računalno čitljiv medij. Below is a brief description of the system for the detection and recognition of waveforms in time series, especially the physiological ECG signal, which is based on the application of signal transformation, as well as on a computer program and on a computer-readable medium.

Sustav za detekciju i raspoznavanja oblika valova u vremenskim serijama, posebno fiziološkog signala, koji sadrži ulazni sklop koji je uređen za primanje, snimanje i pohranjivanje valova u vremenskim serijama, uređaj uređen za uklanjanje šuma izvornog signala, koji nadalje sadrži: A system for detecting and distinguishing waveforms in time series, especially physiological signals, which contains an input circuit arranged to receive, record and store waveforms in time series, a device arranged to remove noise from the original signal, which further contains:

- procesor koji je uređen na način da izlučuje morfološke značajke signala transformacijom izvornog jednodimenzionalnog signala u niz karakterističnih vektora VPT percepcije koji sažimaju morfologiju signala u okolini točke u trenutku T, koji vektori VPT se sastoje od skupa značajki m1T ;m2T ;…;mnT , pri čemu dio navedenih značajki opisuje geometrijske karakteristike odsječka signala, dio čine dinamički kumulativi, a dio vremensko morfološke determinante VMD, pri čemu se transformacija serije provodi unaprijed i unatrag u odnosu na vremenski slijed; - a processor arranged in such a way as to extract the morphological features of the signal by transforming the original one-dimensional signal into a series of characteristic VPT perception vectors that summarize the morphology of the signal in the vicinity of the point at time T, which VPT vectors consist of a set of features m1T ;m2T ;…;mnT , at where a part of the mentioned features describes the geometric characteristics of the signal segment, a part is dynamic cumulative, and a part is a temporal morphological determinant of VMD, whereby the transformation of the series is carried out forward and backward in relation to the time sequence;

- model I klasifikatora pohranjenog na računalu u obliku programskog koda koji koristi rezultat transformacije za klasifikaciju odnosno detekciju zanimljivih dijelova signala u novim signalima; - model I of the classifier stored on the computer in the form of a program code that uses the result of the transformation for the classification or detection of interesting parts of the signal in new signals;

- model II klasifikatora pohranjenog na računalu u obliku programskog koda koji koristi rezultat transformacije i rezultat detekcije zanimljivih oblika za klasifikaciju odnosno raspoznavanje oblika valova, koji se koristi za raspoznavanje oblika valova novih signala; i - model II of the classifier stored on the computer in the form of a program code that uses the result of the transformation and the result of the detection of interesting shapes for classification, that is, the recognition of waveforms, which is used to recognize the waveforms of new signals; and

- bazu znanja koja sadrži skup proširenih značajki nastao izlučivanjem dinamičkih i morfoloških karakteristika signala u svrhu obrade algoritmima strojnog nenadziranog učenja i pronalaženja najsličnijih valova za neki novi val. - a knowledge base that contains a set of extended features created by extracting dynamic and morphological characteristics of signals for the purpose of processing with machine unsupervised learning algorithms and finding the most similar waves for a new wave.

Predmetni izum se odnosi i na računalni program za detekciju i raspoznavanje oblika valova u vremenskim serijama, posebno fiziološkog signala, koji je prilagođen, da kada se izvršava na računalu, uzrokuje da računalo provodi računalno implementirani postupak detekcije i raspoznavanja oblika valova u vremenskim serijama. Prema predmetnom izumu valovi u vremenskim serijama su EKG valovi. Predmetni izum se odnosi i na računalno čitljiv medij, na kojem je pohranjen navedeni računalni program. The subject invention also relates to a computer program for the detection and recognition of waveforms in time series, especially physiological signals, which is adapted so that when executed on a computer, it causes the computer to perform a computer-implemented process of detection and recognition of waveforms in time series. According to the subject invention, the waves in the time series are ECG waves. The present invention also relates to a computer-readable medium on which the said computer program is stored.

Predmetni izum tj. računalno implementirani postupak u obliku programskog koda što postupak čini prikladnim za razne primjene poput softvera na računalu, mobilnom telefonu ili portabilnom EKG uređaju (holter). Znanstvena i stručna literatura navode sustave u kojima su na mobilnim telefonima realizirane aplikacije za prihvat EKG podataka putem bežične veze (npr. Bluetooth), a aplikacije na mobilnom telefonu obrađuju signal te putem mobilne mreže šalju kritične dijelove signala ili cijele signale u centralnu bazu podataka. Međutim, sustavi opisani u literaturi ne prikazuju obradu signala sa takvom točnošću kao što je to kod predmetnog izuma. Osim izvedbe softvera na mobilnom telefonu, predmetni izum je moguće implementirati i kao ugrađeni softver (engl. embedded) u specijalizirani EKG uređaj (npr. holter). U tom slučaju specijalizirani EKG uređaj osim funkcije snimanja EKG signala ima i funkciju obrade signala te detekcije QRS segmenata i klasifikacije oblika EKG valova. Takav uređaj nadalje može imati i sklopovlje i programsku opremu za komunikaciju sa drugim sustavima i slanje signala i obavijesti u centralnu bazu podataka. The subject invention, i.e. a computer-implemented procedure in the form of a program code, which makes the procedure suitable for various applications such as software on a computer, a mobile phone or a portable EKG device (holter). Scientific and professional literature mentions systems in which applications for receiving ECG data via a wireless connection (e.g. Bluetooth) are implemented on mobile phones, and applications on the mobile phone process the signal and send critical parts of the signal or the entire signal to a central database via the mobile network. However, systems described in the literature do not display signal processing with such accuracy as the present invention. In addition to the implementation of software on a mobile phone, the subject invention can also be implemented as embedded software in a specialized ECG device (e.g. holter). In this case, the specialized ECG device, in addition to the function of recording ECG signals, also has the function of signal processing and detection of QRS segments and classification of ECG waveforms. Such a device can also have circuitry and software for communicating with other systems and sending signals and notifications to a central database.

PRIMJENA METODE U NENADZIRANOM UČENJU APPLICATION OF THE METHOD IN UNSUPERVISED LEARNING

Ranije opisani način učenja algoritama, gdje u podatkovnom setu za svaku opservaciju imamo informaciju o klasi kojoj ona pripada, spada u tzv. nadzirano učenje. U situaciji gdje nemamo informaciju o klasi kojoj opservacija pripada, zadatak je ustvrditi kojoj grupi ona pripada tj. koje od poznatih opservacija su joj najsličnije. Navedeni problem nazivamo i grupiranje ili klasteriranje. U nastavku je prikazan samo konceptualni model takvog procesa. Temelj algoritama nenadziranog strojnog učenja je izračun udaljenosti, odnosno razlika među opservacijama. Najjednostavniji primjer izračuna udaljenosti je Euklidska udaljenost. Ako su dvije opservacije a i b opisane varijablama (a1; a2; a3) i (b1; b2; b3) tada udaljenost među njima (Δ) možemo izračunati kao The previously described way of learning algorithms, where in the data set for each observation we have information about the class to which it belongs, belongs to the so-called supervised learning. In a situation where we do not have information about the class to which an observation belongs, the task is to determine which group it belongs to, i.e. which of the known observations are most similar to it. We also call the mentioned problem grouping or clustering. Only a conceptual model of such a process is presented below. The basis of unsupervised machine learning algorithms is the calculation of the distance, that is, the difference between observations. The simplest example of distance calculation is the Euclidean distance. If two observations a and b are described by the variables (a1; a2; a3) and (b1; b2; b3), then the distance between them (Δ) can be calculated as

[image] [image]

Kod ovakvog izračuna udaljenosti potrebno je provesti ranije spomenutu normalizaciju kako bi sve varijable bile na istoj skali (minmax transformacija, z-skaliranje i sl.). Euklidska udaljenost ne vodi računa o varijabilitetu podataka unutar pojedinih varijabli te stoga postoje "naprednije" metode izračuna udaljenosti. Jedna od naprednijih metoda izračuna udaljenosti koja u obzir uzima raspršenost podataka je Mahalanobisova udaljenost. Obuhvaćanje raspršenosti je realizirano pomoću matrice kovarijanci podataka. Zapravo možemo reći da je Mahalanobisova udaljenost na neki način višedimenzionalno z-skaliranje. Uz danu listu X koja se sastoji od N opservacija gdje svaka opservacija može biti K dimenzionalna (duljina vektora) te vektor μx (koji se sastoji od individualnih sredina μ1,……K), kovarijanca je K * K matrica When calculating this distance, it is necessary to carry out the previously mentioned normalization so that all variables are on the same scale (minmax transformation, z-scaling, etc.). Euclidean distance does not take into account the variability of data within individual variables, and therefore there are more "advanced" methods of distance calculation. One of the more advanced distance calculation methods that takes data dispersion into account is the Mahalanobis distance. Scatter capture was realized using the data covariance matrix. In fact we can say that the Mahalanobis distance is in a way a multi-dimensional z-scaling. With a given list X consisting of N observations where each observation can be K dimensional (vector length) and a vector μx (consisting of individual means μ1,……K), the covariance is a K * K matrix

[image] [image]

S inverzom matrice Σ možemo analogno Euklidskoj udaljenosti izračunati Mahalanobisovu udaljenost na način da iz izračuna pomoću matrice Σ −1 isključimo kovarijancu prostora With the inverse of the matrix Σ, we can calculate the Mahalanobis distance analogously to the Euclidean distance by excluding the space covariance from the calculation using the matrix Σ −1

[image] [image]

Mahalanobisova udaljenost je broj koji je zapravo mjera sličnosti između dvije opservacije višedimenzionalnog prostora. Postoje i mnoge druge metode izračuna udaljenosti, no njihovo pojašnjavanje izlazi izvan okvira ovog rada. Mahalanobis distance is a number that is actually a measure of similarity between two observations of a multidimensional space. There are many other methods of distance calculation, but their explanation is beyond the scope of this paper.

Claims

1. A computer-implemented procedure for detecting and recognizing waveforms in time series, especially the ECG physiological signal, which includes the process of preprocessing the original signal for the purpose of removing noise, extracting morphological features of the signal with reduced noise, detecting interesting parts of the signal, extracting dynamic features, recognizing and classifying shapes waves and grouping/clustering of similar segments of the signal, indicated by the fact that the extraction of morphological features of the signal takes place by transforming the original one-dimensional signal into a series of characteristic vectors of VPT perception that summarize the morphology of the signal in the vicinity of the point at time T, which VPT vectors consist of a set of features m1T; m2T;...;mnT, where a part of the mentioned features describes the geometric characteristics of the signal segment, a part is dynamic cumulative, and a part is the time-morphological determinant VMD, where the transformation of the series is carried out forward and backward in relation to the time sequence.

2. The method according to claim 1, characterized in that the result of the signal transformation is used as an input to the classifier for the purpose of building model I for the detection of interesting shapes of signal segments, and the use of model I for the detection of interesting signal segments for the purpose of detecting such segments in new signals, and storing model I for detection of interesting segments of the signal in the form of program code.

3. The method according to claims 1 and 2, characterized by the fact that the transformed signal segments are grouped by shape into groups/clusters with similar signal segments, whereby new associated information is generated about the grouping of segments with the original signal, which are used in the process of unsupervised machine learning to find similar shapes of signal segments for a new signal segment.

4. The procedure according to claims 1, 2 and 3, characterized by the fact that the signal is simultaneously subjected to the classification of the shape of signal segments using model I, after which the dynamic features are extracted and transformed into a series of characteristic vectors of VPT perception, whereby the result of these two simultaneous steps are vectors with extended features for the purpose of building a shape recognition model II, and using the signal shape recognition model II for the purpose of shape recognition in new signals, and storing the signal shape recognition model II in the form of program code.

5. The method according to claim 4, characterized in that the transformed signal forms are grouped by shape into groups/clusters with similar signal forms, whereby new associated information about the grouping of forms is generated in addition to the original signal, which in the process of unsupervised machine learning is used to find similar signal forms for a new form of signal.

6. The method according to claim 1, characterized in that the dynamic cumulative components of the x vector [image] 6. calculate separately for the duration of the trend of the signal and for the duration of the concavity of the signal.

7. The method according to claim 6, characterized in that the calculation of the cumulative component of the x vector [image] 7. for the duration of the trend, it takes place in the following way: if the trend at point T is equal to the trend at point T-1, then the duration of the trend is increased by one and the cumulative value is increased by the value of the variable x, if the trend at point T is not equal to the trend at point T-1 then the duration of the trend is set to 1 and the cumulative to the value of variable x, if the detailed trend at point T is equal to the detailed trend at point T-1, then the duration of the detailed trend is increased by one and the detailed cumulative is increased by the value variable x, and if the detailed trend at point T is not equal to the detailed trend at point T-1, then the duration of the detailed trend is set to 1, and the detailed cumulative to the value of the variable x.

8. The method according to claim 6, characterized in that the calculation of the cumulative component of the x vector [image] 8. during the duration of the concavity, it takes place as follows: if the degree of concavity at point T is equal to the degree of concavity at point T-1, then the duration of the degree of concavity is increased by one and cumulatively increases by the value of the variable x, if the degree of concavity at point T is not equal to the degree of concavity at point T-1 then the duration of the degree of concavity is set to 1 and the cumulative to the value of variable x, if the detailed degree of concavity at point T is equal to the detailed degree of concavity at point T-1 then the duration of the detailed degree of concavity is increased by one and the detailed cumulative is increased by the value of variable x, and if the detailed degree of concavity at point T is not equal to the detailed degree of concavity at point T-1 then the duration of detailed degree of concavity is set to 1 and the detailed cumulative is set to the value of variable x.

9. The method according to claim 1, characterized in that the calculation of the VMD value of the component x of the vector [image] 9. for the duration of the trend it takes place as follows: if the trend at point T is equal to the trend at point T-1, then the duration of the VMD counter is increased by one and the VMD component is increased by the product of the variable x and the VDM counter, if the trend at point T is not equal to the trend at point T-1 then the VMD counter is set to 1 and the VMD component is set to the value of variable x, if the detailed trend at point T is equal to the detailed trend at point T-1 then the value of the detailed VMD counter is increased by one and the detailed VMD component is incremented by the product of the variable x and the detailed VDM counter, and if the detailed trend at point T is not equal to the detailed trend at point T-1 then the value of the detailed VMD counter is set to 1 and the detailed VMD component is set to the value of the variable x.

10. The method according to claim 1, indicated by the fact that the geometric characteristics of the segment of the signal include: angular deflection and backward angular deflection, normalized original value, area under the last segment and area under the last backward segment, approximations of the first derivative and approximation of the first derivative backward, the last change of the surface and last area change backward, trend and backward trend, detailed trend and detailed backward trend, trend duration and backward trend duration, detailed trend duration and detailed backward trend duration, concavity coefficient and backward concavity coefficient, concavity label and backward concavity label, detailed label concavities and detail backward concavity label, type of stationary point and duration of degree of concavity and duration of degree of backward concavity.

11. The method according to claims 1 to 6, characterized by the fact that the dynamic cumulative signal segment consists of: cumulative area for the duration of the trend, cumulative area for the duration of the backward trend, cumulative change of the area during the trend, cumulative change of the area during the backward trend, cumulative change area during the detailed trend, cumulative area change during the detailed trend back, angle cumulative during the trend, angle cumulative during the trend back, angle cumulative during the detailed trend, angle cumulative during the detailed trend back, concavity cumulative for duration of concavity, cumulative concavity for the duration of backward concavity, cumulative concavity for the duration of concavity in detail, cumulative concavity for the duration of concavity in detail backward, cumulative concavity for the duration of the trend and cumulative concavity for the duration of the trend backward ag.

12. The method according to claim 1 and 9, characterized in that the temporal morphological determinants of the signal segment (VMD) consist of: first derivatives for the duration of the trend, VMD of the first derivatives for the duration of the backward trend, VMD of the first derivatives for the duration of the detailed trend, VMD of the first derivatives for the duration of the detailed trend backward, VMD concavities for the duration of the trend, VMD concavities for the duration of the trend backward, VMD concavities for the duration of the detailed trend, VMD concavities for the duration of the detailed trend backward, VMD surfaces for the duration of the trend, VMD surfaces for the duration of the backward trend, the VMD area for the duration of the detailed trend and the VMD area for the duration of the detailed trend backward.

13. Procedure according to the previous requirements, characterized by the fact that the Random Forest algorithm is applied for signal classification.

14. The method according to the previous claims, characterized in that the interesting parts of the signal are the QRS complex in the ECG physiological signal.

15. A system for detecting and distinguishing waveforms in time series, especially physiological signals, which contains an input circuit that is arranged for receiving, recording and storing waves in time series, a device arranged for removing noise from the original signal, characterized by the fact that it further contains: a processor which is organized in such a way as to extract the morphological features of the signal by transforming the original one-dimensional signal into a series of characteristic VPT perception vectors that summarize the morphology of the signal in the vicinity of the point at time T, which VPT vectors consist of a set of features m1T ;m2T ;…;mnT , where part of the mentioned features describes the geometric characteristics of the signal segment, part of which is dynamic cumulative, and part of which is the time-morphological determinant of VMD, whereby the transformation of the series is carried out forward and backward in relation to the time sequence; model I classifier stored on the computer in the form of program code that model I uses the result of the transformation for classification or detection of interesting signal segments in new signals; model II classifier stored on the computer in the form of program code, which model II uses the result of transformation and the result of detection of interesting shapes for classification, i.e. recognition of waveforms, which is used for recognition of waveforms of new signals; and a knowledge base that contains a set of extended features created by extracting dynamic and morphological characteristics of the signal for the purpose of processing with machine unsupervised learning algorithms and finding the most similar signal segments for a new signal segment or the most similar waveforms for a new wave.

16. A computer program for the detection and recognition of waveforms in time series, in particular a physiological signal, characterized in that it is adapted to, when executed on a computer, cause the computer to perform the procedure according to claims 1 to 14.

17. Computer-readable medium, characterized in that it contains a stored computer program from claim 16.