GB2622458A

GB2622458A - Electrocardiographic (ECG) signal processing method and system

Info

Publication number: GB2622458A
Application number: GB2307060.0A
Authority: GB
Inventors: Liu Jinxin; Wang Huiquan; Shou Zhaobing; Feng Qiang; Song Yao; Han Mengting; Long Hua
Original assignee: Shenzhen Timeyaa Electronic Tech Co Ltd
Current assignee: Shenzhen Timeyaa Electronic Tech Co Ltd
Priority date: 2022-09-19
Filing date: 2023-05-12
Publication date: 2024-03-20
Also published as: CN115363598B; CN115363598A; GB202307060D0

Abstract

Heart rate variability (HRV) analysis is performed on one of a set of electrocardiogram (ECG) signals to obtain values of a feature set comprising a time-domain index, a frequency-domain index, and a nonlinear-domain index. Based on the feature values, calibration and classification is performed on the ECG signal to obtain a corresponding heart state. The feature values are processed using a ReliefF algorithm and a multicollinearity analysis algorithm in sequence to obtain an optimal feature subset of the ECG signal, which is a subset of the feature set. An optimal feature set of each of the set of ECG signals and the heart state are used as a training set to train a random forest model, to obtain a weight model. Weight values of features in the optimal feature set of each ECG signal are obtained based on the weight model. The optimal feature values and weight values for each ECG signal are determined, a weighted sum being a feature indicator formula, wherein a feature index formula is used to calculate a feature index.

Description

ELECTROCARDIOGRAPHIC (ECG) SIGNAL PROCESSING METHOD AND

SYSTEM

TECHNICAL FIELD

100011 The present disclosure relates to the field of electrocardiographic (ECG) signal analysis technologies, in particular, to an ECG signal processing method and system.

BACKGROUND

100021 An ECG signal is one of the important physiological signals of a human body, and some diseases may be reflected by extracting a large quantity of effective features from the ECG signal. Heart rate variability (HRV) is a variant based on a heart rate change caused when a sympathetic nerve interacts with a vagus nerve, the HRV can reflect activity of an autonomic nervous system and evaluate tension balance between a cardiac sympathetic nerve and the vagus nerve, and some diseases including a cardiovascular disease, fatigue, mood, and the like may be determined based on the ERV. In this case, time-frequency domain analysis is always a common method for ERV analysis, and the ECG signal is extracted by extracting time-frequency domain features. However, some nonlinear information fails to be extracted in this method.

100031 For this problem, Chinese patent titled "ECG SIGNAL PROCESSING METHOD" with Application No. 201410852128.0 provides an ECG signal processing method based on a neural network. In this method, the feature of a signal is not extracted, and the signal is directly processed by using an end-to-end learning method. In practice, it is difficult to know functions of components in a model, in other words, the model becomes more black-box. As a result, interpretability is reduced. In addition, this method is only applicable to a case with a large quantity of ECG signals. Moreover, Chinese patent Application No. 201811429915.9 provides "HEART RAlk VARIABILITY MEASUREMENT METHOD, APPARATUS, AND DEVICE BASED ON TIME-FREQUENCY ANALYSIS". This patent proposes a method for extracting a time-frequency domain feature of an ECG signal, which is applicable to a case with various data amounts. However, this method only performs mathematical statistical analysis on feature parameters to construct a disease monitoring model based on feature parameter analysis, but not explores the feature parameters, resulting in low accuracy and poor actual test results.

SUMMARY

100041 The present disclosure aims to provide an ECG signal processing method and system, to improve accuracy of a processing result of an ECG signal.

[0005] To achieve the foregoing objective, the present disclosure provides the following technical solutions: [0006] An ECG signal processing method is provided, including: 100071 obtaining a sample set, where the sample set includes a plurality of ECG signals [0008] performing heart rate variability analysis on any ECG signal in the sample set, to obtain values of features in a feature set of the ECG signal, where the feature set includes a time domain index, a frequency domain index, and a nonlinear domain index; [0009] performing calibration and classification on the ECG signal based on the values of the features in the feature set of the ECG signal, to obtain a heart state corresponding to the ECG signal; 100101 processing the values of the features in the feature set of the ECG signal by sequentially using a ReliefT algorithm and a multicollinearity analysis algorithm, to obtain an optimal feature subset of the ECG signal, where the optimal feature subset is a subset of the feature set; [0011] training a random forest model by using an optimal feature subset of each ECG signal in the sample set and a heart state corresponding to the ECG signal as a training set, to obtain a weight model; [0012] obtaining weight values of features in the optimal feature subset of each ECG signal based on the weight model; and [0013] determining the weight value of each feature in the optimal feature subset of each said ECG signal and the value of each feature in the optimal feature subset of each said ECG signal, the weighted sum of the two being a feature indicator formula, where the feature index formula is used to calculate a feature index.

[0014] Optionally, the performing heart rate variability analysis on the ECG signal, to obtain values of features in a feature set of the ECG signal specifically includes: 100151 performing signal preprocessing on the ECG signal, to obtain an ECG waveform of the ECG signal; [0016] obtaining an RR interval sequence of the ECG signal based on the ECG waveform of the ECG signal; 100171 removing the RR interval sequence of the ECG signal according to a 3cs principle, to obtain a processed RR interval sequence; and 100181 obtaining the values of the features in the feature set of the ECG signal based on the processed RR interval sequence.

[0019] Optionally, the processing the values of the features in the feature set of the ECG signal by sequentially using a ReliefF algorithm and a multicollinearity analysis algorithm, to obtain an optimal feature subset of the ECG signal includes: [0020] inputting the values of the features in the feature set of the ECG signal into a ReliefF analysis model, to obtain selection weights of the features in the feature set, [0021] deleting a feature whose selection weight in the feature set is less than a first set threshold, to obtain a first feature set; and [0022] processing the first feature set by using the multicollinearity analysis algorithm, to obtain the optimal feature subset.

[0023] Optionally, the processing the first feature set by using the multicollinearity analysis algorithm, to obtain the optimal feature subset specifically includes: 100241 calculating mutual information between every two features in the first feature set; and 100251 deleting a feature with a small selection weight from the two features between which mutual information is greater than a second set threshold, to obtain the optimal feature subset. [0026] An ECG signal processing system is provided, including: [0027] an obtaining module, configured to obtain a sample set, where the sample set includes a plurality of ECG signals, [0028] a heart rate variability analysis module, configured to perform heart rate variability analysis on any ECG signal in the sample set, to obtain values of features in a feature set of the ECG signal, where the feature set includes a time domain index, a frequency domain index, and a nonlinear domain index; [0029] a calibration and classification module, configured to perform calibration and classification on the ECG signal based on the values of the features in the feature set of the ECG signal, to obtain a heart state corresponding to the ECG signal, [0030] an optimal feature subset determining module, configured to process the values of the features in the feature set of the ECG signal by sequentially using a ReliefF algorithm and a multicollinearity analysis algorithm, to obtain an optimal feature subset of the ECG signal, where the optimal feature subset is a subset of the feature set; [0031] a training module, configured to train a random forest model by using an optimal feature subset of each ECG signal in the sample set and a heart state corresponding to the ECG signal as a training set, to obtain a weight model; [0032] a weight value determining module, configured to obtain weight values of features in the optimal feature subset of each ECG signal based on the weight model; and 100331 a feature index formula determining module, configured to determine the weight value of each feature in the optimal feature subset of each said ECG signal and the value of each feature in the optimal feature subset of each said ECG signal, the weighted sum of the two being the feature indicator formula.

[0034] Optionally, the heart rate variability analysis module specifically includes: 100351 a preprocessing unit, configured to perform signal preprocessing on the ECG signal, to obtain an ECG waveform of the ECG signal; [0036] an RR interval sequence determining unit, configured to obtain an RR interval sequence of the ECG signal based on the ECG waveform of the ECG signal; [0037] a removing unit, configured to remove the RR interval sequence of the ECG signal according to a 3a principle, to obtain a processed RR interval sequence; and [0038] a feature value determining unit, configured to obtain the values of the features in the feature set of the ECG signal based on the processed RR interval sequence.

[0039] Optionally, the optimal feature subset determining module specifically includes: [0040] a selection weight calculation unit, configured to input the values of the features in the feature set of the ECG signal into a ReliefF analysis model, to obtain selection weights of the features in the feature set; [0041] a first feature set determining unit, configured to delete a feature whose selection weight in the feature set is less than a first set threshold, to obtain a first feature set; and [0042] an optimal feature subset determining unit, configured to process the first feature set by using the multicollinearity analysis algorithm, to obtain the optimal feature subset.

[0043] Optionally, the optimal feature subset determining unit specifically includes: 100441 a mutual information calculation subunit, configured to calculate mutual information between every two features in the first feature set; and [0045] an optimal feature subset determining subunit, configured to delete a feature with a small selection weight from the two features between which mutual information is greater than a second set threshold, to obtain the optimal feature subset.

[0046] According to the specific embodiments provided in the present disclosure, the present disclosure achieves the following technical effects: In the present disclosure, the optimal feature subset is selected by introducing the ReliefF algorithm and the multicollinearity analysis method, to improve the accuracy of the processing result of the ECG signal.

BRIEF DESCRIPTION OF THE DRAWINGS

100471 To describe the technical solutions in embodiments of the present disclosure or in the related art more clearly, the following briefly describes the accompanying drawings required for describing the embodiments. Apparently, the accompanying drawings in the following description show merely some embodiments of the present disclosure, and a person of ordinary skill in the art may still derive other accompanying drawings from these accompanying drawings without creative efforts 100481 FIG 1 is a flowchart of an ECG signal processing method according to an embodiment of the present disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0049] The technical solutions in embodiments of the present disclosure are described below clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure. Apparently, the described embodiments are merely some rather than all of the embodiments of the present disclosure. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present disclosure without creative efforts shall fall within the protection scope of the present disclosure.

[0050] To make the foregoing objectives, features, and advantages of the present disclosure clearer and more comprehensible, the present disclosure is further described in detail below with reference to the accompanying drawings and specific implementations [0051] As shown in FIG. 1, an embodiment of the present disclosure provides an ECG signal processing method, including the following steps.

100521 Step 101: Obtain a sample set, where the sample set includes a plurality of ECG signals. [0053] Step 102: Perform heart rate variability (HRV) analysis on any ECG signal in the sample set, to obtain values of features in a feature set Dff(1), f(2), f(371 of the ECG signal, where the feature set includes a time domain index, a frequency domain index, and a nonlinear domain index. 10 time domain indexes, 7 frequency domain indexes, and 20 nonlinear indexes are specifically shown in Table I. [0054] Table 1 Feature set table [00551 Parameter Meaning Time domain index meanH_R Mean heart rate MEAN Mean of RR intervals SDNN Standard deviation of all normal sinus beat (RR) intervals SDNNindex Mean of standard deviations of RR intervals for all 5-minute segments RMSSD Square root of mean of differences between adjacent RR intervals pNN50(%) NN50 count divided by total quantity of RR intervals SDSD Standard deviation of differences between adjacent INN intervals TINN Baseline width of minimum square difference triangular interpolation of highest peak of histogram of RR intervals HrvIndex Total quantity of RR intervals divided by height of histogram of RR intervals CV Coefficient of variation, ratio of SDNN to WAN Frequency domain index VLF Frequency band: 0.003 Hz-0.04 Hz, reflecting auxiliary information of a sympathetic nerve LF Frequency band: 0.04 Hz-0.15 Hz, reflecting an activity between the sympathetic nerve and a vagus nerve HE Frequency band: 0.15 Hz-0.4 Hz, reflecting activity of the vagus nerve TotalPower Frequency band: 0.003 Hz-0.4 Hz, reflecting the activity of the dominant sympathetic nerve and overall activity of an autonomic nervous system LFn(%) Normalized low-frequency power HEM%) Normalized high-frequency power LFn/HFn Reflecting an overall degree of balance between the sympathetic nerve and a parasympathetic nerve [0056] Step 103: Considering a base value of an ECG signal of a user, by using a relative ratio method and in combination with single factor analysis of variance, perform calibration and classification on the ECG signal based on the values of the features in the feature set of the ECG signal, to obtain a heart state corresponding to the ECG signal, where the state may be some heart-related abnormal states. For a continuous change feature of the ECG signal of a human body, the ECG signal is divided into "normal" or "abnormal" states. For example, the state may be set to "normal", "mild fatigue", "moderate fatigue" or "heavy fatigue" during fatigue detection. The state may be set to "normal", "atrial fibrillation", or the like during atrial fibrillation determining. Finally, a feature vector data set with state calibrations is formed, and these calibrations, as labels of an optimal feature subset, and the optimal feature subset are used as training samples.

[0057] Step 104: Process the values of the features in the feature set of the ECG signal by sequentially using a ReliefF algorithm and a multicollinearity analysis algorithm, to obtain an optimal feature subset Dlff(1), f(2), f(I I)} of the ECG signal, where the optimal feature subset is a subset of the feature set.

[0058] Step 105: Train a random forest model by using an optimal feature subset of each ECG signal in the sample set and a heart state corresponding to the ECG signal as a training set, to obtain a weight model. Specifically, the training samples are input into the random forest model, and an initial value is set for a parameter of the random forest model; and L optimal decision trees are selected in a manner of a learning curve, and sampling and training are performed by using a Bootstarp algorithm.

[0059] Step 106: Obtain weight values of features in the optimal feature subset of each ECG signal based on the weight model. After the training of the random forest model is completed, direct analysis may be performed by using feature importance, to directly output a contribution rate of each feature parameter to the model, namely, the weight value of the feature.

100601 Step 107: Determine the weight value of each feature in the optimal feature subset of each said ECG signal and the value of each feature in the optimal feature subset of each said ECG signal, the weighted sum of the two being the feature indicator formula, where the feature index formula is used to calculate a feature index. The index includes all information of the ECG signal, changes with a change of the state, and may be subsequently used for determining heart-related diseases. After the formula is obtained, when the ECG signal is subsequently processed, only corresponding FIRV parameters in the formula need to be extracted, and additional machine learning training is not needed provided that these BRV parameters are input into the formula.

[0061] The index formula is: [0062] Indict=a1 *meanHR+a2*LFn+a3*ED2+a4*ED1+a5*pNN50+a6*Pl+a7*FE+a8*Gl+a9* ED3+a10*CV+al l*Rpf, where al, a2... and all are weight coefficients of these 11 feature indexes in the model.

[0063] In an optional implementation, the performing heart rate variability analysis on the ECG signal, to obtain values of features in a feature set of the ECG signal specifically includes: 100641 performing signal preprocessing on the ECG signal, to obtain an ECG waveform of the ECG signal; 100651 obtaining an RR interval sequence RIIIrr( 1), rr(2), r(n1)1 of the ECG signal based on the ECG waveform of the ECG signal; 100661 removing the RR interval sequence of the ECG signal according to a 3a principle, to obtain a processed RR interval sequence, and removing abnormal values according to the 3oprincipl e; and 100671 obtaining the values of the features in the feature set of the ECG signal based on the processed RR interval sequence.

[0068] In an optional implementation, the obtaining the values of the features in the feature set of the ECG signal based on the processed RR interval sequence specifically includes: 100691 performing statistical analysis on the RR interval sequence RBA rr(1), rr(2), rr(n1)1 to obtain time domain indexes: meanHR, MEAN, SDNN, R_MSSD, T1NN, SDNNindex, pNN50, HRVindex, SDSD, and CV; [0070] performing Fourier transformation on the RR interval sequence RR frr(1), rr(2), rr(n1)} to obtain a frequency domain spectrogram, and calculating frequency domain indexes of the frequency domain spectrogram: very low frequency (VLF), low frequency (LF), high frequency (HF), normalized low-frequency power ratio (LEn), normalized high-frequency power ratio (ITTn), LEn/HFn, and total power (TP); and 100711 calculating nonlinear domain indexes (semi-minor axis of Poincare scatterplot (SDI), semi-major axis (SD2), ratio of semi-minor axis to semi-major axis (index), area (S) of Poincare scatterplot, vector length index (VLI), vector angle index (VAI), complex correlation measure (CCM), Guzik index (GI), Porta index (PI), Ehler index (El), distribution entropies (ED 1, ED2, ED3, and ED4) of four quadrants, positive feedback index (Rpf), negative feedback index (Rnf), total feedback index (Rtf), sample entropy (SE), approximate entropy (AE), and fuzzy entropy (FE)).

100721 The Poincare scatterplot is drawn based on the RR interval sequence RR{ rr(1), rr(2), rr(n1)), a scatter distribution diagram is drawn by using a variation (first-order difference) between two adjacent intervals in the RR interval sequence RR{rr(1), rr(2), rr(n1)}, and the scatterplot after this transformation can be divided into four quadrants. Therefore, the Poincare scatterplot graphically shows correlation between adjacent points in a time sequence.

[0073] Based on a waveform feature of the scatterplot, the semi-minor axis (SDI), the semi-major axis (SD2), the ratio of the semi-short axis to the semi-major axis (index), the area (S) of the Poincare scatterplot, the vector length index (VU), and the vector angle index (VAT) of the Poincare scatterplot are calculated.

[0074] Complex correlation measure (CCM): The complex correlation measure is a measure of temporal variability in a continuous three-point component window in the scatterplot. For scatterplot coordinates (RnRn+1, Rn+1Rn+2), a sliding window method is used, where a window size is 3, and a window step is 1. Rn is a current R peak, Rn+1 is a next adjacent R peak, and Rn+2 is a R peak followed by the next adjacent R peak. The CCM measure is formed by a result of all overlapped scatter sliding windows, and the formula is as follows: N-2 C" (N -2) CCM(T) = iiA(011 [0075] where T represents a delay amount for the Poincare scatterplot; N is a quantity of R peaks; C" indicates an area of a fitted ellipse of the scatterplot, namely, C" = r * SD1 * SD2. SDI is the semi-minor axis of the Poincare scatterplot, and SD2 is the semi-major axis of the Poincare scatterplot; and A(i) represents an area of a triangle formed by three consecutive points in an ith sliding window.

[0076] Guzik index (GI), Porta index (PI), and Ehler index (El) are described as follows: Eir(-11(D12 ) C) pj, * 100% C(1r) + C(P1) Eri(RRi_RRi+1,2 El = -i (Eril(RR; -ER;+02)4 100771 A distance between an 1th point and a contour line of the ith point in the scatterplot is 1/1, pknt-Rnt+it where D, = * C(P) represents a quantity of scatters above the contour line, and a distance between the scatter and the contour line is Di', C(l) represents a quantity of scatters below the contour line; and RR; represents an ith RR interval.

[0078] Positive feedback index (Apt), negative feedback index (RnO, and total feedback index (Rtf) are described as follows.

[0079] The positive feedback index represents a ratio of a quantity of scatters distributed in a first quadrant and a third quadrant to a total quantity of scatters. In other words, a change in the RR intervals changes in a same direction (where a heart rate continuously increases or decreases).

[0080] The negative feedback index represents a ratio of a quantity of scatters distributed in a second quadrant and a fourth quadrant to the total quantity of scatters. In other words, a change in the RR intervals changes in a reverse direction (where the heart rate alternately accelerates and decelerates).

100811 The total feedback index is defined as the positive feedback index/negative feedback index, and reflects an overall mechanism of a cardiac dynamic system.

100821 In the Poincare scatterplot, N concentric circles are made by using an origin point as a center, to form N annular subregions. These annular subregions are divided based on the four quadrants, in other words, each quadrant is divided into N sector subregions. Then, a quantity of 1=1 scatters in each sector subregion obtained through division is counted, and a ratio of the quantity of scatters to the total quantity of scatters in a total scatter region is further calculated. These ratios are denoted as pl, p2, ..., and pN respectively to approximatively describe a distribution probability of the scatters in each region.

100831 Distribution entropies of the four quadrants (a distribution entropy EDI of the first quadrant, a distribution entropy ED2 of the second quadrant, a distribution entropy of the third quadrant ED3, and a distribution entropy of the fourth quadrant ED4) are defined as: 100841 ED = - Nog (p1). When calculating the entropy of the distribution in the first quadrant EDI the right-hand side is substituted with the ratio value of the first quadrant, and other quadrants are similar [0085] The sample entropy (SE) is defined as: SampEn(m, r) = Um { In rAm111(r), N->co 1-13 * 100861 When the RR interval sequence RRIrr(1), rr(2), rr(n1)} sequence is a m-dimensional vector, RR(i) and RR(j) are any two values in the sequence, and d[RR(i), RR(j)] is a maximum value of distances between the two.

Br' (r) n um{d [RR(i), RA(j)] < n1 -m [0087] 13111(r) is a ratio of a quantity of d[RR(i), RR(j)] less than a threshold r to a total quantity of vectors; r(r>0) is the given threshold, numfd[RR(i), RR(j)]<r} is the quantity of d[RR(i), RR(j)] less than r; and nl-m is a total quantity of vectors.

B m (r) = 1 (r) ni -m 100881 Bm (r) is a mean of B;"(r). 100891 A' (r) = B m +1 (0 100901 (r) is a value obtained by increasing a dimension of the mean Bin (r) to m+I.

100911 When N is a finite value, the following formula may be used for estimation.

100921 SampEn(m, r, N) = ln[ Am (r) 100931 An approximate entropy (AE) is used to describe irregularity of a complex system 100941 When the RR interval sequence RR{n-(1), n-(2), . rr(n1)} sequence is an N-dimensional vector, m is defined as an integer of the vector, and r is a measure of "similarity".

[0095] M-dimensional vectors x(1), x(2), x(n-m+1) are reconstructed, where x(1)=[11-(1), rr(I+1), rr(I+m-1)].

[0096] For each i value, distances between a vector and rest vectors are calculated: 100971 d[X(i),X0)}=maxirr(i),rni+1), ,rr(i+m-1)1 where a range of i is 0-m-I [0098] Based on the given threshold r(r>0), for each i, a quantity of d[X(i), X(j)]<r is counted, and a ratio of the quantity to a total quantity N-m+1 of vectors is denoted as: CV." = numfd[X(1), X(j)] < r}/(N -m + 1) 1 N -m+1 (p 771 1nCr (r) N-m+ 1 i=1 [0099] where yor(i) is a mean of all i. A logarithm of Cij." (i) is taken, and a mean of the logarithm over all i is obtained.

[0100] The approximate entropy of the sequence is: lim 7,, AE= (r) -(Pm±l(r)] N 00 101011 (pin' (r) is a value obtained by increasing a dimension of (pm-0) to m+1.

101021 The fuzzy entropy (FE) has similar physical meaning of the approximate entropy and the sample entropy, and the fuzzy entropy measures a probability generated in a new pattern. A larger measure value indicates a higher probability generated in the new pattern, in other words, higher complexity of the sequence.

101031 The RR interval sequence RR{ rr(1), rr(2), rr(n1)) sequence is the N-dimensional vector.

101041 m is used as a time window, the RR interval sequence is divided into k=N-m+1 sequences X(i)=[rr(i), rr(i+1), rr(i+m-1)].

101051 A distance between two time sequences X(i) and X(j) is defined as: drj = d[X(i), X(j)] = r(-1x 1) I rr (1 + k) -rr(j + k) I ke (i)1a [0106] A fuzzy membership function is introduced to help the fuzzy function calculate a similarity between the time sequences X(i) and X(j): (dr)n Dr" = arp[ r' ] 101071 where r is a similar tolerance, i, j=1, 2, . N-m+1, and i=j. [0108] if a function is defined as: 1 N-+1 Dr)) N -m 1=1,1 [0109] the fuzzy entropy FE=NllI Jim [bur (r) - (r)].

[0110] In an optional implementation, the processing the values of the features in the feature set of the ECG signal by sequentially using a ReliefF algorithm and a multicollinearity analysis algorithm, to obtain an optimal feature subset of the ECG signal includes: [0111] inputting the values of the features in the feature set of the ECG signal into a ReliefF analysis model, to obtain selection weights of the features in the feature set; [0112] deleting a feature whose selection weight in the feature set is less than a first set threshold (0.2), to obtain a first feature set; and [0113] processing the first feature set by using the multicollinearity analysis algorithm, to obtain the optimal feature subset.

[0114] In an optional implementation, the processing the first feature set by using the multicollinearity analysis algorithm, to obtain the optimal feature subset specifically includes: [0115] calculating mutual information between every two features in the first feature set; and [0116] deleting a feature with a small selection weight from the two features between which mutual information is greater than a second set threshold (0.8), to obtain the optimal feature subset. Because when the mutual information is greater than the second set threshold, it is considered that the two features have multicollinearity, where one feature with a large weight is reserved, and the other feature is deleted.

[0117] The essence of the random forest (Random Forest, RF) model is a bagging (bagging) integration algorithm. The bagging integration algorithm is to average prediction results of a base evaluator or to determine a result of an ensemble evaluator estimator according to a majority voting principle.

101181 Bagging method: A part of data is randomly selected from a dataset as a training set, so that each tree uses a different training set for training, and therefore a different classification decision tree is generated. This is the bagging method. In the bagging method, the different training set is formed each time by using a random sampling technology with put-back method. 101191 In an optional implementation, a specific method of the random forest model is as follows: [0120] Assuming that the RF model has a total of L decision trees, corresponding classification regression decision trees are constructed by using a classification and regression tree (CART) algorithm, and each decision tree is trained by the Bootstrap method. After the training is completed, a test set is input, each decision tree classifies and determines the test set, and a final output result depends on voting of a plurality of decision trees.

[0121] S0010: Input the training samples into the random forest model, set an initial value for a parameter of the model by using a Grid Search grid search method, and preset a training sample train and a quantity N of training times.

101221 S0011: Traverse the decision trees in the random forest from 1 to 200, draw a learning curve of the training samples, to finally obtain 200 learning rates, select L decision trees with highest learning rates from the 200 learning rates, and then sample training data by using the Bootstarp algorithm, to randomly generate L training sets, where x prediction samples selected from each training set are used for verification training.

101231 For the foregoing method, an embodiment of the present disclosure provides an ECG signal processing system, including: [0124] an obtaining module, configured to obtain a sample set, where the sample set includes a plurality of ECG signals; [0125] a heart rate variability analysis module, configured to perform heart rate variability analysis on any ECG signal in the sample set, to obtain values of features in a feature set of the ECG signal, where the feature set includes a time domain index, a frequency domain index and a non-linear domain index; 101261 a calibration and classification module, configured to perform calibration and classification on the ECG signal based on the values of the features in the feature set of the ECG signal, to obtain a heart state corresponding to the ECG signal; [0127] an optimal feature subset determining module, configured to process the values of the features in the feature set of the ECG signal by sequentially using a ReliefF algorithm and a multicollinearity analysis algorithm, to obtain an optimal feature subset of the ECG signal, where the optimal feature subset is a subset of the feature set; [0128] a training module, configured to train a random forest model by using an optimal feature subset of each ECG signal in the sample set and a heart state corresponding to the ECG signal as a training set, to obtain a weight model; [0129] a weight value determining module, configured to obtain weight values of features in the optimal feature subset of each ECG signal based on the weight model; and [0130] a feature index formula determining module, configured to determine the weight value of each feature in the optimal feature subset of each said ECG signal and the value of each feature in the optimal feature subset of each said ECG signal, the weighted sum of the two being the feature indicator formula, where the feature index formula is used to calculate a feature index. [0131] In an optional implementation, the heart rate variability analysis module specifically includes [0132] a preprocessing unit, configured to perform signal preprocessing on the ECG signal, to obtain an ECG waveform of the ECG signal; [0133] an RR interval sequence determining unit, configured to obtain an RR interval sequence of the ECG signal based on the ECG waveform of the ECG signal, [0134] a removing unit, configured to remove the RR interval sequence of the ECG signal according to a 3a principle, to obtain a processed RR interval sequence; and 101351 a feature value determining unit, configured to obtain the values of the features in the feature set of the ECG signal based on the processed RR interval sequence [0136] In an optional implementation, the optimal feature subset determining module specifically includes: 101371 a selection weight calculation unit, configured to input the values of the features in the feature set of the ECG signal into a ReliefF analysis model, to obtain selection weights of the features in the feature set; 101381 a first feature set determining unit, configured to delete a feature whose selection weight in the feature set is less than a first set threshold, to obtain a first feature set; and 101391 an optimal feature subset determining unit, configured to process the first feature set by using the multicollinearity analysis algorithm, to obtain the optimal feature subset.

[0140] In an optional implementation, the optimal feature subset determining unit specifically includes: [0141] a mutual information calculation subunit, configured to calculate mutual information between every two features in the first feature set; and [0142] an optimal feature subset determining subunit, configured to delete a feature with a small selection weight from the two features between which mutual information is greater than a second set threshold, to obtain the optimal feature subset.

[0143] Beneficial effects of the present disclosure are as follows: 101441 (1) The ECG signal is a common physiological signal. In the ECG waveform processing method in the present disclosure, the original waveform signal can be effectively preprocessed, to obtain high-quality ECG waveform data.

[0145] (2) The time domain indexes, the frequency domain indexes, and the nonlinear domain indexes are used in this method, so that the obtained feature indexes can reflect the real feature and meaning of the ECG waveform more effectively.

[0146] (3) A classification model is established in combination with machine learning, the feature index is calculated based on the weight value of the feature, and the index may be subsequently used as a basis for determining the heart-related diseases.

[0147] (4) In the present disclosure, the optimal subset is selected by introducing the Rel efF algorithm and the multicollinearity analysis method, to improve processing efficiency and precision, and a feature index indict obtained by weighting the features in the optimal feature subset based on the weights in the model is finally obtained.

[0148] The embodiments in this specification are described in a progressive manner.

Description of each of the embodiments focuses on differences from other embodiments, and reference may be made to each other for the same or similar parts among the embodiments. The system disclosed in the embodiments corresponds to the method disclosed in the embodiments, and therefore is described briefly. For related parts, refer to partial descriptions in the method for an associated part.

[0149] Although the principle and implementations of the present disclosure are described by using specific examples in this specification, the descriptions of the foregoing embodiments are merely intended to help understand the method and the core idea of the method of the present disclosure. In addition, a person of ordinary skill in the art may make modifications to the specific implementations and application range according to the idea of the present disclosure. In conclusion, the content of this specification shall not be construed as a limitation to the present disclosure.

Claims

WHAT IS CLAIMED IS: 1. An electrocardiographic (ECG) signal processing method, comprising: obtaining a sample set, wherein the sample set comprises a plurality of ECG signals; performing heart rate variability analysis on any ECG signal in the sample set, to obtain values of features in a feature set of the ECG signal, wherein the feature set comprises a time domain index, a frequency domain index, and a nonlinear domain index; performing calibration and classification on the ECG signal based on the values of the features in the feature set of the ECG signal, to obtain a heart state corresponding to the ECG signal; processing the values of the features in the feature set of the ECG signal by sequentially using a ReliefF algorithm and a multicollinearity analysis algorithm, to obtain an optimal feature subset of the ECG signal, wherein the optimal feature subset is a subset of the feature set; training a random forest model by using an optimal feature subset of each ECG signal in the sample set and a heart state corresponding to the ECG signal as a training set, to obtain a weight model; obtaining weight values of features in the optimal feature subset of each ECG signal based on the weight model; and determining the weight value of each feature in the optimal feature subset of each said ECG signal and the value of each feature in the optimal feature subset of each said ECG signal, the weighted sum of the two being the feature indicator formula, wherein the feature index formula is used to calculate a feature index.
2. The ECG signal processing method according to claim 1, wherein the performing heart rate variability analysis on the ECG signal, to obtain values of features in a feature set of the ECG signal specifically comprises: performing signal preprocessing on the ECG signal, to obtain an ECG waveform of the ECG signal; obtaining an RR interval sequence of the ECG signal based on the ECG waveform of the ECG signal; removing the RR interval sequence of the ECG signal according to a 3c5 principle, to obtain a processed RR interval sequence; and obtaining the values of the features in the feature set of the ECG signal based on the processed RR interval sequence.
3. The ECG signal processing method according to claim 1, wherein the processing the values of the features in the feature set of the ECG signal by sequentially using a ReliefF algorithm and a multicollinearity analysis algorithm, to obtain an optimal feature subset of the ECG signal comprises: inputting the values of the features in the feature set of the ECG signal into a ReliefF analysis model, to obtain selection weights of the features in the feature set; deleting a feature whose selection weight in the feature set is less than a first set threshold, to obtain a first feature set; and processing the first feature set by using the multicollinearity analysis algorithm, to obtain the optimal feature subset.
4. The ECG signal processing method according to claim 3, wherein the processing the first feature set by using the multicollinearity analysis algorithm, to obtain the optimal feature subset specifically comprises: calculating mutual information between every two features in the first feature set; and deleting a feature with a small selection weight from the two features between which mutual information is greater than a second set threshold, to obtain the optimal feature subset.
5. An ECG signal processing system, comprising: an obtaining module, configured to obtain a sample set, wherein the sample set comprises a plurality of ECG signals; a heart rate variability analysis module, configured to perform heart rate variability analysis on any ECG signal in the sample set, to obtain values of features in a feature set of the ECG signal, wherein the feature set comprises a time domain index, a frequency domain index, and a nonlinear domain index; a calibration and classification module, configured to perform calibration and classification on the ECG signal based on the values of the features in the feature set of the ECG signal, to obtain a heart state corresponding to the ECG signal; an optimal feature subset determining module, configured to process the values of the features in the feature set of the ECG signal by sequentially using a ReliefF algorithm and a multicollinearity analysis algorithm, to obtain an optimal feature subset of the ECG signal, wherein the optimal feature subset is a subset of the feature set; a training module, configured to train a random forest model by using an optimal feature subset of each ECG signal in the sample set and a heart state corresponding to the ECG signal as a training set, to obtain a weight model; a weight value determining module, configured to obtain weight values of features in the optimal feature subset of each ECG signal based on the weight model; and a feature index formula determining module, configured to determine the weight value of each feature in the optimal feature subset of each said ECG signal and the value of each feature in the optimal feature subset of each said ECG signal, the weighted sum of the two being the feature indicator formula, wherein the feature index formula is used to calculate a feature index.
6. The ECG signal processing system according to claim 5, wherein the heart rate variability analysis module specifically comprises: a preprocessing unit, configured to perform signal preprocessing on the ECG signal, to obtain an ECG waveform of the ECG signal; an RR interval sequence determining unit, configured to obtain an RR interval sequence of the ECG signal based on the ECG waveform of the ECG signal; a removing unit, configured to remove the RR interval sequence of the ECG signal according to a 3a principle, to obtain a processed RR interval sequence; and a feature value determining unit, configured to obtain the values of the features in the feature set of the ECG signal based on the processed RR interval sequence.
7. The ECG signal processing system according to claim 5, wherein the optimal feature subset determining module specifically comprises: a selection weight calculation unit, configured to input the values of the features in the feature set of the ECG signal into a ReliefF analysis model, to obtain selection weights of the features in the feature set; a first feature set determining unit, configured to delete a feature whose selection weight in the feature set is less than a first set threshold, to obtain a first feature set; and an optimal feature subset determining unit, configured to process the first feature set by using the multicollinearity analysis algorithm, to obtain the optimal feature subset.
8. The ECG signal processing system according to claim 7, wherein the optimal feature subset determining unit specifically comprises: a mutual information calculation subunit, configured to calculate mutual information between every two features in the first feature set; and an optimal feature subset determining subunit, configured to delete a feature with a small selection weight from the two features between which mutual information is greater than a second set threshold, to obtain the optimal feature subset.