Disclosure of Invention
The technical problem to be solved by the present application is to provide a method and an apparatus for obtaining a PPG signal clustering center, and a method and an apparatus for processing a PPG signal.
The technical scheme adopted by the application for solving the technical problem is as follows: a PPG signal clustering center acquisition method is constructed, and the method comprises the following steps:
s11, obtaining a sample PPG signal with a preset length, preprocessing the sample PPG signal to obtain a plurality of first pulses, and obtaining the first pulses meeting a first preset condition to perform direct current removal to obtain second pulses, wherein the PPG signal is a photoplethysmography signal;
s12, respectively carrying out fast Fourier transform on the second pulse to obtain a frequency spectrum corresponding to the second pulse, obtaining a frequency component meeting a second preset condition in the frequency spectrum, obtaining a characteristic parameter corresponding to the second pulse according to the frequency component, and normalizing the characteristic parameter to obtain a third pulse;
s13, clustering the third pulse for multiple times based on a preset clustering algorithm, finishing clustering when the clustering process or the clustering result meets a third preset condition to obtain a preset number of initial clustering clusters, and acquiring an initial clustering center of each initial clustering cluster;
s14, constructing an MAE similarity matrix of any two initial clustering centers by taking the MAE values of the normalization parameters corresponding to the two initial clustering centers as matrix elements, wherein the MAE values are average absolute error values corresponding to the two initial clustering centers, the average absolute error values are average values of all absolute parameter difference values in the two initial clustering centers, the absolute parameter difference values are difference absolute values of two same normalization parameters in the two initial clustering centers, the matrix elements of each row vector in the MAE similarity matrix are the MAE values of the same selected initial clustering center and all initial clustering centers, and the selected initial clustering centers corresponding to different row vectors are different;
s15, acquiring a first MAE value of each row vector in the MAE similarity matrix, and judging whether a second MAE value which is smaller than a first preset value and corresponds to the same initial clustering center exists in the first MAE values; if yes, go to step S16, otherwise go to step S17;
s16, combining the initial cluster corresponding to the initial cluster center corresponding to the second MAE value into a new initial cluster, taking the mean pulse of the initial cluster center corresponding to the second MAE value as the initial cluster center of the new initial cluster, and executing the step S14;
and S17, taking the initial cluster center as a target cluster center of the PPG signal.
Preferably, in the PPG signal cluster center obtaining method of the present application, in step S11, the preprocessing the sample PPG signal to obtain a number of first pulses includes:
s111, filtering high-frequency noise in the sample PPG signal based on smoothing filtering to obtain a first PPG signal;
s112, acquiring a minimum value point of the first PPG signal, and removing baseline drift of the first PPG signal to obtain a second PPG signal;
and S113, performing pulse segmentation on the second PPG signal to obtain an initial pulse, and performing amplitude normalization on the initial pulse to obtain the first pulse.
Preferably, in the PPG signal cluster center obtaining method of the present application, in step S11, the first preset condition includes:
the number of peaks is less than or equal to 2;
when the number of the peak values is more than 2, the number of the peak values with the peak amplitude larger than a second preset value is less than or equal to 3;
the pulse width is within a first preset range;
and the peak-to-peak interval width between any two adjacent large peaks is less than or equal to a third preset value, wherein the peak amplitude of the large peak is greater than the second preset value.
Preferably, in the method for obtaining a PPG signal cluster center of the present application, the second preset value is less than or equal to 0.2;
the first preset range is obtained by a difference value based on a sampling frequency and an original signal frequency; and/or
The third preset value is greater than or equal to 60.
Preferably, in the PPG signal cluster center obtaining method of the present application, in step S11, the obtaining is performed to dc-remove a first pulse that satisfies a first preset condition to obtain a second pulse: acquiring the mean value of all first pulses meeting a first preset condition, and subtracting the mean value from each first pulse to obtain a second pulse; and/or
In the step S15, the first preset value ranges from 0.5 to 0.7.
Preferably, in the PPG signal cluster center obtaining method of the present application, in step S12, the obtaining frequency components in the frequency spectrum that satisfy a second preset condition includes:
and respectively extracting a frequency parameter, an amplitude parameter, a phase parameter, a real part constant and an imaginary part constant from the frequency components of the frequency spectrum, and acquiring the frequency components of which the amplitudes are greater than a fourth preset value as the frequency components meeting the second preset condition.
Preferably, in the PPG signal cluster center obtaining method of the present application, the fourth preset value is less than or equal to 0.1.
Preferably, in the PPG signal cluster center acquisition method of the present application, in step S12,
the obtaining of the characteristic parameter corresponding to the second pulse according to the frequency component includes:
respectively extracting the frequency parameter, the amplitude parameter, the phase parameter, the real part constant and the imaginary part constant of the second pulse as the characteristic parameters of the second pulse; and/or
The normalizing the characteristic parameter to obtain a third pulse comprises:
the characteristic parameters were normalized by the Z-Score normalization method.
Preferably, in the PPG signal cluster center acquisition method of the present application, in step S13,
the preset clustering algorithm comprises a K _ Means clustering method;
the clustering process or the clustering result meeting a third preset condition comprises the following steps:
the clustering repetition number is greater than a fifth preset value, or
And the average absolute difference value of the current clustering center and the calculation clustering center is smaller than a sixth preset value, wherein the calculation clustering center is the average pulse of the clustering cluster corresponding to the current clustering center.
Preferably, in the PPG signal cluster center obtaining method of the present application, in step S13, the fifth preset value is less than or equal to 20, and the sixth preset value is less than or equal to 1 × 10-8。
Preferably, in the PPG signal cluster center obtaining method of the present application, in step S13, the obtaining an initial cluster center of each initial cluster includes:
and acquiring the mean pulse of the initial clustering cluster as an initial clustering center corresponding to the initial clustering cluster.
Preferably, in the PPG signal cluster center acquisition method of the present application, further including:
s18, acquiring the Pearson correlation coefficient of each target cluster center and the corresponding cluster, acquiring the cluster clusters with the largest correlation coefficient and the preset number, and acquiring the clustering threshold value of the target cluster center with the smallest correlation coefficient in the cluster clusters.
Preferably, in the PPG signal cluster center obtaining method of the present application, the preset number is 90% of the total number of cluster clusters corresponding to the target cluster center.
The application also constructs a PPG signal processing method, comprising the steps of:
s21, acquiring an original PPG signal in real time, preprocessing the original PPG signal to obtain a fourth pulse, and acquiring the fourth pulse meeting a first preset condition and performing direct current removal to obtain a fifth pulse, wherein the PPG signal is a photoplethysmography signal;
s22, performing fast Fourier transform on the fifth pulse respectively to obtain a frequency spectrum corresponding to the fifth pulse, obtaining a frequency component meeting a second preset condition in the frequency spectrum, obtaining a characteristic parameter corresponding to the fifth pulse according to the frequency component, and normalizing the characteristic parameter to obtain a sixth pulse;
and S23, acquiring a cluster center corresponding to the sixth pulse according to the normalization parameter of the sixth pulse and a preset cluster center, so as to acquire information corresponding to the sixth pulse according to the cluster center, wherein the preset cluster center is acquired by any one of the above PPG signal cluster center acquisition methods.
Preferably, in the PPG signal processing method of the present application, further comprising the steps of:
s24, acquiring a Pearson correlation coefficient of the sixth pulse and a corresponding clustering center thereof, judging whether the Pearson correlation coefficient is larger than or equal to a clustering threshold of the preset clustering center, if so, executing the step S23, otherwise, executing the step S25;
and S25, judging the original PPG signal as noise and ignoring the noise.
This application still constructs a PPG signal clustering center acquisition device, includes:
the pulse detection device comprises a first pulse extraction unit, a second pulse extraction unit and a pulse detection unit, wherein the first pulse extraction unit is used for acquiring a sample PPG signal with a preset length, and preprocessing the sample PPG signal to obtain a plurality of first pulses, and the PPG signal is a photoplethysmography waveform signal;
the second pulse extraction unit is used for obtaining a first pulse meeting a first preset condition and carrying out direct current removal to obtain a second pulse;
a third pulse extraction unit, configured to perform fast fourier transform on the second pulse to obtain a frequency spectrum corresponding to the second pulse, obtain a frequency component that meets a second preset condition in the frequency spectrum, obtain a characteristic parameter corresponding to the second pulse according to the frequency component, and normalize the characteristic parameter to obtain a third pulse;
an initial clustering center obtaining unit, configured to perform multiple clustering on the third pulse based on a preset clustering algorithm and end clustering when a clustering process or a clustering result meets a third preset condition, so as to obtain a preset number of initial clustering clusters, and obtain an initial clustering center of each initial clustering cluster;
the MAE similarity matrix obtaining unit is used for constructing an MAE similarity matrix of any two initial clustering centers by taking the MAE values of the normalization parameters corresponding to the two initial clustering centers as matrix elements, wherein the MAE values are average absolute error values corresponding to the two initial clustering centers, the average absolute error values are average values of all absolute parameter difference values in the two initial clustering centers, the absolute parameter difference values are difference absolute values of two same normalization parameters in the two initial clustering centers, and the matrix elements of each row vector in the MAE similarity matrix are the MAE values of the same selected initial clustering center and all initial clustering centers and the selected initial clustering centers corresponding to different row vectors are different;
a determining unit, configured to obtain a first MAE value of each row vector in the MAE similarity matrix, where a second minimum value in each row vector is the row vector, and determine whether there is a second MAE value that is smaller than a first preset value and corresponds to the same initial clustering center in the first MAE value, if yes, outputting a positive result, and otherwise, outputting a negative result;
a merging unit, configured to merge an initial cluster corresponding to the initial cluster center corresponding to the second MAE value into a new initial cluster when the determining unit outputs a positive result, and use a mean pulse of the initial cluster center corresponding to the second MAE value as an initial cluster center of the new initial cluster;
and the target clustering center acquisition unit is used for taking the initial clustering center as the target clustering center of the PPG signal when the judgment unit outputs a negative result.
The present application also constructs a PPG signal processing device, comprising:
the fourth pulse acquisition unit is used for acquiring an original PPG signal in real time, preprocessing the original PPG signal to obtain a fourth pulse, wherein the PPG signal is a photoplethysmography signal;
the fifth pulse acquisition unit is used for acquiring a fourth pulse meeting a first preset condition and carrying out direct current removal to obtain a fifth pulse;
a sixth pulse obtaining unit, configured to perform fast fourier transform on the fifth pulse to obtain a frequency spectrum corresponding to the fifth pulse, obtain a frequency component that meets a second preset condition in the frequency spectrum, obtain a characteristic parameter corresponding to the fifth pulse according to the frequency component, and normalize the characteristic parameter to obtain a sixth pulse;
and the execution unit is used for acquiring the clustering center corresponding to the sixth pulse according to the normalization parameter of the sixth pulse and a preset clustering center so as to acquire information corresponding to the sixth pulse according to the clustering center, wherein the preset clustering center is acquired by the PPG signal clustering center acquisition device.
The implementation of the method and the device for acquiring the PPG signal clustering center and the method and the device for processing the PPG signal have the following beneficial effects: the accuracy and the processing speed of PPG signal processing can be effectively improved.
Detailed Description
For a more clear understanding of the technical features, objects, and effects of the present application, specific embodiments thereof will now be described in detail with reference to the accompanying drawings.
As shown in fig. 1, in a first embodiment of a method for obtaining a PPG signal cluster center of the present application, the method includes the steps of: s11, obtaining a sample PPG signal with a preset length, preprocessing the sample PPG signal to obtain a plurality of first pulses, and obtaining the first pulses meeting a first preset condition to perform direct current removal to obtain second pulses, wherein the PPG signal is a photoplethysmography signal; in particular, the sample PPG signal is sufficiently numerous that it can truly reflect the various scenes represented by the PPG signal. The PPG signal is preprocessed with the aim of extracting the first pulse, which is the pulse that represents the true information of the PPG signal. And the obtained first pulse is subjected to direct current removing processing to remove a zero frequency signal, so that interference caused by the zero frequency signal in the data processing process is avoided, and finally the second pulse is obtained. In an embodiment, the sample PPG signal may be a raw PPG signal that is initially acquired, sufficiently numerous, or sufficiently long in time during the data acquisition process.
S12, respectively carrying out fast Fourier transform on the second pulse to obtain a frequency spectrum corresponding to the second pulse, obtaining a frequency component meeting a second preset condition in the frequency spectrum, obtaining a characteristic parameter corresponding to the second pulse according to the frequency component, and normalizing the characteristic parameter to obtain a third pulse; specifically, the fast fourier transform is performed on the nth pulse in the second pulses, and the fast fourier transform formula is as follows:
,
as a post-fast Fourier transform sequence
The m-th frequency component of (a),
wherein
Is the value of the real part,
is an imaginary numerical value. Frequency components in which the requirements are satisfied are obtained based on the transformation result.
S13, clustering the third pulse for multiple times based on a preset clustering algorithm, finishing clustering when a clustering process or a clustering result meets a third preset condition to obtain a preset number of initial clustering clusters, and acquiring an initial clustering center of each initial clustering cluster; specifically, the obtained third pulses are clustered, and the clustering process can be based on the preset number of clustering clusters. And updating the clustering center for clustering again based on each clustering result, thereby realizing multiple iterations. Judging whether clustering needs to be finished or not based on the iterative process or the clustering result in the iterative process, and defining the obtained cluster as an initial cluster when clustering is finished.
S14, constructing an MAE similarity matrix of any two initial clustering centers by taking the MAE values of the normalization parameters corresponding to the two initial clustering centers as matrix elements, wherein the MAE values are average absolute error values corresponding to the two initial clustering centers, the average absolute error values are average values of all absolute parameter difference values in the two initial clustering centers, the absolute parameter difference values are difference absolute values of two same normalization parameters in the two initial clustering centers, the matrix elements of each row vector in the MAE similarity matrix are the MAE values of the same selected initial clustering center and all initial clustering centers, and the selected initial clustering centers corresponding to different row vectors are different; specifically, the Mean Absolute Error (MAE) of each cluster center with all other cluster centers is calculated, and the specific expression thereof may be,
. Wherein, in order
J-th feature representing c-th initial cluster center, wherein
,
Is shown as
J-th feature of initial cluster center, wherein
. Constructing an MAE similarity matrix according to the obtained MAE value, wherein the specific form of the MAE similarity matrix can be as follows:
wherein the content of the first and second substances,
,
the distance from itself for each cluster.
S15, acquiring a first MAE value of each row vector in the MAE similarity matrix, and judging whether a second MAE value which is smaller than a first preset value and corresponds to the same initial clustering center exists in the first MAE values; if yes, go to step S16, otherwise go to step S17; and S16, combining the initial cluster corresponding to the initial cluster center corresponding to the second MAE value into a new initial cluster, taking the mean pulse of the initial cluster center corresponding to the second MAE value as the initial cluster center of the new initial cluster, and executing the steps S14, S17 and taking the initial cluster center as the target cluster center of the PPG signal. Specifically, the distance between the c-th cluster center and all k cluster centers can be defined according to the MAE similarity matrix obtained above
To, for
The second smallest value of
Make a judgment when
The second smallest value of
And is less than the first preset value E, because
Both are actually smaller than the first preset value E. A simple understanding of this may be to obtain in one clusterAnd combining the initial clustering clusters corresponding to the two clustering centers when the minimum distance between the center and all other clustering centers (not including the center) is the distance between the two clustering centers and the distance is smaller than a first preset value E. That is to say, the first
Merging class with class II while merging class II
Class and the first
And averaging the clustering centers of the classes, wherein the average value of the clustering centers is the clustering center of the new clustering cluster. The same operation is performed on all the cluster centers satisfying the above requirements, and the cluster centers are subjected to the same operation
And when the groups of initial clustering centers meet the requirements, respectively calculating the average number of each group of initial clustering centers to obtain a new initial clustering center. And (4) executing the step (S14) and the subsequent steps again on the initial cluster and the initial cluster center obtained after merging until the initial cluster center which does not meet the requirements is obtained, and stopping merging. Generally, merging can be understood as merging from k clusters into
And (4) clustering. Then, the final initial cluster center is the target cluster center of the PPG signal. Nor does it exclude that, at the extreme, there is initially no initial cluster center that can be merged. Its corresponding initial cluster center may also serve as the target cluster center for the PPG signal. In one embodiment, the first preset value E ranges from 0.5 to 0.7.
Optionally, as shown in fig. 2, in step S11, the specific process of preprocessing the sample PPG signal to obtain a plurality of first pulses may include steps of: s111, filtering the samples based on smooth filteringHigh-frequency noise in the PPG signal is used for obtaining a first PPG signal; s112, acquiring a minimum value point of the first PPG signal, and removing baseline drift of the first PPG signal to obtain a second PPG signal; and S113, performing pulse segmentation on the second PPG signal to obtain an initial pulse, and performing amplitude normalization on the initial pulse to obtain the first pulse. Specifically, a Savitzky-Golay filter with a window size of 5 may be used to filter high-frequency noise in the PPG signal to obtain a first PPG signal. The filter performs polynomial fitting on the data in the window, and can ensure that the shape and the width of the signal are kept unchanged under the condition of filtering noise. It can be understood that the filter can be selected according to the actual signal processing process, and the window size setting can also be properly adjusted according to the actual signal processing process. For the filtered first PPG signal
Wherein L is a preset length, and the corresponding minimum value is detected according to the minimum value, and the point corresponding to the minimum value
It can be represented by the following expression:
wherein
Is the abscissa of the nth minimum point,
the ordinate of the nth minimum point. In order to be able to segment the cycle correctly, the effect of baseline drift needs to be removed from the first PPG signal. Wherein the minimum point can be passed
Polynomial fitting to obtain a first PPG signalBase line
Then the second PPG signal obtained after removing the baseline drift can be expressed as
Wherein the second PPG signal may employ the following expression
And (4) showing. And can obtain minimum value points
Removing the second PPG signal after baseline drift
The initial pulse is obtained by dividing the pulse signals into N-1 PPG pulse signals, and the specific expression can be as follows:
wherein the nth PPG pulse signal is expressed as
,
Is the pulse width. Normalizing the amplitude range of each pulse of the obtained initial pulse signal to be 0-1, and performing maximum and minimum normalization on the first point of the nth PPG pulse by using a maximum and minimum normalization method
Normalized is expressed as
Wherein
Is the minimum value of the nth PPG pulse and is the nth PPThe maximum value of the G-pulse is,
for the normalized value, the normalized nth PPG pulse is expressed as
Corresponding to the resulting first pulse.
Optionally, in the step S11, the first preset condition includes: the number of peaks is less than or equal to 2; when the number of the peak values is more than 2, the number of the peak values with the peak amplitude larger than a second preset value is less than or equal to 3; the pulse width is within a first preset range; and the peak-to-peak interval width between any two adjacent large peaks is less than or equal to a third preset value, wherein the peak amplitude of the large peak is greater than the second preset value. Specifically, the process of acquiring the second pulse is to eliminate pulses which do not meet the requirement in the first pulse, so as to obtain a high-quality PPG pulse signal. The method comprises the steps of obtaining the number of peak values or the number of first pulses less than or equal to 2, judging the amplitude of the peak values when the number of the peak values is greater than 2, obtaining the number of the peak values, wherein the amplitude of the peak values is greater than a second preset value, wherein the number is less than or equal to 3, and rejecting the number if the number exceeds three. The purpose of this process is to reject pulses where the main peak is not significant, since it can basically be determined that the pulse is noise if the peaks are relatively close together in the pulse. The second preset value may be set to be less than or equal to 0.2, i.e., the difference in noise is set to represent less than 20% of the main peak. Once more peaks exceed 20% of the main peak, the peaks are directly rejected. And simultaneously rejecting pulses with too large or too small peak width. I.e. pulses having a width within a first predetermined range are acquired. The first preset range is obtained through a difference value based on sampling frequency and original signal frequency, in a specific embodiment, the frequency of human heartbeats corresponding to the PPG signal is 0.8 Hz-4 Hz, a sampling frequency of 125Hz is set in actual sampling, the sampling frequency can be converted into the number of sampling points of 31.25-156.25 (125/1/4-125/1/0.8) based on the heart rate, and 30-150 PPG pulses which are normal are carried out. I.e. pulses with pulse widths smaller than 30 or larger than 150 are deleted. In addition, when the number of peak values with peak amplitude larger than the second preset value is equal to 2 or 3, the peak value with peak amplitude larger than the second preset value is defined as a large peak, and when the distance between two adjacent large peaks is larger than a third preset value, the pulse is rejected. In one embodiment, the third preset value is set to be greater than or equal to 60. The pulse width is identified by the number of sampling points.
Optionally, in the step S11, the obtaining of the first pulse that satisfies the first preset condition is dc-removed to obtain a second pulse: acquiring the mean value of all first pulses meeting a first preset condition, and subtracting the mean value from each first pulse to obtain a second pulse; the process of removing the dc component of the acquired first pulses meeting the requirement may be performed by averaging all the acquired first pulses, and subtracting the average from each first pulse meeting the requirement, so as to obtain a final second pulse.
Optionally, in step S12, the acquiring frequency components in the frequency spectrum that satisfy a second preset condition includes: and respectively extracting a frequency parameter, an amplitude parameter, a phase parameter, a real part constant and an imaginary part constant from the frequency components of the frequency spectrum, and acquiring the frequency components of which the amplitudes are greater than a fourth preset value as the frequency components meeting the second preset condition. That is, spectral features of each frequency component are extracted based on the obtained transformation result, and the mth frequency component is subjected to
Extracting characteristic frequency parameters
(frequency), amplitude parameter
(Amplitude), phase parameter
(, phase), real part constant
(Real value after Fourier transform) and imaginary constant
(the imaginary Value after Fourier transform, Image Value). The calculation processes are respectively as follows:
Based on the obtained amplitude parameter, the frequency component in which the amplitude parameter is larger, that is, larger than a fourth preset value, is obtained as the frequency component satisfying the requirement, wherein the fourth preset value may be smaller than or equal to 0.1. In one embodiment, as shown in FIG. 3, when the second pulse is a single pulse (i.e., only one peak with a larger peak value) and Fourier transformed, as shown in FIG. 4, the real part constant of the fifth frequency component
And imaginary constants
Approaching zero results in an amplitude a value of approximately zero, so the first four frequency components are used in the spectrum and the dc component (zero frequency) is insignificant, so the frequency components in this embodiment are
In the range of
。
Optionally, in step S12, the obtaining the characteristic parameter corresponding to the second pulse according to the frequency component includes: respectively extracting the frequency parameter, the amplitude parameter, the phase parameter, the real part constant and the imaginary part constant of the second pulse as the characteristic parameters of the second pulse; specifically, on the basis of the above, the corresponding characteristic parameters such as the frequency parameter, the amplitude parameter, the phase parameter, the real part constant and the imaginary part constant are extracted based on the obtained frequency components, and the characteristic parameters are combined to obtain the characteristic parameters of the second pulse, wherein the series of characteristic parameters can represent data information carried by the second pulse. In one embodiment, when deriving 4 frequency components is performed on the derived frequency components since each frequency component has 5 features, the feature of one pulse is defined as:
wherein the feature dimension can be obtained
. Definition of
Represents the jth characteristic of the ith second pulse, wherein
And the number of the second pulses is N.
Optionally, in step S12, the normalizing the characteristic parameter to obtain a third pulse includes: the characteristic parameters were normalized by the Z-Score normalization method. Since the value space of each frequency component is different and the distribution difference is large, in one embodiment, the above parameters are respectively normalized by using a Z-Score normalization method.
Wherein the content of the first and second substances,
is the average of the jth characteristic parameter of all second pulses,
the standard deviation of the jth feature of all second pulses,
the normalized value of the jth feature of the ith second pulse can be finally passed
To characterize the third pulse.
Optionally, in the PPG signal cluster center obtaining method of the present application, in step S13, the preset clustering algorithm includes a K _ Means clustering method; the clustering process or the clustering result meeting a third preset condition comprises the following steps: and the clustering repetition times are greater than a fifth preset value, or the average absolute difference value of the current clustering center and a calculation clustering center is less than a sixth preset value, wherein the calculation clustering center is the mean pulse of the clustering cluster corresponding to the current clustering center. Specifically, a currently common K _ Means clustering method may be sampled, and K sample pulses are randomly selected from the third pulse to serve as a clustering center, where the K value may be increased as much as possible, in order to separate more aggregated clusters. In a general clustering problem, a proper k value needs to be selected, because an excessively large k value can result in the same clusters being separated (the inter-cluster distance is small), and an excessively small k value can result in some clusters not being separated (the intra-cluster distance is large). In the present application, a k value as large as possible is first selected. The value of K is chosen for clustering purposes and can be set according to the number of dominant morphologies of the PPG, e.g. a common PPG comprises a dozen different morphologies, which can set K based on 1.5 or 2 times the number of morphologies, e.g. setting the initial K to 30-60, or even more. Sequentially calculating the distance from each sample pulse to each clustering center, and dividing the distance into clusters closest to the clustering centers, wherein the distance formula is as follows:
wherein the content of the first and second substances,
is as follows
The number of the center pulses of each cluster is,
for the ith third pulse to the ith
Distance of the central pulse is clustered, wherein
Is 1 to k, and is obtained after finishing one-time clustering
More recent
I.e. finally obtaining the cluster center
And (4) corresponding clustering clusters. Calculating the mean pulse of the sample pulses in each cluster toThis serves as the next cluster center. The jth characteristic parameter in the mean pulse is the mean value of the jth characteristic values of all pulses in the cluster. Obtained N
FThe average value of the characteristic values forms a new pulse, namely the average value pulse. And clustering again by taking the cluster center as a new cluster center, repeating the clustering until the maximum iteration times Max is reached or the average absolute difference value between the current cluster center and the calculated cluster center is less than a sixth preset value MinD, wherein the clustering can be stopped when the distance between the cluster center adopted by the current cluster and the corresponding cluster center obtained after clustering based on the cluster center is reduced to a certain extent. The simplest understanding is that the distance between the centers of two closest clusters obtained before and after clustering is small to a certain extent, which means that the cluster center does not change greatly any more. The difference between the current clustering center and the computing clustering center is summed, and then the average absolute difference is obtained by averaging. In an embodiment, the maximum number of iterations max is set to be less than or equal to 20, and the mean absolute difference is set to be less than or equal to 1 x 10
-8And (4) no iteration is performed, and the obtained cluster is an initial cluster.
Optionally, in step S13, the obtaining an initial cluster center of each initial cluster includes: and acquiring the mean pulse of the initial clustering cluster as an initial clustering center corresponding to the initial clustering cluster. Specifically, the mean pulse calculation is performed on the initial cluster, and the mean pulse calculation process is to average the pulses obtained by combining the final average feature values for each feature value, and the mean pulse is used as the initial cluster center corresponding to the initial cluster. And obtaining the class of each pulse, recording the characteristic value of each pulse after normalization by using a Z-Score method, and respectively obtaining the average value of k cluster characteristics, wherein the average value of the characteristics forms the clustering center of the last clustering iteration.
Optionally, in the method for obtaining a PPG signal cluster center described in this application, the method further includes the steps of: s18, acquiring the Pearson correlation coefficient of each target cluster center and the corresponding cluster, acquiring the cluster with the maximum correlation coefficient and the preset number, and acquiringThe minimum correlation coefficient in the cluster is the clustering threshold of the target cluster center. Specifically, all the third pulses may be re-clustered based on the obtained target clustering center, and the clustering process may still use a K-means clustering method to sequentially cluster the third pulses with the third pulses
Calculating the distance of the cluster center of each cluster, and determining whether a certain pulse is in a cluster
Is smallest, it is reclassified to a cluster
. Calculating the Pearson correlation coefficient of the third pulse and the corresponding target cluster center for each cluster, arranging the correlation coefficients from large to small, and taking the correlation coefficient of which the number position of the pulse in the cluster is close to the top 90% as a threshold value
Pulses that are classified in the closest cluster because they do not have a cluster center close to them can be filtered by this process. For rejecting some discrete pulse signals.
After the target clustering centers are obtained based on the process, each target clustering center is numbered, the corresponding clustering cluster which is equivalent to each target clustering center has a special number (namely the number of clusters), and the waveform forms of all PPGs in the cluster are approximately similar. Thus, there is one cluster of self for PPG pulses of different morphologies. To facilitate analysis of the PPG pulses.
The PPG signal can be compressed, the number of the PPG signals needing to be analyzed is reduced, and representative PPG pulses are obtained. For the PPG pulses in the same cluster, the uncertainty of the pulses in the cluster is reduced, namely the PPG pulses in the same cluster have basically the same shape and a small blood pressure distribution range. And the accuracy of predicting the blood pressure is improved for the PPG pulse training model in one cluster. The relation and influence factors between different diseases and blood pressure and between PPG pulse forms can be analyzed by comparing PPG pulses of different clusters and blood pressure distribution thereof.
In one embodiment, a set of data is processed with k =48, Max =20, and MinD =20
E =0.6, obtained after clustering
And judging indexes of the obtained clustering result. In the internal index of the cluster thereof,
intra-cluster distance:
where is the number of samples in the c-th cluster.
the external indexes of the cluster are as follows:
(within clusters): the standard deviation of systolic blood pressure was averaged for each cluster (clusters with large standard deviations were excluded and the cluster with a significant result was more reflected).
(within clusters): averaging SBP _ Delta, wherein,
SBP _ Delta = SBP _75% -SBP _25%, wherein SBP _75% is the systolic blood pressure corresponding to 3/4 positions in the sequence from small to large, and SBP _25% is the systolic blood pressure corresponding to 1/4 positions in the sequence from small to large.
(between clusters): the difference between the maximum and minimum median of systolic blood pressure (cluster spread).
(between clusters): median standard deviation of systolic blood pressure (degree of cluster dispersion).
(between clusters): standard deviation of the number of samples per cluster (degree of dispersion of the number of samples).
The final results are shown in Table 1
TABLE 1 comparison table of clustering method and clustering result
And the FAPRI _17 is to set K to be 17 to obtain 17 clusters which are directly clustered, and the FAPRI _ Corr _17 is to delete the pulse according to the pulse in the cluster and the clustering center correlation coefficient threshold of the cluster, namely the clustering method used by the application. The results in the table show that the indexes in the clusters of the FAPRI _ Corr _17 are all smaller than the indexes in the clusters of the FAPRI _17, and the indexes between the clusters are all larger than the indexes in the FAPRI _17, which indicates that the clustering method of the application obtains the number of the clustered clusters and the number of the directly clustered appointed clusters without supervision, reduces the intra-cluster distance, increases the inter-cluster distance, and improves the clustering precision.
The results in table 3 are obtained after final unsupervised clustering, for a total of 17 clusters representing PPG pulses of different morphologies, for each cluster a number is generated unsupervised. The sum of Num _ Pulses of all clusters is 750458, the number distribution of Pulses in the clusters obtained by clustering is between 2000 and 120000, namely the method compresses about 75 ten thousand Pulses into 17 Pulses, so that the number of Pulses to be analyzed is reduced, the blood pressure mean distribution of the clusters is 99 to 160 as can be seen from SBP _ AVG, and the SBP _ Delta value of each cluster is basically within 35. Particularly, the clustering method obtains that SBP _ Delta is within 20, the minimum value is 2, the SBP _ STD value is basically less than 15, proves that PPG signals with different forms have different blood pressure ranges, and can greatly improve the accuracy of a blood pressure prediction model.
Comparing the results of direct clustering into 17 clusters in table 2, the sample number distribution of each cluster is uniform, the blood pressure mean distribution of the clusters is 101 to 147 as can be seen from SBP _ AVG, the distribution range of the clusters is reduced, the cluster distance is small, mainly low blood pressure, and SBP _25% of the 17 th, 6 th and 1 st clusters are 99, the distribution difference is not obvious, and no cluster with SBP _ STD less than 15 exists.
Wherein:
num _ Pulses: the number of samples of a cluster is,
SBP _ AVG: the average value of the systolic pressure in the cluster,
SBP _ STD: the standard deviation of the systolic blood pressure in a cluster,
SBP _ MAE: the average absolute error of the systolic pressure in the cluster,
SBP _ ME: the average error in the systolic pressure in a cluster,
SBP _ STD _ ME: the standard deviation of the absolute error of systolic blood pressure in a cluster,
SBP _ MIN: the minimum value of the systolic pressure in the cluster,
SBP _ 25%: the systolic pressures in the clusters are in accordance with the magnitude of the systolic pressure corresponding from the small arrival at the position at the rank 1/4,
SBP _ 50%: the median systolic blood pressure in the cluster,
SBP _ 75%: the systolic pressures in the clusters are in accordance with the magnitude of the systolic pressure corresponding from the small arrival at the position at the rank 3/4,
SBP _ MAX: the maximum value of the systolic pressure in the cluster,
SBP_Delta:SBP_75% - SBP_25%,
table 2 analysis result table corresponding to FAPRI _17
Table 3 analysis result table corresponding to FAPRI _ Corr _17
In addition, as shown in fig. 5, a PPG signal processing method of the present application includes the steps of: s21, acquiring an original PPG signal in real time, preprocessing the original PPG signal to obtain a fourth pulse, and acquiring the fourth pulse meeting a first preset condition and performing direct current removal to obtain a fifth pulse, wherein the PPG signal is a photoplethysmography signal; s22, performing fast Fourier transform on the fifth pulse respectively to obtain a frequency spectrum corresponding to the fifth pulse, obtaining a frequency component meeting a second preset condition in the frequency spectrum, obtaining a characteristic parameter corresponding to the fifth pulse according to the frequency component, and normalizing the characteristic parameter to obtain a sixth pulse; and S23, acquiring a cluster center corresponding to the sixth pulse according to the normalization parameter of the sixth pulse and a preset cluster center, so as to acquire information corresponding to the sixth pulse according to the cluster center, wherein the preset cluster center is acquired by any one of the above PPG signal cluster center acquisition methods. Specifically, in actual use, the original PPG signal may be preprocessed based on the obtained original PPG signal, a pulse that can represent real information of the original PPG signal, that is, a fourth pulse, is extracted, and dc removal processing is performed on the obtained fourth pulse to remove a zero-frequency signal therein, thereby avoiding interference caused by the zero-frequency signal in the data processing process. Wherein, the original PPG signal is a PPG signal acquired in real time. It may refer to deriving a fourth pulse based on the same processing procedure as for the sample PPG signal, and deriving a fifth pulse based on the fourth pulse. And processing the fifth pulse based on the same processing mode as the second pulse to obtain a frequency spectrum corresponding to the fifth pulse, so as to obtain a frequency component corresponding to the fifth pulse and meeting the requirement and a characteristic parameter corresponding to the frequency component, finally obtain a characteristic parameter corresponding to the fifth pulse, and finally obtain the sixth pulse. Wherein the determination of the second preset condition may be the same as the determination of the first preset condition. And then clustering the sixth pulse according to the target clustering center obtained in the process to finally obtain the clustering result of the pulse, obtain the corresponding clustering center of the pulse, and quickly obtain the information of the pulse according to the clustering center.
Optionally, as shown in fig. 6, the PPG signal processing method of the present application further includes the steps of: s24, acquiring a Pearson correlation coefficient of the sixth pulse and a corresponding clustering center thereof, judging whether the Pearson correlation coefficient is larger than or equal to a clustering threshold of the preset clustering center, if so, executing the step S23, otherwise, executing the step S25; and S25, judging the original PPG signal as noise and ignoring the noise. That is, whether the pulse signal is a discrete pulse signal is judged based on the pearson correlation coefficient of the pulse signal and the corresponding cluster center, and when the pulse signal is judged to be a discrete pulse signal, the PPG signal can be directly ignored without signal extraction.
As shown in fig. 7, a PPG signal cluster center obtaining apparatus of the present application includes:
the pulse extraction unit 111 is configured to acquire a sample PPG signal of a preset length, and preprocess the sample PPG signal to obtain a plurality of first pulses, where the PPG signal is a photoplethysmography signal;
a second pulse extracting unit 112, configured to obtain a first pulse that meets a first preset condition and perform dc removal to obtain a second pulse;
a third pulse extracting unit 113, configured to perform fast fourier transform on the second pulse to obtain a frequency spectrum corresponding to the second pulse, obtain a frequency component meeting a second preset condition in the frequency spectrum, obtain a characteristic parameter corresponding to the second pulse according to the frequency component, and normalize the characteristic parameter to obtain a third pulse;
an initial clustering center obtaining unit 120, configured to perform multiple clustering on the third pulse based on a preset clustering algorithm, and end clustering when a clustering process or a clustering result meets a third preset condition, so as to obtain a preset number of initial clustering clusters, and obtain an initial clustering center of each initial clustering cluster;
an MAE similarity matrix obtaining unit 130, configured to construct an MAE similarity matrix of any two initial clustering centers by using MAE values of normalization parameters corresponding to the two initial clustering centers as matrix elements, where the MAE value is an average absolute error value corresponding to the two initial clustering centers, the average absolute error value is an average value of absolute parameter differences in the two initial clustering centers, the absolute parameter difference value is an absolute value of a difference between two same normalization parameters in the two initial clustering centers, and matrix elements of each row vector in the MAE similarity matrix are MAE values of the same selected initial clustering center and all initial clustering centers, and selected initial clustering centers corresponding to different row vectors are different;
a determining unit 140, configured to obtain a first MAE value of each row vector in the MAE similarity matrix, where a second minimum value in each row vector is the row vector, determine whether there is a second MAE value that is smaller than a first preset value and corresponds to the same initial clustering center in the first MAE value, if yes, output a positive result, and otherwise output a negative result;
a merging unit 150, configured to merge an initial cluster corresponding to the initial cluster center corresponding to the second MAE value into a new initial cluster when the determining unit outputs a positive result, and use a mean pulse of the initial cluster center corresponding to the second MAE value as an initial cluster center of the new initial cluster;
and a target clustering center obtaining unit 160, configured to use the initial clustering center as a target clustering center of the PPG signal when the determining unit outputs a negative result.
Specifically, the specific coordination operation process among the units of the PPG signal cluster center obtaining apparatus herein may specifically refer to the PPG signal cluster center obtaining method described above, and is not described herein again.
As shown in fig. 8, a PPG signal processing device according to the present application includes:
a fourth pulse acquiring unit 211, configured to acquire an original PPG signal in real time, and pre-process the original PPG signal to obtain a fourth pulse, where the PPG signal is a photoplethysmography signal;
a fifth pulse obtaining unit 212, configured to obtain a fourth pulse that satisfies the first preset condition and perform dc removal to obtain a fifth pulse;
a sixth pulse obtaining unit 213, which performs fast fourier transform on the fifth pulse to obtain a frequency spectrum corresponding to the fifth pulse, obtains a frequency component satisfying a second preset condition in the frequency spectrum, obtains a characteristic parameter corresponding to the fifth pulse according to the frequency component, and normalizes the characteristic parameter to obtain the sixth pulse;
and the executing unit 220 is configured to obtain a cluster center corresponding to the sixth pulse according to the normalization parameter of the sixth pulse and a preset cluster center, so as to obtain information corresponding to the sixth pulse according to the cluster center, where the preset cluster center is obtained by the PPG signal cluster center obtaining device.
Specifically, the specific coordination operation process between the units of the PPG signal processing apparatus herein may specifically refer to the PPG signal processing method described above, and is not described herein again. The processing means may be an artificial intelligence device.
It is to be understood that the above examples merely represent preferred embodiments of the present application, and that the description thereof is more specific and detailed, but not construed as limiting the scope of the claims; it should be noted that, for a person skilled in the art, the above technical features can be freely combined, and several variations and modifications can be made without departing from the concept of the present application, which all belong to the protection scope of the present application; therefore, all equivalent changes and modifications made within the scope of the claims of the present application shall fall within the scope of the claims of the present application.