CN116959742A

CN116959742A - Blood glucose data processing method and system based on spherical coordinate kernel principal component analysis

Info

Publication number: CN116959742A
Application number: CN202311033031.2A
Authority: CN
Inventors: 袁昊; 邓兴华; 李霜; 陈江飞
Original assignee: Meide Medical Technology Shenzhen Co ltd
Current assignee: Meide Medical Technology Shenzhen Co ltd
Priority date: 2023-08-16
Filing date: 2023-08-16
Publication date: 2023-10-27

Abstract

The application discloses a blood sugar data processing method and a system based on nuclear principal component analysis of spherical coordinates, wherein the method comprises the steps of obtaining physiological index data and corresponding actual blood sugar values, constructing a sample data set for blood sugar prediction, carrying out normalization processing, and constructing an original feature matrix; defining a robust center of a sample data set as a norm median, mapping the feature vector of the original feature matrix onto an hypersphere, and enabling the mapped data average value to be located at the sphere center of the unit hypersphere; setting the sphere center as an origin, establishing a Cartesian coordinate system, mapping the feature vector of the original feature matrix onto the Cartesian coordinate system, generating a first feature matrix, replacing the Cartesian coordinate with the spherical coordinate, generating a manifold dimension-reduced second feature matrix by setting the sphere angle or the radius as a mean value, and mapping the manifold dimension-reduced second feature matrix onto the Cartesian coordinate system to finish data processing of a sample data set; the computational complexity of the application is far less than that of the traditional principal component analysis and the commonly used nuclear principal component analysis.

Description

Blood glucose data processing method and system based on spherical coordinate kernel principal component analysis

Technical Field

The application relates to the technical field of signal generator circuit design, in particular to a blood glucose data processing method and system based on spherical coordinate kernel principal component analysis.

Background

Diabetes is a life-long metabolic disease, caused by various causes such as impaired insulin secretion and impaired biological effects, and is characterized by chronic hyperglycemia. Along with the development of the living standard of people and the change of the living style of people, the sick people of diabetes tend to be younger and younger, and become one of important health care problems in China. At present, no effective treatment method for diabetes exists, but the occurrence of complications can be reduced or delayed by regularly monitoring the blood sugar level. Self-monitoring is considered one of the most direct and feasible methods for controlling diabetes.

A well-established detection technique is to measure blood glucose levels by puncturing a blood sample obtained from the top of a patient's finger using a blood glucose analyzer. The method not only brings pain and burden to patients due to frequent blood sampling, but also can not be monitored in real time. The noninvasive blood glucose technology can overcome the defects and becomes a hot subject of intelligent medical research. The method is characterized in that a pulse electricity (PPG) signal is acquired by utilizing a near infrared spectroscopy, then the pulse electricity (PPG) signal is subjected to characteristic extraction, and a series of other characteristics are combined for prediction, and the synchronously acquired invasive blood glucose value is used as a reference, so that a high-accuracy blood glucose prediction model is built.

However, due to the complex blood components in the human body, the precision of the measuring instrument, the change of the measuring environment and the like, the spectrum data has a large amount of noise, and the partial outlier of the obtained data is unavoidable, so that the accuracy and the robustness of the blood glucose model are affected. An important way to improve model robustness is to use some data preprocessing methods to filter out noise and improve the signal-to-noise ratio, so that the difference between the sample characteristic data acquired each time for the same subject is reduced.

Disclosure of Invention

In order to filter noise of original data characteristics related to blood glucose prediction, the original sample data set is subjected to data preprocessing, and the application aims to provide a blood glucose data processing method based on spherical coordinate kernel principal component analysis, which is used for effectively extracting information with optimal interpretation ability to a measured by reducing complexity of a model and improving robustness of the model. Meanwhile, the operation time of the method is far lower than that of the traditional principal component analysis and the commonly used nuclear principal component analysis.

In order to achieve the technical aim, the application provides a blood sugar data processing method based on the analysis of the principal components of a spherical coordinate kernel, which comprises the following steps:

acquiring physiological index data and the corresponding actual blood glucose value thereof, and constructing a sample data set for blood glucose prediction;

extracting a plurality of features based on the physiological index data as feature values of a sample data set, carrying out normalization processing, and constructing an original feature matrix according to the acquisition times of the sample data set and the types of the features;

defining a robust center of a sample data set as a norm median, mapping the feature vector of the original feature matrix onto an hypersphere, and enabling the mapped data average value to be located at the sphere center of the unit hypersphere, wherein the sphere center represents the robust center;

setting the sphere center as an origin, establishing a Cartesian coordinate system, mapping the feature vector of the original feature matrix onto the Cartesian coordinate system, and generating a first feature matrix;

based on the first feature matrix, replacing Cartesian coordinates with spherical coordinates, and generating a manifold dimension-reduced second feature matrix by setting spherical angles or radii as average values;

and mapping the second feature matrix back to the Cartesian coordinate system to finish the data processing of the sample data set.

Preferably, in constructing the sample data set, the physiological index data includes: height, weight, eating habits, synchronized heart rate, blood pressure, blood fat, ECG electrocardiograph signals, and PPG pulse signals.

Preferably, in the process of constructing the sample data set, the relevant characteristics are extracted from the physiological index data, and the sample data set is constructed by taking the corresponding actual blood glucose value as a reference, wherein the sample data set is used for training the neural network as the input of the neural network after data processing, so as to generate a blood glucose prediction model for predicting blood glucose.

Preferably, in the process of mapping the feature vector to the hypersphere, the feature vector is mapped to the hypersphere based on the feature vector of the original feature matrix according to the mean value of the feature vector and the euclidean distance between the feature vector and the sphere center.

Preferably, in the process of obtaining the mean value of the feature vector, iteration is performed according to an M estimation algorithm, and the sphere center of the hypersphere is close to the mapped data mean value, so that the sphere center becomes a robust center.

Preferably, in the process of obtaining the second feature matrix, the manifold dimension is reduced to k dimension, variance and mean of each row of the first feature matrix are calculated, and according to the variance, the row value is replaced by the corresponding mean, so that the manifold dimension-reduced second feature matrix is generated.

Preferably, in the process of mapping the second feature matrix onto the Cartesian coordinate system, after the second feature matrix is reconstructed, the second feature matrix is mapped onto the Cartesian coordinate system to generate a final feature matrix, and data processing on the sample data set is completed.

The application provides a blood sugar data processing system based on the analysis of a spherical coordinate kernel principal component, which comprises:

the data acquisition module is used for acquiring the physiological index data and the corresponding actual blood glucose values thereof and constructing a sample data set for blood glucose prediction;

the first data processing module is used for extracting a plurality of characteristics based on the physiological index data to serve as characteristic values of a sample data set, and constructing an original characteristic matrix according to the acquisition times of the sample data set and the types of the characteristics after normalization processing;

the second data processing module is used for defining a robust center of the sample data set as a norm median, mapping the feature vector of the original feature matrix onto the hypersphere, and enabling the mapped data mean value to be located at the sphere center of the unit hypersphere, wherein the sphere center represents the robust center;

the third data processing module is used for taking the sphere center as an origin, establishing a Cartesian coordinate system, mapping the feature vector of the original feature matrix onto the Cartesian coordinate system and generating a first feature matrix;

the third data processing module is used for replacing Cartesian coordinates with spherical coordinates based on the first feature matrix, and generating a manifold dimension-reduced second feature matrix by setting a spherical angle or a radius as a mean value; and mapping the second feature matrix back to the Cartesian coordinate system to complete the data processing of the sample data set.

The application discloses the following technical effects:

the application reduces the degree of freedom of the characteristic vector represented in the spherical coordinate system and the dimension of the characteristic vector manifold represented in the Cartesian coordinate system, thereby reducing the complexity of the model and effectively extracting the information with the best interpretation ability to the measured; in addition, a norm median is used as a robust center of the data, the influence of the outlier on the principal component obtained by the dimension reduction by the method is far smaller than that of the traditional principal component analysis, and the robustness of the model is improved; meanwhile, the operation complexity of the method is far less than that of the traditional principal component analysis and the commonly used nuclear principal component analysis.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart showing a blood glucose data processing method based on core principal component analysis according to embodiment 1 of the present application;

FIG. 2 is a schematic diagram showing a process of finding the robust center of data as proposed in embodiment 2 of the present application;

FIG. 3 shows a schematic diagram based on a comparison of a core principal component analysis and a conventional principal component analysis as set forth in example 3 of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

As shown in fig. 1-3, the application provides a blood glucose data processing method based on the analysis of the principal components of a kernel of spherical coordinates, which comprises the steps of firstly extracting features of collected data, normalizing the features, and generating an original feature matrix X. In order to find the robust center of the data, namely a norm median, firstly, the data is projected onto the spherical center of the unit hypersphere by taking the mean value of the characteristic data as the spherical center of the unit hypersphere. And iterating through M estimation to enable the sphere center of the hypersphere to be close to the projected data mean value until convergence. Then, a Cartesian coordinate system is established by taking the robust center as an origin, and the original feature matrix X is mapped to the Cartesian coordinate system to generate a new feature matrix. And then, representing the new characteristic value by using the spherical coordinates, calculating the variance and the mean value of each row of the characteristic matrix, replacing some characteristics with less variances with the mean value of the corresponding row, and finally, remapping the characteristic matrix processed in the spherical coordinates on a Cartesian coordinate system to generate a final characteristic matrix, thereby completing the processing of the whole blood glucose data.

Example 1: the embodiment provides a blood glucose data processing method based on the analysis of the principal components of the spherical coordinates kernel, referring to fig. 1, the steps of the method include:

s1, acquiring N times to obtain a sample data set required by blood glucose prediction, wherein the sample data set comprises physiological index data and actual blood glucose values acquired synchronously with the physiological index data;

the physiological index data includes: height, weight, eating habits, synchronous heart rate, blood pressure, blood fat, ECG electrocardiosignals and PPG pulse electric signals, and corresponding actual blood sugar values synchronously acquired by utilizing a blood sugar meter.

S2, M features extracted during each data acquisition are taken as physiological index data, meanwhile, the actual blood glucose value is acquired as a reference, a sample data set required by blood glucose prediction is obtained after N times of data acquisition, the data of each feature are normalized, an original feature matrix X is formed based on the physiological index data, and the original feature matrix X is characterized in that:

wherein each row represents a feature, M types are used, each column represents a group of feature values, N groups are used, x _MN And a Y-th feature representing an N-th set of feature values. The actual blood glucose values collected synchronously are characterized as follows: REF= [ r ] ₁ r ₂ … r _n ]；

S3, defining a robust center of the data as a norm median, mapping N eigenvectors in an original eigenvector matrix X onto the hypersphere, and enabling the mapped data average value to be located at a sphere center O of the unit hypersphere, wherein the sphere center O is the robust center of the data.

Firstly, mapping an original feature vector onto a unit hypersphere taking the mean value of the original feature vector as a sphere center, and adopting the following expression:

wherein x is _i Representing the vector formed by the i-th set of raw data eigenvalues,mapping characteristic values representing the ith group of original data to hyperspheresPost vector, O ₀ Is the mean value of the original feature vector and is also the sphere center of the hypersphere, ||x _i -O ₀ || ₂ Representing the Euclidean distance between the ith group of feature vectors and the sphere center of the unit hypersphere. Then, iteration is carried out by adopting an M estimation algorithm, the sphere center of the hypersphere is close to the mapped data mean value, and the formula is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,represents the sphere center, w, of the hypersphere after the first iteration _i Representing the characteristic value of the original data of the i-th group, < + >>Representing the feature vector of the i-th group of original data feature values mapped onto the hypersphere after the first-1 iteration. The condition for ending the iteration is that the number of iterations reaches 20 or +.>And after the iteration is finished, the generated hyperspherical sphere center O is used as a robust center of the data.

S4, taking the spherical center O of the unit hypersphere as an origin, establishing a Cartesian coordinate system, mapping the feature vector of the original feature matrix X onto the coordinate system, and generating a feature matrix X', wherein the expression is as follows:

x′ _i ＝x _i -O

wherein x' _i Is the ith set of eigenvectors of the eigenvector matrix X'.

S5, using spherical coordinates to replace Cartesian coordinates, setting some spherical angles or radii as the average value of the spherical angles or radii, reducing the degree of freedom of vector manifold, and generating a feature matrix P after manifold dimension reduction.

Let X '= [ X ]' ₁ … x′ _n ]，x′ _i ＝[x′ _1i … x′ _mi ] ^T I=1, 2, …, n. The Cartesian coordinates are replaced by the spherical coordinates, and the expression is as follows:

wherein θ _ji ∈[0，π]J=1, 2, m-2, and θ _(m-1)i E [0,2 pi). Generating a feature matrix represented by spherical coordinates from the feature matrix X' through the expression of the coordinate transformation, wherein the feature matrix is characterized by:

if the manifold dimension is reduced to k dimension, calculating variance and mean value of each row of the feature matrix, replacing the m-k row value with smaller variance with the mean value of the corresponding row, and generating the feature matrix P after manifold dimension reduction.

S6, mapping the feature matrix P back to a Cartesian coordinate system to generate a final feature matrix X', and finishing the data processing of the blood glucose prediction.

Reconstructing the feature matrix P after manifold dimension reduction, and mapping the feature matrix P on a Cartesian coordinate system to generate a final feature matrix X', wherein the expression is as follows:

X″ _i ＝X″ _i +O

wherein, X "= [ X ] ₁ … x″ _n ]，x″ _i ＝[x″ _1i … x″ _mi ] ^T ，i＝1，2，...，n。

Dividing the characteristic value in the final characteristic matrix X' and the corresponding actual blood glucose value into a training set X according to the ratio of the requirement ₁ "REF 1 and test set X ₂ "and REF2. CNN neural network is used as training model, X of training set ₁ "as input, the error cost calculation function uses a mean square error loss function, whose expression is:

wherein, the liquid crystal display device comprises a liquid crystal display device,is a predicted value and y is a true value. X of test set ₂ And inputting the blood glucose prediction value into a trained blood glucose prediction model to obtain a corresponding blood glucose prediction value, and verifying the accuracy of blood glucose prediction by a Clark error network analysis method.

Example 2: the present embodiment is directed to finding the robust center (one-norm median) of the data and visualizing it, further illustrated by the two-dimensional data.

The course of the experiment is shown in figure 2. The experiment gives 10 sets of two-dimensional features, yielding a feature matrix of 2 x 10. The 10 sets of eigenvalues are indicated by 'good'. The first mapping is located on the upper right circle, the mean value of the characteristic values is taken as the center of a circle, the sphere center is represented by ' x ', the data projected onto the sphere is represented by ' ++, the projected data mean value is represented by ' delta ', and the lowest position of the graph is represented by ' delta '. As can be seen from the figure, the sphere center of the first mapping is far from the projected data mean. The last mapping is on the lower left circle, and it can be seen that the sphere center has already coincided with the projected data mean, so this point is the robust center for the 10 sets of data.

Example 3: the embodiment is compared with the traditional principal component analysis method aiming at the nuclear principal component analysis method based on the spherical coordinates. As shown in fig. 3, the solid line is the first principal component of the conventional principal component analysis, and the broken line is the first principal component of the principal component analysis based on the spherical coordinates kernel. The influence of an outlier on the first principal component can be seen to be arbitrarily large, and the influence of the outlier on the data characteristics can be greatly reduced through analysis of the principal component based on the spherical coordinates, so that the accuracy of blood glucose prediction can be effectively improved. Meanwhile, the operation complexity of the nuclear principal component analysis method based on the spherical coordinates is far lower than that of the traditional principal component analysis and the common nuclear principal component analysis method.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The blood sugar data processing method based on the analysis of the principal components of the spherical coordinates is characterized by comprising the following steps:

extracting a plurality of features based on the physiological index data as feature values of the sample data set, carrying out normalization processing, and constructing an original feature matrix according to the acquisition times of the sample data set and the types of the features;

defining a robust center of the sample data set as a norm median, mapping the feature vector of the original feature matrix onto an hypersphere, and enabling the mapped data average value to be located at the sphere center of the unit hypersphere, wherein the sphere center represents the robust center;

establishing a Cartesian coordinate system by taking the sphere center as an origin, mapping the feature vector of the original feature matrix onto the Cartesian coordinate system, and generating a first feature matrix;

based on the first feature matrix, replacing the Cartesian coordinates with spherical coordinates, and generating a manifold dimension-reduced second feature matrix by setting a spherical angle or a radius as a mean value;

and mapping the second feature matrix back to the Cartesian coordinate system to complete data processing of the sample data set.

2. The blood glucose data processing method based on the analysis of the principal components of the spherical coordinates kernel as claimed in claim 1, wherein:

in constructing a sample data set, the physiological index data includes: height, weight, eating habits, synchronized heart rate, blood pressure, blood fat, ECG electrocardiograph signals, and PPG pulse signals.

3. The blood glucose data processing method based on the analysis of the principal components of the spherical coordinates kernel as claimed in claim 2, wherein:

in the process of constructing a sample data set, relevant features are extracted from physiological index data, and the sample data set is constructed by taking corresponding actual blood glucose values as references, wherein the sample data set is used for training a neural network as input of the neural network after data processing, so as to generate a blood glucose prediction model for predicting blood glucose.

4. A blood glucose data processing method based on a principle component analysis of a spherical coordinates kernel as claimed in claim 3, wherein:

in the process of mapping the feature vector to the hypersphere, the feature vector is mapped to the hypersphere based on the feature vector of the original feature matrix according to the mean value of the feature vector and the Euclidean distance between the feature vector and the sphere center.

5. The method for processing blood glucose data based on the analysis of the principal components of the spherical coordinates core according to claim 4, wherein:

in the process of obtaining the mean value of the feature vector, iterating according to an M estimation algorithm, and enabling the spherical center of the hypersphere to be close to the mapped data mean value, so that the spherical center becomes the robust center.

6. The method for processing blood glucose data based on the analysis of the principal components of the spherical coordinates according to claim 5, wherein:

in the process of acquiring a second feature matrix, reducing manifold dimensionality to k-dimension, calculating variance and mean of each row of the first feature matrix, and replacing the row values with corresponding mean values according to the variance to generate the manifold dimensionality-reduced second feature matrix.

7. The method for processing blood glucose data based on the analysis of the principal components of the spherical coordinates core according to claim 6, wherein:

and in the process of mapping the second feature matrix back to the Cartesian coordinate system, after the second feature matrix is reconstructed, mapping the second feature matrix back to the Cartesian coordinate system to generate a final feature matrix, and finishing the data processing of the sample data set.

8. A blood glucose data processing system based on a global coordinate kernel principal component analysis, comprising:

the first data processing module is used for extracting a plurality of characteristics based on the physiological index data to serve as characteristic values of the sample data set, and constructing an original characteristic matrix according to the types of the characteristics and the acquisition times of the sample data set after normalization processing;

the second data processing module is used for defining a robust center of the sample data set as a norm median, mapping the feature vector of the original feature matrix onto an hypersphere, and enabling the mapped data average value to be located at the sphere center of the unit hypersphere, wherein the sphere center represents the robust center;

the third data processing module is used for replacing the Cartesian coordinates with spherical coordinates based on the first feature matrix, and generating a manifold dimension-reduced second feature matrix by setting a spherical angle or a radius as a mean value; and mapping the second feature matrix back to the Cartesian coordinate system to complete data processing of the sample data set.