CN114548259B

CN114548259B - A PISA fault identification method based on semi-supervised Semi-KNN model

Info

Publication number: CN114548259B
Application number: CN202210152424.4A
Authority: CN
Inventors: 于霞; 张占虎; 李鸿儒; 周健; 陆静毅; 马晓静
Original assignee: Shanghai Sixth Peoples Hospital; Northeastern University China
Current assignee: Shanghai Sixth Peoples Hospital; Northeastern University China
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2023-10-10
Anticipated expiration: 2042-02-18
Also published as: CN114548259A

Abstract

The invention relates to a PISA fault identification method based on the semi-supervised Semi-KNN model, which includes: S10. Obtain the blood glucose information to be measured within a preset time period and preprocess it to obtain the preprocessed blood glucose information to be measured; S20. Based on The pre-established PISA constraint set and the pre-processed blood glucose information to be measured are used to obtain the constraint relationship using similarity measurement processing; S30, input the pre-processed blood glucose information to be measured and the constraint relationship into the pre-trained semi-supervised Semi-KNN In the model, the semi-supervised Semi-KNN model outputs the classification results of the blood glucose information to be measured; the semi-supervised Semi-KNN model uses the training data set and the PISA constraint set to train the KNN model, and the semi-supervised model is used to identify abnormal blood glucose information. way model. The above method improves the reliability of blood glucose information detection, improves the accuracy of fault diagnosis results, and improves processing efficiency.

Description

A PISA fault identification method based on semi-supervised Semi-KNN model

技术领域Technical Field

本发明涉及PISA故障识别技术，尤其涉及一种基于半监督Semi-KNN模型的PISA故障识别方法。The present invention relates to a PISA fault recognition technology, and in particular to a PISA fault recognition method based on a semi-supervised Semi-KNN model.

背景技术Background Art

近年来，连续血糖监测系统CGM已经得到越来越广泛的关注。连续血糖监测信号被用于诊断和指导各类型的糖尿病的辅助手段。连续血糖监测信号通常采用数据驱动的方法进行分析，存在血糖信号易受噪声影响、血糖监测仪易发生故障、血糖预测报警受数据误差影响而精度不高等问题。针对连续血糖监测系统的故障识别方法大多都受到性能差和高假阳性率的困扰，这限制了辅助的临床用途。In recent years, continuous glucose monitoring systems (CGM) have received more and more attention. Continuous glucose monitoring signals are used as an auxiliary means to diagnose and guide various types of diabetes. Continuous glucose monitoring signals are usually analyzed using data-driven methods, which have problems such as blood glucose signals being easily affected by noise, blood glucose monitors being prone to failure, and blood glucose prediction alarms being affected by data errors and having low accuracy. Most fault identification methods for continuous glucose monitoring systems are plagued by poor performance and high false positive rates, which limits the clinical use of the auxiliary.

近年来数字信号处理的发展迅速，通过有限和无限脉冲响应滤波器，解决了CGM信号的噪声问题。但CGM故障检测仍然是一个需要关注的挑战，是一个非常活跃的研究应用领域。Digital signal processing has developed rapidly in recent years, and the noise problem of CGM signals has been solved through finite and infinite impulse response filters. However, CGM fault detection is still a challenge that needs attention and is a very active research application field.

当连续血糖监测系统CGM使用的传感器周围皮肤受到巨大压力时，CGM读数会快速下降，基于CGM读数的算法比如预测泵关闭(predictive pump shut-off)依靠葡萄糖传感器变化速率的估计来关闭胰岛素泵以避免低血糖。但由于PISA事件在夜间发生时无法及时得到关注，会造成不恰当的泵关闭；另外预测算法也会由于PISA故障造成预测数据偏低，并对预报预警产生更为严重的影响。因此如何区分PISA故障与胰岛素事件、低血糖事件、运动事件等其他信号值变低事件成为当前亟需解决的技术问题。因此，需要一种具有足够快的执行时间进行实时操作的半监督PISA故障识别方法。When the skin around the sensor used in the continuous glucose monitoring system CGM is under great pressure, the CGM reading will drop rapidly. Algorithms based on CGM readings, such as predictive pump shut-off, rely on the estimation of the rate of change of the glucose sensor to shut down the insulin pump to avoid hypoglycemia. However, since PISA events cannot be paid attention to in time when they occur at night, inappropriate pump shutdown will occur; in addition, the prediction algorithm will also cause low prediction data due to PISA failure, and have a more serious impact on forecasts and warnings. Therefore, how to distinguish PISA failures from other signal value low events such as insulin events, hypoglycemia events, and exercise events has become a technical problem that needs to be solved urgently. Therefore, a semi-supervised PISA fault identification method with a fast enough execution time for real-time operation is needed.

发明内容Summary of the invention

(一)要解决的技术问题1. Technical issues to be resolved

鉴于现有技术的上述缺点、不足，本发明提供一种基于半监督Semi-KNN模型的PISA故障识别方法。In view of the above-mentioned shortcomings and deficiencies of the prior art, the present invention provides a PISA fault identification method based on a semi-supervised Semi-KNN model.

(二)技术方案(II) Technical solution

为了达到上述目的，本发明采用的主要技术方案包括：In order to achieve the above object, the main technical solutions adopted by the present invention include:

第一方面，本发明实施例提供一种基于半监督Semi-KNN模型的PISA故障识别方法，其包括：In a first aspect, an embodiment of the present invention provides a PISA fault identification method based on a semi-supervised Semi-KNN model, which includes:

S10、获取预设时间段内的待测血糖信息，并对待测血糖信息进行预处理，得到预处理后的待测血糖信息；S10, obtaining blood sugar information to be measured within a preset time period, and preprocessing the blood sugar information to be measured to obtain the preprocessed blood sugar information to be measured;

S20、基于预先建立的PISA约束集合和所述预处理后的待测血糖信息，采用相似度度量处理方式，获取待测血糖信息所属的约束关系；S20, based on the pre-established PISA constraint set and the pre-processed blood glucose information to be measured, a similarity measurement processing method is used to obtain the constraint relationship to which the blood glucose information to be measured belongs;

所述PISA约束集合为半监督Semi-KNN模型训练阶段基于先验知识构造的具有ML约束、CL约束的集合，集合中每一元素为血糖子序列的一阶差分特征的信息；The PISA constraint set is a set with ML constraints and CL constraints constructed based on prior knowledge in the training phase of the semi-supervised Semi-KNN model, and each element in the set is information on the first-order differential features of the blood glucose subsequence;

S30、将所述预处理后的待测血糖信息、约束关系输入到预先训练的半监督Semi-KNN模型中，所述半监督Semi-KNN模型输出待测血糖信息的分类结果；S30, inputting the pre-processed blood glucose information to be measured and the constraint relationship into a pre-trained semi-supervised Semi-KNN model, and the semi-supervised Semi-KNN model outputs a classification result of the blood glucose information to be measured;

所述半监督Semi-KNN模型为采用训练数据集和所述PISA约束集合对KNN模型进行训练，得到的用于识别血糖信息异常的半监督方式的模型，且所述训练数据集包括经由一阶差分处理的血糖数据。The semi-supervised Semi-KNN model is a semi-supervised model for identifying abnormal blood glucose information obtained by training the KNN model using a training data set and the PISA constraint set, and the training data set includes blood glucose data processed by first-order difference.

可选地，所述S10之前，所述方法还包括：Optionally, before S10, the method further includes:

S01、借助于CGM设备获取多个历史血糖数据，并对每一历史血糖数据进行预处理，并得到血糖序列；每一血糖序列中包括具有PISA时间戳标签的血糖数据和非PISA时间戳标签的血糖数据；S01. Acquire multiple historical blood glucose data with the help of a CGM device, and pre-process each historical blood glucose data to obtain a blood glucose sequence; each blood glucose sequence includes blood glucose data with a PISA timestamp label and blood glucose data without a PISA timestamp label;

S02、将每一血糖序列划分为多个子序列，并对每一个子序列进行一阶差分计算，得到训练数据集；S02, dividing each blood glucose sequence into multiple subsequences, and performing first-order difference calculation on each subsequence to obtain a training data set;

S03、基于先验知识和训练数据集中的具有PISA时间戳标签的训练数据，按照半监督约束条件形成规则，生成PISA约束集合；S03, based on prior knowledge and training data with PISA timestamp labels in the training data set, forming rules according to semi-supervised constraints to generate a PISA constraint set;

S04、将所述训练数据集和PISA约束集合对半监督Semi-KNN模型进行训练获取训练后的半监督Semi-KNN模型；S04, training the semi-supervised Semi-KNN model using the training data set and the PISA constraint set to obtain a trained semi-supervised Semi-KNN model;

所述半监督Semi-KNN模型为改进KNN模型并采用半监督方式构建的。The semi-supervised Semi-KNN model is constructed by improving the KNN model in a semi-supervised manner.

可选地，所述S04包括：Optionally, the S04 includes:

S04-1、遍历训练数据集的所有子序列，构建离线K维搜索二叉树，得到K-D树；S04-1, traverse all subsequences of the training data set, construct an offline K-dimensional search binary tree, and obtain a K-D tree;

S04-2、基于所述K-D树，遍历PISA约束集合，获取半监督Semi-KNN模型的异常阈值σ，所述异常阈值的边界阈值为σ1和σ2，表示为σ＝[σ1，σ2]；S04-2. Based on the K-D tree, traverse the PISA constraint set to obtain an abnormal threshold σ of the semi-supervised Semi-KNN model, where the boundary thresholds of the abnormal threshold are σ1 and σ2, expressed as σ=[σ1, σ2];

其中，采用DTW相似性度量函数计算PISA约束集合中每一个PISA事件与其他事件的平均距离，得到距离集合；Among them, the DTW similarity measurement function is used to calculate the average distance between each PISA event and other events in the PISA constraint set to obtain a distance set;

则根据下述公式(1)获得异常阈值σ＝[σ1，σ2]；Then, according to the following formula (1), the abnormal threshold σ = [σ1, σ2] is obtained;

σ1＝Q3+1.5(Q3-Q1)，公式(1)σ1＝Q3+1.5(Q3-Q1), formula (1)

σ2＝Q1-1.5(Q3-Q1)，σ2＝Q1-1.5(Q3-Q1),

待测血糖数据距离PISA约束中ML关系的样本距离dist<σ2，则确定为异常样本；待测血糖数据距离PISA约束中CL关系的样本距离dist>σ1，则确定为异常样本；If the sample distance between the blood glucose data to be tested and the ML relationship in the PISA constraint is dist<σ2, it is determined to be an abnormal sample; if the sample distance between the blood glucose data to be tested and the CL relationship in the PISA constraint is dist>σ1, it is determined to be an abnormal sample;

Q3为距离集合中的上四分位数，Q1为距离集合中的下四分位数。Q3 is the upper quartile in the distance set, and Q1 is the lower quartile in the distance set.

可选地，所述S02包括：Optionally, the S02 includes:

对每一个血糖序列做滑动窗口处理，在血糖序列X＝{x1,x2,…,xn}中，经大小为w滑动窗口后形成若干子序列qi＝{x_i,x_i+1,…,x_i+k}，Perform sliding window processing on each blood glucose sequence. In the blood glucose sequence X = {x1, x2, ..., xn}, a number of subsequences qi = { _xi , xi ₊₁ , ..., xi _+k } are formed after sliding window of size w.

一个序列子集为D＝{q1,q2,…,qm}，对每一个子序列qi按照公式(2)进行一阶差分计算，A sequence subset is D = {q1, q2, ..., qm}. For each subsequence qi, the first-order difference is calculated according to formula (2).

h为一阶差分公式的改变量，h取值为0.8-1.2；h is the change in the first-order difference formula, and the value of h is 0.8-1.2;

对所有的子序列计算一阶差分后，每一子序列的一阶差分值作为训练数据集。After calculating the first-order differences for all subsequences, the first-order difference value of each subsequence is used as the training data set.

可选地，所述S10包括：Optionally, the S10 includes:

借助于CGM设备获取大于等于30-45分钟的待测血糖信息；Obtain blood sugar information for 30-45 minutes or more with the help of CGM equipment;

进行滤波处理，并通过滑动窗口方式对待测血糖信息进行预处理，以去除待测血糖信息中的孤立噪声点并实现缺失值填补，得到待测血糖信息的待测血糖序列。也就是说，可对待测血糖信息进行滤波处理，并通过适当大小的滑动窗口方式遍历待测血糖信息，当滑窗范围内存在此类微小区域时将其平均处理，从而去除待测血糖信息准孤立噪声点。当由于传感器问题造成待测血糖信息存在缺失值时，首先可以判断缺失值连续存在的个数，然后通过一般的线性插值方法对待测血糖信息进行缺失值填补，得到待测血糖信息的预处理后的待测血糖序列。Filtering is performed, and the blood glucose information to be measured is preprocessed by a sliding window method to remove isolated noise points in the blood glucose information to be measured and to fill missing values, so as to obtain a blood glucose sequence to be measured of the blood glucose information to be measured. In other words, the blood glucose information to be measured can be filtered, and the blood glucose information to be measured can be traversed by a sliding window of appropriate size, and when such tiny areas exist within the sliding window range, they are averaged to remove quasi-isolated noise points of the blood glucose information to be measured. When there are missing values in the blood glucose information to be measured due to sensor problems, the number of missing values that exist continuously can be determined first, and then the missing values of the blood glucose information to be measured can be filled by a general linear interpolation method to obtain the blood glucose sequence to be measured after preprocessing of the blood glucose information to be measured.

可选地，所述S20包括：Optionally, the S20 includes:

当待测血糖序列中每一序列的血糖数据A与PISA约束集合中一个PISA事件B的SBD距离小于阈值λ时，即f_SBD(A,B)<λ，则确定一个约束关系ML(A,B)，并对PISA约束集合进行更新；λ为预设的大于0的数值；f表示两个序列计算SBD距离的函数；When the SBD distance between the blood glucose data A of each sequence in the blood glucose sequence to be measured and a PISA event B in the PISA constraint set is less than the threshold λ, that is, f _SBD (A, B) <λ, a constraint relationship ML (A, B) is determined, and the PISA constraint set is updated; λ is a preset value greater than 0; f represents a function for calculating the SBD distance between two sequences;

当待测血糖序列中每一序列的血糖数据A与PISA约束集合中一个PISA事件B的CL约束关系的SBD距离小于阈值λ时，即f_SBD(A,B)<λ，则确定一个约束关系CL(A,B)，并对PISA约束集合进行更新；When the SBD distance between the blood glucose data A of each sequence in the blood glucose sequence to be measured and the CL constraint relationship of a PISA event B in the PISA constraint set is less than a threshold λ, that is, f _SBD (A, B) <λ, a constraint relationship CL (A, B) is determined, and the PISA constraint set is updated;

遍历待测血糖序列中每一序列，并将更新后的PISA约束集合作为待测血糖信息所属的约束关系。Each sequence in the blood glucose sequence to be tested is traversed, and the updated PISA constraint set is used as the constraint relationship to which the blood glucose information to be tested belongs.

可选地，所述S30包括：Optionally, the S30 includes:

基于所述K-D树和待测血糖序列，循环迭代方式获取距离到待测血糖序列中每一数据的最近的K个数据点，并得到使用阶段的K-D树，Based on the K-D tree and the blood glucose sequence to be measured, the K data points closest to each data in the blood glucose sequence to be measured are obtained in a cyclic iteration manner, and the K-D tree of the use stage is obtained.

基于使用阶段的K-D树，遍历约束关系，获取待测血糖序列的PISA异常信息的分类结果；Based on the K-D tree of the use phase, the constraint relationship is traversed to obtain the classification results of the PISA abnormal information of the blood glucose sequence to be tested;

采用DTW相似性度量函数计算约束关系中每一个PISA事件与其他事件的实际距离，将该实际距离和异常阈值σ＝[σ1，σ2]进行比较，获得属于PISA事件和非PISA事件的分类结果。The DTW similarity measurement function is used to calculate the actual distance between each PISA event and other events in the constraint relationship, and the actual distance is compared with the abnormal threshold σ=[σ1, σ2] to obtain the classification results of PISA events and non-PISA events.

可选地，将实际距离与异常阈值进行比较后，确定属于约束关系中ML约束的数据量，根据该数据量确定属于PISA事件中异常等级值。Optionally, after comparing the actual distance with the abnormal threshold, the amount of data belonging to the ML constraint in the constraint relationship is determined, and the abnormal level value in the PISA event is determined based on the data amount.

第二方面，本发明实施例还提供一种电子设备，其包括：存储器和处理器，所述存储器用于存储计算机程序，所述处理器用于执行所述存储器中存储的计算机程序并执行上述第一方面任一所述的基于半监督Semi-KNN模型的PISA故障识别方法的步骤。In a second aspect, an embodiment of the present invention further provides an electronic device, comprising: a memory and a processor, wherein the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory and execute the steps of the PISA fault identification method based on the semi-supervised Semi-KNN model as described in any one of the first aspects above.

第三方面，本发明实施例还提供一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如上第一方面任一所述的基于半监督Semi-KNN模型的PISA故障识别方法的步骤。In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored. When the computer program is executed by a processor, the steps of the PISA fault identification method based on the semi-supervised Semi-KNN model as described in any one of the first aspects above are implemented.

(三)有益效果(III) Beneficial effects

本发明实施例的方法基于Semi-KNN模型进行异常分类，其解决了KNN模型在执行异常检测时的不确定性问题；首次以约束形式引入先验知识形成半监督的异常检测方法，最大限度的利用有效先验知识，提高了检测结果的可靠性；通过对结果划分等级，提高结果的可信度，达到帮助医生实现临床判断的效果。The method of the embodiment of the present invention performs anomaly classification based on the Semi-KNN model, which solves the uncertainty problem of the KNN model when performing anomaly detection; for the first time, prior knowledge is introduced in the form of constraints to form a semi-supervised anomaly detection method, which maximizes the use of effective prior knowledge and improves the reliability of the detection results; by grading the results, the credibility of the results is improved, thereby helping doctors to make clinical judgments.

本发明实施例中首次提出通过半监督方法进行CGM传感器故障诊断，通过引入先验知识(比如专家经验)，提高了故障诊断结果的准确性；相比于传统无监督故障识别方法应用在CGM传感器故障诊断领域的效果，本发明的检测结果准确率更高，由引入半监督模型的PISA约束集合可以保证针对PISA故障的较高的识别率和对未标定异常(如夜间发生的PISA异常事件)检测的置信度。The embodiments of the present invention propose for the first time to perform CGM sensor fault diagnosis by a semi-supervised method, and improve the accuracy of the fault diagnosis result by introducing prior knowledge (such as expert experience); compared with the effect of traditional unsupervised fault identification methods applied in the field of CGM sensor fault diagnosis, the detection result of the present invention has higher accuracy, and the PISA constraint set introduced into the semi-supervised model can ensure a high recognition rate for PISA faults and confidence in the detection of uncalibrated anomalies (such as PISA abnormal events occurring at night).

另外，本发明提出的半监督模型针对时间序列数据类型，可将距离度量方式由原先的欧氏距离更新为DTW和SBD相似性度量方法，由此，提高了时间序列数据类型的度量准确性，也加快了整个计算程序的运行速度。In addition, the semi-supervised model proposed in the present invention can update the distance measurement method from the original Euclidean distance to the DTW and SBD similarity measurement methods for time series data types, thereby improving the measurement accuracy of the time series data type and accelerating the running speed of the entire calculation program.

本实施的方法可应用在连续血糖监测系统CGM中以增强CGM数据。故障检测不仅增强了CGM的安全性，还可以避免由于故障造成的治疗方案改变或预报预警等任务的可信度降低。应用本发明方法的CGM是检测压力感应传感器衰减(PISA)伪信号，提高检测的置信度。The method of this embodiment can be applied in a continuous glucose monitoring system CGM to enhance CGM data. Fault detection not only enhances the safety of CGM, but also avoids the reduction of the credibility of tasks such as changes in treatment plans or forecast warnings due to faults. The CGM using the method of the present invention detects pressure-sensitive sensor attenuation (PISA) false signals to improve the confidence of detection.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明一实施例提供的基于半监督Semi-KNN模型的PISA故障识别方法的流程图；FIG1 is a flow chart of a PISA fault identification method based on a semi-supervised Semi-KNN model provided by an embodiment of the present invention;

图2(a)为构造k维搜索二叉树K-d tree的样例的过程示意图；FIG2(a) is a schematic diagram of a process of constructing a sample of a k-dimensional search binary tree K-d tree;

图2(b)K-d tree的示意图；Figure 2(b) Schematic diagram of K-d tree;

图3为新的样本的表示图；FIG3 is a representation diagram of a new sample;

图4为约束关系对KNN的异常检测迭代过程指引作用的示意图；FIG4 is a schematic diagram showing the guiding effect of constraint relationships on the iterative process of anomaly detection using KNN;

图5为本发明另一实施例提供的基于半监督Semi-KNN模型的PISA故障识别方法的流程图。FIG5 is a flow chart of a PISA fault identification method based on a semi-supervised Semi-KNN model provided by another embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了更好的解释本发明，以便于理解，下面结合附图，通过具体实施方式，对本发明作详细描述。In order to better explain the present invention and facilitate understanding, the present invention is described in detail below through specific implementation modes in conjunction with the accompanying drawings.

实施例一Embodiment 1

如图1所示，图1示出了一种基于半监督Semi-KNN模型的PISA故障识别方法的流程图，该方法的执行主体可为任一计算机/电子设备/CGM，该方法可包括下述的步骤：As shown in FIG. 1 , FIG. 1 shows a flow chart of a PISA fault identification method based on a semi-supervised Semi-KNN model. The execution subject of the method may be any computer/electronic device/CGM. The method may include the following steps:

S10、获取预设时间段内的待测血糖信息，并对待测血糖信息进行预处理，得到预处理后的待测血糖信息。S10, obtaining blood sugar information to be measured within a preset time period, and preprocessing the blood sugar information to be measured to obtain the preprocessed blood sugar information to be measured.

本实施例中，可借助于CGM获取大于等于30-45分钟的待测血糖信息；并对待测血糖信息进行滤波处理，并通过滑动窗口方式对待测血糖信息进行预处理，以去除待测血糖信息中的孤立噪声点并实现缺失值填补，得到待测血糖信息的待测血糖序列。In this embodiment, CGM can be used to obtain blood glucose information to be tested for greater than or equal to 30-45 minutes; the blood glucose information to be tested is filtered and pre-processed through a sliding window method to remove isolated noise points in the blood glucose information to be tested and fill in missing values, thereby obtaining a blood glucose sequence to be tested of the blood glucose information to be tested.

需要说明的是，待测血糖信息的时间段可调整，根据预处理中的滑动窗口的参数值进行调整。It should be noted that the time period of the blood sugar information to be measured is adjustable according to the parameter value of the sliding window in the preprocessing.

所述PISA约束集合为半监督Semi-KNN模型训练阶段基于先验知识构造的具有ML约束、CL约束的集合，集合中每一元素为血糖子序列的一阶差分特征的信息。PISA约束集合中一定包含PISA信息，即根据PISA时间戳标签对训练阶段的血糖数据进行的预处理，然后创建的PISA约束集合。The PISA constraint set is a set with ML constraints and CL constraints constructed based on prior knowledge in the training phase of the semi-supervised Semi-KNN model, and each element in the set is the information of the first-order difference feature of the blood glucose subsequence. The PISA constraint set must contain PISA information, that is, the blood glucose data in the training phase is preprocessed according to the PISA timestamp label, and then the PISA constraint set is created.

所述半监督Semi-KNN模型为采用训练数据集和所述PISA约束集合对KNN模型进行训练，得到的用于识别血糖信息异常的半监督方式的模型，且所述训练数据集包括经由一阶差分处理的血糖数据。可理解的是，训练数据集是对训练阶段获取的血糖数据经由华创处理后的子序列集合进行一阶差分处理后的数据集。The semi-supervised Semi-KNN model is a semi-supervised model for identifying abnormal blood glucose information obtained by training the KNN model using a training data set and the PISA constraint set, and the training data set includes blood glucose data processed by first-order difference. It is understandable that the training data set is a data set obtained by performing first-order difference processing on the subsequence set of the blood glucose data obtained in the training phase after being processed by Huachuang.

本实施例的方法基于Semi-KNN模型进行异常分类，其解决了KNN模型在执行异常检测时的不确定性问题；首次以约束形式引入先验知识形成半监督的异常检测方法，最大限度的利用有效先验知识，提高了检测结果的可靠性；通过对结果划分等级，提高结果的可信度，达到帮助医生实现临床判断的效果。The method of this embodiment performs anomaly classification based on the Semi-KNN model, which solves the uncertainty problem of the KNN model when performing anomaly detection; for the first time, prior knowledge is introduced in the form of constraints to form a semi-supervised anomaly detection method, which maximizes the use of effective prior knowledge and improves the reliability of the detection results; by grading the results, the credibility of the results is improved, thereby helping doctors to make clinical judgments.

在实际应用中，在上述步骤S10之前，图1所示的方法还可包括下述的图中未示出的步骤：In practical applications, before the above step S10, the method shown in FIG1 may further include the following steps which are not shown in the figure:

S01、借助于连续血糖监测系统CGM(即CGM设备)获取多个历史血糖数据，并对每一历史血糖数据进行预处理，并得到血糖序列；每一血糖序列中包括具有PISA时间戳标签的血糖数据和非PISA时间戳标签的血糖数据；S01. Acquire multiple historical blood glucose data by means of a continuous blood glucose monitoring system CGM (i.e., a CGM device), and pre-process each historical blood glucose data to obtain a blood glucose sequence; each blood glucose sequence includes blood glucose data with a PISA timestamp label and blood glucose data without a PISA timestamp label;

举例来说，所述S02可包括：For example, the S02 may include:

h为一阶差分公式的改变量，h取值为0.8-1.2，优选取1；h is the change in the first-order difference formula, and the value of h is 0.8-1.2, preferably 1;

举例来说，S04可包括：For example, S04 may include:

σ1＝Q3+1.5(Q3-Q1)，公式(1)σ1＝Q3+1.5(Q3-Q1), formula (1)

σ2＝Q1-1.5(Q3-Q1)，σ2＝Q1-1.5(Q3-Q1),

为更好的理解上述的步骤S20，该步骤S20可具体说明如下：To better understand the above step S20, the step S20 can be specifically described as follows:

当待测血糖序列中每一序列的血糖数据A与PISA约束集合中一个PISA事件B的SBD距离小于阈值λ时，即f_SBD(A,B)<λ，则确定一个约束关系ML(A,B)，并对PISA约束集合进行更新；λ为预设的大于0的数值；f_SBD表示两个序列计算SBD距离的函数；When the SBD distance between the blood glucose data A of each sequence in the blood glucose sequence to be measured and a PISA event B in the PISA constraint set is less than a threshold value λ, that is, f _SBD (A, B) <λ, a constraint relationship ML (A, B) is determined, and the PISA constraint set is updated; λ is a preset value greater than 0; f _SBD represents a function for calculating the SBD distance between two sequences;

相应地，上述步骤S30可包括：Accordingly, the above step S30 may include:

采用DTW相似性度量函数计算约束关系中每一个PISA事件与其他事件的实际距离，将该实际距离和异常阈值σ＝[σ1，σ2]进行比较，获得属于PISA事件和非PISA事件的分类结果。特别地，将实际距离与异常阈值进行比较后，确定属于约束关系中ML约束的数据量，根据该数据量确定属于PISA事件中异常等级值。The DTW similarity metric function is used to calculate the actual distance between each PISA event and other events in the constraint relationship, and the actual distance is compared with the abnormal threshold σ = [σ1, σ2] to obtain the classification results of PISA events and non-PISA events. In particular, after comparing the actual distance with the abnormal threshold, the amount of data belonging to the ML constraint in the constraint relationship is determined, and the abnormal level value of the PISA event is determined based on the data amount.

本实施例的方法可集成在电子设备如异常检测器中，该异常检测器可识别PISA的异常问题，由此，在临床的急症患者护理中使用上述异常检测器，有效监测并可靠的量化患者的异常状态，解决了现有技术中的滞后性、实现了实时监测和分析。The method of this embodiment can be integrated into an electronic device such as an anomaly detector, which can identify abnormal problems of PISA. Therefore, the above-mentioned anomaly detector is used in clinical emergency patient care to effectively monitor and reliably quantify the patient's abnormal state, thereby solving the lag in the existing technology and realizing real-time monitoring and analysis.

本实施例中首次提出通过半监督方法进行CGM传感器故障诊断，通过引入先验知识(比如专家经验)，提高了故障诊断结果的准确性。同时解决了KNN算法在执行异常检测时的不确定性问题。This embodiment proposes for the first time to perform CGM sensor fault diagnosis using a semi-supervised method, and by introducing prior knowledge (such as expert experience), the accuracy of the fault diagnosis result is improved, and the uncertainty problem of the KNN algorithm when performing anomaly detection is solved.

实施例二Embodiment 2

本实施例的方法可按照准备阶段、训练阶段和使用阶段的顺序对本实施例的方法进行详细说明，参照图5所示。The method of this embodiment can be described in detail in the order of a preparation stage, a training stage, and a use stage, as shown in FIG. 5 .

1.准备阶段——历史CGM血糖数据获取及预处理1. Preparation stage - historical CGM blood glucose data acquisition and preprocessing

连续血糖监测系统CGM(Continuous Glucose Monitoring)是人工胰腺的关键部件之一，通过该设备可以对患者的血糖水平进行连续监控，从而帮助一型糖尿病(T1DM)患者将血糖浓度维持在安全范围内。The continuous glucose monitoring system CGM (Continuous Glucose Monitoring) is one of the key components of the artificial pancreas. Through this device, the patient's blood glucose level can be continuously monitored, thereby helping patients with type 1 diabetes (T1DM) maintain blood glucose concentration within a safe range.

1.1CGM血糖数据获取1.1CGM blood glucose data acquisition

CGM通过葡萄糖感应器监测皮下组织间液的葡萄糖浓度而间接反映血糖水平，可提供连续、全面、可靠的全天血糖信息。所获取的历史血糖数据应包含完整三天数据，其中每五分钟采集一次血糖值，共3*288个血糖值，其中除了餐食、运动、睡眠等正常生理活动之外，应包含实验按压所获得的若干个PISA典型故障信息，每一个PISA典型故障信息可包括准确的按压标签，用于建立后续的半监督Semi-KNN模型。CGM indirectly reflects blood sugar levels by monitoring the glucose concentration of the interstitial fluid of the subcutaneous tissue through a glucose sensor, and can provide continuous, comprehensive and reliable blood sugar information throughout the day. The historical blood sugar data obtained should include three complete days of data, in which blood sugar values are collected every five minutes, for a total of 3*288 blood sugar values. In addition to normal physiological activities such as meals, exercise, and sleep, it should include several PISA typical fault information obtained by experimental pressing. Each PISA typical fault information can include an accurate pressing label for establishing the subsequent semi-supervised Semi-KNN model.

1.2CGM血糖数据预处理1.2 CGM blood glucose data preprocessing

通过上述设备即CGM采集历史血糖数据保存在存储设备中，可通过数字信号分析的方式进行预处理，包括滤波、缺失值填补、打标等。其中，滤波是为了去除CGM血糖序列上的孤立噪声点，孤立噪声点意为与该时刻前后血糖值偏差过大的数据，使用滑窗遍历序列，当滑窗范围内存在此类微小区域时将其平均处理，此步骤为可选的步骤，滤波后的血糖序列更平滑，便于后续处理。缺失值填补是为了防止存在血糖值空缺的情况，通常在使用的血糖序列是对缺失值填补之后的血糖序列，否则，容易出现后续处理中血糖曲线的断层，使得模型输出结果不准确。The historical blood glucose data collected by the above-mentioned device, i.e., CGM, is stored in a storage device and can be pre-processed by means of digital signal analysis, including filtering, missing value filling, labeling, etc. Among them, filtering is to remove isolated noise points on the CGM blood glucose sequence. Isolated noise points mean data that deviate too much from the blood glucose values before and after that moment. A sliding window is used to traverse the sequence. When such tiny areas exist within the sliding window range, they are averaged. This step is an optional step. The filtered blood glucose sequence is smoother and convenient for subsequent processing. Missing value filling is to prevent the existence of blood glucose value vacancies. Usually, the blood glucose sequence used is the blood glucose sequence after missing value filling. Otherwise, it is easy to have a fault in the blood glucose curve in the subsequent processing, making the model output result inaccurate.

在实际处理中，还需要对填补后的血糖序列进行打标处理，对存在按压动作的实验区间段进行标记，其余时间不进行标记，用来区分PISA事件和其他未知事件，由此，可有效保证半监督Semi-KNN模型的准确性。In actual processing, the filled blood glucose sequence also needs to be marked. The experimental interval segments with pressing actions are marked, and the rest of the time is not marked. This is used to distinguish PISA events from other unknown events. In this way, the accuracy of the semi-supervised Semi-KNN model can be effectively guaranteed.

上述是对CGM采集的血糖数据的预处理过程的说明。连续采集的包含PISA按压实验事件的CGM血糖序列经过预处理之后得到滤波、缺失值填补后的血糖序列和PISA时间戳标签。The above is an explanation of the preprocessing process of blood glucose data collected by CGM. After preprocessing, the continuously collected CGM blood glucose sequence containing PISA compression experiment events is filtered, the missing value filled blood glucose sequence and PISA timestamp label are obtained.

2.准备阶段——构造特征并根据先验知识添加初始种子2. Preparation phase - constructing features and adding initial seeds based on prior knowledge

由实验按压的PISA事件时间段可知，可以对血糖序列中的部分数据进行打标处理，上述预处理得到的PISA时间戳标签，比如9：00–9：45区间，基于PISA时间戳标签获取血糖序列中9个血糖值为PISA事件发生时的血糖值，其对应的一阶差分特征也就对应为PISA事件特征。It can be seen from the PISA event time period pressed in the experiment that part of the data in the blood glucose sequence can be labeled. The PISA timestamp label obtained by the above preprocessing, such as the 9:00-9:45 interval, is used to obtain 9 blood glucose values in the blood glucose sequence based on the PISA timestamp label as the blood glucose values when the PISA event occurs. The corresponding first-order differential features also correspond to the PISA event features.

2.1构造特征2.1 Structural features

通常，血糖序列中血糖数据较多，无法使用几个数值表达整个序列的特征，本实施例中采用滑动窗口方式对血糖序列进行处理。具体地，在血糖序列X＝{x1,x2,…,xn}中，经大小为w滑动窗口后形成若干子序列qi＝{x_i,x_i+1,…,x_i+k}，一个序列子集定义为D＝{q1,q2,…,qm}，对每一个子序列qi进行一阶差分计算，计算公式(1)如下，其中h为一阶差分公式的改变量，本实施例在血糖序列特征构造中h取值0.8至1.2，优选取1；Usually, there are many blood sugar data in the blood sugar sequence, and it is impossible to use a few numerical values to express the characteristics of the entire sequence. In this embodiment, a sliding window method is used to process the blood sugar sequence. Specifically, in the blood sugar sequence X = {x1, x2, ..., xn}, a number of subsequences qi = { _xi , xi ₊₁ , ..., xi _+k } are formed after a sliding window of size w. A sequence subset is defined as D = {q1, q2, ..., qm}. A first-order difference calculation is performed on each subsequence qi. The calculation formula (1) is as follows, where h is the change in the first-order difference formula. In this embodiment, h takes a value of 0.8 to 1.2 in the blood sugar sequence feature construction, and preferably takes 1;

对所有的子序列计算一阶差分后，作为后续模型即半监督Semi-KNN模型的输入。本实施例中采用一阶差分特征，一方面避免原始血糖序列形态各异造成对模型通用性的影响，另一方面一阶差分特征对时间序列的波动性有较好的抑制作用，可以让半监督Semi-KNN模型的输入更加简单。After calculating the first-order difference for all subsequences, it is used as the input of the subsequent model, namely the semi-supervised Semi-KNN model. In this embodiment, the first-order difference feature is used to avoid the impact of the different forms of the original blood glucose sequence on the universality of the model. On the other hand, the first-order difference feature has a good inhibitory effect on the volatility of the time series, which can make the input of the semi-supervised Semi-KNN model simpler.

2.2根据专家经验(先验知识)添加初始种子2.2 Adding initial seeds based on expert experience (prior knowledge)

当前半监督操作的数据集中若干个数据点的代表样本称为种子Seed，其表现形式可以为样本点本身，或者以约束形式存在。成对约束由must-link(ML)和cannot-link(CL)组成，ML约束中的两个数据点必须在同一个集群中，而CL约束声明的两个数据点必须在不同的集群中。The representative samples of several data points in the dataset of the current semi-supervised operation are called seeds, which can be expressed as sample points themselves or in the form of constraints. Pairwise constraints consist of must-link (ML) and cannot-link (CL). The two data points in the ML constraint must be in the same cluster, while the two data points declared in the CL constraint must be in different clusters.

本实施例中，将已知的PISA事件标签规定为半监督模型的初始种子，与其他生理事件(比如吃饭、运动)所造成的血糖波动情况构造CL约束，不同的PISA事件构成ML约束。In this embodiment, the known PISA event labels are defined as the initial seeds of the semi-supervised model, and CL constraints are constructed with blood sugar fluctuations caused by other physiological events (such as eating and exercising), and different PISA events constitute ML constraints.

本实施例中对血糖序列构造特征并添加初始种子部分，即对原始采集的血糖序列和PISA时间戳标签进行预处理后，得到血糖序列的一阶差分特征和PISA所包含的成对约束。In this embodiment, features are constructed for the blood glucose sequence and an initial seed portion is added, that is, after preprocessing the originally collected blood glucose sequence and the PISA timestamp label, the first-order differential features of the blood glucose sequence and the pairwise constraints included in PISA are obtained.

即，上述预处理之后得到训练数据集和PISA约束集合。That is, after the above preprocessing, a training data set and a PISA constraint set are obtained.

3.训练阶段——训练半监督Semi-KNN模型3. Training phase - training semi-supervised Semi-KNN model

将上述预处理后的训练数据集与上述PISA包含的成对约束输入到Semi-KNN模型中，得到半监督故障检测模型；The preprocessed training data set and the pairwise constraints contained in the PISA are input into the Semi-KNN model to obtain a semi-supervised fault detection model;

3.1基础模型KNN3.1 Basic Model KNN

现有的KNN是非参数的有监督分类器，通过最接近的训练样本的多数类来确定测试样本的类别。以血糖序列下的二分类问题为例，二分类问题以及求解过程的形式化定义如下：已知血糖序列的样本集合为S＝(x₁,y₁),(x₂,y₂),…,(x_N,y_N)，其中x_i∈R²为血糖测量值中的点，y_i∈{c₁,c₂}，表示血糖值样本所属的类别。对于一个新的血糖值样本x，可以用公式(2)求解该血糖值样本的类别y：The existing KNN is a non-parametric supervised classifier that determines the category of the test sample by the majority class of the closest training sample. Taking the binary classification problem under the blood glucose sequence as an example, the formal definition of the binary classification problem and the solution process is as follows: the sample set of the known blood glucose sequence is S = (x ₁ , y ₁ ), (x ₂ , y ₂ ), ..., (x _N , y _N ), where x _i ∈ R ² is a point in the blood glucose measurement value, and y _i ∈ {c ₁ , c ₂ } represents the category to which the blood glucose value sample belongs. For a new blood glucose value sample x, the category y of the blood glucose value sample can be solved using formula (2):

其中，N_K(x)表示距离血糖值样本x最近的K个样本的集合，f为关于y_i的指示函数Where N _K (x) represents the set of K samples closest to the blood glucose value sample x, and f is the indicator function about _yi

采用有监督的二分类算法用来异常检测的问题在于：1)异常样本与正常样本的比例极度不平衡，导致分类结果的失衡；2)异常样本的种类多样，将未知异常与已知异常归为一类不具备解释性，简单的二分类满足不了需求；3)数据标注工作代价太大。因此，仅需要少量先验知识的半监督模型更加适合应用的需求。The problems with using supervised binary classification algorithms for anomaly detection are: 1) the ratio of abnormal samples to normal samples is extremely unbalanced, resulting in imbalanced classification results; 2) there are many types of abnormal samples, and classifying unknown anomalies and known anomalies into one category is not explanatory, and simple binary classification cannot meet the needs; 3) the cost of data labeling is too high. Therefore, semi-supervised models that only require a small amount of prior knowledge are more suitable for application needs.

3.2 Semi-KNN模型3.2 Semi-KNN Model

本实施例中，半监督异常检测模型更适合PISA故障的识别，所述半监督模型是在基础模型的基础上，将标签输入改为上述有先验知识得到的部分PISA约束集合输入，数据对象是预处理后的CGM血糖序列数据。In this embodiment, the semi-supervised anomaly detection model is more suitable for the identification of PISA faults. The semi-supervised model is based on the basic model, and the label input is changed to the partial PISA constraint set input obtained with prior knowledge. The data object is the preprocessed CGM blood glucose sequence data.

3.2.1遍历训练数据集的所有数据对象，构建训练数据集离线K维搜索二叉树3.2.1 Traverse all data objects in the training dataset and build an offline K-dimensional search binary tree for the training dataset

通过对训练数据集中样本的拟合，构造k维搜索二叉树K-d tree(K-D树)，具体构造过程如下，假设简单二维序列数据：{(2,3),(4,7),(5,4),(7,2),(8,1),(9,6)}，首先分别计算x，y方向上数据的方差得知x方向上的方差较大，所以划分域确定为x轴方向；其次根据x轴方向的值2,5,9,4,8,7排序选出中值为7，所以分割超平面为(7,2)且垂直于x轴；然后确定左子空间和右子空间，分割超平面将整个空间分为两部分，如图2(a)所示。左子空间包含3个节点{(2,3),(4,7),(5,4)}；右子空间包含2个节点{(8,1),(9,6)}，然后继续递归，直到每一个空间中只包含一个数据点，生成最后的K-d tree，如图2(b)所示。By fitting the samples in the training data set, a k-dimensional search binary tree K-d tree (K-D tree) is constructed. The specific construction process is as follows. Assume a simple two-dimensional sequence data: {(2,3), (4,7), (5,4), (7,2), (8,1), (9,6)}. First, the variance of the data in the x and y directions is calculated respectively. It is found that the variance in the x direction is larger, so the partition domain is determined to be the x-axis direction; secondly, the median value is 7 according to the x-axis value 2, 5, 9, 4, 8, 7, so the splitting hyperplane is (7,2) and perpendicular to the x-axis; then the left subspace and the right subspace are determined. The splitting hyperplane divides the entire space into two parts, as shown in Figure 2 (a). The left subspace contains 3 nodes {(2,3), (4,7), (5,4)}; the right subspace contains 2 nodes {(8,1), (9,6)}, and then continue to recurse until each space contains only one data point, generating the final K-d tree, as shown in Figure 2 (b).

3.2.2遍历PISA约束集合，确定PISA异常阈值3.2.2 Traversing the PISA constraint set and determining the PISA anomaly threshold

上述训练数据集包含实验按压造成的PISA事件数据，因此对PISA约束集合进行遍历，可以将训练数据集的PISA异常和其他正常生理事件区分开，因此可以得到半监督Semi-KNN模型的异常阈值σ，所述异常阈值的边界阈值为σ1和σ2，表示为σ＝[σ1，σ2]。The above training data set contains PISA event data caused by experimental pressing. Therefore, by traversing the PISA constraint set, the PISA abnormalities in the training data set can be distinguished from other normal physiological events. Therefore, the abnormality threshold σ of the semi-supervised Semi-KNN model can be obtained. The boundary thresholds of the abnormality threshold are σ1 and σ2, expressed as σ=[σ1, σ2].

计算每一个PISA事件于其他正常生理事件的平均距离，该距离通常采用欧式距离，但为了适合血糖序列数据类型，更好的表达时间序列的形状相似性，这里使用在时间轴下warping扭曲以达到更好的对齐效果的DTW相似性度量函数。得到所有的平均距离之后，计算该距离集合的上四分位数Q3和下四分位数Q1，则模型的异常阈值σ1＝Q3+1.5(Q3-Q1)，σ2＝Q1-1.5(Q3-Q1)，当距离正常样本的距离dist>σ1或dist<σ2时，认为该样本为异常样本。Calculate the average distance between each PISA event and other normal physiological events. This distance usually uses the Euclidean distance. However, in order to fit the blood glucose sequence data type and better express the shape similarity of the time series, the DTW similarity metric function is used here, which is warped under the time axis to achieve a better alignment effect. After obtaining all the average distances, calculate the upper quartile Q3 and lower quartile Q1 of the distance set. Then the abnormal threshold of the model is σ1 = Q3 + 1.5 (Q3-Q1), σ2 = Q1-1.5 (Q3-Q1). When the distance from the normal sample is dist>σ1 or dist<σ2, the sample is considered to be an abnormal sample.

也就是说，待测血糖数据距离PISA约束中ML关系的样本距离dist<σ2，则确定为异常样本；待测血糖数据距离PISA约束中CL关系的样本距离dist>σ1，则确定为异常样本。That is to say, if the sample distance between the blood glucose data to be tested and the ML relationship in the PISA constraint is dist<σ2, it is determined to be an abnormal sample; if the sample distance between the blood glucose data to be tested and the CL relationship in the PISA constraint is dist>σ1, it is determined to be an abnormal sample.

4.使用阶段——获得新到血糖数据，执行约束传播4. Use phase - get new blood glucose data and perform constraint propagation

半监督Semi-KNN模型训练完成之后，可以对实时采集的血糖数据进行检验，当新到的血糖序列包含PISA故障事件(即非PISA事件)时，可以做到准确的识别。After the semi-supervised Semi-KNN model is trained, the real-time collected blood glucose data can be tested. When the newly received blood glucose sequence contains PISA failure events (i.e., non-PISA events), accurate identification can be achieved.

4.1待测血糖数据4.1 Blood glucose data to be measured

获取待分析的连续且至少包含45分钟内的血糖值，也就是9个血糖测量点，并进行预处理，如滤波处理，并通过滑动窗口方式实现去除待测血糖信息中的孤立噪声点并实现缺失值填补，得到待测血糖信息的待测血糖序列。4.2约束传播Obtain the continuous blood glucose values to be analyzed that are within at least 45 minutes, that is, 9 blood glucose measurement points, and perform preprocessing, such as filtering, and use the sliding window method to remove isolated noise points in the blood glucose information to be measured and fill in missing values to obtain the blood glucose sequence to be measured. 4.2 Constraint Propagation

对待测血糖序列执行约束传播算法，目的是将原先训练数据集中的PISA约束集合进行扩充，从而在半监督Semi-KNN模型使用时辅助判断，使结果更加精确。The constraint propagation algorithm is executed on the blood glucose sequence to be tested, with the aim of expanding the PISA constraint set in the original training data set, thereby assisting judgment when using the semi-supervised Semi-KNN model and making the results more accurate.

具体执行过程如下：The specific implementation process is as follows:

4.2.1遍历原先训练数据集中的PISA约束集合，计算SBD距离4.2.1 Traverse the PISA constraint set in the original training data set and calculate the SBD distance

当一个序列子集为D＝{q1,q2,…,qm}，D中的任意qi将其他序列视为最近邻，最近邻的意思是当前序列到其他子序列ql的距离均小于阈值λ，该阈值可以通常是人为规定，上述的成对约束包含以下性质：When a sequence subset is D = {q1, q2, ..., qm}, any qi in D regards other sequences as nearest neighbors. The nearest neighbor means that the distance from the current sequence to other subsequences ql is less than the threshold λ, which can usually be artificially defined. The above pairwise constraints include the following properties:

1)任意q∈D，可以生成ML(q，r)，其中r∈D；1) For any q∈D, ML(q,r) can be generated, where r∈D;

2)给定ML(p，q)、(q，r)可以得到ML(p，r)；2) Given ML(p, q) and (q, r), we can get ML(p, r);

3)给定ML(p，q)，可以生成ML(c，p)和ML(p，r)，其中c∈D,r∈D；3) Given ML(p, q), we can generate ML(c, p) and ML(p, r), where c∈D, r∈D;

4)给定CL(p，q)，可以生成CL(c，p)和CL(p，r)，其中c∈D,r∈D。4) Given CL(p, q), CL(c, p) and CL(p, r) can be generated, where c∈D, r∈D.

本实施例中选择的距离度量为SBD距离，SBD算法是一种基于互相关的形状相似性度量，其高效且无参数的特性是DTW度量所不可比拟的，DTW是一种精度很高但计算成本也很高的测量方法。并且SBD算法的精度接近于DTW算法，因此SBD可以更好的用来度量CGM曲线间的相似性，并方便实现在线的相似性计算。The distance metric selected in this embodiment is SBD distance. The SBD algorithm is a shape similarity metric based on cross-correlation. Its high efficiency and parameter-free characteristics are incomparable to the DTW metric. DTW is a measurement method with high accuracy but high computational cost. In addition, the accuracy of the SBD algorithm is close to that of the DTW algorithm, so SBD can be better used to measure the similarity between CGM curves and facilitate online similarity calculation.

4.2.2执行约束传播4.2.2 Execution Constraint Propagation

当待测血糖序列中每一序列的血糖数据A与训练数据集中某个PISA事件B的SBD距离小于阈值λ时，即f_SBD(A,B)<λ，则可以确定一个约束关系ML(A,B)，由上述约束传播性质可以对约束集合进行更新。When the SBD distance between the blood glucose data A of each sequence in the blood glucose sequence to be tested and a PISA event B in the training data set is less than the threshold λ, that is, f _SBD (A, B) <λ, a constraint relationship ML (A, B) can be determined, and the constraint set can be updated according to the above constraint propagation properties.

当待测血糖序列中每一序列的血糖数据A与训练数据集中某个PISA事件B的CL约束关系的SBD距离小于阈值λ时，即f_SBD(A,B)<λ，则可以确定一个约束关系CL(A,B)，同样由上述约束传播性质可以对约束集合进行更新。When the SBD distance between the blood glucose data A in each sequence of the blood glucose sequence to be tested and the CL constraint relationship of a PISA event B in the training data set is less than the threshold λ, that is, f _SBD (A, B) <λ, a constraint relationship CL(A, B) can be determined. Similarly, the constraint set can be updated by the above-mentioned constraint propagation property.

5.使用阶段——使用半监督Semi-KNN模型检测异常，并判断异常等级5. Use stage - use the semi-supervised Semi-KNN model to detect anomalies and determine the anomaly level

执行完步骤4之后，输入半监督Semi-KNN模型的为待测血糖序列中每一序列的血糖数据和对PISA约束集合扩充的约束关系，使用步骤3.2.2所建立的半监督Semi-KNN模型进行异常检测，并根据约束关系判断异常等级，具体执行步骤如下：After executing step 4, the input of the semi-supervised Semi-KNN model is the blood glucose data of each sequence in the blood glucose sequence to be tested and the constraint relationship expanded to the PISA constraint set. The semi-supervised Semi-KNN model established in step 3.2.2 is used for anomaly detection, and the anomaly level is determined according to the constraint relationship. The specific execution steps are as follows:

5.1搜索k最近邻，并计算平均距离5.1 Search for k nearest neighbors and calculate the average distance

待测血糖序列中每一序列的血糖数据(即新的血糖数据)输入到半监督Semi-KNN模型中，通过步骤3.2.1建立的k维搜索二叉树，可以得到距离每一序列的血糖数据最近的k个数据点，比如依据上述步骤3.2.1建立的k维搜索二叉树样本，假设每一序列的血糖数据是(2.1,3.1)，如图3所示。首先通过二叉搜索，找到最近邻的近似点(2,3)，计算距离是0.1414；然后回溯到父节点(5,4)，以(2.1,3.1)为圆心，0.1414为半径，发现与y＝4的超平面不相交，因此不用进入到右子空间搜索；然后再回溯到父节点(7,2)，该圆也不与x＝7的超平面相交，因此不用进入(7,2)的右子空间搜索；至此回溯完毕，得到最近的样本为(2,3)，循环迭代k次即可得到最近的k个样本点，称为k近邻。k近邻得到后，计算与每一序列的血糖数据的平均距离，用该距离作为异常判定的得分，当该得分满足3.2.2阈值情况时，将该样本点视作PISA故障事件。The blood glucose data of each sequence in the blood glucose sequence to be tested (i.e., new blood glucose data) is input into the semi-supervised Semi-KNN model. Through the k-dimensional search binary tree established in step 3.2.1, the k data points closest to the blood glucose data of each sequence can be obtained. For example, based on the k-dimensional search binary tree sample established in the above step 3.2.1, it is assumed that the blood glucose data of each sequence is (2.1, 3.1), as shown in Figure 3. First, through binary search, find the nearest neighbor approximate point (2,3), and calculate the distance is 0.1414; then backtrack to the parent node (5,4), with (2.1,3.1) as the center and 0.1414 as the radius, and find that it does not intersect with the hyperplane of y=4, so there is no need to enter the right subspace search; then backtrack to the parent node (7,2), the circle does not intersect with the hyperplane of x=7, so there is no need to enter the right subspace search of (7,2); so far, the backtracking is completed, and the nearest sample is (2,3). After k iterations, the nearest k sample points can be obtained, which are called k nearest neighbors. After the k nearest neighbors are obtained, the average distance with each sequence of blood glucose data is calculated, and the distance is used as the score for abnormal judgment. When the score meets the threshold of 3.2.2, the sample point is regarded as a PISA failure event.

需要说明的是，在迭代过程中，当新到样本存在与PISA的约束关系时，应满足k近邻样本中不包含CL关系，即迭代过程中如果遇到CL关系的样本距离新到样本的距离满足k近邻的要求，应当舍弃该样本，继续向下迭代，约束关系对KNN的异常检测迭代过程指引作用可以用图4表示。It should be noted that during the iteration process, when the new sample has a constraint relationship with PISA, the k-nearest neighbor samples should not contain CL relationship. That is, if the distance between the sample with CL relationship and the new sample meets the requirement of k-nearest neighbor during the iteration process, the sample should be discarded and the iteration should continue. The guiding role of the constraint relationship on the iterative process of KNN anomaly detection can be represented by Figure 4.

5.2根据约束关系，输出异常等级5.2 Output abnormal level according to constraint relationship

上述步骤输出异常检测结果的同时，可以考虑统计k近邻中是否包含PISA的约束关系。也就是得到每一序列的血糖数据的在半监督Semi-KNN模型中的距离之后，除了与异常阈值进行比较，还需要考虑是否存在约束关系，即前述的约束传播阶段所获取的结果。While the above steps output the anomaly detection results, we can consider whether the PISA constraint relationship is included in the statistical k-nearest neighbors. That is, after obtaining the distance of each sequence of blood glucose data in the semi-supervised Semi-KNN model, in addition to comparing with the anomaly threshold, we also need to consider whether there is a constraint relationship, that is, the result obtained in the aforementioned constraint propagation stage.

具体操作如下：当不包含任何约束关系时，判断为异常等级1；当包含扩充的PISA约束集合的ML关系时，依据包含扩充的PISA约束集合的ML关系数量分配更高的异常等级2，3,…n。存在扩充的PISA约束集合的ML约束越多时，异常等级越高。The specific operation is as follows: when no constraint relationship is included, it is judged as abnormal level 1; when the ML relationship of the expanded PISA constraint set is included, a higher abnormal level 2, 3, ... n is assigned according to the number of ML relationships including the expanded PISA constraint set. The more ML constraints of the expanded PISA constraint set exist, the higher the abnormal level.

前述的DTW(Dynamic Time Warping)算法用于检测两条时间序列相似程度，对时间序列进行拉伸或压缩，使其尽可能的对齐。大部分情况下，两个序列整体上具有非常相似的形状，但是这些形状在x轴上并不是对齐的。在比较相似度之前，需要将其中一个(或者两个)序列在时间轴下warping扭曲，以达到更好的对齐。而DTW就是实现这种warping扭曲的一种有效方法。The aforementioned DTW (Dynamic Time Warping) algorithm is used to detect the similarity between two time series and stretch or compress the time series to align them as much as possible. In most cases, the two series have very similar shapes as a whole, but these shapes are not aligned on the x-axis. Before comparing the similarity, one (or both) of the series needs to be warped on the time axis to achieve better alignment. DTW is an effective way to achieve this warping distortion.

SBD算法是一种基于互相关(cross-correlation)的形状相似性度量，其高效且无参数的特性是DTW度量所不可比拟的，DTW是一种精度很高但计算成本也很高的测量方法。并且SBD算法的精度接近于DTW算法，因此SBD可以更好的用来度量CGM曲线间的相似性，并方便实现在线的相似性计算。The SBD algorithm is a shape similarity measure based on cross-correlation. Its high efficiency and parameter-free characteristics are incomparable to the DTW measure. DTW is a measurement method with high accuracy but high computational cost. The accuracy of the SBD algorithm is close to that of the DTW algorithm, so SBD can be better used to measure the similarity between CGM curves and facilitate online similarity calculation.

通过互相关计算两条时序数据之间的滑动内积，对于相位偏移具有原生的鲁棒性。对于给定的两条时间序列x＝(x₁,x₂,…,x_m)和y＝(y₁,y₂,…,y_m)，以及给定对应的相位差s，两条曲线的内积结果如下：The sliding inner product between two time series data is calculated by cross-correlation, which has native robustness to phase offset. For two given time series x = (x ₁ , x ₂ , …, x _m ) and y = (y ₁ , y ₂ , …, y _m ), and the corresponding phase difference s, the inner product of the two curves is as follows:

标准互相关NCC及距离度量SBD可计算如下：The standard cross-correlation NCC and distance metric SBD can be calculated as follows:

本实施例对上述相似度度量方式进行实验，实验结果可知，SBD相似性度量算法抗噪能力强，可以有效的区分出PISA故障事件与正常序列的波形差距，而其它序列之间的噪声差异可以有效避免；DTW相似性度量算法对形状特征敏感，在距离输出归一化后可以放大形状的微小差异；欧氏距离对血糖曲线幅值敏感，即在原始序列分布中的绝对距离差能被完整的表现出来。This embodiment experiments on the above similarity measurement methods. The experimental results show that the SBD similarity measurement algorithm has strong noise resistance and can effectively distinguish the waveform difference between the PISA fault event and the normal sequence, while the noise difference between other sequences can be effectively avoided; the DTW similarity measurement algorithm is sensitive to shape features and can amplify small differences in shape after the distance output is normalized; the Euclidean distance is sensitive to the amplitude of the blood glucose curve, that is, the absolute distance difference in the original sequence distribution can be fully expressed.

实施例三Embodiment 3

本实施例还提供一种电子设备，包括：存储器和处理器；所述处理器用于执行所述存储器中存储的计算机程序，以实现执行上述实施例一和实施例二任意所述的基于半监督Semi-KNN模型的PISA故障识别方法的步骤。This embodiment also provides an electronic device, including: a memory and a processor; the processor is used to execute the computer program stored in the memory to implement the steps of the PISA fault identification method based on the semi-supervised Semi-KNN model described in any of the above-mentioned embodiments 1 and 2.

应当注意的是，在权利要求中，不应将位于括号之间的任何附图标记理解成对权利要求的限制。词语“包含”不排除存在未列在权利要求中的部件或步骤。位于部件之前的词语“一”或“一个”不排除存在多个这样的部件。本发明可以借助于包括有若干不同部件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的权利要求中，这些装置中的若干个可以是通过同一个硬件来具体体现。词语第一、第二、第三等的使用，仅是为了表述方便，而不表示任何顺序。可将这些词语理解为部件名称的一部分。It should be noted that in the claims, any reference numerals placed between brackets shall not be construed as limiting the claims. The word "comprising" does not exclude the presence of components or steps not listed in the claims. The word "a" or "an" preceding a component does not exclude the presence of a plurality of such components. The invention may be implemented by means of hardware comprising several different components and by means of a suitably programmed computer. In the claims enumerating several means, several of these means may be embodied by the same hardware. The use of the words first, second, third, etc., is for convenience of expression only and does not indicate any order. These words may be understood as part of the component name.

此外，需要说明的是，在本说明书的描述中，术语“一个实施例”、“一些实施例”、“实施例”、“示例”、“具体示例”或“一些示例”等的描述，是指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In addition, it should be noted that, in the description of this specification, the description of the terms "one embodiment", "some embodiments", "embodiment", "example", "specific example" or "some examples" etc. means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art may combine and combine the different embodiments or examples described in this specification and the features of the different embodiments or examples, unless they are contradictory.

尽管已描述了本发明的优选实施例，但本领域的技术人员在得知了基本创造性概念后，则可对这些实施例作出另外的变更和修改。所以，权利要求应该解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, those skilled in the art may make other changes and modifications to these embodiments after knowing the basic creative concept. Therefore, the claims should be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present invention.

显然，本领域的技术人员可以对本发明进行各种修改和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也应该包含这些修改和变型在内。Obviously, those skilled in the art can make various modifications and variations to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention should also include these modifications and variations.

Claims

1. A PISA fault identification method based on the semi-supervised Semi-KNN model, which is characterized by:

S10. Obtain the blood glucose information to be measured within the preset time period, and preprocess the blood glucose information to be measured to obtain the preprocessed blood glucose information to be measured;

S20. Based on the pre-established detection pressure sensor attenuation PISA constraint set and the preprocessed blood glucose information to be measured, use a similarity measurement processing method to obtain the constraint relationship to which the blood glucose information to be measured belongs;

The PISA constraint set is a set with ML constraints and CL constraints constructed based on prior knowledge during the training phase of the semi-supervised Semi-KNN model. Each element in the set is information on the first-order difference characteristics of the blood glucose subsequence;

The S20 includes: when the SBD distance between the blood glucose data A of each sequence in the blood glucose sequence to be measured and a PISA event B in the PISA constraint set is less than the threshold λ, that is, f _SBD (A, B) < λ, then determine a constraint. Relationship ML(A,B), and update the PISA constraint set; λ is a preset value greater than 0; f _SBD represents the function of calculating the SBD distance between two sequences;

When the SBD distance between the blood glucose data A of each sequence in the blood glucose sequence to be measured and the CL constraint relationship of a PISA event B in the PISA constraint set is less than the threshold λ, that is, f _SBD (A, B) < λ, then a constraint relationship is determined CL(A,B), and update the PISA constraint set;

Traverse each sequence in the blood glucose sequence to be measured, and use the updated PISA constraint set as the constraint relationship to which the blood glucose information to be measured belongs;

S30. Input the preprocessed blood glucose information to be measured and the constraint relationship into the pre-trained semi-supervised Semi-KNN model, and the semi-supervised Semi-KNN model outputs the classification result of the blood glucose information to be measured;

The semi-supervised Semi-KNN model is a semi-supervised model for identifying abnormalities in blood glucose information obtained by training the KNN model using a training data set and the PISA constraint set, and the training data set includes a first-order Differentially processed blood glucose data;

The S30 includes: based on the K-D tree and the blood glucose sequence to be measured, obtain the nearest K data points to each data in the blood glucose sequence to be measured in a loop iterative manner, and obtain the K-D tree of the use stage,

Based on the K-D tree of the usage stage, traverse the constraint relationship and obtain the classification results of the PISA abnormal information of the blood glucose sequence to be measured;

The DTW similarity measure function is used to calculate the actual distance between each PISA event and other events in the constraint relationship, and the actual distance is compared with the abnormal threshold σ = [σ1, σ2] to obtain the classification results of PISA events and non-PISA events; σ is the abnormal threshold of the semi-supervised Semi-KNN model, and the boundary thresholds of the abnormal threshold σ are σ1 and σ2.

2. The method according to claim 1, characterized in that before said S10, the method further includes:

S01. Obtain multiple historical blood glucose data with the help of the continuous blood glucose monitoring system CGM, preprocess each historical blood glucose data, and obtain a blood glucose sequence; each blood glucose sequence includes blood glucose data with PISA time stamp tags and non-PISA time Glucose data by stamping the label;

S02. Divide each blood glucose sequence into multiple subsequences, and perform first-order difference calculation on each subsequence to obtain a training data set;

S03. Based on prior knowledge and training data with PISA timestamp labels in the training data set, form rules according to semi-supervised constraints and generate a PISA constraint set;

S04. Use the training data set and the PISA constraint set to train the semi-supervised Semi-KNN model to obtain the trained semi-supervised Semi-KNN model;

The semi-supervised Semi-KNN model is an improved KNN model and is constructed in a semi-supervised manner.

3. The method according to claim 2, characterized in that said S04 includes:

S04-1. Traverse all subsequences of the training data set, construct an offline K-dimensional search binary tree, and obtain a K-D tree;

S04-2. Based on the K-D tree, traverse the PISA constraint set to obtain the abnormal threshold σ of the semi-supervised Semi-KNN model. The boundary thresholds of the abnormal threshold are σ1 and σ2, expressed as σ = [σ1, σ2];

Among them, the DTW similarity measure function is used to calculate the average distance between each PISA event and other events in the PISA constraint set to obtain the distance set;

Then the abnormal threshold σ = [σ1, σ2] is obtained according to the following formula (1);

σ1＝Q3+1.5(Q3-Q1), formula (1)

σ2=Q1-1.5(Q3-Q1),

If the distance between the blood glucose data to be measured and the sample distance of the ML relationship in the PISA constraint is dist<σ2, it is determined to be an abnormal sample; if the distance between the blood glucose data to be measured and the sample distance of the CL relationship in the PISA constraint is dist>σ1, it is determined to be an abnormal sample;

Q3 is the upper quartile in the distance set, and Q1 is the lower quartile in the distance set.

4. The method according to claim 2, characterized in that said S02 includes:

Sliding window processing is performed on each blood glucose sequence. In the blood glucose sequence X={x1,x2,…,xn}, several subsequences qi={ _xi , _xi+1 ,…, are formed after sliding windows of size w. x _i+k },

A sequence subset is D = {q1, q2,...,qm}. For each subsequence qi, the first-order difference calculation is performed according to formula (2). n represents the total length of the blood glucose sequence, and i is any one from 1 to n-w. The value represents the i-th sequence, and qi is the i-th sequence in the sequence subset;

h is the change amount of the first-order difference formula, and the value of h is 0.8-1.2;

After calculating the first-order difference for all subsequences, the first-order difference value of each subsequence is used as the training data set.

5. The method according to claim 1, characterized in that said S10 includes:

Use CGM to obtain blood glucose information to be measured for 30-45 minutes or more;

Filtering is performed, and the blood glucose information to be measured is preprocessed through a sliding window method to remove isolated noise points in the blood glucose information to be measured and to fill in missing values, thereby obtaining the blood glucose sequence to be measured of the blood glucose information to be measured.

6. The method according to claim 1, characterized in that, after comparing the actual distance with the abnormal threshold, the amount of data belonging to the ML constraint in the constraint relationship is determined, and the abnormal level value in the PISA event is determined based on the amount of data.

7. An electronic device, characterized in that it includes a memory and a processor, a computer program is stored in the memory, the processor executes the computer program stored in the memory, and executes any one of the above claims 1 to 6. The steps of the PISA fault identification method based on the semi-supervised Semi-KNN model are described.

8. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the half-based method as described in any one of claims 1 to 6 is implemented. Steps of the PISA fault identification method supervised by the Semi-KNN model.