JP2022512402A

JP2022512402A - Optimization of glycemic datasets for improved hypoglycemia prediction based on the incorporation of machine learning implementations

Info

Publication number: JP2022512402A
Application number: JP2021533545A
Authority: JP
Inventors: アヌアルイマンバエフ，
Original assignee: ノボ・ノルデイスク・エー／エス
Priority date: 2018-12-14
Filing date: 2019-12-11
Publication date: 2022-02-03
Also published as: US20220020497A1; WO2020120571A1; CN113168917A; EP3895179A1

Abstract

本発明は、分類子の取り込みに基づく改善された低血糖症予測のためのデータセット拡張のための方法に関し、対象者に関する未加工のデータセットを提供する工程であって、データセットが、所与のサンプリングレートで取得された複数のＢＧ値、およびそれらの値に関連付けられた、複数日Ｎにわたるタイムスタンプ、を含む、提供する工程と、評価ブロック値（ｅＨＨ）を入力Ｘとしてローリングスキームの時間的ビニングによってデータ変換を実施して、対応する予測値（ｐＨＨ）を出力Ｙとして作成する工程と、を含み、Ｘは、所与の過去の期間Ｔ－ｐに対する、ＢＧ値を含むスライディングウィンドウとして作成され、Ｙは、所与の将来の時間Ｔ－ｆにおけるＢＧ値が低血糖症状態を示す所与の閾値を下回るか否かを示すインジケータＩとして作成される。【選択図】図１３The present invention relates to a method for data set expansion for improved hypoglycemic prediction based on classifier uptake, the process of providing a raw data set for a subject, wherein the data set is located. The steps to be provided, including multiple BG values obtained at a given sampling rate, and the time stamps associated with those values over multiple days N, and the evaluation block value (eHH) of the rolling scheme as input X. A sliding window containing a BG value for a given past period Tp, including the step of performing data conversion by temporal binning to create the corresponding predicted value (pHH) as the output Y. Is created as an indicator I indicating whether the BG value at a given future time Tf is below a given threshold indicating a hypoglycemic condition. [Selection diagram] FIG. 13

Description

本開示は、概して、糖尿病に対するインスリン治療の管理において、患者および医療従事者を支援するためのシステムおよび方法に関する。特定の態様では、本発明は、機械学習（ＭＬ）の実装を取り込むために最適化されたより高いデータ分解能のための方法に関する。 The disclosure generally relates to systems and methods for assisting patients and healthcare professionals in the management of insulin treatment for diabetes. In certain aspects, the invention relates to methods for higher data resolution optimized for incorporating machine learning (ML) implementations.

真性糖尿病（ＤＭ）は、高血糖につながるインスリン分泌障害および様々な程度の末梢インスリン抵抗性である。２型真性糖尿病は、正常な生理的インスリン分泌の進行性の妨害を特徴とする。健康な個体では、膵β細胞による基礎インスリン分泌が連続的に起こり、食間で長期間にわたって定常グルコースレベルを維持する。健康な個体ではまた、食事に対応する初期の第１段階スパイクでインスリンが急速に放出され、続いて２～３時間後に基底レベルに戻る長期インスリン分泌が続く。何年も制御不良な高血糖症が続くと、複数の健康上の合併症を引き起こす可能性がある。真性糖尿病は、世界中の早期罹患率および死亡率の主な原因の１つである。 Diabetes mellitus (DM) is impaired insulin secretion leading to hyperglycemia and varying degrees of peripheral insulin resistance. Type 2 diabetes mellitus is characterized by a progressive obstruction of normal physiological insulin secretion. In healthy individuals, pancreatic β-cells continuously secrete basal insulin, maintaining steady-state glucose levels between meals for extended periods of time. In healthy individuals, insulin is also rapidly released during the initial first-stage spikes corresponding to the diet, followed by long-term insulin secretion that returns to basal levels after 2-3 hours. Years of uncontrolled hyperglycemia can lead to multiple health complications. Diabetes mellitus is one of the leading causes of early morbidity and mortality worldwide.

血糖／血漿グルコース（ＢＧ）の効果的な制御は、これらの合併症の多くを予防または遅延させることができるが、一度確立されるとそれらを元に戻すことができない可能性がある。したがって、糖尿病の合併症を予防するための努力において良好な血糖コントロールを達成することは、１型および２型糖尿病の治療における主要な目標である。特に、インスリン用量調節の頻繁な変化は、患者の血糖値の安定化を助けるための鍵となる（Ｂｅｒｇｅｎｓｔａｌｅｔａｌ．，“ＣａｎａＴｏｏｌｔｈａｔＡｕｔｏｍａｔｅｓＩｎｓｕｌｉｎＴｉｔｒａｔｉｏｎｂｅａＫｅｙｔｏＤｉａｂｅｔｅｓＭａｎａｇｅｍｅｎｔ？”ＤｉａｂｅｔｅｓＴｅｃｈ．ａｎｄＴｈｅｒａ．２０１２；１４（８）６７５－６８２）。インスリン薬剤治療レジメンを施すために、調節可能な工程サイズ、ならびに生理学的パラメータ推定および所定の空腹時血糖標的値を用いるスマートタイトレータが開発されている。長時間作用型基礎インスリンの最適な開始方法および滴定方法は、依然として決定されている。しかしながら、証拠は、多くの患者が、グルコース管理の目標レベルを達成するのに十分に滴定されたインスリン用量を受け取らないこと（最適以下の用量のままであり、治療目標に到達できない）が多いことを示唆している（Ｈｏｌｍａｎｅｔａｌ．，“１０－ｙｅａｒｆｏｌｌｏｗ－ｕｐｏｆｉｎｔｅｎｓｉｖｅｇｌｕｃｏｓｅｃｏｎｔｒｏｌｉｎｔｙｐｅ２ｄｉａｂｅｔｅｓ，”Ｎ．Ｅｎｇｌ．Ｊ．Ｍｅｄ．２００８；３５９：１５７７－１５８９）。 Effective control of blood glucose / plasma glucose (BG) can prevent or delay many of these complications, but once established, they may be irreversible. Therefore, achieving good glycemic control in efforts to prevent diabetic complications is a major goal in the treatment of type 1 and type 2 diabetes. In particular, frequent changes in insulin dose regulation are key to help stabilize the patient's blood glucose levels (Bergenstal et al., "Can a Tool that Automatic Titration be a Key to Diabetes Management?" and Thera. 2012; 14 (8) 675-682). To administer the insulin drug treatment regimen, smart titrators have been developed that use adjustable process sizes, as well as physiological parameter estimates and predetermined fasting blood glucose target values. The optimal initiation and titration method for long-acting basal insulin remains to be determined. However, the evidence is that many patients often do not receive adequately titrated insulin doses to reach their glucose control target levels (remaining suboptimal doses and failing to reach therapeutic goals). (Holman et al., "10-ear follow-up of intensive glucose control in type 2 diabetes," N. Engl. J. Med. 2008; 359: 1577-1589).

インスリンレジメンに関する主要な問題のうちの１つは、患者の自律性およびエンパワメントの欠如である。患者はしばしば、新しい滴定量を計算するために診療所を訪問しなければならない。診療所が患者のインスリン用量を滴定しなければならない場合、滴定用量の変更頻度には自然制限がある。自己滴定レジメンは、患者のエンパワメントを促進し、治療により深く関与することを可能にし、その結果、血糖コントロールの改善をもたらす可能性がある（Ｋｈｕｎｔｉｅｔａｌ．，“Ｓｅｌｆ－ｔｉｔｒａｔｉｏｎｏｆｉｎｓｕｌｉｎｉｎｔｈｅｍａｎａｇｅｍｅｎｔｏｆｐｅｏｐｌｅｗｉｔｈｔｙｐｅ２ｄｉａｂｅｔｅｓ：ａｐｒａｃｔｉｃａｌｓｏｌｕｔｉｏｎｔｏｉｍｐｒｏｖｅｍａｎａｇｅｍｅｎｔｉｎｐｒｉｍａｒｙｃａｒｅ，”Ｄｉａｂｅｔｅｓ，Ｏｂｅｓ．，ａｎｄＭｅｔａｂｏｌ．２０１２；１５（８）６９０－７００）。糖尿病の管理およびインスリンの滴定に積極的な役割を果たす患者は、自身のセルフケアに責任を持ち、自身の行動が自身の疾患に影響を及ぼし得ると強く信じ、より良い治療結果をもたらすことができる可能性がある（Ｎｏｒｒｉｓｅｔａｌ．，“Ｓｅｌｆ－ｍａｎａｇｅｍｅｎｔｅｄｕｃａｔｉｏｎｆｏｒａｄｕｌｔｓｗｉｔｈｔｙｐｅ２ｄｉａｂｅｔｅｓ：ａｍｅｔａ－ａｎａｌｙｓｉｓｏｎｔｈｅｅｆｆｅｃｔｏｆｇｌｙｃｅｍｉｃｃｏｎｔｒｏｌ．”ＤｉａｂｅｔｅｓＣａｒｅ．２００２；２５：１１５９－７１、Ｋｕｌｚｅｒｅｔａｌ．，“Ｅｆｆｅｃｔｓｏｆｓｅｌｆ－ｍａｎａｇｅｍｅｎｔｔｒａｉｎｉｎｇｉｎｔｙｐｅ２ｄｉａｂｅｔｅｓ：ａｒａｎｄｏｍｉｚｅｄ，ｐｒｏｓｐｅｃｔｉｖｅｔｒｉａｌ，”Ｄｉａｂｅｔ．Ｍｅｄ．２００７；２４：４１５－２３、Ａｎｄｅｒｓｏｎｅｔａｌ．，“Ｐａｔｉｅｎｔｅｍｐｏｗｅｒｍｅｎｔ：ｒｅｓｕｌｔｓｏｆａｒａｎｄｏｍｉｚｅｄｃｏｎｔｒｏｌｌｅｄｔｒｉａｌ．”ＤｉａｂｅｔｅｓＣａｒｅ．１９９５；１８：９４３－９）。さらに、患者が自身の滴定を管理している場合、滴定の頻度が増加し、それにより、患者が所望の血糖値を達成する可能性が高まる。 One of the major problems with insulin regimens is the lack of patient autonomy and empowerment. Patients often have to visit the clinic to calculate a new titration. If the clinic has to titrate a patient's insulin dose, there is a natural limit to the frequency of titration dose changes. Self-titration regimens can promote patient empowerment and allow them to be more involved in treatment, resulting in improved glycemic control (Khunti et al., “Self-titration of insulin in the management”). of people with type 2 diabetes: a practical solution to impedance management care, "Diabetes, Obes., And Metabol. 2012; 15" (15). Patients who play an active role in diabetes management and insulin titration are responsible for their own self-care, strongly believe that their actions can affect their disease, and can lead to better treatment results. There is a possibility (Norris et al., "Self-management education for adults with type 2 diabetes: a meta-analysis on the effect of glycemic control "Effects of self-management training in type 2 diabetes: a randomized, prospective trial," Diabet.Med.2007; 24:. ". Patient empowerment: results of a randomized controlled trial" 415-23, Anderson et al, Diabetes Care 1995; 18: 943-9). In addition, if the patient manages his or her own titration, the frequency of titration will increase, thereby increasing the likelihood that the patient will achieve the desired blood glucose level.

しかしながら、より積極的な滴定のアプローチでは、低血糖症事象（以下、「低血糖」）のリスクはより高くなり、毎日複数回の注射（ＭＤＩ）に基づく滴定レジメンの場合、リスクはさらに増す。これに対して、短期低血糖予測（ＳＴＨＰ）のためのいくつかのソリューション、例えば、Ｋｏｖａｔｃｈｅｖら（ＴｙｐｅＺｅｒｏ＆ＵｎｉｖｅｒｓｉｔｙｏｆＶｉｒｇｉｎｉａｇｒｏｕｐ）の“ＥｖａｌｕａｔｉｏｎｏｆａＮｅｗＭｅａｓｕｒｅｏｆＢｌｏｏｄＧｌｕｃｏｓｅＶａｒｉａｂｉｌｉｔｙｉｎＤｉａｂｅｔｅｓ”，ＤｉａｂｅｔｅｓＣａｒｅ，Ｖｏｌ２９（１１），２００６年１１月、Ｓｐａｒａｃｉｎｏら（ＣｏｂｅｌｌｉＬａｂｉｎＵｎｉｖｅｒｓｉｔｙｏｆＰａｄｏｖａ）の“ＧｌｕｃｏｓｅＣｏｎｃｅｎｔｒａｔｉｏｎｃａｎｂｅＰｒｅｄｉｃｔｅｄＡｈｅａｄｉｎＴｉｍｅＦｒｏｍＣｏｎｔｉｎｕｏｕｓＧｌｕｃｏｓｅＭｏｎｉｔｏｒｉｎｇＳｅｎｓｏｒＴｉｍｅ－Ｓｅｒｉｅｓ”，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＢｉｏｍｅｄｉｃａｌＥｎｇｉｎｅｅｒｉｎｇ，Ｖｏｌ．５４（５）２００７年５月、Ｆｒａｎｃら（ＶｏｌｕｎｉｔｓｗｉｔｈＳａｎｏｆｉ）の“Ｒｅａｌ－ｌｉｆｅａｐｐｌｉｃａｔｉｏｎａｎｄｖａｌｉｄａｔｉｏｎｏｆｆｌｅｘｉｂｌｅｉｎｔｅｎｓｉｖｅｉｎｓｕｌｉｎ－ｔｈｅｒａｐｙａｌｇｏｒｉｔｈｍｉｎｔｙｐｅ１ｄｉａｂｅｔｅｓｐａｔｉｅｎｔｓ”，ＤｉａｂｅｔｅｓＭｅｔａｂ．２００９年１２月，３５（６）：４６３－８、およびＳｕｄｈａｒｓａｎら（ＷｅｌｌＤｏｃ）（ＬＴＨＰ２４－ｈｏｕｒｓａｈｅａｄｌｉｔｅｒａｔｕｒｅｃｏｍｐａｒｉｓｏｎ）の“ＨｙｐｏｇｌｙｃｅｍｉａＰｒｅｄｉｃｔｉｏｎＵｓｉｎｇＭａｃｈｉｎｅＬｅａｒｎｉｎｇＭｏｄｅｌｓｆｏｒＰａｔｉｅｎｔｓｗｉｔｈＴｙｐｅ２Ｄｉａｂｅｔｅｓ”，ＪｏｕｒｎａｌｏｆＤｉａｂｅｔｅｓＳｃｉｅｎｃｅａｎｄＴｅｃｈｎｏｌｏｇｙ２０１５，Ｖｏｌ．９（１）８６－９０が提案されている。 However, with a more aggressive titration approach, the risk of hypoglycemic events (“hypoglycemia”) is higher, and with multiple daily injection (MDI) -based titration regimens, the risk is even higher. In contrast, some solutions for short-term hypoglycemic prediction (STHP), such as the “Evolution of a New Measure of Diabetes” from Kovatchev et al. (TypeZero & University of Virginia group), 11), November 2006, Sparacino et al. (Cobelli Lab in University of Padova) of "Glucose Concentration can be Predicted Ahead in Time From Continuous Glucose Monitoring Sensor Time-Series", IEEE Transactions on Biomedical Engineering, Vol. 54 (5) May 2007, Franc et al. (Volunts with Sanofi)'s "Real-life application and validation of flexible insulin-therapeutic algorithm" In December 2009, 35 (6): 463-8, and Sudharsan et al. (WellDoc) (LTHP 24-hours ahead literature comparison) "Hypoglycemia Prediction Using Machine Learning Models for Patients with Type 2 Diabetes", Journal of Diabetes Science and Technology 2015, Vol. 9 (1) 86-90 has been proposed.

この問題に対処するため、ＵＳ２００８／０１５４５１３は、糖尿病の最適な制御の維持に関連する方法、システム、およびコンピュータプログラム製品を開示しており、血糖自己モニタリング（ＳＭＢＧ）装置によって集められた血糖読み取り値に基づいて、今後の期間にわたる低血糖症、高血糖症、グルコース変動の増加、および不十分または過剰な試験のパターンを予測することを対象としている。ユーザの高血糖症のパターンを識別および／または予測するための方法は、複数のＳＭＢＧデータポイントを取得する工程と、所定の持続期間を有する期間内のＳＭＢＧデータポイントを分類する工程と、各期間のグルコース値を評価する工程と、上記評価に基づいて後続の期間にわたって高血糖症のリスクを示す工程と、を含む。評価には、上記グルコース値に基づいて高血糖症に対する個別偏差を決定する工程と、個別偏差および絶対偏差に基づいて上記各期間の複合確率を決定する工程と、各期間の上記複合確率を予め設定された閾値と比較する工程と、が含まれ得る。期間は、２４時間の日を、所定の持続期間を有する時間ビンに分割することを含み得る。 To address this issue, US2008 / 0154513 discloses methods, systems, and computer program products related to maintaining optimal control of diabetes, and blood glucose readings collected by a blood glucose self-monitoring (SMBG) device. Based on, it is intended to predict patterns of hypoglycemia, hyperglycemia, increased glucose variability, and inadequate or excessive testing over the coming period. Methods for identifying and / or predicting a user's pattern of hyperglycemia include the steps of acquiring multiple SMBG data points, classifying SMBG data points within a given duration, and each period. It comprises the step of evaluating the glucose level of the blood glucose level and the step of showing the risk of hyperglycemia over a subsequent period based on the above evaluation. The evaluation includes a step of determining the individual deviation for hyperglycemia based on the glucose value, a step of determining the combined probability of each of the above periods based on the individual deviation and the absolute deviation, and the above-mentioned combined probability of each period in advance. A step of comparing with a set threshold may be included. The period may include dividing the 24-hour day into time bins with a predetermined duration.

上述の問題に対処し、低血糖のリスクをよりうまく軽減するために、本発明の目的は、将来の低血糖を予測して、現在の推奨用量を低下させる能力を改善する方法およびシステムを提供し、これにより、より正確な滴定レジメンを可能にし、かつそれによって、２型糖尿病の治療を可能にすることである。本発明の特定の目的は、分類子の取り込みおよび機械学習アルゴリズムに基づく改善された低血糖症予測を可能にする、データセット最適化のための方法を提供することである。かかる方法は、投与ガイダンスシステムで使用するために当局の承認を受けるのにより適したものにするために、透明で制約のあるアプローチを使用する必要がある。 In order to address the above-mentioned problems and better reduce the risk of hypoglycemia, an object of the present invention provides methods and systems for predicting future hypoglycemia and improving the ability to reduce the current recommended dose. And this allows for a more accurate titration regimen, thereby enabling the treatment of type 2 diabetes. A particular object of the invention is to provide a method for dataset optimization that allows for improved classifier uptake and improved hypoglycemia prediction based on machine learning algorithms. Such methods need to use a transparent and constrained approach to make them more suitable for use in the dosing guidance system with the approval of the authorities.

ＵＳ２００８／０１５４５１３US2008 / 0154513

「ＣａｎａＴｏｏｌｔｈａｔＡｕｔｏｍａｔｅｓＩｎｓｕｌｉｎＴｉｔｒａｔｉｏｎｂｅａＫｅｙｔｏＤｉａｂｅｔｅｓＭａｎａｇｅｍｅｎｔ？」"Can a Tool that Insulin Titration be a Key to Diabetes Management?" 「１０－ｙｅａｒｆｏｌｌｏｗ－ｕｐｏｆｉｎｔｅｎｓｉｖｅｇｌｕｃｏｓｅｃｏｎｔｒｏｌｉｎｔｙｐｅ２ｄｉａｂｅｔｅｓ」"10-year follow-up of glucose glucose control in type 2 diabetes" 「Ｓｅｌｆ－ｔｉｔｒａｔｉｏｎｏｆｉｎｓｕｌｉｎｉｎｔｈｅｍａｎａｇｅｍｅｎｔｏｆｐｅｏｐｌｅｗｉｔｈｔｙｐｅ２ｄｉａｂｅｔｅｓ：ａｐｒａｃｔｉｃａｌｓｏｌｕｔｉｏｎｔｏｉｍｐｒｏｖｅｍａｎａｇｅｍｅｎｔｉｎｐｒｉｍａｒｙｃａｒｅ」"Self-titration of insulin in the management of people with type 2 diabetes: a practical solution to impact management in primary" 「Ｓｅｌｆ－ｍａｎａｇｅｍｅｎｔｅｄｕｃａｔｉｏｎｆｏｒａｄｕｌｔｓｗｉｔｈｔｙｐｅ２ｄｉａｂｅｔｅｓ：ａｍｅｔａ－ａｎａｌｙｓｉｓｏｎｔｈｅｅｆｆｅｃｔｏｆｇｌｙｃｅｍｉｃｃｏｎｔｒｏｌ」"Self-management education for adults with type 2 diabetes: a meta-analysis on the effect of glysmic control" 「Ｅｆｆｅｃｔｓｏｆｓｅｌｆ－ｍａｎａｇｅｍｅｎｔｔｒａｉｎｉｎｇｉｎｔｙｐｅ２ｄｉａｂｅｔｅｓ：ａｒａｎｄｏｍｉｚｅｄ，ｐｒｏｓｐｅｃｔｉｖｅｔｒｉａｌ」"Effects of self-management training in type 2 diabetes: a randomized, positive trial" 「Ｐａｔｉｅｎｔｅｍｐｏｗｅｒｍｅｎｔ：ｒｅｓｕｌｔｓｏｆａｒａｎｄｏｍｉｚｅｄｃｏｎｔｒｏｌｌｅｄｔｒｉａｌ」"Patient empowerment: results of a randomized controlled trial" 「ＥｖａｌｕａｔｉｏｎｏｆａＮｅｗＭｅａｓｕｒｅｏｆＢｌｏｏｄＧｌｕｃｏｓｅＶａｒｉａｂｉｌｉｔｙｉｎＤｉａｂｅｔｅｓ」"Evaluation of a New Measurement of Blood Glucose Variation in Diabetes" 「Ｇｌｕ-ｃｏｓｅＣｏｎｃｅｎｔｒａｔｉｏｎｃａｎｂｅＰｒｅｄｉｃｔｅｄＡｈｅａｄｉｎＴｉｍｅＦｒｏｍＣｏｎｔｉｎｕｏｕｓＧｌｕｃｏｓｅＭｏｎｉｔｏｒｉｎｇＳｅｎｓｏｒＴｉｍｅ－Ｓｅｒｉｅｓ」"Glucose Concentration can be Predicted Ahead in Time From Glucose Monitoring Sensor Time-Series" 「Ｒｅａｌ－ｌｉｆｅａｐｐｌｉｃａｔｉｏｎａｎｄｖａｌｉｄａｔｉｏｎｏｆｆｌｅｘｉｂｌｅｉｎｔｅｎｓｉｖｅｉｎｓｕｌｉｎ－ｔｈｅｒａｐｙａｌｇｏｒｉｔｈｍｓｉｎｔｙｐｅ１ｄｉａｂｅｔｅｓｐａｔｉｅｎｔｓ」"Real-life application and validation of flexible insulin-therapy algorithms in type 1 diabetes patients" 「ＨｙｐｏｇｌｙｃｅｍｉａＰｒｅｄｉｃｔｉｏｎＵｓｉｎｇＭａｃｈｉｎｅＬｅａｒｎｉｎｇＭｏｄｅｌｓｆｏｒＰａｔｉｅｎｔｓｗｉｔｈＴｙｐｅ２Ｄｉａｂｅｔｅｓ」"Hypoglycemia Prediction Machine Learning Machines for Patients with Type 2 Diabetes"

課題を解決するための手段
本発明の開示では、上記の目的のうちの１つ以上に対処する、または下記の開示だけでなく例示的な実施形態の説明からも明らかな目的に対処する、実施形態および態様が説明される。 Means for Solving the Problems In the disclosure of the present invention, one or more of the above objects are addressed, or the objectives apparent from the description of the exemplary embodiments as well as the following disclosure are addressed. The morphology and aspects are described.

本発明の第１の態様は、分類子の取り込みに基づく改善された低血糖症予測のためのデータセット最適化のための方法であって、方法が、対象者に関する未加工のデータセットを提供する工程であって、データセットが、所与のサンプリングレートで取得された複数のＢＧ値、およびそれらの値に関連付けられた、複数日Ｎにわたるタイムスタンプ、を含む、提供する工程と、評価ブロック値（ｅＨＨ）を入力Ｘとしてローリングスキームの時間的ビニングによってデータ変換を実施して、対応する予測値（ｐＨＨ）を出力Ｙとして作成する工程と、を含み、Ｘが、所与の過去の期間Ｔ－ｐに対する、ＢＧ値を含むスライディングウィンドウとして作成され、Ｙが、所与の将来の時間Ｔ－ｆにおけるＢＧ値が低血糖症状態を示す所与の閾値を下回るか否かを示すインジケータＩとして作成される、方法が提供される。 A first aspect of the invention is a method for dataset optimization for improved hypoglycemic prediction based on classifier uptake, wherein the method provides a raw dataset for the subject. The process of providing and the evaluation block that the dataset comprises, including multiple BG values acquired at a given sampling rate, and a time stamp associated with those values over multiple days N. A step of performing data conversion by temporal binning of a rolling scheme with a value (eHH) as input X to create a corresponding predicted value (pHH) as output Y, wherein X is a given past period. Created as a sliding window containing a BG value for Tp, an indicator I indicating whether Y is below a given threshold indicating a hypoglycemic condition at a given future time Tf. A method is provided, which is created as.

概して、予測モデルは、訓練されるデータによって決まる。上記の方法によって、同じ量のデータを、ランダムフォレスト（ＲＦ）分類子などの機械学習アルゴリズムにフィットし、それに応じて適合する、より効率的かつより良い方法で利用することができる。 In general, the predictive model depends on the data being trained. The above method allows the same amount of data to be utilized in a more efficient and better way that fits and fits machine learning algorithms such as random forest (RF) classifiers accordingly.

対照的に、ＵＳ２００８／０１５４５１３に開示されるような、低血糖症のパターンを予測することを対象とした以前の試みは、ＢＧデータの単純な時間的ビニング、およびそれに続く組織化されたデータの従来の数学的分析に依存してきた。 In contrast, previous attempts aimed at predicting patterns of hypoglycemia, as disclosed in US2008 / 0154513, were simple temporal binning of BG data, followed by organized data. It has relied on traditional mathematical analysis.

データ変換は、少なくとも２つの異なる過去の期間Ｔ－ｐにわたって実施され得る。Ｔ－ｆは、Ｔ－ｐに対応し得、例えば、１５分の予測値は、１５分のＢＧ値に基づく。 Data conversion can be performed over at least two different past periods Tp. Tf may correspond to Tp, for example, the 15 minute predicted value is based on the 15 minute BG value.

例示的な実施形態では、データ変換の工程は、Ｍ日間の評価ブロックへの日ごとのＢＧ値のローリングスキームの時間的ビニングによってデータ拡張を実施する工程の後に行われ、Ｍは、２以上であり、かつ複数日Ｎ未満である。 In an exemplary embodiment, the data conversion step is performed after the step of performing data expansion by temporal binning of the daily BG value rolling scheme to the M-day evaluation block, where M is 2 or greater. Yes, and less than N for multiple days.

かかるデータ拡張は、取得される未加工のデータセットが、Ｍ日間のインスリン滴定レジメン、例えば、変更前の同じインスリン用量を用いた３日間に基づく場合、かかるレジメンは通常、所与の基礎インスリンの使用説明書に示されるように、基礎インスリンの滴定のために使用される。ボーラスインスリンに基づくデータセットの場合、Ｍ＝１が妥当であろう。実際、Ｍ＝１の場合、実際の拡張は行われない。 Such data expansion is usually based on an M-day insulin titration regimen, eg, 3 days with the same insulin dose before modification, if the raw data set obtained is usually of a given basal insulin. Used for basal insulin titration as indicated in the instructions for use. For bolus insulin-based datasets, M = 1 would be appropriate. In fact, when M = 1, no actual expansion is done.

例示的な実施形態では、未加工のデータセットを提供する工程は、公称サンプリングレートに対応するリサンプリング、および欠落ＢＧ値を置き換えるための補間されたＢＧ値の作成を用いてデータプレパレーションを実施する工程の前に行われる。 In an exemplary embodiment, the step of providing a raw data set performs data preparation with resampling corresponding to the nominal sampling rate and creation of interpolated BG values to replace the missing BG values. It is done before the process of

本発明のさらなる態様では、分類子を訓練するための方法が提供され、分類子に最適化されたデータセットを取り込み、取り込まれたデータセットに基づいて分類子を訓練する、上述のように最適化されたデータセットを提供する工程を含む。分類子は、ランダムフォレスト分類子であってもよい。 A further aspect of the invention provides a method for training a classifier, which captures a classifier-optimized dataset and trains the classifier based on the captured dataset, as described above. Includes the step of providing a personalized dataset. The classifier may be a random forest classifier.

本発明のさらなる態様では、将来のＢＧ値を予測するための方法であって、対象者からＢＧ値の一連の評価を取得する工程と、上述のように訓練された分類子にＢＧ値の一連の評価を取り込む工程と、予測ＢＧ値を提供する工程と、を含む、方法が提供される。分類子を訓練したデータセットは、ＢＧ値の一連の評価と同じ対象者から取得されている場合がある。ＢＧ値の一連の評価は、継続的な血糖モニタリング（ＣＧＭ）、例えば、５分ごとにＢＧ値を生成することによって取得され得る。 In a further aspect of the invention, a method for predicting future BG values, a step of obtaining a series of evaluations of BG values from a subject, and a series of BG values in a classifier trained as described above. A method is provided that includes a step of incorporating the evaluation of the above and a step of providing a predicted BG value. The classifier-trained dataset may be obtained from the same subject as the series of assessments of BG values. A series of assessments of BG values can be obtained by continuous blood glucose monitoring (CGM), eg, generating BG values every 5 minutes.

本発明のなおもさらなる態様では、対象者からのデータセットの時間的最適化を実施するためのコンピュータ処理システムであって、コンピュータシステムが、１つ以上のプロセッサと、メモリと、を備え、メモリが、命令を含み、命令が、１つ以上のプロセッサによって実行されると、本発明の異なる態様に従って上で定義される方法を実施する、コンピュータ処理システムが提供される。 A still further aspect of the invention is a computer processing system for performing temporal optimization of a data set from a subject, wherein the computer system comprises one or more processors, a memory, and a memory. However, there is provided a computer processing system that includes instructions and, when the instructions are executed by one or more processors, implements the methods defined above according to different aspects of the invention.

特定の例示的な実施形態では、同じ量のデータを使用するが、より拡張された、よりスマートなフィットする方法でのデータの時間的最適化および拡張が、以下の工程を実行することによって提供される。
（１）欠落データの取り扱い：スプライン補間ソリューションを用いた５分間のリサンプリング：データサイズが、ソフトウェアコードの一部を用いたデータプレパレーションのデータ品質処理要件を達成する欠落データに応じて増加する。
（２）ローリングスキームの時間的ビニングによる評価限界履歴（ｅＨＨ）：３日間の調査ブロックの臨床的に導出された間隔または３日前の評価限界履歴（ｅＨＨ）内にネストされた一連のＣＧＭ測定値をビニングするために、標準的な逐次スキームとは対照的に、時間的に最適化された日ごとのローリングスキームを用いた３日間のブロックビニング。
（３）ローリングスキームの時間的ビニングによる低血糖症の予測限界履歴（ｐＨＨ）：今後のある将来の時間間隔、対応する以前の遡及的な時間間隔に基づく１５分、３０分、および６０分先の予測限界（ＰＨ）、または１５分、３０分、および６０分前の予測限界履歴（ｐＨＨ）でそれぞれ低血糖症の予測を繰り返し行う、ソフトウェアプログラム。各工程で、５分ごとに、逐次スキームとは反対にローリングスキームでも、ｐＨＨ＝ＰＨ予測が行われる。 Certain exemplary embodiments use the same amount of data, but time optimization and expansion of the data in a more expanded and smarter fitting way is provided by performing the following steps: Will be done.
(1) Handling of missing data: 5 minutes of resampling using a spline interpolation solution: The data size increases with missing data to meet the data quality processing requirements of data preparation with part of the software code. ..
(2) Time binning evaluation limit history (eHH) of the rolling scheme: a series of CGM measurements nested within the clinically derived interval of the 3-day study block or the evaluation limit history (eHH) 3 days ago. Three-day block binning with a time-optimized daily rolling scheme as opposed to the standard sequential scheme for binning.
(3) Predictive Limit History (pHH) of hypoglycemia by temporal binning of rolling schemes: 15 minutes, 30 minutes, and 60 minutes ahead based on some future time interval, corresponding previous retroactive time interval. A software program that repeatedly predicts hypoglycemia at the predicted limit (PH) of, or the predicted limit history (pHH) 15 minutes, 30 minutes, and 60 minutes ago, respectively. In each step, pHH = PH prediction is performed every 5 minutes in the rolling scheme as opposed to the sequential scheme.

これら３つの工程を全て合わせると、元の未処理のＢＧデータセットのサイズおよび深度が増加する。したがって、３つの工程の技法で変換された処理済みのデータセットは、ＭＬ分類子フォーマットに直接かつ迅速に、有意に大きなサイズだけではなく、深度および操作上の取り込み可能性も達成する。未処理または未加工のデータセットは、同じ効率で容易にまたは直ちにＭＬ分類子フォーマットに取り込むまたは供給することができない。 Combining all three steps increases the size and depth of the original unprocessed BG dataset. Therefore, the processed dataset converted by the three-step technique directly and quickly to the ML classifier format achieves not only significantly larger size, but also depth and operational capture potential. Raw or raw datasets cannot be easily or immediately populated or fed into the ML classifier format with the same efficiency.

合わせると、評価限界履歴および予測限界履歴の間隔のローリングスキームの時間的ビニングによる欠落データのスプライン補間は、高い感度（低血糖症事象の正確な予測）および高い特異性（非低血糖症事象の正確な予測）での低血糖症のより正確な予測を施すために、ＣＧＭ分解能データの最適化をもたらす。 Taken together, spline interpolation of missing data by temporal binning of the rolling scheme of the interval between the evaluation limit history and the prediction limit history is highly sensitive (accurate prediction of hypoglycemic events) and highly specific (non-hypoglycemic events). Accurate Prediction) provides optimization of CGM resolution data for more accurate prediction of hypoglycemia.

以下では、本発明の実施形態を、図面を参照しながら説明する。
本開示の一実施形態による例示的なデータプレパレーションモジュール示す。本開示の一実施形態による例示的なデータ変換モジュール示す。本開示の一実施形態による例示的なポインタルックアップテーブル示す。本開示の一実施形態による例示的な時間的ビニングの最適化示す。本開示の一実施形態による、異なるｐＨＨ値についての例示的な低血糖症判定モジュール示す。本開示の一実施形態による、異なるｐＨＨ値についての例示的な低血糖症判定モジュール示す。本開示の一実施形態による、異なるｐＨＨ値についての例示的な低血糖症判定モジュール示す。本開示の一実施形態による、後続のＭＬ処理のための例示的な訓練結果の保存示す。本開示の一実施形態による例示的なランダムフォレスト（ＲＦ）分類子の実装を示す。本開示の一実施形態による例示的なランダムフォレスト（ＲＦ）分類子の実装を示す。本開示の一実施形態によるＲＦ分類子の結果を示す。本開示の一実施形態によるＲＦ分類子の結果を示す。文献の結果と比較したＲＦ分類子の結果を示す。文献の結果と比較したＲＦ分類子の結果を示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。本開示の一実施形態による実施例を集合的に示す。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
An exemplary data preparation module according to an embodiment of the present disclosure is shown. An exemplary data conversion module according to an embodiment of the present disclosure is shown. An exemplary pointer look-up table according to an embodiment of the present disclosure is shown. An exemplary temporal binning optimization according to an embodiment of the present disclosure. An exemplary hypoglycemia determination module for different pHH values according to one embodiment of the present disclosure is shown. An exemplary hypoglycemia determination module for different pHH values according to one embodiment of the present disclosure is shown. An exemplary hypoglycemia determination module for different pHH values according to one embodiment of the present disclosure is shown. The preservation of exemplary training results for subsequent ML processing according to one embodiment of the present disclosure is shown. An exemplary implementation of a random forest (RF) classifier according to an embodiment of the present disclosure is shown. An exemplary implementation of a random forest (RF) classifier according to an embodiment of the present disclosure is shown. The results of the RF classifier according to one embodiment of the present disclosure are shown. The results of the RF classifier according to one embodiment of the present disclosure are shown. The results of the RF classifier compared to the results in the literature are shown. The results of the RF classifier compared to the results in the literature are shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown. The embodiments according to one embodiment of the present disclosure are collectively shown.

図において、同様の構造物は、主として同様の参照番号によって特定される。 In the figure, similar structures are identified primarily by similar reference numbers.

本開示は、少なくとも１人の対象者に関連する情報を含む訓練および試験データのセットの獲得に依存する。データセット（複数可）は、血糖履歴を確立するために、時間的経過にわたって得られた対象者の複数の血糖測定値と、複数の血糖測定値における各それぞれのグルコース測定について、時間的経過中のどの時点でそれぞれのグルコース測定が行われたかを表す対応する血糖タイムスタンプと、１つ以上の基礎インスリン注射履歴と、を少なくとも含み、注射履歴は、時間的経過のすべてまたは一部分の間の複数の注射と、複数の注射における各それぞれの注射について、対応する投与事象の量および時間的経過中のどの時点でそれぞれの注射事象が発生したかを表す投与事象のタイムスタンプと、を少なくとも含む。 The present disclosure relies on the acquisition of a set of training and test data containing information relevant to at least one subject. The dataset (s) are available over time for multiple glucose measurements of the subject obtained over time and for each glucose measurement at multiple glucose measurements to establish a glycemic history. Containing at least a corresponding glycemic time stamp indicating at what point in time each glucose measurement was taken and one or more basal insulin injection histories, the injection history is multiple during all or part of the time course. And for each injection in multiple injections, at least the amount of the corresponding dosing event and the time stamp of the dosing event indicating at what point in the time course each injecting event occurred.

ＳＴＨＰ分類子：ＳＴＨＰ分類子のデータプレパレーションおよびデータ変換
短期間における低血糖症または低血糖レベルの有害事象の予測もしくは検出を決定するために、１５分～最大６０分先、次いで、現在、実験上、および将来の機械学習の方法論の予測限界（ＰＨ）は、１日あたり１もしくは２ポイントでの血糖の自己モニタリング（ＳＭＢＧ）から、１５分間隔のフラッシュグルコースモニタ（ＦＧＭ）または５分間隔の連続グルコースモニタ（ＣＧＭ）まで、異なる時間的分解能を完全に取り込み、採用し、そして活用するために、最適化および適合を必要とする。 STHP classifier: Data preparation and data transformation of the STHP classifier 15 minutes to up to 60 minutes ahead, then currently experimental, to determine the prediction or detection of adverse events of hypoglycemia or hypoglycemic levels in a short period of time. The predictive limits (PH) of the above and future machine learning methodologies range from self-monitoring of blood glucose (SMBG) at 1 or 2 points per day to flash glucose monitor (FGM) at 15 minute intervals or 5 minute intervals. Up to the continuous glucose monitor (CGM), optimization and adaptation are required to fully capture, adopt, and leverage different temporal resolutions.

概して、予測モデルは、訓練されるデータによって決まる。したがって、データ品質を改善するか、または同じ量のデータをより効率的に利用することが最も重量であり価値がある。この現在のソリューションでは、ＣＧＭのより高い時間的分解能でより多くのデータを活用するだけでなく、このデータを、ランダムフォレスト（ＲＦ）分類子などの機械学習アルゴリズムにフィットし、それに応じて適合する、よりスマートでより良い方法で利用することも目指す。例えば、午後１２時～午後３時の間隔の空間で、Ｄｅｘｃｏｍによる毎時レポートを伴うＳＭＢＧの低分解能では、３つの間隔のみを取得することが可能である。ＣＧＭ高分解能および完全なデータ最適化により、２５個の間隔を取得し、ランダムフォレスト（ＲＦ）分類子などのＭＬモデルに供給することができる。 In general, the predictive model depends on the data being trained. Therefore, improving data quality or using the same amount of data more efficiently is the heaviest and most valuable. This current solution not only leverages more data with the higher temporal resolution of CGM, but also fits and adapts this data to machine learning algorithms such as random forest (RF) classifiers. We also aim to use it in a smarter and better way. For example, in the space between 12:00 pm and 3:00 pm, with the low resolution of SMBG with an hourly report by Dexcom, it is possible to acquire only three intervals. With CGM high resolution and full data optimization, 25 intervals can be obtained and fed to ML models such as random forest (RF) classifiers.

ランダムフォレスト分類子アルゴリズムに対するＣＧＭデータの現在の構成または利用は、以下のとおりである。例えば、次の６０分間（ＰＨ＝６０分先）の低血糖症を予測するには、過去６０分間を入力の予測限界履歴（ｐＨＨ）として利用するが、それでも評価限界履歴（ｅＨＨ）の過去３日間のブロック内に制限される。ＣＧＭデータがなく、ＳＭＢＧデータのみの場合、時間的シフトは１時間ごとに発生する。 The current configuration or use of CGM data for the Random Forest classifier algorithm is as follows: For example, in order to predict hypoglycemia for the next 60 minutes (PH = 60 minutes ahead), the past 60 minutes is used as the input prediction limit history (pHH), but the past 3 of the evaluation limit history (eHH) is still used. Limited to within the block of days. If there is no CGM data and only SMBG data, the temporal shift occurs every hour.

例えば、ＳＭＢＧデータでは、例えば、午後１２時～午後３時までの３時間の空間では、３つの時間的データの間隔、１）午後１２時～午後１時までの第１の間隔、２）午後１時～午後２時までの第２の間隔、および３）午後２時～午後３時までの第３の間隔のみが存在する。

For example, in SMBG data, for example, in a three-hour space from 12:00 pm to 3:00 pm, the interval between the three temporal data, 1) the first interval from 12:00 pm to 1:00 pm, and 2) afternoon. There is only a second interval from 1:00 pm to 2:00 pm and 3) a third interval from 2:00 pm to 3:00 pm.

これは、ＳＭＢＧまたは他の装置などの他の測定スキームの制約内では意味をなすが、ＣＧＭでは意味がない。このより低い分解能のスキームでは、ＣＧＭからのより高い分解能のデータを最適化し、完全に利用することができない。 This makes sense within the constraints of other measurement schemes such as SMBG or other equipment, but not in CGM. This lower resolution scheme does not allow the higher resolution data from CGM to be optimized and fully utilized.

ＣＧＭの時間的最適化は、ＣＧＭによって制約されているように、３時間の同じ空間内に、各５分間隔で、２５個の時間的データの間隔を適合させる。

Time optimization of CGM adapts 25 temporal data intervals, each 5 minutes apart, within the same space of 3 hours, as constrained by CGM.

午後１２時～午後３時：ＳＭＢＧ低分解能（Ｄｅｘｃｏｍは毎時レポート）：３つの間隔、ＣＧＭ高分解能（完全最適化）：２５個の間隔。 From 12:00 pm to 3:00 pm: SMBG low resolution (Dexcom reports hourly): 3 intervals, CGM high resolution (fully optimized): 25 intervals.

要約すると、上記は予測限界履歴（ｐＨＨ）の時間的ビニングの最適化である。したがって、この時間的データ最適化および適合により、機械学習ランダムフォレスト分類子のために３つのデータ間隔だけを準備する代わりに、２５個のデータの時間的間隔を準備し、機械学習ランダムフォレスト分類子のために用意し、これにより、データ利用可能性を増加させ、使用ケースを訓練する。 In summary, the above is an optimization of temporal binning of predicted limit history (pHH). Therefore, with this temporal data optimization and adaptation, instead of preparing only 3 data intervals for the machine learning random forest classifier, 25 data temporal intervals are prepared and the machine learning random forest classifier is prepared. Prepare for, thereby increasing data availability and training use cases.

当然のことながら、このＣＧＭデータの完全な利用は、単に次の論理工程として見なすことができ、真の改善は、機械学習アルゴリズムに対するＣＧＭデータのより高い分解能の適用にあり、そのうちのいくつか、例えば、時系列ＡＲＩＭＡモデルは、ｓｔａｔｓｍｏｄｅｌｓパッケージからのｓｅａｓｏｎａｌ＿ｄｅｃｏｍｐｏｓｅなどの他の関数によって捕捉される、日ごとに発生する強い季節成分がデータに明らかに存在する場合でも、日ごとの変動を捕捉するためのその多くの（１日あたり２８８ポイント）の季節パラメータではうまくいかない。 Not surprisingly, this full utilization of CGM data can only be seen as the next logical step, and the real improvement lies in the application of higher resolution of CGM data to machine learning algorithms, some of which. For example, the time series ARIMA model is for capturing daily variation, even if there is a clear daily strong seasonal component in the data, which is captured by other functions such as seasonal_decompose from the statusmodels package. Many of the seasonal parameters (288 points per day) do not work.

これらのＣＧＭデータ最適化および適合の方法ならびに関数がなければ、ランダムフォレスト分類子などの機械学習アルゴリズムは、十分に訓練されず、フィットもせず、予測を作成しようとしているデータを表すことができない。 Without these CGM data optimization and fitting methods and functions, machine learning algorithms such as Random Forest classifiers are not well trained, do not fit, and cannot represent the data for which predictions are being made.

中間の５分の間隔を利用するための医学的および科学的根拠は、時間的線形性、順序、および最小データ品質の仮定が維持されている限り、各１５分、３０分、または６０分の間隔は、将来に向けてのみ見積もられ、５分刻みで次々と線形状に続き、新しいウィンドウ内で捕捉され得る新しいデータトレンドを除き、午後１２時～午後１時のウィンドウと午後１２：０５分～午後１：０５分のウィンドウのどちらを適用しても違いはない。 The medical and scientific basis for utilizing the intermediate 5 minute interval is 15 minutes, 30 minutes, or 60 minutes, respectively, as long as the assumptions of temporal linearity, order, and minimum data quality are maintained. Intervals are estimated only for the future, followed by linear shapes in 5-minute increments, 12 pm to 1 pm window and 12:05 pm, except for new data trends that can be captured within the new window. It makes no difference whether you apply the window from minutes to 1:05 pm.

例えば、５分ごとの間隔での単一ポイントのＣＧＭ分解能の代わりに、１時間ごとの単一ポイントのＳＭＢＧ分解能内では、仮に、午後１２時～午後１時が欠落している場合、そのデータを、リスクの高い外挿による場合を除き、書き込む方法はない。ＣＧＭ分解能では、午後１２時～午後１時までの間隔が欠落しているが、午後１２：０５分～午後１：０５分までが利用可能な場合、そのＣＧＭの５分間隔シフトされた午後１２：０５分～午後１：０５分までの１時間の持続時間が、受諾されたデータになる。 For example, if 12:00 pm to 1:00 pm is missing within the SMBG resolution of a single point every hour instead of the CGM resolution of a single point at intervals of 5 minutes, that data. There is no way to write, except by high-risk extrapolation. The CGM resolution lacks the interval from 12:00 pm to 1:00 pm, but if 12:05 pm to 1:05 pm is available, the CGM is shifted by 5 minutes every 12 pm The duration of 1 hour from 05 minutes to 1:05 pm is the accepted data.

ＳＭＢＧ分解能では、午後１時～午後２時までの間隔が欠落している場合、午後１２時～午後１時までの間隔と、午後２時～午後３時までの間隔との間で補間することが可能であり、ある程度のリスクは生じるが、外挿ほどではない。ＳＭＢＧ分解能では、午後２時～午後３時までの間隔が欠落している場合、それは午後１２時～午後１時までの間隔が欠落している場合と同様の状況であり、その欠落データを書き込むために外挿が必要になる。基本的に、間隔のエッジケースは外挿を必要とするが、欠落データのケース間または間隔は補間を必要とする。どちらもリスクはあるが、補間は外挿よりもリスクが少ない。 In SMBG resolution, if the interval from 1:00 pm to 2:00 pm is missing, interpolate between the interval from 12:00 pm to 1:00 pm and the interval from 2:00 pm to 3:00 pm Is possible, and there is some risk, but not as much as extrapolation. In the SMBG resolution, if the interval from 2:00 pm to 3:00 pm is missing, it is the same situation as if the interval from 12:00 pm to 1:00 pm is missing, and the missing data is written. Therefore, extrapolation is required. Basically, interval edge cases require extrapolation, while missing data cases or intervals require interpolation. Both are risky, but interpolation is less risky than extrapolation.

ＣＧＭデータ最適化工程は、より高い分解能を利用し、かつもちろん医学的制約の範囲内で、代わりに他の５分シフトされた１時間の間隔に頼ることができることによって、この補間および外挿の必要性を取り除く。例えば、２０分を超えて欠落している場合、午後１２時～午後１時までの間隔を、仮に、午後１２：２５分～午後１：２５分までの間隔（午後１２時～午後１２：２５分までの間のすべての間隔が欠落、基本的に５つの間隔が欠落した状態）に置き換えることは得策ではない。そうでなければ、医学的、科学的、生理学的な視点から、２０分または４５分の間隔内で、相互に置換、平均化、または補間することができ、これにより、データが欠落している、不完全である、または破損している場合でも、データの品質および線形性のある程度の閾値が満たされている限り、ランダムフォレスト分類子などの機械学習アルゴリズムに確実にフィットまたは適合し得る適合関数の記述が可能になり、これは、ＭＢＧならびに他の方法論および装置のより低い時間的データ分解能を持つ非常に厳格で要求の厳しい閾値に対して、ＣＧＭはより高い時間的データ分解能を持つはるかに緩い閾値である。 The CGM data optimization process takes advantage of higher resolution and, of course, within medical constraints, can instead rely on other 5-minute-shifted 1-hour intervals for this interpolation and extrapolation. Remove the need. For example, if it is missing for more than 20 minutes, the interval from 12:00 pm to 1:00 pm is assumed to be the interval from 12:25 pm to 1:25 pm (12:00 pm to 12:25 pm). It is not a good idea to replace it with a state in which all the intervals up to the minute are missing, basically five intervals are missing). Otherwise, from a medical, scientific, and physiological point of view, they can be replaced, averaged, or interpolated with each other within 20 or 45 minute intervals, which is missing data. A fitting function that can reliably fit or fit machine learning algorithms such as random forest classifiers, even if they are incomplete or corrupted, as long as some thresholds of data quality and linearity are met. Allows the description of this to be much more stringent and demanding thresholds with lower temporal data resolution of MBG and other methodologies and devices, whereas CGM has higher temporal data resolution. It is a loose threshold.

これについて考える別の方法は、データ品質に関して、以下のとおりである。可能な限りすべてを使用するＣＧＭ最適化では（ただし、線形に制約される）、データの欠落または破損の余地があり、ランダムフォレスト分類子などの機械学習アルゴリズムには、予測を生成するのに十分なデータがまだある。ＳＭＢＧが３つの間隔のみである場合、１つの間隔が欠落していたとしても、ランダムフォレスト分類子の機械学習アルゴリズムは中断し、次の時間の予測を与えることはできない。 Another way to think about this is with respect to data quality: CGM optimizations that use everything possible (but linearly constrained) have room for data loss or corruption, and machine learning algorithms such as random forest classifiers are sufficient to generate predictions. Data is still available. If the SMBG has only three intervals, the machine learning algorithm of the random forest classifier is interrupted and cannot give a prediction for the next time, even if one interval is missing.

以下において、ＪｕｐｙｔｅｒＮｏｔｅｂｏｏｋコードにおけるデータプレパレーションモジュールの例示的な実施形態を説明する。図１を参照されたい。 Hereinafter, an exemplary embodiment of the data preparation module in the Jupyter Notebook code will be described. See FIG.

データプレパレーションモジュールは、「ｃｏｎｖｅｒｔＴｏＴＳ」および「ｒｅｍｏｖｅＮａＮｄａｙｓ」関数を採用する。「ｒｅｍｏｖｅＮａＮｄａｙｓ」関数自体は、データ変換モジュールの工程で網羅される、別の関数の出力ルックアップテーブル「ｐｏｉｎｔｅｒＴａｂｌｅ」を採用する。最後に、「ｉｎｔｅｒｐｏｌａｔｅＬｉｓｔ」関数が採用される。図１を参照されたい。 The data preparation module employs the "convertToTS" and "removeNaNdays" functions. The "removeNaNdays" function itself adopts the output lookup table "pointerTable" of another function, which is covered in the process of the data conversion module. Finally, the "interpolateList" function is adopted. See FIG.

より具体的には、以下が行われる。
１．対象者のＣＧＭデータが読み込まれる。対象者のＣＧＭデータは、表形式のデータフレームのオブジェクトタイプである。
２．（利用可能なラベルがある場合）対象者のＣＧＭデータは、任意の「ＳＭＰＧ」または他のデータラベルを除去し、「ＣＧＭ」データラベルのみを残す。
３．「ｃｏｎｖｅｒｔＴｏＴＳ」関数を採用すると、対象者のＣＧＭデータ（通常は表形式）が、さらなるデータプレパレーションのために時系列オブジェクトに変換される。
４．Ｐａｎｄａｓ時系列のネイティブリサンプリング関数を、少なくとも一部のＣＧＭデータを有する日のみを含む、対象者のＣＧＭ時系列のオブジェクトデータの平均値を用いて採用することは、「５－Ｔ」または５分のビンにリサンプリングすることでさらに準備される。欠落データがない場合、この工程は同じデータセットをもたらすが、データ解析のためにきちんと積み重ねられる。例えば、８５ｍｇ／ｄＬでの午後１２：０１：４３秒の時点は、同じ８５ｍｇ／ｄＬで午後１２：００分になる。また、９２ｍｇ／ｄＬでの午後１２：０６：２１秒は、同じ９２ｍｇ／ｄＬで午後１２：０５分になる。欠落データがある場合、このリサンプリング工程は、最初に、元の未加工のデータセットを、処理済みのより大きなデータセットへと実質的に増加させ、後続の工程で実際の値に変換する必要がある新しい欠落データまたはＮａＮを生成する。ただし、最初に、任意の完全なＮａＮ日を除去する必要がある。臨床研究では、完全なＮａＮ日は、基本的にベースラインと経過観察日との間の期間である。ベースラインおよび経過観察の両方のタイムスタンプが１つのデータオブジェクト内にあるため、リサンプリング工程は残念なことに、プログラムで除去する必要のある、不必要な欠落したＮａＮ日の非観察期間を追加する。これは次の工程で達成される。
５．「ｒｅｍｏｖｅＮａＮｄａｙｓ」関数を採用する。
入力：対象者のＣＧＭ［ＴｉｍｅＳｅｒｉｅｓ］オブジェクトデータタイプ
処理：完全に欠落したＮａＮ日間をスキャンして除去する
根拠：日と日との間の日全体を補間することもリスクである。リスクがはるかに低いのは、同日中にＣＧＭ値を補間することであり、これは、データプレパレーションの次の工程および最後の工程となる。
出力：対象者のＣＧＭ［Ｌｉｓｔ］オブジェクトデータタイプ。［ＴｉｍｅＳｅｒｉｅｓ］オブジェクトデータタイプがなくなった！
この関数は、データ変換モジュールの工程で説明する「ｐｏｉｎｔｅｒＴａｂｌｅ」関数を採用する。
６．「ｉｎｔｅｒｐｏｌａｔｅＬｉｓｔ」関数を採用して、この消去された処理済みＣＧＭ値のリストは、最終的に、少なくとも一部のＣＧＭが利用可能な状態の日の範囲内の任意のＮａＮまたは欠落データを書き込む高度なスプライン補間で補間される。 More specifically, the following is done:
1. 1. The CGM data of the target person is read. The subject's CGM data is an object type of tabular data frame.
2. 2. The subject's CGM data (if available) removes any "SMPG" or other data label, leaving only the "CGM" data label.
3. 3. When the "convertToTS" function is adopted, the subject's CGM data (usually in tabular form) is converted into a time series object for further data preparation.
4. Adopting the Pandas time series native resampling function with the mean value of the subject's CGM time series object data, including only the days with at least some CGM data, is "5-T" or 5 Further prepared by resampling into a minute bin. In the absence of missing data, this step yields the same dataset, but stacks neatly for data analysis. For example, at 12:01:43 pm at 85 mg / dL, it is 12:00 pm at the same 85 mg / dL. Further, 12:06:21 pm at 92 mg / dL becomes 12:05 pm at the same 92 mg / dL. If there is missing data, this resampling step must first substantially increase the original raw dataset to a larger dataset that has been processed and then convert it to the actual value in subsequent steps. Generates new missing data or NaN. However, first it is necessary to remove any complete NaN days. In clinical studies, a full NaN day is basically the period between baseline and follow-up days. The resampling process unfortunately adds an unnecessary missing NaN day non-observation period that needs to be programmatically removed, as both baseline and follow-up timestamps are in one data object. do. This is achieved in the next step.
5. The "removeNaNdays" function is adopted.
Input: Subject's CGM [Time Series] Object Data Type Processing: Scanning and removing completely missing NaN days Rationale: Interpolating the entire day between days is also a risk. A much lower risk is to interpolate the CGM values during the same day, which will be the next and final step of data preparation.
Output: Target CGM [List] object data type. [Time Series] Object data type is gone!
This function adopts the "pointerTable" function described in the process of the data conversion module.
6. Employing the "interpolateList" function, this list of erased processed CGM values will eventually write any NaN or missing data within the day range when at least some CGM is available. Interpolated by spline interpolation.

次に、データ変換モジュールは、１日２８８ポイントのＣＧＭのルックアップテーブルの「ｐｏｉｎｔｅｒＴａｂｌｅ」関数の出力を採用する。図２を参照されたい。 Next, the data conversion module adopts the output of the "pointerTable" function of the CGM look-up table of 288 points per day. See FIG.

より具体的には、以下が行われる。
１「ｐｏｉｎｔｅｒＴａｂｌｅ」関数は、２８８ポイントのＣＧＭをＩＤとして相互参照したルックアップテーブルを一度作成するだけである。
２．「ｐｏｉｎｔｅｒＴａｂｌｅ」関数を採用すると、ＣＧＭのリストは、相互参照された２８８個のＩＤを割り当てて、特定の値がその日のどの時点またはタイムスタンプにあるかを調整する。 More specifically, the following is done:
1 The "pointerTable" function only creates a look-up table that cross-references the 288-point CGM as an ID once.
2. 2. When the "pointerTable" function is adopted, the list of CGMs is assigned 288 cross-referenced IDs to adjust at what point or time stamp of the day a particular value is located.

ＣＧＭポインタテーブルルックアップサブモジュール
医学的および科学的な観点から、ＣＧＭデータポイントが、空腹時血漿グルコース（ＦＰＧ）の決定および確証のために、朝の午前または夕方の午後、とりわけ、夜間の夜の時間帯と朝の時間帯に関連付けられているかどうかを知ることが重要である。典型的なＣＧＭ日の２８８個のＩＤを相互参照することにより、ＣＧＭ値のリストオブジェクトを有するだけで、時系列オブジェクトなしで、かかる情報を依然として取得するために、単一の日のためのポインタルックアップテーブルを考案した。 CGM Pointer Table Lookup Submodule From a medical and scientific point of view, CGM data points are used to determine and confirm fasting plasma glucose (FPG) in the morning or evening afternoon, especially at night. It is important to know if it is associated with a time zone and a morning time zone. A pointer for a single day to still get such information, without a time series object, just by having a list object of CGM values by cross-referencing the 288 IDs of a typical CGM day. I devised a look-up table.

典型的なＣＧＭ日のポインタテーブルの２８８個のＩＤを利用することにより、タイムスタンプ成分をストリッピングして、ＣＧＭ値のリストのみを残すことができる。次に、このＣＧＭ値のリストをＭＬ分類子フォーマットアルゴリズムに供給し、取り込むことができる。残念ながら、時系列オブジェクト自体を、ＭＬ分類子フォーマットアルゴリズムに供給することはできない。したがって、ＣＧＭの２８８ポイントのＩＤテーブルとの相互参照が必要である。 By utilizing the 288 IDs of a typical CGM day pointer table, the timestamp component can be stripped to leave only a list of CGM values. This list of CGM values can then be fed and populated into the ML classifier format algorithm. Unfortunately, the time series object itself cannot be supplied to the ML classifier format algorithm. Therefore, a cross-reference with the CGM 288-point ID table is required.

１日の内の時点または時間の情報（例えば、その日の２８８個のＣＧＭポイントのうちｉｄ＝１０が、午前０：５０分または午前１２：５０分の時点に対応する）を保持するために、毎日５分の２８８ポイントのＣＧＭ工程のポインタルックアップテーブルを作成する。図３を参照されたい。 To retain time-of-day or time information (eg, id = 10 of the 288 CGM points of the day corresponds to 0:50 am or 12:50 am). Create a pointer lookup table for the CGM process with 288/5 points daily. See FIG.

頂部（左の図）については、ポインタテーブルｉｄ＝９は、午前１２：４５分の実際の時点に対応し、底部（右の図）については、ポインタテーブルｉｄ＝２８７は、午後２３：５５分または午後１１：５５分に対応する。 For the top (left figure), the pointer table id = 9 corresponds to the actual time point at 12:45 am, and for the bottom (right figure), the pointer table id = 287 is 23:55 pm. Or it corresponds to 11:55 pm.

したがって、かかるポインタルックアップテーブルでは、ＣＧＭ値のリスト（数日、例えば、１４～１６日を含み得る）を反復処理し、利用可能な時点のデータなしに、１日の内のどの時間をＣＧＭ値が指しているかを理解することが可能になる。したがって、ポインタインデックス０が午前１２：００分で新しい日に対応するため、ＣＧＭ値の長いリストを、日ごとの塊に分けることが可能になる。 Therefore, such a pointer lookup table iterates over a list of CGM values (which may include several days, eg 14-16 days) and CGM at any time of the day without data at the time available. It becomes possible to understand what the value is pointing to. Therefore, since the pointer index 0 corresponds to the new day at 12:00 am, it is possible to divide the long list of CGM values into daily chunks.

ポインタＩＤ＝０が新しい日または翌日を示すため、ＣＧＭ値の合計リストは、その日の前のスタンドアロンリストへの入力を停止し、翌日のＣＧＭ値の新しいスタンドアロンリストを始めることができる。さらに、本アルゴリズムは、２８８ポイントすべてを含む丸１日しか追加しない。２８８ポイント未満の日は、丸１日としては追加されない。例えば、ユーザまたは患者のほとんどの臨床的または現実的な治験では、通常、最初と最後の日または数日が、２８８ポイント未満を有する。かかるデータのコーナーエッジキャップについて、欠落データを外挿、補間、または書き込むことは難しいため、かかるデータを利用しないことが最良である。最後に、本アルゴリズムは、終了ケースも処理する。そうしないと、試験で確認されるように、最終日が適切に追加されない。結果として、ここで、ＣＧＭ値の合計リストが日ごとの塊またはブロックにビニングされる。 Since pointer ID = 0 indicates a new day or the next day, the total list of CGM values can stop inputting to the stand-alone list before that day and start a new stand-alone list of CGM values for the next day. In addition, the algorithm adds only a full day, including all 288 points. Days with less than 288 points will not be added as a full day. For example, in most clinical or realistic clinical trials of users or patients, the first and last days or days usually have less than 288 points. It is best not to utilize such data because it is difficult to extrapolate, interpolate, or write missing data for the corner edge caps of such data. Finally, the algorithm also handles end cases. Otherwise, the final day will not be added properly, as confirmed by the exam. As a result, the total list of CGM values is now binned into daily chunks or blocks.

そのため、ｐｏｉｎｔｅｒＴａｂｌｅは、ＳＴＨＰ分類子コードベースの２つ場所でのみ呼び出される。
１．完全に欠落している日または後続の除去のためのＮａＮ日を識別および指定するために、「ｒｅｍｏｖｅＮａＮｄａｙｓ」機能を採用した。
２．主に、単一の日のブロックから３日間のブロックの評価限界履歴（ｅＨＨ）を作成することをタスクとする、データ変換モジュールの工程処理（ループの場合のステートメント）を採用した。
入力：ＣＧＭ値のクリーンなリスト
処理：「ｐｏｉｎｔｅｒＴａｂｌｅ」関数のｐｏｉｎｔｅｒＴａｂｌｅ出力との相互参照
出力：最初にＣＧＭ値の日ごとリストにビニングする（１日あたり２８８ポイントまたは日ごとの塊） Therefore, the pointerTable is called only in two places in the STHP classifier codebase.
1. 1. A "removeNaNdays" feature was employed to identify and specify the days that are completely missing or the NaN days for subsequent removal.
2. 2. Mainly, we adopted the process processing (statement in the case of a loop) of the data conversion module whose task is to create the evaluation limit history (eHH) of the block for 3 days from the block of a single day.
Input: Clean list of CGM values Processing: Cross-reference output with pointerTable output of "pointerTable" function: First binning to daily list of CGM values (288 points per day or chunk by day)

以下では、ローリングスキームの時間的ビニングによるＣＧＭのより高い時間的分解能の最適化を提供する、データ最適化モジュールを説明する。機械学習ランダムフォレスト分類子に取り込むための適合 The following describes a data optimization module that provides higher temporal resolution optimization of CGM by temporal binning of rolling schemes. Fit for importing into machine learning random forest classifiers

評価限界履歴（ｅＨＨ）－時間的ビニングの最適化。図４を参照されたい。
入力：ＣＧＭ値の日ごとリスト。ただし、３日間の塊またはブロックにはまだビニングされていない。
処理：ローリングスキームの時間的ビニングの第１の工程の利用
出力：次いで、これらの日ごとの塊を、３日間の塊またはブロックにビニングすることができる。
根拠：医学的および科学的な考慮事項と患者の生理学的調節期間のガイドライン、ならびにランダムフォレスト分類子に供給するためのモデル訓練期間の管理可能な入力の考慮事項に基づいて、日ごとおよび３日間の塊にビニングする。
１．ループのメインは、日ごとの履歴の塊を３日間の限界履歴（ＨＨ）の塊に変換することを扱う。
２．「ｆｕｎｃｔｏｏｌｓ」パッケージから「ｒｅｄｕｃｅ」関数を採用すると、結果として得られるリストのリストは、単に単一の実行中のリストに変換されるか、または低減されるか、またはフラット化されるが、この時間の各リストは、単一の日ではなく、臨床的に必要な３日間の観察または評価を表す。 Evaluation Limit History (eHH) -Optimization of temporal binning. See FIG.
Input: Daily list of CGM values. However, it has not yet been binned into a 3-day mass or block.
Treatment: Utilization of the first step of temporal binning of the rolling scheme Output: These daily chunks can then be binned into 3 day chunks or blocks.
Evidence: daily and 3 days based on medical and scientific considerations and guidelines for patient physiological regulation periods, as well as manageable input considerations for model training periods to feed random forest classifiers. Binning into a lump of.
1. 1. The main part of the loop deals with converting a daily history chunk into a three-day limit history (HH) chunk.
2. 2. Adopting the "reduce" function from the "functools" package simply converts, reduces, or flattens the resulting list of lists into a single running list. Each list of times represents a clinically necessary three-day observation or assessment rather than a single day.

これまでのところ、ＣＧＭデータには、５分のリサンプリング関数で増大する実質的な機会が１つしかなかった。補間関数の行いのすべては、５分のリサンプリング工程が既に増大または拡張されている欠落ＮａＮを書き込むことであった。そのため、補間関数はデータを増大または拡張させることができない。同様に、日ごとの塊へのビニングは、単に対象者のＣＧＭデータで利用可能な日数を示すようにセットアップされる。この工程では、全体的なデータ拡張は行われない。繰り返しになるが、データセットを増大させる第１の実質的な機会は、５Ｔまたは５分のリサンプリング工程であった。 So far, CGM data has only one substantial opportunity to increase with a 5-minute resampling function. All of the interpolation function's actions were to write missing NaNs that had already been augmented or extended by a 5-minute resampling step. Therefore, the interpolation function cannot grow or expand the data. Similarly, binning to the daily chunks is set up simply to indicate the number of days available in the subject's CGM data. No overall data expansion is done in this process. Again, the first substantial opportunity to grow the dataset was a 5T or 5 minute resampling step.

しかしながら、３日間のブロックにビニングするこの工程では、ＣＧＭデータを増大および拡張させるための第２の実質的な機会がある。 However, in this step of binning into blocks for 3 days, there is a second substantial opportunity to grow and expand CGM data.

１２日間の利用可能な合計ブロックへの典型的な３日間のブロックのビニング：４つの間隔を達成した。

Typical 3-day block binning to 12-day available total blocks: 4 intervals achieved.

上記の典型的なスキームは、ＳＭＢＧまたは他の装置のデータのために意味があり、３日間の各研究ブロック間で大幅な再キャリブレーションおよび計算を行う必要がある。しかし、これは、同日内にキャリブレーション（１－２）のみが必要であり、毎日計算を実行することができる、ＣＧＭデータにとって、ほとんど意味をなさない。したがって、２日目～４日目までなど、３日間のブロックを欠落させる意味はない。医学的、科学的、およびデータサイエンスの仮定は、ＣＧＭのより高い時間的分解能データを用いたこのローリングスキームの完全なデータ最適化の場合にも当てはまる。これらの仮定は、ＳＭＢＧおよび他の装置のデータについては当てはまらず、したがって、典型的なスキームが使用される。しかし、この典型的なスキームは、ＣＧＭの実装、特にＭＬ分類子の取り込みに準最適である。もちろん、ローリングスキームの問題は、ランダムフォレスト（ＲＦ）からサポートベクトルマシン（ＳＶＭ）、そしてｋ近傍法（ＫＮＮ）まで、ＭＬの方法によって迅速に採用可能であるように、さらに解され、詳細に適合される。 The above typical scheme is meaningful for SMBG or other device data and requires significant recalibration and calculations between each study block for 3 days. However, this makes little sense for CGM data, which requires only calibration (1-2) within the same day and can perform calculations daily. Therefore, there is no point in missing a block for 3 days, such as from the 2nd to the 4th day. The medical, scientific, and data science assumptions also apply to the complete data optimization of this rolling scheme with the higher temporal resolution data of CGM. These assumptions do not apply to SMBG and other device data, so typical schemes are used. However, this typical scheme is suboptimal for implementing CGM, especially for incorporating ML classifiers. Of course, the problem of rolling schemes is further solved and fitted in detail so that it can be quickly adopted by the ML method, from Random Forest (RF) to Support Vector Machines (SVMs) to k-nearest neighbors (KNN). Will be done.

それに応じて、３日間のブロックをビニングするための以下の最適化されたより多くのデータ収集方法が提供される。 Accordingly, the following optimized and more data collection methods for binning blocks for 3 days are provided.

１２日間の利用可能な合計ブロックへの最適化された３日間のブロックのビニング：

Optimized 3-day block binning to total blocks available for 12 days:

この最適化されたスキームで１０個の間隔が達成された。基本的には、合計ｎ－３個を含む。
低血糖症の予測限界履歴（ｐＨＨ）の時間的ビニングの最適化：
入力：評価限界履歴（ｅＨＨ）の３日間の塊またはブロック。
根拠：このセットアップは、次の３日間の臨床評価期間への時間的な落とし穴および出血のエラーを回避する。ＭＬ解析のためにきちんとパッケージ化される。
処理：ローリングスキームの時間的ビニングの第２の工程の利用
出力：予測限界履歴（ｐＨＨ）は、３日間の塊またはブロックの評価ＨＨ（ｅＨＨ）内にネストされる。これは、機械学習（ＭＬ）のために輪郭を描き、かつ患者の生理学的な調節もしくは整列にも準拠し得る、境界や境界線をセットアップするために不可欠である。この第２の革新的な工程では、これは、入力データを増大させるための第３の実質的な機会である。したがって、元の未加工の入力データは、３つの実質的な工程で、ＭＬ分類子フォーマット取り込み、モデル作成、訓練、および試験のために準備が整った、処理および浄化された入力データに増大または拡張されている。 Ten intervals were achieved with this optimized scheme. Basically, a total of n-3 pieces is included.
Optimization of temporal binning of hypoglycemic predictive limit history (pHH):
Input: A 3-day chunk or block of evaluation limit history (eHH).
Rationale: This setup avoids time pitfalls and bleeding errors into the next 3-day clinical evaluation period. Properly packaged for ML analysis.
Treatment: Utilization of the second step of temporal binning of the rolling scheme Output: Predicted limit history (pHH) is nested within a 3-day chunk or block evaluation HH (eHH). This is essential for setting up boundaries and boundaries that are contoured for machine learning (ML) and can also comply with the patient's physiological adjustments or alignments. In this second innovative process, this is a third substantive opportunity to increase the input data. Therefore, the original raw input data is augmented or purified into processed and purified input data ready for ML classifier format capture, modeling, training, and testing in three substantive steps. It has been expanded.

ｐＨＨ＝ＰＨ＝６０分の場合、図５を参照されたい。 See FIG. 5 for pHH = PH = 60 minutes.

ＭＬ分類子の入力のこれらの予測限界履歴（ｐＨＨ）を、評価限界履歴（ｅＨＨ）の作成によるデータプレパレーション、変換、およびデータ適合とは別個にモジュール化して作るこの最後のデータ最適化の工程では、この低血糖症の決定のみが、ｐＨＨ＝ＰＨ＝１５～ｐＨＨ＝ＰＨ＝３０分まで、ｐＨＨ＝ＰＨ＝６０分まで、異なる実装間で変化する。 This final data optimization step is to modularize these predicted limit histories (pHH) of the ML classifier inputs separately from the data preparation, transformation, and data conformance by creating the evaluation limit history (eHH). Now, only this determination of hypoglycemia varies between different implementations, from pHH = PH = 15 to pHH = PH = 30 minutes, pHH = PH = 60 minutes.

ｐＨＨ＝ＰＨ＝３０分については、図６を参照、ｐＨＨ＝ＰＨ＝１５分については、図７を参照されたい。 See FIG. 6 for pHH = PH = 30 minutes and FIG. 7 for pHH = PH = 15 minutes.

ここまでで、例示的な実施形態は、未加工で未処理のＣＧＭデータを、三度拡張され、時間的に最適化された、浄化され、処理され、ＭＬに取り込み可能なデータに変換する背後にあるコンピュータ処理による計算を網羅しており、ひいては、ランダムフォレスト（ＲＦ）分類子モデルに供給され得る。 So far, the exemplary embodiment is behind converting raw, unprocessed CGM data into data that has been extended three times, time-optimized, purified, processed, and incorporated into the ML. It covers computer-processed calculations in and can be fed into a random forest (RF) classifier model.

訓練－試験Ｘ－ｙセット（図８を参照されたい）の生成および保存に焦点を当てた最終データセクションでは、独立変数（Ｘ）および従属変数（ｙ）の両方が、訓練－試験分割データセットの部分と共に保存される。次いで、この特定のｐＨＨ＝ＰＨ＝６０分についてのこれらの最終データセッが試験コードセクションで検証される。以下を参照されたい。 In the final data section, which focuses on the generation and storage of the training-test X-y set (see Figure 8), both the independent variable (X) and the dependent variable (y) are the training-test split dataset. It is saved with the part of. These final datasets for this particular pHH = PH = 60 minutes are then validated in the test code section. See below.

これらの最終データセットが保存された後、実際のＳＴＨＰＲＦ分類子モデルを実行して、その最終データ入力を用いて作ることができる。 After these final datasets have been saved, the actual STHP RF classifier model can be run and created using the final data inputs.

単純な数値の例
以下において、単純な数値の例を使用して、上述のデータ処理工程を説明する。値は、この目的のためにランダムに発生したものであり、実データに基づくものではない。［ＫＥＹ］分子：＃日：＃ｍｇ／ｄＬでの１日あたり１２個のＣＧＭ値。この簡略化された説明に役立つ例の１２個のＣＧＭポイント内では、１５分および３０分先のｐＨＨのみが可能である。以下では、計算を主に１５分のｐＨＨについて行う。
０：１日目：［１５８、３３５、１４６、３７１、１０４、１７０、１０９、２９０、１２７、１５１、２３１、３７６］
１：２日目：［３４２、２０１、１７４、１００、２５３、３６、１３４、２７０、２２５、１１７、２０２、３５６］
２：３日目：［２４０、１７２、３２０、１７４、５７、２１５、２２５、１６３、２４６、２３５、１５９、３６］
３：４日目：［２４８、３４２、５２、３８８、３０９、２１９、２４３、２７５、１６６、１０７、１９１、２８８］
４：５日目：［２７９、７４、１４６、２７６、２８４、３３４、２０１、１８５、１８７、１５１、２４２、１１４］
５：６日目：［２１５、２８９、３３８、２８２、３３１、２８２、２１、１５２、２７０、８３、５７、１１４］

ｐＨＨ＝ＰＨ＝１５スライディングウィンドウ６つ。
入力：ブロック１のｅＨＨ：
０：１日目：［１５８、３３５、１４６、３７１、１０４、１７０、１０９、２９０、１２７、１５１、２３１、３７６］
Ｓｌｉｄｉｎｇ＿Ｗｉｎｄｏｗ１＝［１５８、３３５、１４６、３７１、１０４、１７０］
Ｘ１＝［１５８、３３５、１４６］～前の過去１５分の過去３つのＣＧＭポイントに対応
Ｙ１＝０～１７０＞７０＝０、１７０ｍｇ／ｄＬ＞７０ｍｇ／ｄＬの低血糖閾値であるため、低血糖なしに対応
したがって、Ｘ１がＸ（または入力、過去のＣＧＭのＢＧ値）に追加または付加されることになり、Ｙ１がＹ（出力、低血糖／低血糖なしのバイナリ分類子、オン／オフ）に追加または付加されることになる。
Ｓｌｉｄｉｎｇ＿Ｗｉｎｄｏｗ２＝［３３５、１４６、３７１、１０４、１７０、１０９］
Ｘ２＝［３３５、１４６、３７１］～前の過去１５分の過去３つのＣＧＭポイントに対応
Ｙ２＝０～１０９＞７０＝０、１０９ｍｇ／ｄＬ＞７０ｍｇ／ｄＬの低血糖閾値であるため、低血糖なしに対応
ここまでのＸおよびＹは以下のとおり。
Ｘ＝［［１５８、３３５、１４６］、～Ｘ［０］
［３３５、１４６、３７１］］～Ｘ［１］
Ｙ＝［０、０］～Ｙｓ［０］、Ｙｓ［１］
Ｓｌｉｄｉｎｇ＿Ｗｉｎｄｏｗ３＝［１４６、３７１、１０４、１７０、１０９、２９０］
Ｘ３＝［１４６、３７１、１０４］～前の過去１５分の過去３つのＣＧＭポイントに対応
Ｙ３＝０～２９０＞７０＝０、２９０ｍｇ／ｄＬ＞７０ｍｇ／ｄＬの低血糖閾値であるため、低血糖なしに対応
ここまでのＸおよびＹは以下のとおり。
Ｘ＝［［１５８、３３５、１４６］、～Ｘ［０］
［３３５、１４６、３７１］、～Ｘ［１］
［１４６、３７１、１０４］］～Ｘ［２］
Ｙ＝［０、０、０］～Ｙ［０］、Ｙ［１］、Ｙ［２］
Ｓｌｉｄｉｎｇ＿Ｗｉｎｄｏｗ４＝［３７１、１０４、１７０、１０９、２９０、１２７］
Ｘ４＝［３７１、１０４、１７０］～前の過去１５分の過去３つのＣＧＭポイントに対応
Ｙ４＝０～１２７＞７０＝０、１２７ｍｇ／ｄＬ＞７０ｍｇ／ｄＬの低血糖閾値であるため、低血糖なしに対応
ここまでのＸおよびＹは以下のとおり。
Ｘ＝［［１５８、３３５、１４６］、～Ｘ［０］
［３３５、１４６、３７１］、～Ｘ［１］
［１４６、３７１、１０４］、～Ｘ［２］
［３７１、１０４、１７０］］～Ｘ［３］
Ｙ＝［０、０、０、０］～Ｙｓ［０］、Ｙ［１］、Ｙ［２］、Ｙ［３］
Ｓｌｉｄｉｎｇ＿Ｗｉｎｄｏｗ５＝［１０４、１７０、１０９、２９０、１２７、１５１］
Ｘ５＝［１０４、１７０、１０９］～前の過去１５分の過去３つのＣＧＭポイントに対応
Ｙ５＝０～１５１＞７０＝０、１５１ｍｇ／ｄＬ＞７０ｍｇ／ｄＬの低血糖閾値であるため、低血糖なしに対応
ここまでのＸおよびＹは以下のとおり。
Ｘ＝［［１５８、３３５、１４６］、～Ｘ［０］
［３３５、１４６、３７１］、～Ｘ［１］
［１４６、３７１、１０４］、～Ｘ［２］
［３７１、１０４、１７０］、～Ｘ［３］
［１０４、１７０、１０９］］～Ｘ［４］
Ｙ＝［０、０、０、０、０］～Ｙ［０］、Ｙ［１］、Ｙ［２］、Ｙ［３］、Ｙ［４］
Ｓｌｉｄｉｎｇ＿Ｗｉｎｄｏｗ６＝［１７０、１０９、２９０、１２７、１５１、２３１］
Ｘ６＝［１７０、１０９、２９０］～前の過去１５分の過去３つのＣＧＭポイントに対応
Ｙ６＝０～２３１＞７０＝０、２３１ｍｇ／ｄＬ＞７０ｍｇ／ｄＬの低血糖閾値であるため、低血糖なしに対応
ここまでのＸおよびＹは以下のとおり。
Ｘ＝［［１５８、３３５、１４６］、～Ｘ［０］
［３３５、１４６、３７１］、～Ｘ［１］
［１４６、３７１、１０４］、～Ｘ［２］
［３７１、１０４、１７０］、～Ｘ［３］
［１０４、１７０、１０９］、～Ｘ［４］
［１７０、１０９、２９０］］～Ｘ［５］
Ｙ＝［０、０、０、０、０、０］～Ｙ［０］、Ｙ［１］、Ｙ［２］、Ｙ［３］、Ｙ［４］、Ｙ［５］
Ｓｌｉｄｉｎｇ＿Ｗｉｎｄｏｗ７＝［１０９、２９０、１２７、１５１、２３１、３７６］
Ｘ７＝［１０９、２９０、１２７］～前の過去１５分の過去３つのＣＧＭポイントに対応
Ｙ７＝０～３７６＞７０＝０、３７６ｍｇ／ｄＬ＞７０ｍｇ／ｄＬの低血糖閾値であるため、低血糖なしに対応
ここまでのＸおよびＹは以下のとおり。
Ｘ＝［［１５８、３３５、１４６］、～Ｘ［０］
［３３５、１４６、３７１］、～Ｘ［１］
［１４６、３７１、１０４］、～Ｘ［２］
［３７１、１０４、１７０］、～Ｘ［３］
［１０４、１７０、１０９］、～Ｘ［４］
［１７０、１０９、２９０］、～Ｘ［５］
［１０９、２９０、１２７］］～Ｘ［６］
Ｙ＝［０、０、０、０、０、０、０］～Ｙ［０］、Ｙ［１］、Ｙ［２］、Ｙ［３］、Ｙ［４］、Ｙ［５］、Ｙ［６］ Example of simple numerical value In the following, the above-mentioned data processing process will be described using an example of a simple numerical value. The values are randomly generated for this purpose and are not based on actual data. [KEY] Molecule: # Day: # 12 CGM values per day at mg / dL. Within the 12 CGM points of the example useful for this simplified explanation, only

pHH

15 and 30 minutes ahead is possible. In the following, the calculation is mainly performed for 15 minutes pH H.
Day 0: 1: [158, 335, 146, 371, 104, 170, 109, 290, 127, 151, 231, 376]
Day 1: 2: [342, 201, 174, 100, 253, 36, 134, 270, 225, 117, 202, 356]
2: Day 3: [240, 172, 320, 174, 57, 215, 225, 163, 246, 235, 159, 36]
3: Day 4: [248, 342, 52, 388, 309, 219, 243, 275, 166, 107, 191, 288]
4: Day 5: [279, 74, 146, 276, 284, 334, 201, 185, 187, 151, 242, 114]
5: Day 6: [215, 289, 338, 282, 331, 282, 21, 152, 270, 83, 57, 114]

pHH = PH = 15 6 sliding windows.
Input: eHH of block 1:
Day 0: 1: [158, 335, 146, 371, 104, 170, 109, 290, 127, 151, 231, 376]
Sliding_Windows1 = [158, 335, 146, 371, 104, 170]
X1 = [158, 335, 146] -corresponding to the past 3 CGM points of the previous 15 minutes Y1 = 0-170> 70 = 0, 170 mg / dL> 70 mg / dL hypoglycemia threshold, so hypoglycemia Therefore, X1 will be added or added to X (or input, BG value of past CGM), and Y1 will be Y (output, binary classifier without hypoglycemia / hypoglycemia, on / off). Will be added or added to.
Sliding_Windows2 = [335, 146, 371, 104, 170, 109]
X2 = [335, 146, 371] -corresponds to the past 3 CGM points of the previous 15 minutes Y2 = 0-109> 70 = 0, 109 mg / dL> 70 mg / dL hypoglycemia threshold, so hypoglycemia Correspondence without None The X and Y up to this point are as follows.
X = [[158, 335, 146], ~ X [0]
[335, 146, 371]] to X [1]
Y = [0,0] to Ys [0], Ys [1]
Sliding_Window3 = [146, 371, 104, 170, 109, 290]
X3 = [146, 371, 104] -corresponds to the past 3 CGM points of the previous 15 minutes Y3 = 0-290> 70 = 0, 290 mg / dL> 70 mg / dL hypoglycemia threshold, so hypoglycemia Correspondence without None The X and Y up to this point are as follows.
X = [[158, 335, 146], ~ X [0]
[335, 146, 371], ~ X [1]
[146, 371, 104]] to X [2]
Y = [0, 0, 0] to Y [0], Y [1], Y [2]
Sliding_Window4 = [371, 104, 170, 109, 290, 127]
X4 = [371, 104, 170] -corresponds to the past 3 CGM points of the previous 15 minutes Y4 = 0-127> 70 = 0, 127 mg / dL> 70 mg / dL hypoglycemia threshold Correspondence without None The X and Y up to this point are as follows.
X = [[158, 335, 146], ~ X [0]
[335, 146, 371], ~ X [1]
[146, 371, 104], ~ X [2]
[371, 104, 170]] to X [3]
Y = [0,0,0,0] to Ys [0], Y [1], Y [2], Y [3]
Sliding_Window5 = [104, 170, 109, 290, 127, 151]
X5 = [104, 170, 109] -corresponds to the past 3 CGM points of the previous 15 minutes Y5 = 0 to 151> 70 = 0, 151 mg / dL> 70 mg / dL hypoglycemia threshold Corresponding to none The X and Y up to this point are as follows.
X = [[158, 335, 146], ~ X [0]
[335, 146, 371], ~ X [1]
[146, 371, 104], ~ X [2]
[371, 104, 170], ~ X [3]
[104, 170, 109]] to X [4]
Y = [0,0,0,0,0] to Y [0], Y [1], Y [2], Y [3], Y [4]
Sliding_Window6 = [170, 109, 290, 127, 151, 231]
X6 = [170, 109, 290] -corresponding to the past 3 CGM points of the previous 15 minutes Y6 = 0-231> 70 = 0, 231 mg / dL> 70 mg / dL hypoglycemia threshold Corresponding to none The X and Y up to this point are as follows.
X = [[158, 335, 146], ~ X [0]
[335, 146, 371], ~ X [1]
[146, 371, 104], ~ X [2]
[371, 104, 170], ~ X [3]
[104, 170, 109], ~ X [4]
[170, 109, 290]] to X [5]
Y = [0,0,0,0,0,0] to Y [0], Y [1], Y [2], Y [3], Y [4], Y [5]
Sliding_Window7 = [109, 290, 127, 151, 231, 376]
X7 = [109, 290, 127] -corresponding to the past 3 CGM points of the previous 15 minutes Y7 = 0-376> 70 = 0, 376 mg / dL> 70 mg / dL hypoglycemia threshold Corresponding to none The X and Y up to this point are as follows.
X = [[158, 335, 146], ~ X [0]
[335, 146, 371], ~ X [1]
[146, 371, 104], ~ X [2]
[371, 104, 170], ~ X [3]
[104, 170, 109], ~ X [4]
[170, 109, 290], ~ X [5]
[109, 290, 127]] to X [6]
Y = [0,0,0,0,0,0,0] to Y [0], Y [1], Y [2], Y [3], Y [4], Y [5], Y [ 6]

要約すると、ブロック１の１日目のｅＨＨについてのみ、対応するＹ（出力）を有する７ｐＨＨ＝ＰＨ＝１５個のＸ（入力）を作成した。
ｅＨＨブロック１の残りの日については、同じ方法で値を計算する。
１：２日目：［３４２、２０１、１７４、１００、２５３、３６、１３４、２７０、２２５、１１７、２０２、３５６］
２：３日目：［２４０、１７２、３２０、１７４、５７、２１５、２２５、１６３、２４６、２３５、１５９、３６］
以下において、低血糖の発見をもたらす計算を説明する例を示す。
ｐＨＨ＝ＰＨ＝１５
１：２日目：［３４２、２０１、１７４、１００、２５３、３６、１３４、２７０、２２５、１１７、２０２、３５６］
Ｄａｙ２＿Ｓｌｉｄｉｎｇ＿Ｗｉｎｄｏｗ１＝［３４２、２０１、１７４、１００、２５３、３６］
Ｄａｙ２＿Ｘ１＝［３４２、２０１、１７４］～前の過去１５分の過去３つのＣＧＭポイントに対応
Ｄａｙ２＿Ｙ１＝１～３６＜７０＝１、３６ｍｇ／ｄＬ＞７０ｍｇ／ｄＬの低血糖閾値であるため、低血糖に対応
ここまでのＸおよびＹは以下のとおり。
Ｘ＝［［３４２、２０１、１７４］］
Ｙ＝［１］
ｐＨＨ＝ＰＨ＝３０
２：３日目：［２４０、１７２、３２０、１７４、５７、２１５、２２５、１６３、２４６、２３５、１５９、３６］
Ｄａｙ３＿Ｓｌｉｄｉｎｇ＿Ｗｉｎｄｏｗ１＝［２４０、１７２、３２０、１７４、５７、２１５、２２５、１６３、２４６、２３５、１５９、３６］
Ｄａｙ３＿Ｘ１＝［２４０、１７２、３２０、１７４、５７、２１５］～前の過去１５分の過去３つのＣＧＭポイントに対応
Ｄａｙ３＿Ｙ１＝１～３６＜７０＝１、３６ｍｇ／ｄＬ＞７０ｍｇ／ｄＬの低血糖閾値であるため、低血糖に対応
ここまでのＸおよびＹは以下のとおり。
Ｘ＝［［２４０、１７２、３２０、１７４、５７、２１５］］
Ｙ＝［１］ In summary, only for day 1 eHH of block 1, 7 pHH = PH = 15 Xs (inputs) with corresponding Ys (outputs) were created.
For the remaining days of eHH block 1, the values are calculated in the same way.
Day 1: 2: [342, 201, 174, 100, 253, 36, 134, 270, 225, 117, 202, 356]
2: Day 3: [240, 172, 320, 174, 57, 215, 225, 163, 246, 235, 159, 36]
The following is an example illustrating the calculations that result in the detection of hypoglycemia.
pHH = PH = 15
Day 1: 2: [342, 201, 174, 100, 253, 36, 134, 270, 225, 117, 202, 356]
Day2_Sliding_Window1 = [342, 201, 174, 100, 253, 36]
Day2_X1 = [342, 201, 174] -corresponds to the past 3 CGM points of the previous 15 minutes. Day2_Y1 = 1-36 <70 = 1, 36 mg / dL> 70 mg / dL, so hypoglycemia The X and Y up to this point are as follows.
X = [[342, 201, 174]]
Y = [1]
pHH = PH = 30
2: Day 3: [240, 172, 320, 174, 57, 215, 225, 163, 246, 235, 159, 36]
Day3_Sliding_Window1 = [240, 172, 320, 174, 57, 215, 225, 163, 246, 235, 159, 36]
Day3_X1 = [240,172,320,174,57,215] -corresponding to the past 3 CGM points of the previous 15 minutes Day3_Y1 = 1-36 <70 = 1, 36 mg / dL> 70 mg / dL hypoglycemia threshold Therefore, it corresponds to hypoglycemia. X and Y up to this point are as follows.
X = [[240, 172, 320, 174, 57, 215]]
Y = [1]

ランダムフォレスト（ＲＦ）分類子の実装。図９を参照されたい。
ランダムフォレスト分類子に対して実行される５００個の決定木（ｎ＿ｅｓｔｉｍａｔｏｒｓパラメータ）の要件は厳しい。ほとんどが１００～３００個の決定木で実行される。最も最先端で複雑だが説明が難しい、ＷｅｌｌＤｏｃ、ＵＶＡなどのような競合会社の低血糖予測アルゴリズムのニューラルネットワーク（ＡＮＮ、ＣＮＮなど）に対して、より単純で説明しやすい決定木ベースのランダムフォレスト（ＲＦ）分類子のパフォーマンスおよび競争力をもたらすために、決定木の数を、より標準的な１００または３００個から５００個まで増やすことは合理的であると考えられた。この訓練する決定木の数のパラメータおよび他のかかるパラメータをさらに微調整するために、許容度試験ためのさらなる研究開発ならびにローカルマシンおよびローカルホストサーバのメモリ不足の問題を回避し、Ｈａｄｏｏｐ、ＭａｐＲｅｄｕｃｅ、およびＡｍａｚｏｎＷｅｂＳｅｒｖｉｃｅｓのＳｐａｒｋ、ならびに他のかかるサービスを用いた分散並列コンピュータ処理に移行することが必要である。 Implementation of a random forest (RF) classifier. See FIG.
The requirements for the 500 decision trees (n_estimators parameter) executed for the random forest classifier are strict. Most are performed on 100-300 decision trees. A simpler, easier-to-explain decision tree-based random forest (ANN, CNN, etc.) for the most cutting-edge, complex, but difficult-to-explain, neural networks of competitors' hypoglycemic prediction algorithms such as WellDoc, UVA, etc. It was considered reasonable to increase the number of decision trees from the more standard 100 or 300 to 500 in order to bring about the performance and competitiveness of the RF) classifier. To further fine-tune this training decision tree number parameter and other such parameters, further research and development for tolerance testing and avoiding the problem of running out of memory on local machines and local host servers, Hadoop, MapReduce, And it is necessary to move to Spark of Amazon Web Services, as well as distributed parallel computer processing using such services.

データは、このような高いパラメータに適応するのに十分に堅牢性である必要がある。単純に供給された未加工のデータは、このように多くの決定木を有するランダムフォレスト分類子では実行することができない。したがって、評価限界履歴および予測限界履歴（ｅＨＨ、ｐＨＨ）へのローリングスキームの時間的ビニングを用いた、革新的なデータプレパレーション、変換、適合、および特に最適化の工程は、この分類ソリューションにとって非常に重要であり、それ以外の場合では、より回帰が保証された（ただし、回帰が多ければ多いほど、不十分なデータ品質が発生しやすくもなる）ソリューションであった。分類ベースのソリューションは、主に本発明開示で紹介されるデータ拡張および時間的最適化により、はるかに堅牢で、低品質のデータに対して耐性がある。 The data needs to be robust enough to adapt to such high parameters. The simply supplied raw data cannot be run on a random forest classifier with such a large number of decision trees. Therefore, innovative data preparation, transformation, adaptation, and especially optimization steps using temporal binning of rolling schemes to evaluation limit history and predictive limit history (eHH, pHH) are very important for this classification solution. In other cases, it was a solution with more guaranteed regression (although the more regressions, the more likely it is that poor data quality will occur). Classification-based solutions are much more robust and resistant to low quality data, primarily due to the data expansion and time optimization presented in the present invention.

図１０に示されるように、結果として得られるモデルを、ＰｙｔｈｏｎオブジェクトをＮｕｍＰｙ配列でシリアル化し、異なる圧縮フォーマットを試験するのに効率的なｊｏｂｌｉｂＡＰＩフォーマットで保存することもできる。ＸＺ、ＬＺＭＡ、および特にＢＺ２フォーマットは一貫して、Ｚ、ＧＺ、および特に準最適なＳＡＶ圧縮フォーマットよりも良好な（より小さいサイズのＭＢ）圧縮を実施する。 As shown in FIG. 10, the resulting model can also be serialized in a Python object with a NumPy sequence and stored in a joblib API format that is efficient for testing different compression formats. The XZ, LZMA, and especially BZ2 formats consistently perform better (smaller size MB) compression than the Z, GZ, and especially suboptimal SAV compression formats.

上述の開示を要約すると、「ローリングスキームの時間的ビニング」の使用により、同じ量の過去の履歴データまたは遡及データを、より拡張され、より良好で、よりスマートで、よりフィットした方法で利用し、元の未加工で未処理のデータセットを効果的に増大および増加させることができる。 To summarize the above disclosure, the use of "time binning of rolling schemes" utilizes the same amount of historical or retrospective data in a more extended, better, smarter and more fitted way. , The original raw and raw dataset can be effectively augmented and augmented.

特に、「ローリングスキームの時間的ビニング」の工程で構築された評価限界履歴および予測限界履歴（ｅＨＨ、ｐＨＨ）では、ランダムフォレスト（ＲＦ）、サポートベクトルマシン（ＳＶＭ）、およびｋ近傍法（ＫＮＮ）などのＭＬ分類方法に変換され、取り込まれる、なおもさらに利用可能なデータの間隔を供給するために、既に拡張されているデータセットがさらに最大化され、プライミングされる。 In particular, in the evaluation limit history and predicted limit history (eHH, pHH) constructed in the process of "temporal binning of rolling schemes", random forest (RF), support vector machine (SVM), and k-nearest neighbor method (KNN). Data sets that have already been expanded are further maximized and primed to provide intervals for data that are converted and captured into ML classification methods such as.

ＬＴＨＰＰＨ＝１日（２４時間）の場合、ＲＦは、９１％の精度、９０．９％の感度、および９１．９％の特異性を達成したが、ＳＶＭおよびＫＮＮのパフォーマンスは不良であった。ＬＴＨＰＰＨ＝１日（２４時間）の場合、実施されたＳＶＭは、８６％の精度、７１．４％の感度、および７７．４％の特異性で悪化した。ＬＴＨＰＰＨ＝１日（２４時間）の場合、実施されたＫＮＮは、８６％の精度、７３．２％の感度、および８１．７％の特異性で悪化した。未加工のＣＧＭデータは、ＮｏｖｏＮｏｒｄｉｓｋ治験ＮＮ１２１８－３８５３から提供された。 At LTHP PH = 1 day (24 hours), RF achieved 91% accuracy, 90.9% sensitivity, and 91.9% specificity, but poor SVM and KNN performance. .. At LTHP PH = 1 day (24 hours), the SVMs performed were exacerbated with 86% accuracy, 71.4% sensitivity, and 77.4% specificity. At LTHP PH = 1 day (24 hours), KNN performed was exacerbated with 86% accuracy, 73.2% sensitivity, and 81.7% specificity. Raw CGM data was provided by Novo Nordisk clinical trial NN1218-3853.

これらのＬＴＨＰの結果に基づいて、ＳＴＨＰＭＬ分類子のソリューションに対して、ＳＴＨＰのＲＦの実装のみを、この例では実装した（図中では「Ｌｏｍｂａｒｄｉ」と名付けられている）。ｐＨＨ＝ＰＨ＝３０分の場合、ＳＴＨＰのＲＦ実装は、９８％の精度、９３．５９％の感度、および９９．７５％の特異性を達成した。 Based on these LTHP results, only the implementation of STHP's RF was implemented in this example for the STHP ML classifier solution (named "Lombardi" in the figure). At pHH = PH = 30 minutes, the RF implementation of STHP achieved 98% accuracy, 93.59% sensitivity, and 99.75% specificity.

ＰＨ１５、ＰＨ３０、ＰＨ６０についてのＳＴＨＰＲＦ結果を図１１に示す。図１２では、ＰＨ１５、ＰＨ３０、ＰＨ４５、ＰＨ６０、ＰＨ７５についてのＳＴＨＰＲＦ分類子の結果が示されており、以下の発行済み文献の結果と比較されている。 The STHP RF results for PH15, PH30, and PH60 are shown in FIG. FIG. 12 shows the results of the STHP RF classifier for PH15, PH30, PH45, PH60, PH75 and is compared with the results of the following published literature.

Ｄａｓｋａｌａｋｉらの “Ｒｅａｌ－ＴｉｍｅＡｄａｐｔｉｖｅＭｏｄｅｌｓｆｏｒｔｈｅＰｅｒｓｏｎａｌｉｚｅｄＰｒｅｄｉｃｔｉｏｎｏｆＧｌｙｃｅｍｉｃＰｒｏｆｉｌｅｉｎＴｙｐｅ１ＤｉａｂｅｔｅｓＰａｔｉｅｎｔｓ．”ＤｉａｂｅｔｅｓＴｅｃｈｎｏｌｏｇｙ＆ＴｈｅｒａｐｅｕｔｉｃｓＶｏｌ．１４（２）２０１２。
根拠：学術文献から、Ｄａｓｋａｌａｋｉらの論文を、３０分および４５分で短期低血糖症予測子（ＳＴＨＰ）分類子予測限界（ＰＨ）の比較として使用した。 Daskalaki et al., “Real-Time Adaptive Models for the Personalized Prediction of Glycemic Profile in Type 1 Diabetes Patients.” 14 (2) 2012.
Rationale: From academic literature, the paper by Daskalaki et al. Was used as a comparison of short-term hypoglycemia predictor (STHP) classifier predictor limits (PH) at 30 and 45 minutes.

Ｐａｐｐａｄａｅｔａｌらの “ＮｅｕｒａｌＮｅｔｗｏｒｋ－ＢａｓｅｄＲｅａｌ－ＴｉｍｅＰｒｅｄｉｃｔｉｏｎｏｆＧｌｕｃｏｓｅｉｎＰａｔｉｅｎｔｓｗｉｔｈＩｎｓｕｌｉｎ－ＤｅｐｅｎｄｅｎｔＤｉａｂｅｔｅｓ．”ＤｉａｂｅｔｅｓＴｅｃｈｎｏｌｏｇｙ＆ＴｈｅｒａｐｅｕｔｉｃｓＶｏｌ．１３（２）２０１１。
根拠：学術文献から、Ｄａｓｋａｌａｋｉらの論文を、７５分で短期低血糖症予測子（ＳＴＨＰ）分類子予測限界（ＰＨ）の比較として使用した。 "Neural Network-Based Real-Time Prevention of Glucose in Patients with Insulin-Depend Diabetes." Diabetes Techenol, Pappada et al et al. 13 (2) 2011.
Rationale: From academic literature, the paper by Daskalaki et al. Was used as a comparison of short-term hypoglycemia predictor (STHP) classifier predictor limits (PH) in 75 minutes.

図１３および図１４では、それぞれＰＨ４５、ＰＨ７５についてのＳＴＨＰＲＦ分類子の結果を、文献結果と比較している。示されるように、１５分、３０分、４５分、６０分、および７５分のすべての予測限界において精度、感度、および特異性が達成され、これらは、業界および学術的情報源からの文献比較よりも競争力があるか、または文献比較よりもさらに優れている。 In FIGS. 13 and 14, the results of the STHP RF classifier for PH45 and PH75, respectively, are compared with the literature results. As shown, accuracy, sensitivity, and specificity were achieved at all predictive limits of 15 minutes, 30 minutes, 45 minutes, 60 minutes, and 75 minutes, which are literature comparisons from industry and academic sources. More competitive or even better than literature comparison.

次に、ｐＨＨ＝ＰＨ＝６０分またはＳＴＨＰＲＦ分類子６０分の実施例（ＷＥ）について説明する。実施例は、特定の試験および検証の目的で、以下の５つのファイルをロードすることによって、上記の競合結果を達成した試験コードを網羅する。
１．ＳＴＨＰＲＦ分類子モデルのファイル自体：「＿ＰＨ６０．ｐｋｌ．ｂｚ２」接尾辞
２．独立変数Ｘの試験サブセットの最終データ：「＿Ｘｔｅｓｔ．ｎｐｙ」接尾辞
３．従属変数ｙの試験サブセットの最終データ：「＿ｙｔｅｓｔ．ｎｐｙ」接尾辞 Next, an example (WE) of pHH = PH = 60 minutes or STHP RF classifier 60 minutes will be described. The examples cover test code that achieves the above competitive results by loading the following five files for specific testing and validation purposes.
1. 1. STHP RF classifier model file itself: "_PH60.pkl.bz2" suffix 2. Final data of test subset of independent variable X: "_Xtest.npy" suffix 3. Final data of test subset of dependent variable y: "_ytest.npy" suffix

上の３つのファイル入力だけで、以下の検証試験メトリクスをコンピュータ処理することができる。未加工の精度、混同行列グラフィック自体と同様に感度および特異性などの混同行列の計算、ならびに分類レポート。図１５を参照されたい。
４．全独立変数の最終データ：「＿Ｘ．ｎｐｙ」接尾辞
５．全従属変数の最終データ：「＿ｙ．ｎｐｙ」接尾辞 The following verification test metrics can be computer-processed with just the above three file inputs. Confusion matrix calculations such as raw accuracy, confusion matrix graphics themselves as well as sensitivity and specificity, as well as classification reports. See FIG.
4. Final data of all independent variables: "_X.npy" suffix 5. Final data of all dependent variables: "_y.npy" suffix

これら２つは、交差検証された精度の計算にのみ必要である。図１６を参照されたい。 These two are only needed for cross-validated accuracy calculations. See FIG.

これらをすべて組み合わせて、最終データ入力＃１～３：ＷＥの検証試験メトリクス結果：ＰＨ＝６０分についての概要レポートを提供することができる。 All of these can be combined to provide a summary report on final data entry # 1-3: WE validation test metric results: PH = 60 minutes.

混同行列テーブル、図１７を参照されたい。
混同行列テーブルの計算：ＴＮ、ＦＮ、ＦＰ、ＴＰ、図１８を参照されたい。
混同行列テーブルの計算：感度、図１９を参照されたい。
混同行列テーブルの計算：特異性、図２０を参照されたい。
混同行列テーブルの計算：感度、特異性の文字列レポート出力、図２１を参照されたい。
分類レポート：精度、リコール、Ｆ１スコア、およびサポート、図２２を参照されたい。
最終データ入力＃４および５の場合：ＷＥの検証試験メトリクス結果：ＰＨ＝６０分：概要レポート：精度、交差検証された精度、感度、特異性、低血糖行列（ＴＮ、ＦＮ、ＴＰ、ＦＰ）、図２３を参照されたい。
混同行列関数、図２４を参照されたい。
混同行列関数：出力（１／３）、図２５を参照されたい。
混同行列関数：出力（２／３）：正規化なし、図２６を参照されたい。
混同行列関数：出力（３／３）：正規化あり、図２７を参照されたい。 See the confusion matrix table, FIG.
Calculation of confusion matrix table: TN, FN, FP, TP, see FIG.
Calculation of Confusion Matrix Table: Sensitivity, see FIG.
Calculation of Confusion Matrix Table: Specificity, see Figure 20.
Confusion matrix table calculation: Sensitivity, singularity string report output, see Figure 21.
Classification Report: Accuracy, Recall, F1 Score, and Support, see Figure 22.
For final data entry # 4 and 5: WE validation test metric results: PH = 60 minutes: Summary report: accuracy, cross-validated accuracy, sensitivity, specificity, hypoglycemic matrix (TN, FN, TP, FP) , See FIG. 23.
See the confusion matrix function, FIG. 24.
Confusion matrix function: Output (1/3), see Figure 25.
Confusion matrix function: Output (2/3): No normalization, see Figure 26.
Confusion matrix function: Output (3/3): With normalization, see Figure 27.

引用された参考文献および代替的な実施形態
本明細書に引用された全ての参考文献は、各個々の出版物、または特許、または特許出願が、全て目的のためにその全体が参照により組み込まれるように具体的かつ個々に示されるのと同じ程度の範囲で、それらの全体が参照により、全ての目的のために本明細書に組み込まれる。 Cited References and Alternative Embodiments All references cited herein are individual publications, or patents, or patent applications, all incorporated by reference in their entirety for the purpose. To the extent as specifically and individually indicated, they are incorporated herein by reference in their entirety for all purposes.

全ての見出しおよび小見出しは、本明細書では便宜上使用されているだけであり、決して本発明を限定するものとして解釈されるべきではない。 All headings and subheadings are used herein for convenience only and should by no means be construed as limiting the invention.

本明細書で提示する任意のおよびいっさいの例または例示的な語句（例えば「など（ｓｕｃｈａｓ）」）の使用は、単に本発明をより明瞭にするという意図しかなく、特に明記しない限り、本発明の範囲を制限するものではない。本明細書中のいずれの語句も、特許の範囲にない任意の要素が本発明の実施に必須であることを示すと解釈すべきではない。 The use of any and all examples or exemplary phrases presented herein (eg, "such as") is solely intended to make the invention more explicit and unless otherwise stated. It does not limit the scope of the invention. Nothing in the specification should be construed as indicating that any element outside the scope of the patent is essential to the practice of the invention.

本明細書の特許文書の引用および組み込みは、便宜上行われているだけであり、こうした特許文書の有効性、特許性および／または執行可能性のいっさいの観点を反映するものではない。 The citations and incorporation of patent documents herein are for convenience only and do not reflect any aspect of the validity, patentability and / or enforceability of these patent documents.

本発明は、非一時的コンピュータ可読ストレージ媒体に埋め込まれたコンピュータプログラム機構を備えるコンピュータプログラム製品として実装されてもよい。例えば、コンピュータプログラム製品には、図１および図２の任意の組み合わせで示され、かつ／または図４に描かれるプログラムモジュールが含まれ得る。これらのプログラムモジュールは、ＣＤ－ＲＯＭ、ＤＶＤ、磁気ディスクストレージ製品、ＵＳＢキー、または任意の他の非一時的コンピュータ可読データもしくはプログラムストレージ製品に保存することができる。 The present invention may be implemented as a computer program product comprising a computer program mechanism embedded in a non-temporary computer readable storage medium. For example, a computer program product may include a program module shown in any combination of FIGS. 1 and 2 and / or depicted in FIG. These program modules can be stored on CD-ROMs, DVDs, magnetic disk storage products, USB keys, or any other non-temporary computer readable data or program storage products.

本発明の多くの修正および変形を、当業者に明らかであるように、その趣旨および範囲を逸脱することなく行うことができる。本明細書に記載される特定の実施形態は、例証としてのみ提供される。本発明およびその実用的用途の原理を最もよく説明するために実施形態を選択して説明したが、それにより、特定の用途に適した様々な修正を用いて、当業者が本発明および様々な実施形態を最良に利用できるようになる。本発明は、添付の特許請求の範囲の条件と、そのような請求の範囲が適用されるあらゆる等価物によってのみ限定される。
Many modifications and variations of the present invention can be made without departing from the spirit and scope thereof, as will be apparent to those skilled in the art. The particular embodiments described herein are provided by way of illustration only. Embodiments have been selected and described to best explain the principles of the invention and its practical applications, whereby those skilled in the art will be able to use the invention and various modifications to suit a particular application. The embodiment will be best utilized. The present invention is limited only by the terms of the appended claims and any equivalent to which such claims apply.

Claims

A method for dataset optimization for improved hypoglycemia prediction based on classifier uptake, said method.
-A step of providing a raw dataset for a subject, wherein the dataset has multiple BG values acquired at a given sampling rate, and the time associated with those values over multiple days N. The process of providing, including stamps, and
-Including the step of performing data conversion by temporal binning of the rolling scheme with the evaluation block value (eHH) as the input X and creating the corresponding predicted value (pHH) as the output Y.
-X is created as a sliding window containing the BG value for a given past period T-p.
-Y is created as an indicator I indicating whether the BG value at a given future time Tf is below a given threshold indicating a hypoglycemic state.

The data conversion process
-The data set optimization according to claim 1, which is performed after the step of performing data expansion by temporal binning of the daily BG value rolling scheme to the evaluation block for M days (M ≧ 2, M <N). Method for optimization.

The method for dataset optimization according to claim 2, wherein the raw dataset obtained is based on an M-day insulin titration regimen.

The process of providing a raw data set
-Any one of claims 1-3, performed prior to the step of performing data preparation with resampling corresponding to the nominal sampling rate and creation of an interpolated BG value to replace the missing BG value. Methods for dataset optimization as described in section.

The method for data set optimization according to any one of claims 1 to 4, wherein the data conversion is performed over at least two different past time periods Tp.

The method for data set optimization according to claim 5, wherein Tf corresponds to Tp.

A way to train classifiers,
-A step of providing a data set optimized as defined in any one of claims 1 to 6.
-The process of importing the optimized data set into the classifier,
-A method comprising training the classifier based on the captured data set.

The method for training the classifier according to claim 7, wherein the classifier is a random forest classifier.

A method for predicting future BG values,
-The process of acquiring a series of evaluations of BG values from the target person,
-A step of incorporating a series of evaluations of the BG value into a classifier trained as defined in claim 7 or 8.
-A method comprising a step of providing a predicted BG value.

The method for predicting future BG values according to claim 9, wherein a series of assessments of the BG values are obtained by continuous blood glucose monitoring (CGM).

A computer processing system for performing temporal optimization of a data set from a subject, wherein the computer system comprises one or more processors, a memory, and the memory.
-A computer processing system comprising an instruction that, when executed by the one or more processors, implements the method defined in any one of claims 1-9.