JP7404581B1

JP7404581B1 - Chronic nephropathy subtype mining system based on self-supervised graph clustering

Info

Publication number: JP7404581B1
Application number: JP2023092731A
Authority: JP
Inventors: ▲勁▼松李; ▲勝▼▲強▼ 池; ▲銘▼▲鴻▼ 徐; 雪瑶李; 雨田; 天舒周
Original assignee: 之江実験室
Priority date: 2022-08-16
Filing date: 2023-06-05
Publication date: 2023-12-25
Anticipated expiration: 2043-06-05
Also published as: CN115083616A; JP2024027086A; CN115083616B

Abstract

【課題】本発明は自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムを提供する。【解決手段】慢性腎症診療記録における構造化データを収集するためのデータ収集モジュールと、前記構造化データに対して抽出及び前処理を行って、エンティティセット及び受診セットを取得するためのデータ抽出及び前処理モジュールと、前記エンティティセット及び前記受診セットによって慢性腎症亜型マイニングモデルを構築するための慢性腎症亜型マイニングモジュールと、前記慢性腎症亜型マイニングモデルを評価するための慢性腎症表現型亜型評価モジュールと、患者の構造化データを予測するための慢性腎症亜型予測モジュールと、を備える。本発明は、過程マイニング方法が縦方向の電子カルテデータにおける単回受診内イベント情報及び複数回受診間イベント情報などの多粒度情報が共存することを処理できない問題を解決した。【選択図】図１The present invention provides a chronic nephropathy subtype mining system based on self-supervised graph clustering. [Solution] A data collection module for collecting structured data in chronic kidney disease medical records, and a data extraction module for extracting and preprocessing the structured data to obtain an entity set and a consultation set. and a preprocessing module, a chronic nephropathy subtype mining module for constructing a chronic nephropathy subtype mining model using the entity set and the consultation set, and a chronic nephropathy subtype mining module for evaluating the chronic nephropathy subtype mining model. The present invention includes a chronic nephropathy subtype evaluation module and a chronic nephropathy subtype prediction module for predicting structured data of patients. The present invention solves the problem that the process mining method cannot handle the coexistence of multi-granularity information such as event information within a single visit and event information between multiple visits in longitudinal electronic medical record data. [Selection diagram] Figure 1

Description

本発明は医療健康情報の技術分野に関し、特に自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムに関する。 The present invention relates to the technical field of medical health information, and more particularly to a chronic nephropathy subtype mining system based on self-supervised graph clustering.

慢性腎症は重要な公衆衛生問題であり、我が中国の１０％の人口に影響している。臨床指針に従って、慢性腎症は患者の推算糸球体濾過率（ｅＧＦＲ）及び尿中アルブミン－クレアチニン比（ＵＡＣＲ）によって等級分けされる。ｅＧＦＲ及びＵＡＣＲは慢性腎症のスクリーニング検査及び監視測定に使用され得るが、ｅＧＦＲ及びＵＡＣＲのみによって慢性腎症患者の個体間の疾病表現型の差異を表現できない。慢性腎症は一種の高度な異質性疾病であり、糖尿病、高血圧、自己免疫疾患、遺伝傾向又は先天性異常などの全身性疾病及び状態に密接に関連している。慢性腎症の個体の間には明らかな差異があり、これらの差異は実験室による検査、病歴、服薬履歴及び社会的要素などの疾病表現型によって説明され得る。慢性腎症患者の初期表現型の差異に起因して、個体の診療過程及び併発症も千差万別である。合理的な慢性腎症の表現型の分類は異なる亜群患者を区別して、異なる亜群の疾病特徴及び潜在的な疾病病理を示すべきであり、それにより疾病の悪化過程及び進行の異なるメカニズムをより良く理解することに寄与する。 Chronic kidney disease is an important public health problem, affecting 10% of China's population. According to clinical guidelines, chronic nephropathy is graded by the patient's estimated glomerular filtration rate (eGFR) and urinary albumin-creatinine ratio (UACR). Although eGFR and UACR can be used for screening tests and monitoring measurements of chronic nephropathy, eGFR and UACR alone cannot express differences in disease phenotype between individuals of chronic nephropathy patients. Chronic kidney disease is a highly heterogeneous disease and is closely related to systemic diseases and conditions such as diabetes, hypertension, autoimmune diseases, genetic predispositions or congenital abnormalities. There are clear differences between individuals with chronic kidney disease, and these differences can be explained by disease phenotype, such as laboratory tests, medical history, medication history, and social factors. Due to the differences in the initial phenotype of patients with chronic kidney disease, the treatment process and complications vary widely among individuals. A rational phenotypic classification of chronic kidney disease should distinguish between different subgroups of patients and indicate the disease characteristics and underlying disease pathology of different subgroups, thereby identifying the different mechanisms of disease deterioration and progression. Contribute to better understanding.

従来の慢性腎症亜型分類方法は主に患者の初期静的表現型データに基づくクラスタリング分析である。このような方法は主に研究し始める際に収集した患者の人口統計学的、バイオマーカー及び臨床特徴などの多次元データを利用して、階層クラスタリング、コンセンサスクラスタリングなどの常用のクラスタリングアルゴリズムによって慢性腎症患者の表現型分類をマイニングする。ところが、慢性腎症患者は疾病過程が長く、併発症が多いため、患者の個体間の診療過程に大きな差異がある。診療過程データは慢性腎症患者の異なる表現型を区別する重要な情報を暗に含む可能性がある。電子カルテシステムに収集及び記憶される患者診療過程データから特定の患者に対して行われた手術、検査、試験及び薬物治療などのイベント情報、並びにこれらのイベントの発生時間を抽出することができる。患者の診療過程データをクラスタリングして患者の疾病表現型モードを研究することは、異なる亜群患者の特徴を識別及び研究することに対して重要な意義を有する。疾病診療過程データのマイニングについては、よく用いられる方法は以下のとおりである。第（１）としては、過程マイニング方法であり、患者診療過程に生成したイベントログから情報を抽出し、時間順序で配列して診療イベントシーケンスを形成する。次に、診療イベントシーケンスにおける異なるモードを疾病の異なる診療過程としてマイニングすることにより患者の疾病表現型を分類する。該方法は、イベント間の共起情報を利用しにくく、縦方向の電子カルテにおける複数回受診データにおけるイベントの関連関係及び前後順位関係を処理できない。マイニングされた診療過程が複雑で、代表性及びカバー率が低い。第（２）としては、テンソル分解方法であり、患者、時間及び表現型の３つの次元の情報を３次テンソルに組み合わせ、３次テンソルを分解することにより患者の潜在的な表現型分類をマイニングする。該方法は、連続受診間の疾病表現型の変化のみを考慮し、長期の診療過程における表現型変化情報を処理できない。 Traditional chronic nephropathy subtype classification methods are mainly clustering analysis based on patients' initial static phenotypic data. These methods mainly utilize multidimensional data such as patient demographics, biomarkers, and clinical characteristics collected at the beginning of the study, and use commonly used clustering algorithms such as hierarchical clustering and consensus clustering to identify chronic kidney disease. Mining the phenotypic classification of disease patients. However, patients with chronic kidney disease have a long disease process and many complications, so there are large differences in the treatment process between individual patients. Clinical course data may implicitly contain important information that distinguishes between different phenotypes in patients with chronic kidney disease. Event information such as surgeries, examinations, tests, and drug treatments performed on a specific patient, as well as the times at which these events occurred, can be extracted from patient treatment process data collected and stored in the electronic medical record system. Clustering patients' clinical course data to study the disease phenotype mode of patients has important implications for identifying and studying the characteristics of different subgroups of patients. Regarding mining of disease treatment process data, commonly used methods are as follows. The first method is a process mining method, in which information is extracted from event logs generated during the patient treatment process and arranged in chronological order to form a treatment event sequence. Next, the disease phenotype of the patient is classified by mining different modes in the medical event sequence as different treatment processes of the disease. This method has difficulty in utilizing co-occurrence information between events, and cannot process the relational relationship and sequential ranking of events in data of multiple visits in a longitudinal electronic medical record. The mined medical treatment process is complex and has low representativeness and coverage. The second method is the tensor decomposition method, which combines three-dimensional information of patient, time, and phenotype into a third-order tensor, and mines the patient's latent phenotype classification by decomposing the third-order tensor. do. The method only considers changes in disease phenotype between consecutive visits and cannot process information on changes in phenotype over the course of long-term care.

このため、上記技術的問題を解決するように、自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムを提供する。 Therefore, a chronic nephropathy subtype mining system based on self-supervised graph clustering is provided to solve the above technical problems.

上記技術的問題を解決するために、本発明は自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムを提供する。 In order to solve the above technical problems, the present invention provides a chronic nephropathy subtype mining system based on self-supervised graph clustering.

本発明が用いる技術案は以下のとおりである。 The technical scheme used by the present invention is as follows.

自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムであって、
慢性腎症診療記録における構造化データを収集することに用いられるデータ収集モジュールと、
前記構造化データに対して抽出及び前処理を行って、エンティティセット及び受診セットを取得することに用いられるデータ抽出及び前処理モジュールと、
前記エンティティセット及び前記受診セットによって慢性腎症亜型マイニングモデルを構築することに用いられる慢性腎症亜型マイニングモジュールと、
前記慢性腎症亜型マイニングモデルを評価することに用いられる慢性腎症表現型亜型評価モジュールと、
患者の構造化データを予測することに用いられる慢性腎症亜型予測モジュールと、を備える。 A chronic nephropathy subtype mining system based on self-supervised graph clustering, comprising:
a data collection module used to collect structured data in chronic nephropathy medical records;
a data extraction and preprocessing module used to extract and preprocess the structured data to obtain an entity set and a consultation set;
a chronic nephropathy subtype mining module used to construct a chronic nephropathy subtype mining model using the entity set and the consultation set;
a chronic nephropathy phenotype subtype evaluation module used to evaluate the chronic nephropathy subtype mining model;
and a chronic nephropathy subtype prediction module used to predict structured data of patients.

更に、前記構造化データは患者の基本情報、受診記録、観察窓期間の診断、実験室による検査、医学的検査、手術及び／又は服薬データを含む。 Furthermore, the structured data includes patient basic information, medical visit records, diagnosis during observation window, laboratory tests, medical tests, surgeries and/or medication data.

更に、前記データ抽出及び前処理モジュールは具体的に、前記データセットを前処理し、患者の基本情報、受診記録、観察窓期間の診断、実験室による検査、医学的検査、手術データ、服薬データを含む、電子カルテシステムにおける前記慢性腎症診療記録における構造化データを抽出し、抽出された前記構造化データを前処理し、実験室による検査データについては、正常な参照範囲に準じて、異常の検査項のみに関心を持ち、異常の検査項結果を低過ぎ及び高過ぎの２種類に分け、異常の検査項の名称、異常のカテゴリを保持し、医学的検査及び手術データを簡単な自然言語処理技術にて処理し、検査部位及びカテゴリ、手術の名称を保持し、服薬データについては、抗高血糖薬、降圧薬、脂質調節薬、非ステロイド性抗炎症薬、抗血小板凝集薬、ステロイド等の６種類の薬物の使用のみに関心を持ち、服薬データにおける６種類の薬物を分類し、薬物のカテゴリを保持し、診断セット、服薬セット、手術セット、試験セット、診断種類の数、服薬種類の数、手術種類の数、試験種類の数及び受診記録の数を取得し、前記診断セット、服薬セット、手術セット及び試験セットをマージしてエンティティセットに構成し、患者の受診記録を受診セットとして構成するためのものである。 Further, the data extraction and pre-processing module specifically pre-processes the data set to extract patient basic information, consultation records, diagnosis during observation window, laboratory tests, medical tests, surgical data, medication data. The structured data in the chronic kidney disease medical record in the electronic medical record system is extracted, and the extracted structured data is preprocessed, and the laboratory test data is determined to be abnormal according to the normal reference range. It is concerned only with the test items of , divides abnormal test results into two types, too low and too high, maintains the names of abnormal test items and abnormal categories, and allows medical test and surgical data to be easily and naturally Processed using language processing technology, the examination site, category, and name of surgery are retained, and medication data is stored such as antihyperglycemic drugs, antihypertensive drugs, lipid regulators, nonsteroidal anti-inflammatory drugs, antiplatelet aggregating drugs, and steroids. We are only interested in the use of 6 types of drugs such as , classify the 6 types of drugs in medication data, maintain drug categories, diagnose set, medication set, surgery set, test set, number of diagnosis types, medication Obtain the number of types, the number of surgery types, the number of test types, and the number of consultation records, merge the diagnosis set, medication set, surgery set, and test set to form an entity set, and collect the patient consultation records. It is for configuring as a set.

更に、前記慢性腎症亜型マイニングモジュールは具体的に、
前記受診セット及び前記エンティティセットによって受診ネットワークを構築することに用いられる受診ネットワーク構築ユニットと、
前記エンティティセットによってエンティティ共起行列を構築し、前記エンティティ共起行列によってエンティティノードの初期埋め込み表現及び受診ノードの初期埋め込み表現を取得し、前記エンティティノードの初期埋め込み表現と前記受診ノードの初期埋め込み表現とでノードの初期埋め込み表現を構成することに用いられる埋め込み表現構築ユニットと、
前記受診ネットワークにおけるノード間の関係によって隣接行列を構築し、前記隣接行列及び前記ノードの初期埋め込み表現によって自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルを訓練することに用いられるクラスタリングネットワーク構築ユニットと、
前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルによって慢性腎症亜型マイニングモデルを構築することに用いられる慢性腎症亜型マイニングモデル構築ユニットと、を備える。 Furthermore, the chronic nephropathy subtype mining module specifically includes:
a consultation network construction unit used for constructing a consultation network using the consultation set and the entity set;
constructing an entity co-occurrence matrix using the entity set; obtaining an initial embedding representation of an entity node and an initial embedding representation of a visited node using the entity co-occurrence matrix; an embedding representation construction unit used to construct an initial embedding representation of the node;
a clustering network construction unit used for constructing an adjacency matrix according to the relationships between nodes in the visited network, and training a clustering network model of visited nodes based on self-supervised graph clustering by the adjacency matrix and the initial embedding representation of the nodes; ,
a chronic nephropathy subtype mining model construction unit used for constructing a chronic nephropathy subtype mining model by the clustering network model of the visited nodes based on the self-supervised graph clustering.

更に、前記受診ネットワーク構築ユニットは具体的に、
前記受診セットと前記エンティティセットとでノードセットを構成することに用いられることと、
前記ノードセットにおけるノード共起関係によってエッジセットを構築することに用いられることと、
前記ノードセット及び前記エッジセットによって受診ネットワークを構築することに用いられることと、を含む。 Furthermore, the consultation network construction unit specifically:
The consultation set and the entity set are used to configure a node set;
used to construct an edge set based on node co-occurrence relationships in the node set;
The method includes using the node set and the edge set to construct a consultation network.

更に、前記埋め込み表現構築ユニットは具体的に、
前記エンティティセットによってエンティティ共起行列を構築することに用いられることと、
前記エンティティ共起行列に基づいてＧｌｏＶｅアルゴリズムによって各エンティティノードの初期埋め込み表現を計算して取得することに用いられることと、
すべての隣接するエンティティノードのエンティティノードの初期埋め込み表現の平均値を計算することにより受診ノードの初期埋め込み表現を取得し、前記受診ノードの初期埋め込み表現と前記エンティティノードの初期埋め込み表現とでノードの初期埋め込み表現を構成することに用いられることと、を含む。 Furthermore, the embedded representation construction unit specifically includes:
being used to construct an entity co-occurrence matrix by the entity set;
used to calculate and obtain an initial embedding representation of each entity node by a GloVe algorithm based on the entity co-occurrence matrix;
Obtain the initial embedding representation of the visited node by calculating the average value of the initial embedding representation of the entity node of all neighboring entity nodes, and combine the initial embedding representation of the visited node and the initial embedding representation of the entity node to used in constructing an initial embedded representation.

更に、前記クラスタリングネットワーク構築ユニットは具体的に、
前記受診ネットワークにおけるノード間の関係によって隣接行列を構築し、前記隣接行列及び前記ノードの初期埋め込み表現を前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルに入力してグラフ注意訓練を行って、受診ノードの埋め込み表現及びエンティティノードの埋め込み表現を含むノードの埋め込み表現を取得することに用いられることと、
前記ノードの埋め込み表現によって前記受診ネットワークを再構築して、受診ネットワークの再構築誤差を計算することに用いられることと、
前記エンティティノードの埋め込み表現をニューラルネットワークのデコーダに入力して訓練し、デコーダの最終層の出力をエンティティノードの再構築埋め込み表現としてエンティティノードの再構築誤差を計算することに用いられることと、
前記受診ノードの埋め込み表現に対してｓｏｆｔｍａｘ回帰動作を行って、受診ノードの確率分布を取得し、前記受診ノードの確率分布に基づいてクラスタリング損失を計算することに用いられることと、
前記受診ネットワークの再構築誤差、前記エンティティノードの再構築誤差及び前記クラスタリング損失に基づいて前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルの全体損失関数を構築することに用いられることと、を含む。 Furthermore, the clustering network construction unit specifically includes:
constructing an adjacency matrix according to the relationships between nodes in the visited network, and inputting the adjacency matrix and the initial embedding representation of the nodes to the clustering network model of visited nodes based on the self-supervised graph clustering to perform graph attention training; used to obtain an embedded representation of a node including an embedded representation of a consultation node and an embedded representation of an entity node;
used to reconstruct the consultation network using the embedded representation of the nodes and calculate a reconstruction error of the consultation network;
inputting the embedded representation of the entity node to a decoder of a neural network for training, and using the output of the final layer of the decoder as the reconstructed embedded representation of the entity node to calculate a reconstruction error of the entity node;
performing a softmax regression operation on the embedded representation of the visited node to obtain a probability distribution of the visited node, and using the method to calculate a clustering loss based on the probability distribution of the visited node;
used to construct an overall loss function of a clustering network model of the visited nodes based on the self-supervised graph clustering based on the reconstruction error of the visited network, the reconstruction error of the entity nodes, and the clustering loss; include.

更に、前記慢性腎症亜型マイニングモデル構築ユニットは具体的に、
前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルが取得した受診ノードのクラスタリング分布を受診ノードのカテゴリ分布とし、前記カテゴリ分布における確率が一番高いカテゴリを受診ノードのカテゴリタグとして選択し、各患者のすべての受診ノードを時間順序で配列することに用いられることと、
同じカテゴリタグを有する連続受診ノードのカテゴリ分布間のコサイン類似度を計算することにより受診ノードをマージ又は別個に保持することを決定し、前記受診ノードを配列することによりイベント行列を構築することに用いられることと、
頻出イベント決定ノードを検索して、順に受診ノードを接続してイベントプロセスを構成し、前記イベント行列の第１列から各列におけるイベント発生頻度が閾値よりも大きなイベントを頻出イベントとして選択し、頻出イベントをイベントプロセスにおけるノードとし、残りのイベントが直接に終了ノードに入り、頻出イベントにおける各イベントを次回検索する開始ノードとして、対応するイベントベクトルを抽出して新たなイベント行列に組み合わせ、第１列を除去した後に同様の頻出イベントの検索操作を行い、各回検索して取得したノードを開始ノードに接続することによりイベントプロセスを延長し、頻出イベントがヌルになり又はイベントプロセスの長さがイベントプロセスの最大長さになるまで、繰り返しを終了して慢性腎症亜型マイニングモデルを取得することに用いられることと、を含む。 Furthermore, the chronic nephropathy subtype mining model construction unit specifically includes:
The clustering distribution of the visited nodes obtained by the clustering network model of the visited nodes based on the self-supervised graph clustering is set as the category distribution of the visited nodes, the category with the highest probability in the category distribution is selected as the category tag of the visited nodes, and each used for chronologically arranging all visit nodes of a patient;
Decide to merge or keep the visited nodes separately by calculating the cosine similarity between the category distributions of consecutive visited nodes with the same category tag, and construct an event matrix by arranging the visited nodes. to be used and
Search for frequent event determination nodes, connect consultation nodes in order to configure an event process, and select events whose event frequency in each column from the first column of the event matrix is greater than a threshold value as frequent events. Let the event be a node in the event process, the remaining events will directly enter the end node, and each event in the frequent events will be the starting node for the next search, and the corresponding event vectors will be extracted and combined into a new event matrix, and the first column After removing , perform a similar search operation for frequent events, and extend the event process by connecting the node obtained by searching each time to the start node, and the frequent event becomes null or the length of the event process becomes longer than the event process. and is used to obtain a chronic nephropathy subtype mining model by terminating the iterations until a maximum length of .

更に、前記慢性腎症亜型予測モジュールは具体的に、
患者の構造化データを前処理してから前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルに入力して予測して、該患者の受診ノードの確率分布を取得することに用いられることと、
前記受診ノードの確率分布によって受診ノードのクラスタリングカテゴリを判断し、受診イベントシーケンスを構築することに用いられることと、
前記受診イベントシーケンスを前記慢性腎症亜型マイニングモデルに入力して、前記慢性腎症亜型マイニングモデルにおけるノードを順次フィッティングして１つのイベントプロセスを取得し、イベントプロセスによってどの慢性腎症亜型に属するかを判断することに用いられることと、を含む。 Furthermore, the chronic nephropathy subtype prediction module specifically includes:
Pre-processing the patient's structured data and inputting it into the clustering network model of visit nodes based on the self-supervised graph clustering for prediction to obtain a probability distribution of visit nodes of the patient;
determining a clustering category of the visited nodes based on the probability distribution of the visited nodes, and using the clustering category to construct a visited event sequence;
Input the consultation event sequence into the chronic nephropathy subtype mining model, sequentially fit nodes in the chronic nephropathy subtype mining model to obtain one event process, and determine which chronic nephropathy subtype by the event process. This includes being used to determine whether a person belongs to a group or not.

本発明の有益な効果は以下のとおりである。本発明は自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムを提供する。まず、受診、診断、実験室による検査、医学的検査、手術及び服薬などの多次元の患者診療イベント情報を含む患者の複数回受診した縦方向の電子カルテデータを受診ネットワークに構築する。次に、診療イベントの共起情報によって診療イベントのベクトル表現を取得する。自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルによって受診イベントをクラスタリングして、各回の受診イベントにタグを付ける。ついでに、受診の面で、患者の診療過程をマイニングして慢性腎症表現型の異なる亜型を取得する。最後に、表現型亜型の評価方法を提供し、患者の人口統計学的、服薬、併発症及び生存率などの一連の総合的な指標を含むマイニングされた異なる亜型には臨床的に解釈可能な差異があるかどうかを評価する。 The beneficial effects of the present invention are as follows. The present invention provides a chronic nephropathy subtype mining system based on self-supervised graph clustering. First, longitudinal electronic medical record data from multiple patient visits, including multidimensional patient treatment event information such as consultations, diagnoses, laboratory tests, medical tests, surgeries, and medication, is constructed into a visit network. Next, a vector representation of the medical event is obtained based on the co-occurrence information of the medical event. Consultation events are clustered by a clustering network model of consultation nodes based on self-supervised graph clustering, and each consultation event is tagged. Additionally, in terms of consultations, we will mine the patient's treatment process to obtain different subtypes of chronic nephropathy phenotypes. Finally, we provide a method for assessing phenotypic subtypes, including a set of comprehensive indicators such as patient demographics, medications, comorbidities, and survival rates, to provide clinical interpretation for the different subtypes mined. Assess whether there are possible differences.

その中、先に各回受診における診断、実験室による検査、医学的検査、手術及び服薬などのイベント情報を自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルによって訓練して、各回受診のカテゴリタグを取得し、この過程にて低階層・細粒度の情報を高階層・粗粒度の総括的な情報に集め、更に受診のカテゴリタグを診療過程のマイニングに用いることで、過程マイニング方法が縦方向の電子カルテデータにおける単回受診内イベント情報及び複数回受診間イベント情報などの多粒度情報が共存することを処理できない問題を解決した。 Among them, event information such as diagnosis, laboratory tests, medical tests, surgeries, and medications in each visit is first trained by a clustering network model of visit nodes based on self-supervised graph clustering, and category tags for each visit are determined. In this process, the low-level, fine-grained information is collected into high-level, coarse-grained comprehensive information, and the category tags of the consultation are used to mine the medical care process. Solved the problem of not being able to handle the coexistence of multi-granularity information such as event information within a single visit and event information between multiple visits in electronic medical record data.

共起情報に基づいてイベントベクトル表現を取得してグラフモデルに用いることは、過程マイニング方法がイベント共起情報を利用しにくい問題を効果的に解決し、横断面及び縦方向の電子カルテデータを同時に利用して疾病に対して十分な特徴マイニングを行うことが実現される。 Obtaining the event vector representation based on the co-occurrence information and using it in the graph model can effectively solve the problem that process mining methods have difficulty using event co-occurrence information, and can easily handle cross-sectional and longitudinal electronic medical record data. By using this simultaneously, it is possible to perform sufficient feature mining for diseases.

提供される自己監督グラフクラスタリングアルゴリズムにおいては、患者の複数回受診情報を自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデル、訓練ノードの埋め込み表現に同時に組み入れ、長期の診療過程における表現型変化情報を処理できる。次に、それぞれ受診ネットワークにおける異なるノード及び関係に対して監督学習を行う。デコーダによって低階層ノードの埋め込み表現を再構築し、Ｌ２ノルムによってノードの再構築誤差を計算し、交差エントロピーによってグラフ関係の再構築誤差を計算し、ＫＬ発散によって受診ノードのクラスタリング誤差を計算する。 The provided self-supervised graph clustering algorithm simultaneously incorporates the patient's multiple consultation information into the clustering network model of the consultation nodes based on self-supervised graph clustering and the embedded representation of the training nodes, and incorporates the phenotypic change information over the long-term treatment process. Can be processed. Next, supervised learning is performed on different nodes and relationships in each visit network. The embedded representation of the low-level node is reconstructed by the decoder, the reconstruction error of the node is computed by the L2 norm, the reconstruction error of the graph relationship is computed by the cross entropy, and the clustering error of the visited node is computed by the KL divergence.

受診ノードのイベントタグ分布類似度に基づいて、類似する隣接イベントをマージし、過程マイニング方法を最適化し、マイニングされた診療プロセスを簡素化して、診療プロセスの代表性及びカバー率を向上させる。 Based on the event tag distribution similarity of the visited nodes, similar adjacent events are merged, the process mining method is optimized, the mined medical process is simplified, and the representativeness and coverage rate of the medical process are improved.

図１は本発明に係る自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムの構造模式図である。FIG. 1 is a schematic structural diagram of a chronic nephropathy subtype mining system based on self-supervised graph clustering according to the present invention. 図２は本発明に係る自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムの機能プロセスを示す模式図である。FIG. 2 is a schematic diagram showing the functional process of the chronic nephropathy subtype mining system based on self-supervised graph clustering according to the present invention. 図３は本発明の実施例に係る受診ネットワークを示す図である。FIG. 3 is a diagram showing a medical examination network according to an embodiment of the present invention. 図４は本発明の実施例に係る共起行列を示す図である。FIG. 4 is a diagram showing a co-occurrence matrix according to an embodiment of the present invention. 図５は本発明の実施例に係る自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルを示す構造図である。FIG. 5 is a structural diagram showing a clustering network model of visiting nodes based on self-supervised graph clustering according to an embodiment of the present invention.

以下の少なくとも１つの例示的な実施例についての説明は実際に説明のためのものであり、本発明及びその応用又は使用を制限するものではない。本発明の実施例に基づいて、当業者が創造的な労力を要することなく取得したすべての他の実施例は、いずれも本発明の保護範囲に属する。 The following description of at least one exemplary embodiment is illustrative in nature and is not intended to limit the invention or its application or use. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without any creative efforts fall within the protection scope of the present invention.

図１を参照し、自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムであって、
慢性腎症診療記録における構造化データを収集することに用いられるデータ収集モジュールと、
前記構造化データに対して抽出及び前処理を行って、エンティティセット及び受診セットを取得することに用いられるデータ抽出及び前処理モジュールと、
前記エンティティセット及び前記受診セットによって慢性腎症亜型マイニングモデルを構築することに用いられる慢性腎症亜型マイニングモジュールと、
前記慢性腎症亜型マイニングモデルを評価することに用いられる慢性腎症表現型亜型評価モジュールと、
患者の構造化データを予測することに用いられる慢性腎症亜型予測モジュールと、を備える。 Referring to FIG. 1, a chronic nephropathy subtype mining system based on self-supervised graph clustering, comprising:
a data collection module used to collect structured data in chronic nephropathy medical records;
a data extraction and preprocessing module used to extract and preprocess the structured data to obtain an entity set and a consultation set;
a chronic nephropathy subtype mining module used to construct a chronic nephropathy subtype mining model using the entity set and the consultation set;
a chronic nephropathy phenotype subtype evaluation module used to evaluate the chronic nephropathy subtype mining model;
and a chronic nephropathy subtype prediction module used to predict structured data of patients.

図２を参照し、自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムの機能プロセスであって、下記のステップＳ１～ステップＳ５を含む。 Referring to FIG. 2, the functional process of the chronic nephropathy subtype mining system based on self-supervised graph clustering includes the following steps S1 to S5.

ステップＳ１において、データ収集モジュールによって慢性腎症診療記録における構造化データを収集してデータセットを構築し、前記構造化データは患者の基本情報、受診記録、観察窓期間の診断、実験室による検査、医学的検査、手術及び／又は服薬データを含み、
ステップＳ２において、データ抽出及び前処理モジュールによって前記データセットを前処理して受診セット及びエンティティセットを取得し、これは具体的に、前記データセットを前処理し、患者の基本情報、受診記録、観察窓期間の診断、実験室による検査、医学的検査、手術データ、服薬データを含む、電子カルテシステムにおける前記慢性腎症診療記録における構造化データを抽出し、抽出された前記構造化データを前処理し、実験室による検査データについては、正常な参照範囲に準じて、異常の検査項のみに関心を持ち、異常の検査項結果を低過ぎ及び高過ぎの２種類に分け、異常の検査項の名称、異常のカテゴリを保持することと、医学的検査及び手術データを簡単な自然言語処理技術にて処理し、検査部位及びカテゴリ、手術の名称を保持することと、服薬データについては、抗高血糖薬、降圧薬、脂質調節薬、非ステロイド性抗炎症薬、抗血小板凝集薬、ステロイド等の６種類の薬物の使用のみに関心を持ち、服薬データにおける６種類の薬物を分類し、薬物のカテゴリを保持することと、診断セット、服薬セット、手術セット、試験セット、診断種類の数、服薬種類の数、手術種類の数、試験種類の数及び受診記録の数を取得し、前記診断セット、服薬セット、手術セット及び試験セットをマージしてエンティティセットに構成し、患者の受診記録を受診セットとして構成することと、を含む。 In step S1, a data collection module collects structured data in chronic nephropathy medical records to construct a dataset, and the structured data includes patient basic information, consultation records, diagnosis during observation window period, and laboratory tests. , including medical test, surgical and/or medication data;
In step S2, the data extraction and preprocessing module preprocesses the data set to obtain a visit set and an entity set, which specifically preprocesses the data set to obtain patient basic information, visit record, Extract structured data in the chronic kidney disease medical record in the electronic medical record system, including diagnosis, laboratory tests, medical tests, surgical data, and medication data during the observation window period, and pre-process the extracted structured data. Regarding the laboratory test data, we are concerned only with the abnormal test items according to the normal reference range, and we divide the abnormal test results into two types: too low and too high. medical examination and surgical data are processed using simple natural language processing technology, and examination site and category, surgical name are retained, and medication data is We are only interested in the use of six types of drugs, such as hyperglycemic drugs, antihypertensive drugs, lipid-regulating drugs, nonsteroidal anti-inflammatory drugs, antiplatelet aggregants, and steroids, and we classify the six types of drugs in the medication data. The categories of the diagnosis set, medication set, surgery set, test set, number of diagnosis types, number of medication types, number of surgery types, number of test types, and number of consultation records are obtained. merging a set, a medication set, a surgery set, and a test set into an entity set, and configuring a patient's visit record as a visit set.

ステップＳ３において、前記受診セット及びエンティティセットを慢性腎症亜型マイニングモジュールに入力し、前記慢性腎症亜型マイニングモジュールによって慢性腎症亜型マイニングモデルを構築し、
ステップＳ３１において、前記受診セット及び前記エンティティセットによって受診ネットワークを構築し、
ステップＳ３１１において、前記受診セットと前記エンティティセットとでノードセットを構成し、
ステップＳ３１２において、前記ノードセットにおけるノード共起関係によってエッジセットを構築し、
ステップＳ３１３において、前記ノードセット及び前記エッジセットによって受診ネットワークを構築する。 In step S3, input the consultation set and the entity set to a chronic nephropathy subtype mining module, and construct a chronic nephropathy subtype mining model by the chronic nephropathy subtype mining module;
In step S31, a consultation network is constructed using the consultation set and the entity set,
In step S311, a node set is configured by the consultation set and the entity set,
In step S312, an edge set is constructed based on the node co-occurrence relationship in the node set,
In step S313, a consultation network is constructed using the node set and the edge set.

ステップＳ３２において、前記エンティティセットによってエンティティ共起行列を構築し、前記エンティティ共起行列によってエンティティノードの初期埋め込み表現及び受診ノードの初期埋め込み表現を取得し、前記エンティティノードの初期埋め込み表現と前記受診ノードの初期埋め込み表現とでノードの初期埋め込み表現を構成し、
ステップＳ３２１において、前記エンティティセットによってエンティティ共起行列を構築し、
ステップＳ３２２において、前記エンティティ共起行列に基づいてＧｌｏＶｅアルゴリズムによって各エンティティノードの初期埋め込み表現を計算して取得し、
ステップＳ３２３において、すべての隣接するエンティティノードのエンティティノードの初期埋め込み表現の平均値を計算することにより受診ノードの初期埋め込み表現を取得し、前記受診ノードの初期埋め込み表現と前記エンティティノードの初期埋め込み表現とでノードの初期埋め込み表現を構成する。 In step S32, an entity co-occurrence matrix is constructed using the entity set, an initial embedding representation of the entity node and an initial embedding representation of the visited node are obtained using the entity co-occurrence matrix, and an initial embedding representation of the entity node and the visited node are obtained. Construct the initial embedding representation of the node with the initial embedding representation of
In step S321, constructing an entity co-occurrence matrix using the entity set;
In step S322, an initial embedding representation of each entity node is calculated and obtained by the GloVe algorithm based on the entity co-occurrence matrix;
In step S323, the initial embedding representation of the visited node is obtained by calculating the average value of the initial embedding representation of the entity node of all adjacent entity nodes, and the initial embedding representation of the visited node and the initial embedding representation of the entity node are obtained. and constitute the initial embedding representation of the node.

ステップＳ３３において、前記受診ネットワークにおけるノード間の関係によって隣接行列を構築し、前記隣接行列及び前記ノードの初期埋め込み表現によって自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルを訓練し、
ステップＳ３３１において、前記受診ネットワークにおけるノード間の関係によって隣接行列を構築し、前記隣接行列及び前記ノードの初期埋め込み表現を前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルに入力してグラフ注意訓練を行って、受診ノードの埋め込み表現及びエンティティノードの埋め込み表現を含むノードの埋め込み表現を取得し、
ステップＳ３３２において、前記ノードの埋め込み表現によって前記受診ネットワークを再構築して、受診ネットワークの再構築誤差を計算し、
ステップＳ３３３において、前記エンティティノードの埋め込み表現をニューラルネットワークのデコーダに入力して訓練し、デコーダの最終層の出力をエンティティノードの再構築埋め込み表現としてエンティティノードの再構築誤差を計算し、
ステップＳ３３４において、前記受診ノードの埋め込み表現に対してｓｏｆｔｍａｘ回帰動作を行って、受診ノードの確率分布を取得し、前記受診ノードの確率分布に基づいてクラスタリング損失を計算し、
ステップＳ３３５において、前記受診ネットワークの再構築誤差、前記エンティティノードの再構築誤差及び前記クラスタリング損失に基づいて前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルの全体損失関数を構築する。 In step S33, constructing an adjacency matrix according to the relationships between nodes in the visited network, and training a clustering network model of visited nodes based on self-supervised graph clustering by the adjacency matrix and the initial embedding representation of the nodes;
In step S331, an adjacency matrix is constructed according to the relationships between nodes in the visited network, and the adjacency matrix and the initial embedding representation of the nodes are input into the clustering network model of visited nodes based on the self-supervised graph clustering to perform graph attention training. to obtain the embedded representation of the node including the embedded representation of the consultation node and the embedded representation of the entity node,
In step S332, reconstructing the consultation network using the embedded representation of the node and calculating a reconstruction error of the consultation network;
In step S333, the embedded representation of the entity node is input to a decoder of a neural network for training, and the output of the final layer of the decoder is used as the reconstructed embedded representation of the entity node to calculate the reconstruction error of the entity node;
In step S334, a softmax regression operation is performed on the embedded representation of the visited node to obtain a probability distribution of the visited node, and a clustering loss is calculated based on the probability distribution of the visited node;
In step S335, an overall loss function of the clustering network model of the visited nodes based on the self-supervised graph clustering is constructed based on the reconstruction error of the visited network, the reconstruction error of the entity nodes, and the clustering loss.

ステップＳ３４において、前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルによって慢性腎症亜型マイニングモデルを構築する。 In step S34, a chronic nephropathy subtype mining model is constructed using the clustering network model of the visited nodes based on the self-supervised graph clustering.

ステップＳ３４１において、前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルが取得した受診ノードのクラスタリング分布を受診ノードのカテゴリ分布とし、前記カテゴリ分布における確率が一番高いカテゴリを受診ノードのカテゴリタグとして選択し、各患者のすべての受診ノードを時間順序で配列し、
ステップＳ３４２において、同じカテゴリタグを有する連続受診ノードのカテゴリ分布間のコサイン類似度を計算することにより受診ノードをマージ又は別個に保持することを決定し、前記受診ノードを配列することによりイベント行列を構築し、
ステップＳ３４３において、頻出イベント決定ノードを検索して、順に受診ノードを接続してイベントプロセスを構成し、前記イベント行列の第１列から各列におけるイベント発生頻度が閾値よりも大きなイベントを頻出イベントとして選択し、頻出イベントをイベントプロセスにおけるノードとし、残りのイベントが直接に終了ノードに入り、頻出イベントにおける各イベントを次回検索する開始ノードとして、対応するイベントベクトルを抽出して新たなイベント行列に組み合わせ、第１列を除去した後に同様の頻出イベントの検索操作を行い、各回検索して取得したノードを開始ノードに接続することによりイベントプロセスを延長し、頻出イベントがヌルになり又はイベントプロセスの長さがイベントプロセスの最大長さになるまで、繰り返しを終了して慢性腎症亜型マイニングモデルを取得する。 In step S341, the clustering distribution of the visited nodes obtained by the clustering network model of the visited nodes based on the self-supervised graph clustering is set as the category distribution of the visited nodes, and the category with the highest probability in the category distribution is set as the category tag of the visited nodes. Select and arrange all visit nodes for each patient in chronological order,
In step S342, it is determined to merge or keep the visited nodes separately by calculating the cosine similarity between the category distributions of successive visited nodes having the same category tag, and the event matrix is created by arranging the visited nodes. build,
In step S343, a frequent event determination node is searched, and an event process is configured by sequentially connecting consultation nodes, and events whose event frequency in each column from the first column of the event matrix is greater than a threshold value are determined as frequent events. Select and make the frequent events nodes in the event process, the remaining events directly enter the end node, and each event in the frequent events as the start node for the next search, extract the corresponding event vectors and combine them into a new event matrix. , after removing the first column, perform a similar search operation for frequent events, and extend the event process by connecting the nodes obtained by searching each time to the start node, so that the frequent events become null or the length of the event process Terminate the iteration until the maximum length of the event process is reached to obtain the chronic nephropathy subtype mining model.

ステップＳ４において、慢性腎症表現型亜型評価モジュールによって前記慢性腎症亜型マイニングモデルを評価し、
ステップＳ５において、慢性腎症亜型予測モジュールによって患者の構造化データを予測し、
ステップＳ５１において、患者の構造化データを前処理してから前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルに入力して予測して、該患者の受診ノードの確率分布を取得し、
ステップＳ５２において、前記受診ノードの確率分布によって受診ノードのクラスタリングカテゴリを判断し、受診イベントシーケンスを構築し、
ステップＳ５３において、前記受診イベントシーケンスを前記慢性腎症亜型マイニングモデルに入力して、前記慢性腎症亜型マイニングモデルにおけるノードを順次フィッティングして１つのイベントプロセスを取得し、イベントプロセスによってどの慢性腎症亜型に属するかを判断する。 In step S4, the chronic nephropathy subtype mining model is evaluated by a chronic nephropathy phenotypic subtype evaluation module,
In step S5, the patient's structured data is predicted by the chronic nephropathy subtype prediction module;
In step S51, the structured data of the patient is preprocessed and then input into the clustering network model of the consultation nodes based on the self-supervised graph clustering to predict, thereby obtaining the probability distribution of the consultation nodes of the patient;
In step S52, a clustering category of the visited node is determined based on the probability distribution of the visited node, and a visited event sequence is constructed;
In step S53, the consultation event sequence is input to the chronic nephropathy subtype mining model, nodes in the chronic nephropathy subtype mining model are sequentially fitted to obtain one event process, and the event process determines which chronic nephropathy Determine whether it belongs to the nephropathy subtype.

実施例
自己監督グラフクラスタリングに基づく慢性腎症亜型マイニングシステムであって、データ収集モジュール、データ抽出及び前処理モジュール、慢性腎症亜型マイニングモジュール、受診ネットワーク構築ユニット、埋め込み表現構築ユニット、クラスタリングネットワーク構築ユニット、慢性腎症亜型マイニングモデル構築ユニット、慢性腎症表現型亜型評価モジュール、並びに慢性腎症亜型予測モジュールを備える。 Embodiment A chronic nephropathy subtype mining system based on self-supervised graph clustering, which includes a data collection module, a data extraction and preprocessing module, a chronic nephropathy subtype mining module, a consultation network construction unit, an embedded representation construction unit, and a clustering network. It includes a construction unit, a chronic nephropathy subtype mining model construction unit, a chronic nephropathy phenotypic subtype evaluation module, and a chronic nephropathy subtype prediction module.

データ収集モジュールは、慢性腎症診療記録における構造化データを収集してデータセットを構築するためのものであり、前記構造化データが患者の基本情報、受診記録、観察窓期間の診断、実験室による検査、医学的検査、手術及び／又は服薬データを含み、
データ抽出及び前処理モジュールは、前記構造化データに対して抽出及び前処理を行って、受診セット及びエンティティセットを取得するためのものであり、具体的には、前記データセットを前処理し、患者の基本情報、受診記録、観察窓期間の診断、実験室による検査、医学的検査、手術データ、服薬データを含む、電子カルテシステムにおける前記慢性腎症診療記録における構造化データを抽出し、抽出された前記構造化データを前処理し、実験室による検査データについては、正常な参照範囲に準じて、異常の検査項のみに関心を持ち、異常の検査項結果を低過ぎ及び高過ぎの２種類に分け、異常の検査項の名称、異常のカテゴリを保持し、医学的検査及び手術データを簡単な自然言語処理技術にて処理し、検査部位及びカテゴリ、手術の名称を保持し、服薬データについては、抗高血糖薬、降圧薬、脂質調節薬、非ステロイド性抗炎症薬、抗血小板凝集薬、ステロイド等の６種類の薬物の使用のみに関心を持ち、服薬データにおける６種類の薬物を分類し、薬物のカテゴリを保持し、診断セット、服薬セット、手術セット、試験セット、診断種類の数、服薬種類の数、手術種類の数、試験種類の数及び受診記録の数を取得し、前記診断セット、服薬セット、手術セット及び試験セットをマージしてエンティティセットに構成し、患者の受診記録を受診セットとして構成するためのものである。 The data collection module is for constructing a dataset by collecting structured data in chronic kidney disease medical records, and the structured data includes patient basic information, consultation records, diagnosis during the observation window period, and laboratory data. including examination, medical examination, surgical and/or medication data;
The data extraction and preprocessing module is for extracting and preprocessing the structured data to obtain a consultation set and an entity set, and specifically, preprocessing the data set, Extract and extract structured data in the chronic kidney disease medical records in the electronic medical record system, including patient basic information, consultation records, observation window period diagnosis, laboratory tests, medical tests, surgical data, and medication data. For the laboratory test data, we are interested only in abnormal test items according to the normal reference range, and classify abnormal test item results into two categories: too low and too high. The name of the abnormal test item and the abnormal category are stored, the medical test and surgical data are processed using simple natural language processing technology, the test site and category, the name of the surgery are stored, and the medication data is We are only interested in the use of six types of drugs, including antihyperglycemic drugs, antihypertensive drugs, lipid-regulating drugs, nonsteroidal anti-inflammatory drugs, antiplatelet aggregants, and steroids, and we analyze the six types of drugs in the medication data. classify, maintain drug categories, obtain diagnosis sets, medication sets, surgery sets, test sets, number of diagnosis types, number of medication types, number of surgery types, number of test types, and number of consultation records; This is for merging the diagnosis set, medication set, surgery set, and test set to form an entity set, and configuring a patient's medical examination record as a medical examination set.

慢性腎症亜型マイニングモジュールは、前記受診セット及びエンティティセットを慢性腎症亜型マイニングモジュールに入力し、前記慢性腎症亜型マイニングモジュールによって慢性腎症亜型マイニングモデルを構築するためのものであり、
受診ネットワーク構築ユニットは、前記受診セット及び前記エンティティセットによって受診ネットワークを構築するためのものであり、
前記受診セットと前記エンティティセットとでノードセットを構成するためのものであり、
受診セットが

であり、ここで、Ｎ^Ｖが受診数を示す。Ｄ、Ｍ、Ｐ、Ｌがそれぞれ診断セット、服薬セット、手術セット、試験セットであり、

、

であり、ここで、Ｎ^Ｄ、Ｎ^Ｍ、Ｎ^Ｐ、Ｎ^Ｌがそれぞれ診断種類の数、服薬種類の数、手術種類の数、試験種類の数を示す。Ｄ、Ｍ、Ｐ、Ｌがエンティティセット

を構成し、エンティティセット種類の数がＮ^Ｓ＝Ｎ^Ｄ＋Ｎ^Ｍ＋Ｎ^Ｐ＋Ｎ^Ｌである。 The chronic nephropathy subtype mining module is for inputting the consultation set and the entity set into the chronic nephropathy subtype mining module, and constructing a chronic nephropathy subtype mining model by the chronic nephropathy subtype mining module. can be,
The consultation network construction unit is for constructing a consultation network using the consultation set and the entity set,
The consultation set and the entity set constitute a node set,
A medical examination set

, where ^NV indicates the number of consultations. D, M, P, and L are a diagnosis set, a medication set, a surgery set, and a test set, respectively;

,

Here, N ^D , N ^M , N ^P , and N ^L respectively represent the number of diagnosis types, the number of medication types, the number of surgery types, and the number of test types. D, M, P, L are entity sets

, and the number of entity set types is N ^S =N ^D +N ^M +N ^P +N ^L.

エンティティセットが受診セットとともにノードセット

を構成し、ノードの個数がＮ^Ｎ＝Ｎ^Ｖ＋Ｎ^Ｓ＝Ｎ^Ｖ＋Ｎ^Ｄ＋Ｎ^Ｍ＋Ｎ^Ｐ＋Ｎ^Ｌであり、
前記ノードセットにおけるノード共起関係によってエッジセットを構築するためのものであり、
同一回の受診（Ｖ_ｉ）に現れるエンティティをエンティティサブセット

に構成し、ｊがエンティティサブセットＳ（Ｖ_ｉ）におけるエンティティの数を示し、

である。各エンティティサブセットがその対応する受診とともに１つの受診リンクサブセット

を構成する。１つの前記受診リンクサブセットには１つの受診ノード及び今回の受診におけるすべてのエンティティノードが含まれており、１つの前記受診リンクサブセットにおけるすべてのノードに共起関係があり、ノードが２つずつ接続してエッジサブセットを構成し、すべての前記エッジサブセットがエッジセットを構成し、前記エッジセットが

であり、
前記ノードセット及び前記エッジセットによって受診ネットワークＧ＝（Ｎ，Ｅ）を構築するためのものである。 Entity set is node set along with visit set

and the number of nodes is N ^N = N ^V + N ^S = N ^V + N ^D + N ^M + N ^P + N ^L ,
for constructing an edge set based on node co-occurrence relationships in the node set,
Entities that appear in the same medical visit (V _i ) are defined as an entity subset.

, where j denotes the number of entities in the entity subset S(V _i ), and

It is. Each entity subset has one visit link subset with its corresponding visit

Configure. One consultation link subset includes one consultation node and all entity nodes in the current consultation, and all the nodes in one consultation link subset have a co-occurrence relationship, and two nodes are connected. constitute an edge subset, all said edge subsets constitute an edge set, and said edge set

and
This is for constructing a consultation network G=(N,E) using the node set and the edge set.

図３を参照し、受診Ｖ_１において、医者は甲状腺腫（Ｄ_１）、甲状腺結節（Ｄ_２）の２種類の診断を下して、甲状腺部分切除術（Ｐ_１）を行ってレボチロキシンナトリウム錠剤（Ｍ_１）の薬を出す。そうすると、

が１つの受診リンクサブセットを構成し、受診ネットワークにおいてこの５つのノードが２つずつ接続している。受診Ｖ_４において、医者はＴＳＨ測定（Ｌ_３）を行ってから甲状腺機能低下症（Ｄ_４）の診断を下してレボチロキシンナトリウム錠剤（Ｍ_１）の薬を出す。そうすると、

も１つの受診リンクサブセットであり、受診ネットワークにおいてこの４つのノードが２つずつ接続している。Ｍ_１がＣ（Ｖ_１）及びＣ（Ｖ_４）に同時に現れるため、受診ネットワークにおいてＭ_１がこの２つの受診リンクサブセットにおける他のノードにいずれも接続している。 Referring to Figure 3, at visit _V1 , the doctor makes two diagnoses: goiter ( _D1 ) and thyroid nodule ( _D2 ), performs partial thyroidectomy ( _P1 ), and administers levothyroxine sodium. Dispense medicine in tablet form (M ₁ ). Then,

constitutes one consultation link subset, and these five nodes are connected two by two in the consultation network. At visit V ₄ , the doctor performs a TSH measurement (L ₃ ), then diagnoses hypothyroidism (D ₄ ) and prescribes the medicine levothyroxine sodium tablets (M ₁ ). Then,

is also one consultation link subset, and these four nodes are connected two by two in the consultation network. Since M ₁ appears simultaneously in C(V ₁ ) and C(V ₄ ), M ₁ is connected to both other nodes in these two visited link subsets in the visited network.

埋め込み表現構築ユニットは、前記エンティティセットによってエンティティ共起行列を構築し、前記エンティティ共起行列によってエンティティノードの初期埋め込み表現及び受診ノードの初期埋め込み表現を取得し、前記エンティティノードの初期埋め込み表現と前記受診ノードの初期埋め込み表現とでノードの初期埋め込み表現を構成するためのものであり、
前記エンティティセットによってエンティティ共起行列を構築するためのものであり、
エンティティセットＳによってエンティティ共起行列Ｘを構築し、図４を参照し、エンティティ共起行列Ｘの次元がＮ^Ｓ×Ｎ^Ｓであり、各行と各列がいずれもエンティティセットＳにおける１つのエンティティを代表し、Ｘ_ｉｊがエンティティＳ_ｉ及びエンティティＳ_ｊの共起情報を示す。Ｘ_ｉｊの計算公式は、

であり、
エンティティＳ_ｉとエンティティＳ_ｊが受診Ｖ_ｋにおいて同時に現れる場合、

が１に等しく、そうではない場合、０と記す。ここで、Ｓ（Ｖ_ｋ）が受診Ｖ_ｋにおいて現れるすべてのエンティティで構成される１つのエンティティサブセットである。エンティティ共起行列Ｘが対称であり、Ｘ_ｉｊとＸ_ｊｉが等しく、対角線上にあるものが同じエンティティの共起情報であり、０と記す。 The embedded representation construction unit constructs an entity co-occurrence matrix using the entity set, obtains an initial embedding representation of an entity node and an initial embedding representation of a visited node using the entity co-occurrence matrix, and combines the initial embedding representation of the entity node with the initial embedding representation of the visited node. This is for configuring the initial embedding representation of the node with the initial embedding representation of the consultation node,
for constructing an entity co-occurrence matrix by the entity set,
An entity co ^- occurrence matrix X is constructed by the entity set S, and with reference to FIG ^. 4, the dimensions of the entity co-occurrence matrix As a representative, X _ij indicates co-occurrence information of entity S _i and entity S _j . The calculation formula for X _ij is

and
If entity S _i and entity S _j appear simultaneously in visit V _k ,

is equal to 1, otherwise it is written as 0. Here, S(V _k ) is one entity subset consisting of all entities appearing in visit V _k . The entity co-occurrence matrix X is symmetrical, X _ij and X _ji are equal, and those on the diagonal are co-occurrence information of the same entity, which is written as 0.

前記エンティティ共起行列に基づいてＧｌｏＶｅアルゴリズムによって各エンティティノードの初期埋め込み表現を計算して取得するためのものであり、
エンティティノードの初期埋め込み表現とエンティティ共起行列との関係は、

と示され、
ここで、ｗ_ｉとｗ_ｊがそれぞれ最終的に求める必要のあるエンティティＳ_ｉ及びエンティティＳ_ｊのエンティティノードの初期埋め込み表現であり、１２８次元で－０．１～０．１間の値を取るランダムベクトルにランダムに初期化し、上付き文字Ｔが転置操作であり、ｂ_ｉとｂ_ｊがそれぞれ２つのエンティティノードの初期埋め込み表現のバイアス項であり、それらの初期値が０である。 for calculating and obtaining an initial embedding representation of each entity node by the GloVe algorithm based on the entity co-occurrence matrix;
The relationship between the initial embedding representation of entity nodes and the entity co-occurrence matrix is

It is shown that
Here, w _i and w _j are the initial embedding representations of entity nodes of entity S _i and entity S _j that need to be finally found, respectively, and take values between -0.1 and 0.1 in 128 dimensions. Randomly initialize to a random vector, the superscript T is the transpose operation, and b _i and b _j are the bias terms of the initial embedding representations of the two entity nodes, respectively, and their initial values are zero.

エンティティ共起行列とエンティティノードの初期埋め込み表現との関係に基づいて目標関数Ｊを構築し、

であり、
ここで、ＭＡＸが共起情報の閾値であり、αが指数パラメータである。 Construct an objective function J based on the relationship between the entity co-occurrence matrix and the initial embedding representation of the entity node,

and
Here, MAX is a threshold value of co-occurrence information, and α is an index parameter.

２つのエンティティノードが同時に現れることがなく、即ちＸ_ｉｊ＝０の場合、それらが目標関数の計算に参加しない。収束するまでＡｄａＤｅｌｔａ勾配降下アルゴリズムによって前記目標関数を最適化し、前記エンティティセットにおける各エンティティ

に対応するエンティティノードの初期埋め込み表現

を取得し、
すべての隣接するエンティティノードのエンティティノードの初期埋め込み表現の平均値を計算することにより受診ノードの初期埋め込み表現を取得し、前記受診ノードの初期埋め込み表現と前記エンティティノードの初期埋め込み表現とでノードの初期埋め込み表現を構成するためのものであり、
受診ノードＶ_ｉについては、そのすべての隣接するエンティティノードのセットが

であり、Ｖ_ｉノードの初期埋め込み表現は、

であり、
ここで、ｊがＳ（Ｖ_ｉ）におけるエンティティノードの数である。 If two entity nodes do not appear at the same time, ie if X _ij =0, they do not participate in the calculation of the objective function. Optimize the objective function by AdaDelta gradient descent algorithm until convergence, and each entity in the entity set

the initial embedding representation of the entity node corresponding to

and
Obtain the initial embedding representation of the visited node by calculating the average value of the initial embedding representation of the entity node of all neighboring entity nodes, and combine the initial embedding representation of the visited node and the initial embedding representation of the entity node to It is for configuring the initial embedding representation,
For the visited node V _i , the set of all its neighboring entity nodes is

, and the initial embedding representation of V _inode is

and
Here, j is the number of entity nodes in S(V _i ).

ノードの初期埋め込み表現

であり、Ｂ_Ｖが受診ノードの初期埋め込み表現であり、Ｂ_Ｓがエンティティノードの初期埋め込み表現である。 Initial embedding representation of the node

, B _V is the initial embedding representation of the visited node, and B _S is the initial embedding representation of the entity node.

クラスタリングネットワーク構築ユニットは、前記受診ネットワークにおけるノード間の関係によって隣接行列を構築し、前記隣接行列及び前記ノードの初期埋め込み表現によって自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルを訓練するためのものであり、図５を参照し、前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルはグラフ注意、オートエンコーダ及び自己監督の３つの部分で構成される。 The clustering network construction unit is for constructing an adjacency matrix according to the relationships between nodes in the visiting network, and training a clustering network model of visiting nodes based on self-supervised graph clustering by the adjacency matrix and the initial embedding representation of the nodes. Referring to FIG. 5, the clustering network model of visiting nodes based on self-supervised graph clustering is composed of three parts: graph attention, autoencoder, and self-supervision.

前記受診ネットワークにおけるノード間の関係によって隣接行列Ａを構築し、前記隣接行列Ａ及び前記ノードの初期埋め込み表現Ｂを前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルに入力してグラフ注意訓練をＬ回行うためのものであり、第ｌ層のノードの埋め込み表現がＺ^ｌであり、計算方式は、

であり、
ここで、

がｒｅｌｕ活性化関数であり、Ｗ^ｌが第ｌ層のグラフ注意重みである。

であり、Ａが正規化された隣接行列であり、Ｉが単位行列であり、

である。Ｌ層のグラフ注意訓練を行った後、ノードの埋め込み表現Ｚ^Ｌを取得する。Ｚ^Ｌはノードの初期埋め込み表現Ｂと同様に、更新後の受診ノードの埋め込み表現Ｚ_Ｖ ^Ｌとエンティティノードの埋め込み表現Ｚ_Ｓ ^Ｌとで構成され、

である。 Constructing an adjacency matrix A according to the relationships between nodes in the visited network, and inputting the adjacency matrix A and the initial embedding representation B of the nodes into the clustering network model of visited nodes based on the self-supervised graph clustering to perform graph attention training. The embedding representation of the node in the lth layer is ^Zl , and the calculation method is as follows.

and
here,

is the relu activation function and W ^l is the graph attention weight of the lth layer.

, A is the normalized adjacency matrix, I is the identity matrix,

It is. After performing graph attention training for the L layer, obtain the embedded representation Z ^L of the node. Similar to the initial embedded representation B of the node, Z ^L is composed of the updated embedded representation Z _V ^L of the visited node and the embedded representation Z _S ^L of the entity node,

It is.

前記ノードの埋め込み表現によって前記受診ネットワークを再構築して、受診ネットワークの再構築誤差を計算するためのものであり、
再構築後の隣接行列

は、

であり、
ここで、（Ｚ^Ｌ）^ＴがＺ^Ｌの転置行列であり、

がｓｉｇｍｏｉｄ活性化関数である。 for reconstructing the consultation network using the embedded representation of the nodes and calculating a reconstruction error of the consultation network;
Adjacency matrix after reconstruction

teeth,

and
Here, (Z ^L ) ^T is the transposed matrix of Z ^L ,

is the sigmoid activation function.

受診ネットワークの再構築誤差Ｌ_{ｒｅｃ－Ｇ}を計算し、

であり、
ここで、

である。 Calculate the reconstruction error L _rec−G of the consultation network,

and
here,

It is.

エンティティノードの埋め込み表現Ｚ_Ｓ ^ＬをＹ層ニューラルネットワークのデコーダに入力して訓練するためのものであり、ノードの第ｙ層のデコーダにおける表現がＨ^ｙであり、下記計算公式によって取得されたのであり、

であり、
ここで、Ｗ_ｄ ^ｙが第ｙ層のデコーダネットワーク重みであり、ｂ_ｄ ^ｙが偏差であり、デコーダの入力がＨ^０＝Ｚ_Ｓ ^Ｌである。デコーダの最終層の出力をエンティティノードの再構築埋め込み表現

としてエンティティノードの再構築誤差Ｌ_{ｒｅｃ－Ｓ}を計算し、

であり、
受診ノードの埋め込み表現Ｚ_Ｖ ^Ｌに対してｓｏｆｔｍａｘ回帰動作を行って、受診ノードの確率分布を取得するためのものであり、

であり、
ここで、Ｚ_Ｖ ^Ｒの次元がＮ^Ｖ×Ｋであり、Ｋがデフォルトのクラスタリングセンター数即ち受診ノードカテゴリ数であり、経験によって３、５、１０を試して結果がより良いカテゴリ数を選択する。

はｉ番目のサンプルがｊカテゴリに属する確率を示す。 This is for training by inputting the embedded representation _ZSL of the entity node to the decoder of the Y-layer neural network, and the representation of the node ⁱⁿ the y-layer decoder is H ^y , which was obtained by the calculation formula below. can be,

and
Here, W _dy is the decoder network weight of the y-th layer, ^{b dy} _is the deviation, and the input of ^the decoder is H ⁰ =Z _S ^L. The output of the final layer of the decoder is the reconstructed embedding representation of the entity node.

Calculate the reconstruction error L _rec−S of the entity node as

and
This is for performing a softmax regression operation on the embedded representation Z _V ^L of the visited node to obtain the probability distribution of the visited node,

and
Here, the dimension of Z _V ^R is N ^V ×K, K is the default number of clustering centers, that is, the number of visited node categories, and depending on experience, try 3, 5, and 10 and select the number of categories that gives the best result. .

indicates the probability that the i-th sample belongs to the j category.

前記受診ノードの確率分布に基づいてクラスタリング損失を計算し、
ｉ番目の受診サンプル及びｊ番目のクラスタに対して、学生ｔ分布によってデータ表現ｚ_ｉとクラスタリングセンターμ_ｊとの類似度を判断する。ｚ_ｉがＺ_Ｖ ^Ｒの第ｉ行であり、μ_ｊが受診ノードの確率分布Ｚ_Ｖ ^Ｒに基づいてＫ－ｍｅａｎｓ方法で初期化されたクラスタリングセンターであり、ｖが学生ｔ分布の自由度であり、ｑ_ｉｊの計算公式は、

であり、
ここで、ｑ_ｉｊはｉ番目のサンプルがｊ番目のクラスタに属する確率である。

をすべてのサンプルクラスタリング分布のセットとして設定する。クラスタリング分布Ｑを取得した後、目標分布Ｐを計算し、目標分布Ｐは一層高い信頼度のサンプル割り当てを有し、従って、Ｐに基づいてデータ分布を最適化してデータをクラスタリングセンターに更に近づけさせることができる。ＰとＱの次元がＮ^Ｖ×Ｋである。目標分布Ｐにおける各要素ｐ_ｉｊの計算公式は、

であり、
ここで、

である。目標分布Ｐにおいて、Ｑにおける各分布がいずれも二乗されるため、Ｐは一層高い信頼度を有する。クラスタリング損失の計算公式は、

であり、
受診ネットワークの再構築誤差Ｌ_{ｒｅｃ－Ｇ}、エンティティノードの再構築誤差Ｌ_{ｒｅｃ－Ｓ}及びクラスタリング損失Ｌ_ｃｌｕに基づいて、前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルの全体損失関数を構築するためのものである。前記全体損失関数は、

であり、
ここで、γ、βが異なる損失項の重要性を調整するハイパーパラメータであり、０．１としてデフォルト設定される。 Calculating a clustering loss based on the probability distribution of the visited nodes;
For the i-th visit sample and the j-th cluster, the degree of similarity between the data representation z _i and the clustering center μ _j is determined using the student t distribution. z _i is the i-th row of Z _V ^R , μ _j is the clustering center initialized by the K-means method based on the probability distribution Z _V ^R of the visited nodes, and v is the degree of freedom of the student t distribution. Yes, the calculation formula for q _ij is

and
Here, q _ij is the probability that the i-th sample belongs to the j-th cluster.

Set as the set of all sample clustering distributions. After obtaining the clustering distribution Q, calculate a target distribution P, the target distribution P has a higher confidence sample assignment, and therefore optimize the data distribution based on P to bring the data closer to the clustering center. be able to. The dimensions of P and Q are N ^V ×K. The calculation formula for each element p _ij in the target distribution P is:

and
here,

It is. In the target distribution P, each distribution in Q is squared, so P has higher confidence. The formula for calculating clustering loss is:

and
Based on the reconstruction error L _rec-G of the visited network, the reconstruction error L _rec-S of the entity node, and the clustering loss L _clu , construct the overall loss function of the clustering network model of the visited nodes based on the self-supervised graph clustering. It is for. The overall loss function is

and
Here, γ and β are hyperparameters that adjust the importance of different loss terms, and are set as 0.1 by default.

慢性腎症亜型マイニングモデル構築ユニットは、前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルによって慢性腎症亜型マイニングモデルを構築するためのものである。 The chronic nephropathy subtype mining model construction unit is for constructing a chronic nephropathy subtype mining model by the clustering network model of the visiting nodes based on the self-supervised graph clustering.

前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルが取得した受診ノードのクラスタリング分布Ｑを受診ノードのカテゴリ分布とし、前記カテゴリ分布における確率が一番高いカテゴリを受診ノードのカテゴリタグとして選択するためのものであり、受診ノード

に対応するカテゴリタグが

である。単回受診の場合の１番目の医療記録の記録時間を受診ノードの開始時間とし、最後の医療記録の記録時間を受診ノードの終了時間とし、各患者のすべての受診ノードを時間順序で配列する。 The clustering distribution Q of the visited nodes obtained by the clustering network model of the visited nodes based on the self-supervised graph clustering is set as the category distribution of the visited nodes, and the category with the highest probability in the category distribution is selected as the category tag of the visited nodes. and the consultation node

The category tag corresponding to

It is. In the case of a single visit, the recording time of the first medical record is the start time of the visit node, the recording time of the last medical record is the end time of the visit node, and all visit nodes for each patient are arranged in chronological order. .

同じカテゴリタグを有する連続受診ノードのカテゴリ分布間のコサイン類似度を計算することにより受診ノードをマージ又は別個に保持することを決定し、前記受診ノードを配列することによりイベント行列を構築するためのものであり、
同じカテゴリタグを有する２つの連続受診ノードＶ_ｉ、Ｖ_ｊに対して、Ｖ_ｉ、Ｖ_ｊカテゴリ分布間のコサイン類似度を計算し、

であり、
ここで、

がイベントＶ_ｉ、Ｖ_ｊのカテゴリ分布である。 Decide to merge or keep the visited nodes separately by calculating the cosine similarity between the category distributions of consecutive visited nodes with the same category tag, and construct an event matrix by arranging the visited nodes. It is a thing,
Calculate the cosine similarity between the category distributions of V _i and V _j for two consecutive visited nodes V _i and V _j having the same category tag,

and
here,

is the category distribution of events V _i and V _j .

コサイン類似度が０．８よりも大きな前後２つの受診ノードを１つの受診ノードにマージし、マージ後の受診ノードカテゴリ分布が

であり、そうではない場合に２つの受診ノードを別個に保持する。同じカテゴリタグを有する複数の連続受診ノードの場合、配列順序で前から後まで２つごとにコサイン類似度の判断を行って、マージ又は別個に保持することを決定する。 Two consultation nodes with cosine similarity greater than 0.8 are merged into one consultation node, and the consultation node category distribution after merging is

, and if not, the two visited nodes are maintained separately. In the case of a plurality of consecutively visited nodes having the same category tag, the cosine similarity is determined for every two nodes from front to rear in the arrangement order, and it is determined whether to merge or keep them separately.

最終的に各患者の受診ノードをイベントベクトル

に配列し、ｋが受診ノードの一番多い患者のノード数であり、ノード数がｋ未満の患者の場合に０でイベントベクトルを充填する。すべての患者のイベントベクトルをイベント行列Ｈに組み合わせ、前記イベント行列Ｈは、

であり、
ここで、Ｈの次元がｎ×ｋであり、ｎが患者の総数である。 Finally, each patient's visit node is an event vector

where k is the number of nodes of the patient with the largest number of visited nodes, and if the number of nodes is less than k, the event vector is filled with 0. The event vectors of all patients are combined into an event matrix H, where the event matrix H is

and
Here, the dimension of H is n×k, where n is the total number of patients.

頻出イベント決定ノードを検索して、順に受診ノードを接続してイベントプロセスを構成し、前記イベント行列の第１列から各列におけるイベント発生頻度が閾値よりも大きなイベントを頻出イベントとして選択し、頻出イベントをイベントプロセスにおけるノードとし、残りのイベントが直接に終了ノードに入り、頻出イベントにおける各イベントを次回検索する開始ノードとして、対応するイベントベクトルを抽出して新たなイベント行列に組み合わせ、第１列を除去した後に同様の頻出イベントの検索操作を行い、各回検索して取得したノードを開始ノードに接続することによりイベントプロセスを延長し、頻出イベントがヌルになり又はイベントプロセスの長さがイベントプロセスの最大長さになるまで、繰り返しを終了して慢性腎症亜型マイニングモデルを取得するためのものである。 Search for frequent event determination nodes, connect consultation nodes in order to configure an event process, and select events whose event frequency in each column from the first column of the event matrix is greater than a threshold value as frequent events. Let the event be a node in the event process, the remaining events will directly enter the end node, and each event in the frequent events will be the starting node for the next search, and the corresponding event vectors will be extracted and combined into a new event matrix, and the first column After removing , perform a similar search operation for frequent events, and extend the event process by connecting the node obtained by searching each time to the start node, and the frequent event becomes null or the length of the event process becomes longer than the event process. This is to obtain the chronic nephropathy subtype mining model by terminating the iterations until the maximum length of .

慢性腎症表現型亜型評価モジュールは、前記慢性腎症亜型マイニングモデルを評価し、
異なる表現型亜型患者の差異を比較し、マイニングされた異なる亜型特徴に統計学的差異があるかどうかを試験することにより、表現型亜型マイニング方法で取得された疾病亜型が臨床的意味を有するかどうかを評価するためのものである。具体的な評価スキームは以下のとおりである。 The chronic nephropathy phenotype subtype evaluation module evaluates the chronic nephropathy subtype mining model,
By comparing the differences between patients with different phenotypic subtypes and testing whether there are statistical differences in the mined different subtype features, the disease subtypes obtained by the phenotypic subtype mining method can be This is to evaluate whether it has any meaning. The specific evaluation scheme is as follows.

異なる表現型亜型患者の性別、年齢、糸球体濾過率などの指標を計算して、統計的試験方法によって異なる表現型亜型患者の臨床症状に差異があるかどうかを判断する。 Calculate indicators such as gender, age, and glomerular filtration rate of patients with different phenotypic subtypes to determine whether there are differences in clinical symptoms of patients with different phenotypic subtypes by statistical testing methods.

異なる亜型患者の遺伝子組み換えヒトエリスロポエチン、メトホルミン、カンデサルタン、プラバスタチン使用量などの重要な服薬データに差異があるかどうかを統計して、統計的試験方法によって分析する。 We will statistically analyze whether there are any differences in important medication data such as the usage of recombinant human erythropoietin, metformin, candesartan, and pravastatin among patients with different subtypes, and use statistical testing methods.

心臓衰弱、冠状動脈性心臓病、高血圧、糖尿病、高脂血症を含む各種類の亜型患者の様々な併発症の発病人数を統計し、各併発症の割合を計算し、異なる亜型における併発症の割合に差異があるかどうかを試験する。 We calculated the number of patients with various comorbidities in each type of subtype, including cardiac weakness, coronary heart disease, hypertension, diabetes, and hyperlipidemia, calculated the proportion of each comorbidity, and calculated the prevalence of each comorbidity in patients with different subtypes. To test whether there are differences in comorbidity rates.

各亜型総人数及び異なる時点での生存人数を統計し、異なる亜型患者の生存率を比較する。異なる亜型患者が時間の変化につれて変化する生存率の差異を観察し、Ｌｏｇ－ｒａｎｋ試験によって分析する。 The total number of patients of each subtype and the number of survivors at different time points will be compiled to compare the survival rates of patients with different subtypes. Differences in survival rates of different patient subtypes over time are observed and analyzed by Log-rank tests.

異なる亜型の患者グループの５０％以上の特徴に顕著な差異がある場合には、マイニングされた亜型がより良い臨床使用価値を有すると説明される。 If there are significant differences in the characteristics of more than 50% of patient groups of different subtypes, the mined subtype is said to have better clinical use value.

慢性腎症亜型予測モジュールは、患者の構造化データを予測するためのものであり、
患者の構造化データを前処理してから前記自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルに入力して予測して、該患者の受診ノードの確率分布を取得するためのものであり、
前記受診ノードの確率分布によって受診ノードのクラスタリングカテゴリを判断し、受診イベントシーケンスを構築するためのものであり、
前記受診イベントシーケンスを前記慢性腎症亜型マイニングモデルに入力して、前記慢性腎症亜型マイニングモデルにおけるノードを順次フィッティングして１つのイベントプロセスを取得し、イベントプロセスによってどの慢性腎症亜型に属するかを判断するためのものである。 The chronic nephropathy subtype prediction module is for predicting patient structured data,
The structured data of the patient is preprocessed and then input into the clustering network model of the consultation nodes based on the self-supervised graph clustering for prediction to obtain the probability distribution of the consultation nodes of the patient,
determining a clustering category of the visited nodes based on the probability distribution of the visited nodes and constructing a visiting event sequence;
Input the consultation event sequence into the chronic nephropathy subtype mining model, sequentially fit nodes in the chronic nephropathy subtype mining model to obtain one event process, and determine which chronic nephropathy subtype by the event process. This is to determine whether it belongs to the .

本発明は自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルを提供し、グラフ注意訓練にはノードの埋め込み表現の再構築のためのデコーダを追加し、クラスタリングモデルの訓練のための自己監督損失を追加し、自己監督グラフクラスタリングに基づく受診ノードのクラスタリングネットワークモデルは低階層・細粒度の慢性腎症患者情報を高階層・粗粒度の総括的な情報に集めて診療過程のマイニングに用い、過程マイニングによって縦方向の電子カルテデータにおける単回受診内イベント情報及び複数回受診間イベント情報などの多粒度情報が共存することを処理できない問題を解決し、自己監督グラフクラスタリング方法に基づいて患者の単回受診内の多次元診療情報及び複数回受診間のシーケンス情報を十分に統合するとともに、横断面及び縦方向の２つの次元から電子カルテデータに対して十分な特徴マイニングを行い、受診ノードのイベントタグ分布類似度に基づいて類似する隣接イベントをマージし、過程マイニング方法を最適化し、マイニングされた診療プロセスを簡素化して、診療プロセスの代表性及びカバー率を向上させる。 The present invention provides a clustering network model of visited nodes based on self-supervised graph clustering, adds a decoder for reconstructing the embedded representation of nodes to graph attention training, and provides a self-supervised loss for training the clustering model. In addition, the clustering network model of consultation nodes based on self-supervised graph clustering collects low-level, fine-grained chronic kidney disease patient information into high-level, coarse-grained comprehensive information, and uses it to mine the medical treatment process. solves the problem of not being able to handle the coexistence of multi-grained information such as event information within a single visit and event information between multiple visits in longitudinal electronic medical record data, and analyzes the patient's single visit based on a self-supervised graph clustering method. In addition to fully integrating multidimensional medical information within a visit and sequence information between multiple visits, sufficient feature mining is performed on electronic medical record data from two dimensions, cross-sectional and longitudinal, and event tags of visit nodes are Merging similar adjacent events based on distribution similarity, optimizing the process mining method, simplifying the mined medical process, and improving the representativeness and coverage rate of the medical process.

以上の説明は本発明の好適な実施例に過ぎず、本発明を制限するためのものではなく、当業者であれば、本発明に対して種々の変更や変化を行うことができる。本発明の主旨及び原則内に行われるいかなる修正、等価置換、改良などは、いずれも本発明の保護範囲内に含まれるべきである。 The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and those skilled in the art can make various modifications and changes to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

A chronic nephropathy subtype mining system based on self-supervised graph clustering, comprising a data collection module, a data extraction and preprocessing module, a chronic nephropathy subtype mining module, a chronic nephropathy phenotypic subtype evaluation module, and a chronic nephropathy subtype evaluation module. Contains a type prediction module,
The data collection module is used to collect structured data in chronic nephropathy medical records,
The chronic nephropathy subtype mining module is used to extract and preprocess structured data in the chronic nephropathy medical record to obtain an entity set and a consultation set,
The chronic nephropathy subtype mining module is used to construct a chronic nephropathy subtype mining model using the entity set and the consultation set,
The chronic nephropathy phenotype subtype evaluation module is used to evaluate the chronic nephropathy subtype mining model,
The chronic nephropathy subtype prediction module is used to predict structured data of a patient,
Specifically, the data extraction and preprocessing module preprocesses the structured data in the chronic kidney disease medical records to extract patient basic information, consultation records, diagnosis of observation window period, laboratory tests, and medical tests. , extract structured data from the chronic nephropathy medical records in the electronic medical record system, including surgical data and medication data, preprocess the extracted structured data from the chronic nephropathy medical records, and extract laboratory test data. According to the normal reference range, we are concerned only with abnormal test items, and divide abnormal test results into two types: too low and too high, and maintain the names of abnormal test items and abnormal categories. The system processes medical test and surgical data using simple natural language processing technology, retains the test site and category, and name of the surgery, and records medication data such as antihyperglycemic drugs, antihypertensive drugs, lipid-regulating drugs, and We are only interested in the use of six types of drugs: steroidal anti-inflammatory drugs, antiplatelet aggregants, and steroids, and we classify the six types of drugs in medication data, maintain drug categories, and create diagnostic sets, medication sets, and surgeries. set, test set, number of diagnosis types, number of medication types, number of surgery types, number of test types, and number of consultation records, and merge the diagnosis set, medication set, surgery set, and test set into an entity. It is used to configure patient consultation records as a consultation set.
The chronic nephropathy subtype mining module specifically includes a consultation network construction unit, an embedded representation construction unit, a clustering network construction unit, and a chronic nephropathy subtype mining model construction unit,
The consultation network construction unit is used to construct a consultation network using the consultation set and the entity set,
The embedded representation construction unit constructs an entity co-occurrence matrix using the entity set, obtains an initial embedding representation of an entity node and an initial embedding representation of a visited node using the entity co-occurrence matrix, and combines the initial embedding representation of the entity node with the initial embedding representation of the visited node. used to configure an initial embedding representation of the node with the initial embedding representation of the consultation node,
The clustering network construction unit constructs an adjacency matrix according to the relationship between nodes in the visited network, and trains a clustering network model of visited nodes based on self-supervised graph clustering by the adjacency matrix and the initial embedding representation of the nodes. used,
The chronic nephropathy subtype mining model construction unit is used to construct a chronic nephropathy subtype mining model by a clustering network model of consultation nodes based on the self-supervised graph clustering,
Specifically, the chronic nephropathy subtype mining model construction unit:
The clustering distribution of the visited nodes obtained by the clustering network model of the visited nodes based on the self-supervised graph clustering is set as the category distribution of the visited nodes, the category with the highest probability in the category distribution is selected as the category tag of the visited nodes, and each used for chronologically arranging all visit nodes of a patient;
Decide to merge or keep the visited nodes separately by calculating the cosine similarity between the category distributions of consecutive visited nodes with the same category tag, and construct an event matrix by arranging the visited nodes. to be used and
Search for frequent event determination nodes, connect consultation nodes in order to configure an event process, and select events whose event frequency in each column from the first column of the event matrix is greater than a threshold value as frequent events. Let the event be a node in the event process, the remaining events will directly enter the end node, and each event in the frequent events will be the starting node for the next search, and the corresponding event vectors will be extracted and combined into a new event matrix, and the first column After removing , perform a similar search operation for frequent events, and extend the event process by connecting the node obtained by searching each time to the start node, and the frequent event becomes null or the length of the event process becomes longer than the event process. used to obtain a chronic nephropathy subtype mining model by terminating the iterations until a maximum length of
Specifically, the chronic nephropathy subtype prediction module
The structured data of the patient is pre-processed and then input into a clustering network model of visit nodes based on the self-supervised graph clustering for prediction to obtain a probability distribution of visit nodes of the patient. ,
determining a clustering category of the visited nodes based on the probability distribution of the visited nodes, and using the clustering category to construct a visited event sequence;
Input the consultation event sequence into the chronic nephropathy subtype mining model, sequentially fit nodes in the chronic nephropathy subtype mining model to obtain one event process, and determine which chronic nephropathy subtype by the event process. including that it is used to determine whether it belongs to
A chronic nephropathy subtype mining system based on self-supervised graph clustering.

Specifically, the consultation network construction unit:
The consultation set and the entity set are used to configure a node set;
used to construct an edge set based on node co-occurrence relationships in the node set;
The chronic nephropathy subtype mining system based on self-supervised graph clustering according to claim 1, further comprising: using the node set and the edge set to construct a consultation network.

Specifically, the embedded representation construction unit:
being used to construct an entity co-occurrence matrix by the entity set;
used to calculate and obtain an initial embedding representation of each entity node by a GloVe algorithm based on the entity co-occurrence matrix;
The initial embedding representation of the visited node is obtained by calculating the average value of the initial embedding representation of the entity node of all neighboring entity nodes, and the initial embedding representation of the visited node and the initial embedding representation of the entity node are The chronic nephropathy subtype mining system based on self-supervised graph clustering as claimed in claim 1, further comprising: being used to construct an initial embedding representation.

Specifically, the clustering network construction unit includes:
constructing an adjacency matrix according to the relationships between nodes in the visited network, and inputting the adjacency matrix and the initial embedding representation of the nodes to the clustering network model of visited nodes based on the self-supervised graph clustering to perform graph attention training; used to obtain an embedded representation of a node including an embedded representation of a consultation node and an embedded representation of an entity node;
used to reconstruct the consultation network using the embedded representation of the nodes and calculate a reconstruction error of the consultation network;
inputting the embedded representation of the entity node to a decoder of a neural network for training, and using the output of the final layer of the decoder as the reconstructed embedded representation of the entity node to calculate a reconstruction error of the entity node;
performing a softmax regression operation on the embedded representation of the visited node to obtain a probability distribution of the visited node, and using the method to calculate a clustering loss based on the probability distribution of the visited node;
used to construct an overall loss function of a clustering network model of the visited nodes based on the self-supervised graph clustering based on the reconstruction error of the visited network, the reconstruction error of the entity nodes, and the clustering loss; The chronic nephropathy subtype mining system based on self-supervised graph clustering as claimed in claim 1.