JP7479317B2

JP7479317B2 - Information processing device, method and program

Info

Publication number: JP7479317B2
Application number: JP2021041978A
Authority: JP
Inventors: 雅俊永田; 厚志清水; 翔平小巻
Original assignee: KDDI Corp; Iwate Medical University
Current assignee: KDDI Corp; Iwate Medical University
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2024-05-08
Anticipated expiration: 2041-03-16
Also published as: JP2022142021A

Description

本発明は、健康に関連する情報を処理する情報処理装置、方法及びプログラムに関する。 The present invention relates to an information processing device, method, and program for processing health-related information.

従来の技術として、健康診断の結果から健康余命の予測値を個人毎に算出して表示する技術（特許文献１）があり、健康状態を定量的に把握できるようにすることで、疾病予防や健康管理の意欲を高めることができるようにする効果が述べられている。また、近年の機械学習技術を応用し、健診データや体調情報から健康年齢を予測する手法（特許文献２）やそれに影響を与える項目を分析するプログラム（特許文献３）が提案されている。 Conventional technology includes a technology (Patent Document 1) that calculates and displays a predicted healthy life expectancy for each individual from the results of a health checkup, and claims that by allowing a quantitative grasp of health status, it can increase motivation to prevent disease and manage one's health. In addition, a method (Patent Document 2) that applies recent machine learning technology to predict health age from health checkup data and physical condition information, and a program (Patent Document 3) that analyzes the factors that affect it have been proposed.

非特許文献１および２には、遺伝子の働きを制御するエピゲノム（DNAメチル化）に関する情報を用いて、生物学的年齢（またはエピゲノム年齢）を推定できる技術が報告されている。 Non-patent documents 1 and 2 report technology that can estimate biological age (or epigenetic age) using information on the epigenome (DNA methylation) that controls gene function.

特開2007-287184号公報JP 2007-287184 A 特開2019-145057号公報JP 2019-145057 A 特開2020-004126号公報JP 2020-004126 A

Field, Adam E., Neil A. Robertson, Tina Wang, Aaron Havas, Trey Ideker, and Peter D. Adams. "DNA methylation clocks in aging: categories, causes, and consequences." Molecular cell 71, no. 6 (2018): 882-895.Field, Adam E., Neil A. Robertson, Tina Wang, Aaron Havas, Trey Ideker, and Peter D. Adams. "DNA methylation clocks in aging: categories, causes, and consequences." Molecular cell 71, no. 6 (2018): 882-895. Horvath, Steve, and Kenneth Raj. "DNA methylation-based biomarkers and the epigenetic clock theory of ageing." Nature Reviews Genetics 19, no. 6 (2018): 371.Horvath, Steve, and Kenneth Raj. "DNA methylation-based biomarkers and the epigenetic clock theory of aging." Nature Reviews Genetics 19, no. 6 (2018): 371.

しかしながら、これらのような健康プログラムでは血圧や血糖値などの健康診断結果を用いることを前提としているため、健診実施内容の違いや、問診票記入漏れや検査漏れ等による健診結果の欠落等測定できていない項目があった場合に対応が困難となる課題があった。また、エピゲノムに関する情報を用いて生物学的年齢を推定した場合でも、具体的にどのような生活習慣が影響を与えているかを把握することができないという課題があった。 However, these health programs are premised on using health checkup results such as blood pressure and blood sugar levels, and therefore have issues with issues such as differences in the content of health checkups, or missing health checkup results due to missed questionnaires or missed tests, making it difficult to respond when there are items that could not be measured. Also, even when biological age is estimated using information on the epigenome, there is an issue in that it is not possible to grasp the specific lifestyle habits that have an impact.

上記従来技術の課題に鑑み、本発明は、生体情報と生活習慣の情報とを効率的に活用して健康に関する情報を得ることのできる情報処理装置、方法及びプログラムを提供することを目的とする。 In view of the problems with the above-mentioned conventional technology, the present invention aims to provide an information processing device, method, and program that can efficiently utilize biometric information and lifestyle habit information to obtain health-related information.

上記目的を達成するため、本発明は情報処理装置であって、複数のサンプルの各々につき、実年齢と、生活習慣が反映される生体情報の各項目の評価値と、生活習慣の各項目の実績値と、を対応付けたデータベースを参照することで、各サンプルの前記評価値に生物学的年齢予測モデルを適用して生物学的年齢を算出する第１処理と、前記データベースを参照して、前記生活習慣の各項目による、前記生物学的年齢への影響度を算出する第２処理と、を実行することを第１の特徴とする。 To achieve the above object, the present invention is an information processing device that, as a first feature, executes a first process of calculating a biological age by applying a biological age prediction model to the evaluation value of each sample by referring to a database that associates the actual age, the evaluation value of each item of biometric information reflecting lifestyle habits, and the actual value of each item of lifestyle habits for each of a plurality of samples, and a second process of calculating the degree of influence of each item of lifestyle habits on the biological age by referring to the database.

さらに、当該情報処理装置が、前記データベースを参照して、実年齢の一定範囲ごとに前記生物学年齢が若いと判定されるサンプルから、当該実年齢の一定範囲における前記生体情報の評価値の模範値を算出する第３処理と、指定されるサンプルについて、当該サンプルにおける前記評価値と、当該サンプルにおける実年齢に対応する前記生体情報の評価値の模範値とを比較することにより、当該サンプルにおける生物学的年齢を若くするのに有効と判定される前記生活習慣の項目を出力する第４処理と、をさらに実行することを第２の特徴とする。また、当該情報処理装置に対応する方法及びプログラムであることを特徴とする。 The second feature is that the information processing device further executes a third process of referring to the database and calculating, from samples whose biological ages are determined to be young for each range of actual ages, a model value of the evaluation value of the bioinformation within a certain range of the actual ages, and a fourth process of outputting, for a specified sample, the lifestyle habit items that are determined to be effective in making the biological age of the sample younger by comparing the evaluation value of the sample with the model value of the evaluation value of the bioinformation corresponding to the actual age of the sample. The information processing device is also characterized by a method and program that are compatible with the information processing device.

前記第１の特徴によれば、生活習慣の実績値から各サンプルの生物学的年齢を算出し、この生物学的年齢と生体情報とを考慮することで、生活習慣の健康への影響を推定することができるため、生物学的年齢を介して生体情報と生活習慣の情報とを効率的に活用することで健康に関する情報を得ることができる。前記第２の特徴によればさらに、生物学的年齢を若くするのに有効と判定される生活習慣の項目を出力することが可能となる。 According to the first feature, the biological age of each sample is calculated from the actual values of the lifestyle habits, and the impact of lifestyle habits on health can be estimated by taking this biological age and biometric information into consideration, so that health-related information can be obtained by efficiently utilizing biometric information and lifestyle habit information via biological age. According to the second feature, it is further possible to output lifestyle habit items that are determined to be effective in making the biological age younger.

エピゲノムを説明するための模式図である。FIG. 1 is a schematic diagram for explaining the epigenome. エピゲノムを説明するための模式図である。FIG. 1 is a schematic diagram for explaining the epigenome. DNAメチル化率を説明するための模式図である。FIG. 1 is a schematic diagram for explaining DNA methylation rate. 一実施形態に係る情報処理装置の機能ブロック図である。FIG. 2 is a functional block diagram of an information processing device according to an embodiment. 一実施形態に係る情報処理装置の動作のフローチャートである。11 is a flowchart illustrating an operation of an information processing device according to an embodiment. 情報処理装置の各部で扱うデータの模式例を示す図である。2 is a diagram showing a schematic example of data handled by each unit of the information processing device; FIG. 構築されるデータベースの登録情報の模式例を示す図である。FIG. 10 is a diagram showing a schematic example of registered information in a database to be constructed. 予測モデルの係数例である。13 is an example of coefficients of a prediction model. データベースに対する模範値の記録例を示す図である。FIG. 13 is a diagram showing an example of recording model values for a database. 生活習慣影響解析部及び生活習慣アドバイス生成部の処理（ステップS3,4の処理）を説明するためのデータ例を示す図である。FIG. 13 is a diagram showing an example of data for explaining the processing of a lifestyle habit effect analysis unit and a lifestyle habit advice generation unit (processing of steps S3 and S4). 図１０等の例との共通例として、生活習慣影響度解析部の処理を説明するための図である。FIG. 11 is a diagram for explaining the process of a lifestyle habit influence level analysis unit, as a common example to the example of FIG. 10 etc. 一般的なコンピュータにおけるハードウェア構成を示す図である。FIG. 1 is a diagram illustrating a hardware configuration of a typical computer.

本発明の各実施形態を説明する前に、図１～図３を参照して医学上の既知事項であるエピゲノム（DNAメチル化）やDNAメチル化率について簡単に説明する。図１上段側に示すように、DNAの塩基配列（アデニン(A)、グアニン(G)、チミン(T)、シトシン(C)の４種類の有機塩基の配列）であるゲノムは、各個人の生まれ持った遺伝的な特徴であって不変な、先天的な特性として解析可能なものである。一方、図１下段側及び図２に示すように、DNAの働きを制御するエピゲノムは、環境・時間依存的な変化を示す可変な、後天的な特性として解析可能なものである。 Before describing each embodiment of the present invention, the epigenome (DNA methylation) and DNA methylation rate, which are known medical matters, will be briefly described with reference to Figures 1 to 3. As shown in the upper part of Figure 1, the genome, which is the base sequence of DNA (the sequence of four organic bases, adenine (A), guanine (G), thymine (T), and cytosine (C)), is a genetic characteristic that each individual is born with and can be analyzed as an immutable, innate characteristic. On the other hand, as shown in the lower part of Figure 1 and Figure 2, the epigenome, which controls the function of DNA, is a variable, acquired characteristic that shows changes dependent on the environment and time and can be analyzed.

各個人のほとんどの細胞においてDNA（ゲノム）は不変であるのに対し、発達段階・老化及び外部環境によって変化するエピゲノムは細胞の多様性を担い、細胞の運命を決めるものであることが知られている。例えば、iPS細胞（人工多能性幹細胞）はエピゲノムのリセットにより作製され、老化はエピゲノムの情報損失、変乱あるいは無秩序化に関連している。 While DNA (genome) remains unchanged in most cells of each individual, the epigenome, which changes depending on the developmental stage, aging, and the external environment, is known to be responsible for cellular diversity and determine the fate of cells. For example, iPS cells (induced pluripotent stem cells) are created by resetting the epigenome, and aging is associated with the loss, disturbance, or disorder of information in the epigenome.

DNAのシトシン(C)はメチル化されることがあり、遺伝子の制御（スイッチ）に関与する。特にCとGが連続する場合に多く、当該のメチル化されうる位置はCGサイトまたはCpGサイトと呼ばれる。ヒトでは2,600万程のCpGサイトがある。図２，３の模試例では各々のCpGサイトを、識別子としての添え字(i=1,2,…,n,…)を付してCpG_i(i=1,2,…,n,…)として示している。 Cytosine (C) in DNA can be methylated and is involved in gene control (switches). This is particularly common when C and G are consecutive, and the position that can be methylated is called a CG site or CpG site. There are about 26 million CpG sites in humans. In the mock examples in Figures 2 and 3, each CpG site is indicated as CpG _i (i=1,2,…,n,…), with a subscript (i=1,2,…,n,…) added as an identifier.

上記の通り、DNAメチル化等のエピゲノムは遺伝子スイッチの役割を果たすものであることが知られている。高メチル化状態はDNAが折りたたまれており遺伝子を読み取ることができない状態であるのに対し、逆に低メチル化状態はDNAにアクセスして遺伝子を読み取ることができる状態である。 As mentioned above, the epigenome, including DNA methylation, is known to act as a gene switch. In a hypermethylated state, DNA is folded and genes cannot be read, whereas in a hypomethylated state, DNA can be accessed and genes can be read.

以上のように、エピゲノム（DNAメチル化）は、各個人の身体環境をモニタリングした情報を含んでいる。図３の模式例に示されるように個別の細胞（及びこの細胞が構成する器官）ごとにDNAメチル化の状態は異なりうるが、多数のDNAにおける各サイトのメチル化率の情報を取得することが可能である。このメチル化率は各個人の身体環境が何らかの形で反映された数値となっている。図３では、左側に示されるように４個の細胞の４個のDNAにおいて、２つのCpGサイトCpG₁,CpG₂があって図示されるようにメチル化状態の有無が存在する場合に、各CpGサイトのメチル化率が右側に示される通り算出されること（CpG₁は4個中4個で100%、CpG₂は4個中1個で25%）が例として示されている。 As described above, the epigenome (DNA methylation) contains information on the monitoring of the physical environment of each individual. As shown in the schematic example of Figure 3, the state of DNA methylation may differ for each individual cell (and the organ that this cell constitutes), but it is possible to obtain information on the methylation rate of each site in a large number of DNAs. This methylation rate is a value that reflects the physical environment of each individual in some way. In Figure 3, as shown on the left, in four DNAs in four cells, there are two CpG sites, CpG ₁ and CpG ₂ , and when there is a methylation state as shown in the figure, the methylation rate of each CpG site is calculated as shown on the right (CpG ₁ is 100% for four out of four, and CpG ₂ is 25% for one out of four).

図４は、一実施形態に係る情報処理装置の機能ブロック図である。情報処理装置100は、エピゲノムデータセット入力部11、エピゲノムデータ解析部12、生活習慣データ入力部13、対応付け保存部14、データベース15、生物学的年齢算出部21、生活習慣影響解析部22及び生活習慣アドバイス生成部23を備える。 Figure 4 is a functional block diagram of an information processing device according to one embodiment. The information processing device 100 includes an epigenome dataset input unit 11, an epigenome data analysis unit 12, a lifestyle habit data input unit 13, a correspondence storage unit 14, a database 15, a biological age calculation unit 21, a lifestyle habit effect analysis unit 22, and a lifestyle habit advice generation unit 23.

なお、図４の機能ブロック間の矢印のうちの一部に付されているデータD1～D5は、当該機能ブロック間で入出力して扱われるデータであり、後述する図６，８，１０等で同一符号を付したものが当該データD1～D5の例となっている。 Note that data D1 to D5, which are attached to some of the arrows between the functional blocks in Figure 4, are data that are handled by inputting and outputting between the functional blocks, and data with the same reference numerals in Figures 6, 8, 10, etc., described below, are examples of the data D1 to D5.

図５は、一実施形態に係る情報処理装置100の動作のフローチャートであり、図示するように当該フローチャートはステップS1～S4で構成される。当該フローチャートの全体的な概要として、ステップS1がエピゲノムデータセット入力部11、エピゲノムデータ解析部12、生活習慣データ入力部13及び対応付け保存部14によって多数のサンプルについての情報をデータベース15に保存してデータベース15を構築する処理に該当し、ステップS2～S4が、このように構築されたデータベース15を生物学的年齢算出部21、生活習慣影響解析部22及び生活習慣アドバイス生成部23がこの順番で解析して、データベース15から生活習慣に関連する様々な情報を引き出す処理に該当する。 Figure 5 is a flowchart of the operation of the information processing device 100 according to one embodiment, and as shown in the figure, the flowchart is composed of steps S1 to S4. As an overall overview of the flowchart, step S1 corresponds to a process of constructing the database 15 by storing information about a large number of samples in the database 15 by the epigenome dataset input unit 11, the epigenome data analysis unit 12, the lifestyle habit data input unit 13, and the correspondence storage unit 14, and steps S2 to S4 correspond to a process of analyzing the database 15 thus constructed by the biological age calculation unit 21, the lifestyle habit effect analysis unit 22, and the lifestyle habit advice generation unit 23 in this order, and extracting various information related to lifestyle habits from the database 15.

以下、図５のフローチャートの各ステップを説明しながら、図４の情報処理装置100の各部の処理内容の詳細に関して説明する。図６は、情報処理装置100の各部で扱うデータの模式例をデータ例D1～D5として分けて示す図であり、以下の説明において適宜参照する。 The following describes the details of the processing content of each unit of the information processing device 100 in FIG. 4 while explaining each step of the flowchart in FIG. 5. FIG. 6 shows schematic examples of data handled by each unit of the information processing device 100, divided into data examples D1 to D5, and will be referred to as appropriate in the following description.

＜ステップS1＞ステップS1では、多数のサンプル（健康関連の情報（ゲノム関連情報及び生活習慣情報）の取得対象となる多数の対象者）についての情報を取得してデータベース15を構築してから、ステップS2へと進む。このステップS1は以下の手順11～14として構成することができる。なお、手順11,12と手順13とはその先後を問わず実施でき、並行して実施してもよい。これら手順11,12と手順13とを終えた後に手順14を実施することができる。 <Step S1> In step S1, information about a large number of samples (a large number of subjects from whom health-related information (genome-related information and lifestyle information) is to be obtained) is obtained to construct a database 15, and then the process proceeds to step S2. This step S1 can be configured as the following steps 11 to 14. Note that steps 11, 12 and step 13 can be performed in any order, or may be performed in parallel. Step 14 can be performed after steps 11, 12 and step 13 have been completed.

（手順11）エピゲノムデータデータセット入力部11が各サンプルのゲノム関連情報を取得し、これをエピゲノムデータ解析部12へと出力する。 (Step 11) The epigenome data dataset input unit 11 acquires genome-related information for each sample and outputs it to the epigenome data analysis unit 12.

ゲノム関連情報（図６にデータ例D1として模式例を示す）は、一般的な次世代シーケンサーの出力フォーマットであるFASTQ形式または参照配列にマッピング済みのBAM形式及びそれに準ずる形式で、ユーザID、年齢（暦年齢、生活年齢、実年齢であり、後述する「生物学的年齢」から区別される）、性別、人種を含むユーザー情報を含む。エピゲノムデータセット入力部11はこれらのデータを受け取り、フォーマットのチェックを行ったうえでエピゲノムデータ解析部12へ出力する。 The genome-related information (a schematic example is shown as data example D1 in Figure 6) is in FASTQ format, which is a typical output format of next-generation sequencers, or BAM format mapped to a reference sequence, or a format equivalent thereto, and includes user information including user ID, age (chronological age, chronological age, actual age, which is distinguished from "biological age" described below), sex, and race. The epigenome dataset input unit 11 receives this data, checks the format, and outputs it to the epigenome data analysis unit 12.

ここで、データ入力がFASTQ形式の場合は、冗長な配列情報やシークエンス結果の精度が低いデータを除外するため、データに含まれるクオリティー値に基づいたチェックを行うとともに、シーケンスリードのトリミングを行った上でヒトゲノム参照配列への位置づけによりBAMファイルとなる。 If the data input is in FASTQ format, redundant sequence information and data with low sequencing accuracy are removed by checking the quality values contained in the data, trimming the sequence reads, and then mapping them to the human genome reference sequence to create a BAM file.

（手順12）手順11で得た各サンプルのゲノム関連情報をエピゲノムデータ解析部12が解析することで、個々のエピゲノムの箇所における修飾レベル（図６にデータ例D2として、あるサンプルのサイトCpG₃₄（chr12：12番染色体上の3428031-3428032番塩基というアドレス）において算出された修飾レベルの値40%を示す）を算出して対応付け保存部14へと出力する。 (Step 12) The epigenome data analysis unit 12 analyzes the genome-related information of each sample obtained in step 11 to calculate the modification level at each epigenome site (data example D2 in Figure 6 shows a modification level value of 40% calculated at the site CpG ₃₄ (chr12: address of bases 3428031-3428032 on chromosome 12) of a certain sample). The modification level is then output to the correspondence storage unit 14.

ここで、領域ごとのエピゲノムによる修飾レベルは、統計的検定による有意確率とともに算出することができる。なお、これら手順11,12の一連の処理のためのツール類は一般に公開されているため、手動による調整を行なってもよい。これにより、各々の対象者のゲノム配列における、DNAメチル化のようなエピゲノムによるDNA修飾レベル（前述の図３でも模式例として示したもの）のデータとなる。 Here, the epigenetic modification level for each region can be calculated along with the probability of significance by statistical testing. The tools for the series of processes in steps 11 and 12 are publicly available, so manual adjustments may also be made. This results in data on the level of epigenetic DNA modification, such as DNA methylation, in the genome sequence of each subject (as shown as a schematic example in Figure 3 above).

（手順13）生活習慣データ入力部13が各サンプル（エピゲノムデータセット入力部11においてゲノム関連情報を受け付けたのと共通の各サンプル）についての生活習慣データを取得し、これを対応付け保存部14へと出力する。 (Step 13) The lifestyle data input unit 13 acquires lifestyle data for each sample (each sample that is the same as the sample for which genome-related information has been received by the epigenome dataset input unit 11) and outputs this to the correspondence storage unit 14.

この生活習慣データは、生活習慣に関連して予め定義されている各項目と、その実績値（定量値や定性値として与えられる実績値）として取得することができる。例えば、一般的な問診に基づいた生活習慣に関連するアンケートデータ（例えば前述した図１下段にも示される喫煙、食事、運動、精神的なストレス等に関するデータ）を含めて生活習慣データを取得してもよい。また例えば、スマートフォンやウェアラブルデバイスを介して取得される定量的なデータを含めて生活習慣データを取得してもよい。例えば、食事に関するデータであれば、1単位80kcalで換算される食品交換表で主食、果物などのカテゴリごとに単位数を取得することできる。運動に関するデータであれば、1週間等の一定期間での運動頻度、運動強度、運動時間などを取得することができる。 This lifestyle data can be obtained as each item predefined in relation to lifestyle and its actual value (actual value given as a quantitative value or qualitative value). For example, lifestyle data may be obtained including questionnaire data related to lifestyle based on a general medical interview (for example, data related to smoking, diet, exercise, mental stress, etc., as shown in the lower part of Figure 1 described above). Furthermore, for example, lifestyle data may be obtained including quantitative data obtained via a smartphone or wearable device. For example, in the case of data related to diet, the number of units for each category, such as staple food and fruit, can be obtained in a food exchange table where 1 unit is converted to 80 kcal. In the case of data related to exercise, the frequency of exercise, exercise intensity, exercise time, etc., over a certain period such as one week can be obtained.

なお、生活習慣の各項目について複数種類のフォーマットに対応するため、各項目とその数量単位等による実績値とが統一された形式で登録ができるよう、生活習慣データ入力部13でチェックを受けるようにしてよい。（すなわち、共通項目についてサンプル間で異なるフォーマットの実績値が取得される場合は、所定規則により共通フォーマットの実績値に変換したうえで、生活習慣データ入力部13から対応付け保存部14へと出力するようにすればよい。） In order to accommodate multiple types of formats for each lifestyle item, the lifestyle data input unit 13 may check to ensure that each item and its actual value in units of quantity, etc., can be registered in a unified format. (In other words, when actual values in different formats are obtained for common items between samples, they may be converted into actual values in a common format according to a predetermined rule, and then output from the lifestyle data input unit 13 to the correspondence storage unit 14.)

図６ではデータ例D3,D4として、このように食事や運動に関するデータを含んで取得される生活習慣データの元データ及び出力データの模式例が示されている。すなわち、元データD3は生活習慣データ入力部13への入力データの例であり、出力データD4は生活習慣データ入力部13から対応付け保存部14へと出力されるデータの例である。 In FIG. 6, data examples D3 and D4 are schematic examples of the original data and output data of the lifestyle habit data acquired in this way, including data related to diet and exercise. That is, original data D3 is an example of input data to lifestyle habit data input unit 13, and output data D4 is an example of data output from lifestyle habit data input unit 13 to correspondence storage unit 14.

（手順14）以上の手順12,13で得たDNA修飾レベルデータと生活習慣データとを、対応付け保存部14が対応するサンプルID及びそのユーザ情報（年齢、性別等のうち少なくとも年齢）と紐づけたうえで、データベース15へと保存することによってデータベース15を構築する。 (Step 14) The DNA modification level data and lifestyle habit data obtained in steps 12 and 13 above are linked by the correspondence storage unit 14 to the corresponding sample ID and user information (at least age among age, sex, etc.), and then stored in the database 15, thereby constructing the database 15.

図７に、ステップS1で以上のようにして構築されたデータベース15の登録情報の模式例を表形式で示す。図７では、各サンプルのユーザについて年齢（Age）と、DNA修飾レベル（エピゲノム修飾率）データとして、DNAメチル化部位（各々のCpG領域CpG_i(i=1,2,…)）におけるメチル化率と、生活習慣（食事に関する各項目Diet_i(i=1,2,…)や運動に関する各項目Exer_i(i=1,2,…)）に関するデータと、がユーザID（User001,002,003等）ごとに蓄積された表（データベース15での登録情報の内容を表す表）を例として、部分P1に示している。 Fig. 7 shows a schematic example of registered information in the database 15 constructed in step S1 in the above-mentioned manner in the form of a table. In Fig. 7, part P1 shows an example of a table (a table showing the contents of registered information in the database 15) in which the age of each sample user and DNA modification level (epigenetic modification rate) data, such as the methylation rate at the DNA methylation site (each CpG region CpG _i (i=1, 2, ...)) and data on lifestyle habits (each diet item Diet_i (i=1, 2, ...) and each exercise item Exer_i (i=1, 2, ...)) are accumulated for each user ID (User001, 002, 003, etc.).

なお、図７ではさらに部分P2として、このデータベース15が、生物学的年齢予測モデルによって算出された生物学的年齢（B_Age）を受け取り、ユーザIDと紐づけて格納することが示されているが、この部分P2は後述するステップS2においてデータベース15にさらに追加される情報を表すものである。（図７のうち部分P1が、ステップS1において構築されるデータベース15の情報を表している。） In addition, FIG. 7 also shows that database 15 receives biological age (B_Age) calculated by the biological age prediction model and stores it in association with the user ID as part P2, which represents information that is further added to database 15 in step S2, which will be described later. (Part P1 in FIG. 7 represents the information of database 15 that is constructed in step S1.)

＜ステップS2＞ステップS2では、生物学的年齢算出部21が以下の手順21,22,23を実施してから、ステップS3へと進む。 <Step S2> In step S2, the biological age calculation unit 21 performs the following steps 21, 22, and 23, and then proceeds to step S3.

（手順21）ステップS1で構築されたデータベース15の登録情報を参照し、登録されている各サンプルについて、指定される生物学的年齢予測モデルを適用することにより生物学的年齢を算出する。 (Step 21) By referring to the registered information in the database 15 constructed in step S1, the biological age of each registered sample is calculated by applying the specified biological age prediction model.

生物学的年齢予測モデルでは、エピゲノムの特定箇所の修飾率を分析することで、暦年齢よりも生体組織の機能的な能力及び老化の度合いを表す指標として、生物学的年齢を計算することができる。生物学的年齢を計算するためのモデルは、一般に、様々な老化指標から得られる生物学的年齢を目的変数、エピゲノム修飾率データ（一般に、被修飾サイトの修飾レベルのデータ）を説明変数として、多重線形回帰モデルや過学習を抑えるために正則化項を加えた回帰モデル（Elastic Net等）によって学習される。ここで、エピゲノム修飾率データはDNAメチル化のレベルであってもよいし、DNAヒドロキシメチル化であってもよい。（なお、ステップS1でのエピゲノムデータ解析部12においても同様に、DNAメチル化率に限らず任意種類のエピゲノム修飾率を算出することができる。）例えば、エピゲノム修飾率データとしてDNAメチル化率を用いる場合、年齢（生物学的年齢）に相関する複数のDNAメチル化部位（CpG領域）を用いて、以下の式(1)のようなモデルにより生物学的年齢(biological age)Fを計算できる。 In the biological age prediction model, the modification rate of a specific site of the epigenome is analyzed, and biological age can be calculated as an index that represents the functional ability and degree of aging of biological tissues rather than chronological age. A model for calculating biological age is generally trained by a multiple linear regression model or a regression model (such as Elastic Net) with a regularization term added to suppress overlearning, using biological age obtained from various aging indices as the objective variable and epigenetic modification rate data (generally data on the modification level of modified sites) as the explanatory variable. Here, the epigenetic modification rate data may be the level of DNA methylation or DNA hydroxymethylation. (Note that the epigenetic data analysis unit 12 in step S1 can similarly calculate any type of epigenetic modification rate, not limited to DNA methylation rate.) For example, when DNA methylation rate is used as epigenetic modification rate data, biological age F can be calculated by a model such as the following formula (1) using multiple DNA methylation sites (CpG regions) that are correlated with age (biological age).

式(1)において、b_k(k=0,1,…n)は当該予測モデルにおける係数であり、CpG_k(k=1,2,…,n)は当該予測モデルにおけるk番目のCpG領域のメチル化率である。なお、当該予測モデルにおいてk=1,2,…,nの全てのCpG領域ではなく、そのうちの一部のみが生物学的年齢に相関するものである場合は、相関しないCpG領域のメチル化率CpG_kに対する係数b_kがb_k=0であるものとすればよい。なお、より一般に任意の予想モデルを用いる場合は、メチル化率を被修飾サイトの修飾レベルとすればよい。 In formula (1), b _k (k=0,1,...n) is a coefficient in the prediction model, and CpG _k (k=1,2,...,n) is the methylation rate of the kth CpG region in the prediction model. If only some of the CpG regions k=1,2,...,n, rather than all of them, are correlated with biological age in the prediction model, the coefficient b _k for the methylation rate CpG _k of the uncorrelated CpG region may be set to b _k =0. If an arbitrary prediction model is used more generally, the methylation rate may be set to the modification level of the modified site.

ここで例として、CpG領域の2、4、16、55番の領域が意味を持つモデルとして、対象のCpG領域とその係数を指定したとすると、上記の一般的な式(1)は具体的に以下の式(2)の通りとなり、各係数b_k(k=0,2,4,16)が例えば図８（及び図６）のデータ例D5のように与えられるものとなる。（なお、図６のデータ例D5では予測モデルの係数に加えて、模式例として予測モデルによる生物学的年齢の計算例も示しているが、この計算を行うのは生物学的年齢算出部21である。） As an example, if the target CpG region and its coefficients are specified as a model in which CpG regions 2, 4, 16, and 55 are meaningful, the above general formula (1) will specifically be expressed as the following formula (2), and each coefficient b _k (k=0, 2, 4, 16) will be given, for example, as shown in data example D5 in Figure 8 (and Figure 6). (Note that data example D5 in Figure 6 shows, in addition to the coefficients of the prediction model, a schematic example of the calculation of biological age using the prediction model, but this calculation is performed by the biological age calculation unit 21.)

このようなモデルにより、どのCpG領域が生物学的年齢に影響するのかが判明する。エピゲノム情報を用いた生物学的年齢予測モデルはこれまでに様々なモデルが提案されており、任意のモデルを予め指定するようにしてよい。 Such models will reveal which CpG regions affect biological age. Various models have been proposed to predict biological age using epigenetic information, and any model can be specified in advance.

（手順22）上記の手順21によりデータベース15の各サンプルについて、そのIDと紐づける形で以下のように、当該サンプルの（実）年齢n=n(ID)及び生物学的年齢F=F(ID)の情報が得られる。
（ID, 年齢n, 生物学的年齢F） (Step 22) By the above step 21, for each sample in the database 15, information on the (real) age n=n(ID) and biological age F=F(ID) of the sample is obtained by linking it with its ID as follows:
(ID, age n, biological age F)

そこで、手順22では各サンプルにおけるこの情報(ID, n, F)を利用して次に説明する「BYグループ」を求め、このBYグループ毎のエピゲノム修飾率の代表値を算出する。 Therefore, in step 22, this information (ID, n, F) for each sample is used to determine the "BY group" described below, and the representative epigenetic modification rate for each BY group is calculated.

すなわち、実年齢プラスマイナスa（例えばa=1）歳の範囲で、生物学的年齢が暦年齢より低い方からt%の集団を生物学的に若いグループとしてBYグループと呼ぶこととし、例えば実年齢45歳プラスマイナスaをBY@45などと表記することとする。ここで、tの値は予め指定し、例えば年齢nが正規分布に従う場合に平均値μより標準偏差σだけ低い値（μ-σ）以下の集団となる確率Pr (n≦μ-σ)などでもよい。 In other words, within the range of actual age plus or minus a (e.g. a=1) years, the group whose biological age is lower than chronological age and is the top t% will be called the BY group as the biologically young group, and for example, actual age of 45 years plus or minus a will be expressed as BY@45. Here, the value of t is specified in advance, and can be, for example, the probability Pr(n≦μ-σ) of being a group that is equal to or less than the mean value μ and the standard deviation σ (μ-σ) when age n follows a normal distribution.

当該表記の定義より、例えばBYグループである「BY@45」は以下の３つの条件を満たすサンプル集合となる。
BY@45={ID|「45-a≦n(ID)≦45+a」且つ「F(ID)<n(ID)」
且つ「CDF(F(ID))≦t%」} According to the definition of this notation, for example, the BY group “BY@45” is a sample set that satisfies the following three conditions.
BY@45={ID|"45-a≦n(ID)≦45+a" and "F(ID)<n(ID)"
And "CDF(F(ID))≦t%"}

上記において第１条件「45-a≦n(ID)≦45+a」は実年齢n(ID)が45±a歳の範囲にあることであり、第２条件「F(ID)<n(ID)」は生物学的年齢F(ID)の方が実年齢n(ID)よりも小さい（若い）ことである。（ここで、生物学的年齢に関して、その値（目盛り）は、実年齢の値と比較して若ければ健康的であることを意味するように（モデル自体が）予め定義されている。）なお、実年齢、生物学的年齢共に整数で与えてもよいし、実数で与えてもよい。 In the above, the first condition "45-a≦n(ID)≦45+a" means that the actual age n(ID) is in the range of 45±a years, and the second condition "F(ID)<n(ID)" means that the biological age F(ID) is smaller (younger) than the actual age n(ID). (Here, the value (scale) of biological age is predefined (in the model itself) so that being younger than the actual age value means being healthy.) Note that both actual age and biological age may be given as integers or real numbers.

また、第３条件「CDF(F(ID))≦t%」は、「実年齢が45±a歳の範囲」であり且つ「F<n」である（生物学的年齢が実年齢より若い）ような、当該実年齢付近での「健康的グループ」において、F（生物学的年齢）の値の累積分布関数CDFの値がt%以下となることである。（すなわち、当該IDのサンプルの生物学的年齢F(ID)が当該「健康的グループ」内での若い側の上位何%であるかの値がCDF(F(ID))である。） The third condition, "CDF(F(ID))≦t%", is that in a "healthy group" around the actual age where "actual age is in the range of 45±a years" and "F<n" (biological age is younger than actual age), the value of the cumulative distribution function CDF of the value of F (biological age) is less than t%. (In other words, CDF(F(ID)) is the top percentile of the youngest biological age F(ID) of the sample with that ID within the "healthy group".)

すなわち、BYグループである「BY@45」は、実年齢が45歳付近（45歳と同一である、又は、45歳と近いと判定される）にあるサンプル集合（第１条件を満たすサンプル集合）から、生物学的年齢を考慮して模範的な健康状態にあると判定される部分集合を抽出した一例であり、当該模範的な健康状態に該当するものを抽出する条件として、第２条件及び第３条件の両方を課したものである。変形例として、第２条件又は第３条件のいずれか一方のみを課したものを、BYグループである「BY@45」として求めるようにしてもよい。 In other words, the BY group "BY@45" is an example of a subset that is determined to be in exemplary health status taking biological age into consideration, extracted from a sample set (a sample set that satisfies the first condition) whose actual age is around 45 years old (determined to be the same as or close to 45 years old), and both the second and third conditions are imposed as conditions for extracting those that fall into the exemplary health status. As a variant, it is also possible to impose only either the second or third condition and obtain the BY group "BY@45".

手順22ではこのように定義されるBYグループ「BY@n」を、想定される実年齢nの様々な値（例えばn=44,45,46等）について求めたうえで、BYグループ「BY@n」ごとのエピゲノム修飾率の代表値を算出する。代表値としては例えば平均値や最頻値など、任意の統計的指標を利用してよい。ここで、BYグループ「BY@n」は上記の通り、実年齢n付近のサンプルにおいて模範的な健康状態にあると判定されたグループであるため、当該代表値として算出されたエピゲノム修飾率は、実年齢n付近における模範的な健康状態が反映されたものと考えることができる。 In step 22, the BY group "BY@n" defined in this way is determined for various values of assumed actual age n (e.g., n=44, 45, 46, etc.), and a representative value of the epigenetic modification rate for each BY group "BY@n" is calculated. Any statistical indicator, such as the average or mode, may be used as the representative value. Here, as described above, the BY group "BY@n" is a group determined to be in exemplary health status among samples around actual age n, and therefore the epigenetic modification rate calculated as the representative value can be considered to reflect exemplary health status around actual age n.

（手順23）上記の手順21で各サンプルについて算出した生物学的年齢と、手順22で各BYグループ「BY@n」について算出したエピゲノム修飾率（代表値）とを、データベース15に対して記録して保存する。 (Step 23) The biological age calculated for each sample in step 21 above and the epigenetic modification rate (representative value) calculated for each BY group "BY@n" in step 22 are recorded and stored in database 15.

ここで、生物学的年齢については、ステップS1で構築した(ID, 実年齢等の個人情報, エピゲノム修飾率, 生活習慣)の情報にさらに紐づける形で、(ID, 実年齢等の個人情報, エピゲノム修飾率, 生活習慣, 生物学的年齢)としてデータベース15に記録すればよい。当該記録した例が前述の図７であり、部分P1（実年齢(Age)、エピゲノム修飾率、生活習慣）の情報に対してさらに部分P2の生物学的年齢(B_Age)の情報が記録されている。 Here, biological age can be recorded in database 15 as (ID, personal information such as actual age, epigenetic modification rate, lifestyle, biological age) in a form further linked to the information (ID, personal information such as actual age, epigenetic modification rate, lifestyle) constructed in step S1. An example of such recording is shown in Figure 7 mentioned above, where information on biological age (B_Age) in part P2 is further recorded in addition to information on part P1 (actual age (Age), epigenetic modification rate, lifestyle).

また、各BYグループ「BY@n」について算出したエピゲノム修飾率（代表値）も別途に、データベース15にさらに記録するようにすればよい。図９は、当該記録の例であり、前述の式(2)の生物学的年齢モデル（特定のCpG領域のメチル化率が生物学的年齢に相関するモデル）を用いる場合の、実年齢n=44,45,46(歳)等の各BYグループ「BY@n」について記録する例である。 The epigenetic modification rate (representative value) calculated for each BY group "BY@n" may also be separately recorded in the database 15. Figure 9 shows an example of such a record, which is an example of recording for each BY group "BY@n" with actual ages n=44, 45, 46 (years), etc., when using the biological age model of the above-mentioned formula (2) (a model in which the methylation rate of a specific CpG region correlates with biological age).

＜ステップS3＞ステップS3では、データベース15における全サンプルの(ID, 実年齢, DNA修飾レベル, 生活習慣, 生物学的年齢)の情報を参照することにより、生活習慣影響度解析部22が、生活習慣の各項目（図７に例示したような食事に関する各項目Diet_i(i=1,2,…)や運動に関する各項目Exer_i(i=1,2,…)等の各項目）のうち、生物学的年齢算出部21で用いたモデルで相関があるとされている項目について、生物学的年齢への影響度を算出し、この影響度を生活習慣アドバイス生成部23へと出力してから、ステップS4へと進む。 <Step S3> In step S3, by referring to the information (ID, actual age, DNA modification level, lifestyle, biological age) of all samples in the database 15, the lifestyle influence analysis unit 22 calculates the influence on biological age for each lifestyle item (each item such as each diet item Diet_i (i=1, 2, ...) and each exercise item Exer_i (i=1, 2, ...) as illustrated in Figure 7) that is considered to be correlated in the model used in the biological age calculation unit 21, and outputs this influence to the lifestyle advice generation unit 23 before proceeding to step S4.

すなわち、運動や食事などの生活習慣は複数のエピゲノム修飾に影響を与えるが、その影響を受ける領域は生活習慣に関する項目およびその頻度や強度によって異なる。例えば、エピゲノム修飾としてDNAメチル化を考えた場合、模式例として前述した図２に示されるように、生活習慣の項目が与える影響はCpG領域によって異なり、生物学的年齢予測モデルに使われるCpG領域の一部も影響を受ける。 In other words, lifestyle habits such as exercise and diet affect multiple epigenetic modifications, but the affected regions vary depending on the lifestyle item and its frequency and intensity. For example, when considering DNA methylation as an epigenetic modification, as shown in Figure 2 above as a schematic example, the impact of lifestyle items varies depending on the CpG region, and some of the CpG regions used in biological age prediction models are also affected.

生活習慣影響度解析部22では、この影響度を、データベース15内の全サンプルのエピゲノム修飾率（生物学的年齢予測モデルで用いられているエピゲノム修飾率）と生活習慣データを用いて、既存の統計手法により生活習慣項目間の影響を除いた偏相関の値r_Bkとして、以下の式(3)によって算出することができる。偏相関として算出することで、注目するCpG_kに対して異なる複数の生活習慣項目が影響している場合であっても、交絡してしまう影響を排除することができる。（なお、独立変数間で相関が強すぎる場合、多重共線性の問題で偏相関係数の推定値が不安定となるケースもありうる。例えば、生活習慣項目として極めて近い種類（例えば、炭水化物摂取量と摂取カロリー）が存在するケースが該当しうる。このようなケースにおける不安定性を防ぐため、極めて近い種類の複数項目を同じ１種類のみの生活習慣項目として扱うか、いずれか１項目のみを用いるか等の設定を予め設けて、生活習慣データ入力部13でのデータ入力を受け付けるようにしてもよい。） The lifestyle habit influence analysis unit 22 can calculate this influence using the epigenetic modification rate (epigenetic modification rate used in the biological age prediction model) of all samples in the database 15 and the lifestyle habit data, as the partial correlation value r _Bk obtained by removing the influence between lifestyle habit items using existing statistical methods, using the following formula (3). By calculating as partial correlation, even if a plurality of different lifestyle habit items affect the CpG _k of interest, the confounding influence can be eliminated. (Note that if the correlation between independent variables is too strong, there may be cases where the estimated value of the partial correlation coefficient becomes unstable due to the problem of multicollinearity. For example, this may be the case when there are very similar types of lifestyle habit items (e.g., carbohydrate intake and calorie intake). To prevent instability in such cases, a setting may be set in advance to treat a plurality of very similar types of items as only one type of lifestyle habit item, or to use only one item, and data input may be accepted at the lifestyle habit data input unit 13.)

ここで、S_ijは生活習慣項目Bと各エピゲノム修飾率の分散共分散行列Sのi行j列の余因子（i行j列の要素を取り除いて作った行列式に（-1）^i+jを掛けたもの）であり、データベース15に記録されている全サンプルの（生活習慣項目, エピゲノム修飾率）の情報を用いて計算することができる。この余因子Sijを計算するための分散共分散行列Sは以下の式(3A)の通りであり、その要素を表す添え字に関して、xは当該偏相関r_Bkの計算対象として注目するCpG_kのDNAメチル化率であり、yは当該計算対象として注目する生活習慣の項目の１つ（偏相関r_Bkに対応する生活習慣項目B）であり、z（z=z₁,z₂,…,z_p）は当該項目B以外の生活習慣項目p個のそれぞれである。すなわち、式(3)の分子の余因子S₁₂は、行列Sの1行2列の要素（S_xyと同じ行または列にある要素）を取り除いた行列式から算出されるものであり、同様に、分母の余因子S₁₁,S₂₂も、行列Sの1行1列の要素（S_xxと同じ行または列にある要素）、2行2列の要素（S_yyと同じ行または列にある要素）をそれぞれ取り除いた行列式から算出されるものである。（換言すれば、式(3A)の行列Sの余因子行列S~のi行j列の成分が式(3)で用いるS_ijである。なお、行列Sが対称のため、その余因子行列S~も対称であって転置の有無を問わない。） Here, S _ij is the cofactor in row i and column j of the variance-covariance matrix S of lifestyle item B and each epigenetic modification rate (the determinant created by removing the element in row i and column j is multiplied by (-1) ^i+j ) and can be calculated using the information of (lifestyle item, epigenetic modification rate) of all samples recorded in the database 15. The variance-covariance matrix S for calculating this cofactor Sij is as shown in the following formula (3A), and with regard to the subscripts representing its elements, x is the DNA methylation rate of CpG _k of interest as the target of calculation of the partial correlation r _Bk , y is one of the lifestyle items of interest as the target of calculation (lifestyle item B corresponding to the partial correlation r _Bk ), and z (z=z ₁ , z ₂ , ..., z _p ) is each of the p lifestyle items other than the item B. That is, the cofactor _S12 of the numerator in formula (3) is calculated from a determinant obtained by removing the element in row 1, column 2 of the matrix S (the element in the same row or column as _Sxy ), and similarly, the cofactors _S11 and _S22 of the denominator are calculated from determinants obtained by removing the element in row 1, column 1 of the matrix S (the element in the same row or column as _Sxx ) and the element in row 2, column 2 of the matrix S (the element in the same row or column as _Syy ), respectively. (In other words, the component in row i and column j of the cofactor matrix S~ of the matrix S in formula (3A) is _Sij used in formula (3). Note that since the matrix S is symmetric, the cofactor matrix S~ is also symmetric and it does not matter whether it is transposed or not.)

図１０ではデータ例D6として、このように計算した偏相関r_Bkの例が、式(2)や図８でも示したデータ例D5の生物学年齢予測モデルに対応する例（すなわち、エピゲノム修飾のうちDNAメチル化としてCpG領域の2、4、16、55番を対象とした例）として示されている。なお、図１０は、生活習慣影響解析部22及び生活習慣アドバイス生成部23の処理（ステップS3,4の処理）を説明するためのデータ例を示す図であり、以下においても適宜参照する。 In Fig. 10, an example of the partial correlation _rBk calculated in this manner is shown as data example D6 as an example corresponding to the biological age prediction model of formula (2) and data example D5 also shown in Fig. 8 (i.e., an example targeting CpG regions 2, 4, 16, and 55 as DNA methylation among epigenetic modifications). Note that Fig. 10 is a diagram showing example data for explaining the processing (processing of steps S3 and S4) of the lifestyle habit effect analysis unit 22 and the lifestyle habit advice generation unit 23, and will be referred to as appropriate below.

影響度r_Bkの値は、偏相関として計算する以外にも、機械学習等の任意の手法によって計算してもよい。 The value of the influence r _Bk may be calculated by any method such as machine learning, other than calculating as partial correlation.

＜ステップS4＞ステップS4では、アドバイス情報を生成する対象者として指定されるサンプル（データベース15に記録されているいずれかのサンプル）について、当該対象者のサンプルにおけるエピゲノム修飾率及び実年齢を、データベース15に記録されている模範値としてのBYグループ「BY@n」毎のエピゲノム修飾率に対して照合し、生活習慣影響解析部22から得られる影響度r_Bkも考慮することにより、当該対象者の生物学年齢を若くするのに有効と判定されるような生活習慣の項目を当該対象者に対する健康増進のためのアドバイス情報として生活習慣アドバイス生成部23が出力して、以上の図５のフローが終了する。 <Step S4> In step S4, for a sample designated as the subject for which advice information is to be generated (any of the samples recorded in database 15), the epigenetic modification rate and actual age of the subject's sample are compared with the epigenetic modification rate for each BY group "BY@n" as the model value recorded in database 15, and the lifestyle advice generation unit 23 outputs lifestyle items that are determined to be effective in making the subject's biological age younger as advice _information for promoting the health of the subject, by also taking into account the influence degree r Bk obtained from the lifestyle impact analysis unit 22, and then the flow of Figure 5 is terminated.

生活習慣影響度解析部22では具体的に、以下の手順31,32,33,34でアドバイス情報を生成することができる。 Specifically, the lifestyle habit impact analysis unit 22 can generate advice information in the following steps 31, 32, 33, and 34.

（手順31）偏相関係数等として得られた影響度r_Bkに対し、対応するエピゲノム領域の生物学的年齢予測モデルにおける係数を乗じることで、各生活習慣項目がエピゲノム領域を通じて生物学的年齢に与える影響度を推定する。 (Step 31) The influence degree r _Bk obtained as a partial correlation coefficient or the like is multiplied by the coefficient in the biological age prediction model of the corresponding epigenetic region to estimate the influence degree of each lifestyle item on biological age through the epigenetic region.

図１０はこの手順31の例を示しており、データD6に示すようなエピゲノム修飾率の項目kを行方向とし、生活習慣の項目Bを列方向とすることで影響度r_Bkがk行B列の要素値となるような行列に対して、データD5に示すような生物学的年齢予測モデルの係数が要素kとなるような列ベクトルb_kと、をアダマール積として乗算する（影響度r_Bkの行列のk行目の要素をb_k倍する）ことにより、データD7に示すような影響度「r_Bk*b_k」の行列を得る。 Figure 10 shows an example of this step 31, in which the influence _rBk is an element value of k rows and B columns by arranging the epigenetic modification rate item k as shown in data D6 and the lifestyle habit item B as column direction, and multiplying this by a column vector _bk such that the coefficient of the biological age prediction model as shown in data D5 is element k (the element in the kth row of the influence _rBk matrix is multiplied by _bk ) as a Hadamard product to obtain a matrix of influence " _rBk * _bk " as shown in data D7.

（手順32）さらに、対象者の実年齢（m歳とする）に対応するBYグループ「BY@m」のエピゲノム修飾率（すなわち、m歳付近での模範値）との差に注目し、各生活習慣項目がエピゲノム修飾率に与える影響を差分ウェイト「Dw_k」（差分の絶対値）として算出する。 (Step 32) Furthermore, the difference between the epigenetic modification rate of the BY group “BY@m” corresponding to the subject’s actual age (say, m years old) (i.e., the model value at around m years old) is noted, and the influence of each lifestyle item on the epigenetic modification rate is calculated as a differential weight “Dw _k ” (absolute value of the difference).

図１１は、図１０等の例との共通例として、生活習慣影響度解析部22の処理を説明するための図であり、データD8が手順32の例を示している。例えば、対象者User011が実年齢45歳だとして、図９のBY@45と比較した生物学的年齢のCpG領域のメチル化率が図１１のデータ例D8のようであったとする。すると、各CpG領域のメチル化率は、（対象者User011の今後における生活習慣の改善によって）BY@45のメチル化率（模範値）までは近づけられると考えられ、その差の絶対値を差分ウェイト「Dw_k」として算出する。 Fig. 11 is a diagram for explaining the processing of the lifestyle habit influence analysis unit 22 as a common example with the examples of Fig. 10 etc., and data D8 shows an example of procedure 32. For example, suppose that subject User011 has an actual age of 45, and the methylation rate of the CpG region of the biological age compared with BY@45 in Fig. 9 is as shown in data example D8 in Fig. 11. Then, it is considered that the methylation rate of each CpG region can be brought close to the methylation rate (model value) of BY@45 (by improving the lifestyle of subject User011 in the future), and the absolute value of the difference is calculated as the difference weight "Dw _k ".

この差分ウェイトは対象者によって異なる値となり、例えば図１１でのCpG2の差分ウェイトは、図９（及び図１１）の模範値0.9との差分としてUser011（メチル化率0.6）では0.3（＝|0.9-0.6|）であるが、同じ実年齢45歳のUser012（メチル化率0.7）では0.2（＝|0.9-0.7|）となっている。 This differential weighting will be a different value depending on the subject. For example, the differential weighting for CpG2 in Figure 11 is 0.3 (= |0.9-0.6|) for User011 (methylation rate 0.6), as the difference from the model value of 0.9 in Figure 9 (and Figure 11), but is 0.2 (= |0.9-0.7|) for User012 (methylation rate 0.7), who is the same actual age of 45.

（手順33）以上の手順31で求めた影響度「r_Bk*b_k」と手順32で求めた差分ウェイト「Dw_k」とを用いて、これらの積「r_Bk*b_k*Dw_k」について以下の式(4)のように、生活習慣の項目Bごとにエピゲノム修飾率の項目kによる総和「Σ_k」を求めることにより、当該生活習慣Bの影響度E_Bを算出する。ここで、差分ウェイトの大きいエピゲノム領域ほど生活習慣による改善が見込める部分であるため、このように重みづけ和として影響度E_Bを算出している。Kは当該予測モデルにおけるエピゲノム修飾率の項目kの総数である。 (Step 33) Using the influence degree "r _Bk *b _k " calculated in step 31 above and the differential weight "Dw _k " calculated in step 32, the influence degree EB of the lifestyle habit B is calculated by calculating the sum "Σ _k " of the epigenetic modification rate item k for each lifestyle item B for the product "r _Bk *b _k *Dw _k " as shown in the following formula (4). Here, the greater the differential weight of the epigenetic region, the more likely it is that improvement through lifestyle habits will be achieved, so the influence _degree _EB is calculated as a weighted sum in this way. K is the total number of epigenetic modification rate items k in the prediction model.

図１１のデータ例D9は、同図のデータ例D8の対象者User011について、この式(4)で生活習慣の項目B（=Diet1, Diet2,Diet3, Exer1,…）ごとに当該モデルでのエピゲノム修飾率の項目k（=CpG₂, CpG₄, CpG₁₆, CpG₅₅）の総和として影響度E_Bを算出した例である。このデータ例D9において例えば対象者User011の生活習慣項目Diet1については以下のように影響度が算出されている。
CpG₂の項目=(0.1)*(-5.0)*|0.9-0.6|=-0.15
CpG₄の項目=(-0.1)*(3.0)*|0.1-0.4|=-0.09
CpG₁₆の項目=(0.0)*(2.0)*|0.0-0.0|=0.0
CpG₅₅の項目=(0.2)*(-4.5)*|0.5-0.4|=-0.09
以上の総和としての対象者User011の生活習慣項目Diet1の影響度=-0.33 Data example D9 in Fig. 11 is an example of calculating influence EB for subject User011 of data example D8 in the same figure as the sum of epigenetic modification rate item k (= _CpG2 , _CpG4 , _CpG16 , _CpG55 ) in the model for each lifestyle item _B (=Diet1, Diet2, Diet3, Exer1, ...) using formula (4). In this data example D9, for example, the influence for lifestyle item Diet1 of subject User011 is calculated as follows:
CpG ₂ item = (0.1) * (-5.0) * |0.9-0.6| = -0.15
CpG ₄ item = (-0.1) * (3.0) * |0.1-0.4| = -0.09
CpG ₁₆ item = (0.0) * (2.0) * |0.0-0.0| = 0.0
CpG ₅₅ entries = (0.2) * (-4.5) * |0.5-0.4| = -0.09
The total effect of the lifestyle habit item Diet1 of the subject User011 is -0.33.

（手順34）手順33で求めた影響度E_Bの値がマイナス側へと大きいと判定されるような生活習慣Bの項目を、当該対象者の生物学的年齢を若くするのに寄与するもの、すなわち、健康状態の改善に寄与するものとして、アドバイス情報を出力する。なお、アドバイス情報の出力に関して、影響度E_Bの値のマイナス側への大きさの程度と、生活習慣の当該項目Bと、に応じた所定様式のテキスト等を予め設けておき、このテキスト情報等の形でアドバイス情報を出力してもよい。 (Step 34) The lifestyle habit B item for which the value of the influence degree _EB calculated in step 33 is determined to be large on the negative side is output as advice information that contributes to making the subject's biological age younger, i.e., contributes to improving the health condition. Note that, in regard to the output of the advice information, text or the like in a predetermined format corresponding to the degree of the negative value of the influence degree _EB and the lifestyle habit item B may be prepared in advance, and the advice information may be output in the form of this text information or the like.

図１１の対象者User011についてのデータ例D9であれば、Diet2の値の総和が-0.915で、最も効果的であることがわかる。この影響度E_Bにおける値は生物学的年齢への加減を表すので、マイナスの場合は老化を抑える方向に、プラスの場合は老化を加速する方向を意味している。ここで、生活習慣の項目は正の方向を前提としているので、上記値がプラスであればその生活習慣を抑制する方向に、マイナスであれば増加させる方向と考える。つまり、この例では、Diet2を増加させる生活習慣がこの対象者には最も有効ということになる。 In the case of data example D9 for subject User011 in FIG. 11, the sum of the values for Diet2 is -0.915, which indicates that it is the most effective. The value of this influence E _B represents the increase or decrease in biological age, so a negative value means that it slows down aging, and a positive value means that it accelerates aging. Here, since the lifestyle habit items are assumed to be in the positive direction, a positive value means that it slows down the lifestyle habit, and a negative value means that it increases it. In other words, in this example, a lifestyle habit that increases Diet2 is the most effective for this subject.

以上、本発明の実施形態によれば、従来の健康プログラムと比較してより根本的なゲノムレベルの情報を用いて計算された生物学的年齢を用いているため、健康状態の維持・向上のための行動変容に極めて高い効果があると期待される。また、一度十分なユーザのサンプル数によるデータ登録が行われれば、生活習慣項目パラメータが精度よく定まるため、その後のユーザのサンプルにおいては、生活習慣に関するデータは不要となる。つまり、ゲノム関連情報（ゲノム修飾率のように、生活習慣が生体情報として反映された情報）と実年齢情報さえあれば、どのような生活習慣をしているかという情報なしに、生物学的年齢を低くするための生活習慣を提示することが可能である。 As described above, according to an embodiment of the present invention, since biological age is calculated using more fundamental genome-level information compared to conventional health programs, it is expected to be extremely effective in changing behavior to maintain and improve health. In addition, once data is registered from a sufficient number of user samples, lifestyle item parameters are accurately determined, so lifestyle data is not required for subsequent user samples. In other words, as long as there is genome-related information (information that reflects lifestyle habits as biometric information, such as genome modification rate) and actual age information, it is possible to present lifestyle habits to lower biological age, even without information on the type of lifestyle habits.

すなわち、ステップS4での生活習慣アドバイス生成部23でアドバイス情報を生成する対象者は、データベース15に既に登録されているサンプルから指定するものとして説明したが、未登録の新規の対象者を指定するようにしてもよい。この際、当該新規対象者について、その実年齢nと、そのゲノム関連情報をエピゲノムデータ解析部12で処理したDNA修飾レベルデータとを、生活習慣アドバイス生成部23へと入力するようにすればよい。 In other words, although it has been described that the subject for which advice information is generated by the lifestyle habit advice generation unit 23 in step S4 is specified from samples already registered in the database 15, it is also possible to specify a new unregistered subject. In this case, the actual age n of the new subject and DNA modification level data obtained by processing the genome-related information of the new subject by the epigenome data analysis unit 12 may be input to the lifestyle habit advice generation unit 23.

以下、種々の補足事項、追加例、代替例などに関して説明する。 Below, we will explain various supplementary points, additions, alternatives, etc.

（１）本発明の実施形態によれば、健康状態の維持・向上のための行動変容を促すアドバイス情報を生成することができるため、国連が主導する持続可能な開発目標（ＳＤＧｓ）の目標３「あらゆる年齢のすべての人々の健康的な生活を確保し、福祉を推進する」に貢献することが可能となる。 (1) According to an embodiment of the present invention, it is possible to generate advice information that encourages behavioral changes to maintain and improve health, which can contribute to Goal 3 of the United Nations-led Sustainable Development Goals (SDGs) to "Ensure healthy lives and promote well-being for all at all ages."

（２）ステップS1で構築するデータベース15では、ゲノム関連情報より得られるDNA修飾レベルデータを各サンプルについて登録するものとしたが、これに代えて、または、加えて、DNA修飾レベルデータと同様の性質を有する別情報、すなわち、生活習慣が生体情報として反映される１つ以上の任意項目の評価値を登録するようにしてもよい。例えば、健康指標変数であるBMIの値や血液検査項目（LDLコレステロール値）や何らかのバイオマーカーなどのような、健診データ等の評価値を登録するようにしてもよい。また、生活習慣の項目に関して、運動や食事に関する項目以外にも、図１，２の模式例においても示すように、睡眠、喫煙、ストレス等に関する項目の実績値を生活習慣データ入力部13において受け付け、データベース15に登録するようにしてもよい。 (2) In the database 15 constructed in step S1, DNA modification level data obtained from genome-related information is registered for each sample. However, instead of or in addition to this, other information having the same properties as the DNA modification level data, i.e., evaluation values of one or more optional items that reflect lifestyle habits as biological information, may be registered. For example, evaluation values of health checkup data such as BMI values, which are health index variables, blood test items (LDL cholesterol values), and some kind of biomarkers may be registered. Furthermore, with regard to lifestyle habits, in addition to items related to exercise and diet, actual values of items related to sleep, smoking, stress, etc. may be accepted by the lifestyle habits data input unit 13 and registered in the database 15, as shown in the schematic examples of Figures 1 and 2.

（３）本発明の実施形態ではエピゲノム情報から推定する生物学的年齢を用いるが、これには次のような意義がある。生物学的年齢とは生体組織の機能的な能力・老化の度合いを表す指標である。また、エピゲノム年齢とは生物学的年齢の１つであり、ゲノム内のメチル化状態に基づいた数学的アルゴリズムで推定される年齢であり、数百～数十万箇所のメチル化変数から機械学習でモデル化できるものである。エピゲノムとしてDNAメチル化率がよく用いられることから、DNAメチル化年齢とも呼ばれる。エピゲノム年齢の利点として、あらゆる生物学的年齢の中で最も有効な指標とされ、死亡率との関連が強く、健康度の指標となり、組織や臓器ごとの老化進行度がわかることが挙げられる。例えば、がん細胞ではエピゲノム年齢が進んでいたり、エピゲノム年齢が高いとウイルスに罹患しやすいとの報告もある。 (3) In the embodiment of the present invention, biological age estimated from epigenetic information is used, which has the following significance. Biological age is an index that represents the functional capacity and degree of aging of biological tissue. Epigenetic age is one of biological ages, and is an age estimated by a mathematical algorithm based on the methylation state in the genome, and can be modeled by machine learning from methylation variables at hundreds to hundreds of thousands of locations. Since the DNA methylation rate is often used as the epigenome, it is also called DNA methylation age. The advantages of epigenetic age include that it is considered to be the most effective index of all biological ages, is strongly associated with mortality, serves as an index of health, and indicates the degree of aging progress for each tissue or organ. For example, it has been reported that cancer cells have advanced epigenetic ages, and that people with high epigenetic ages are more susceptible to viruses.

（４）図１２は、一般的なコンピュータ装置70におけるハードウェア構成の例を示す図である。情報処理装置100は、このような構成を有する１台以上のコンピュータ装置70として実現可能である。なお、２台以上のコンピュータ装置70で情報処理装置100を実現する場合、ネットワーク経由で処理に必要な情報の送受を行うようにしてよい。コンピュータ装置70は、所定命令を実行するCPU（中央演算装置）71、CPU71の実行命令の一部又は全部をCPU71に代わって又はCPU71と連携して実行する専用プロセッサとしてのGPU（グラフィックス演算装置）72、CPU71（及びGPU72）にワークエリアを提供する主記憶装置としてのRAM73、補助記憶装置としてのROM74、通信インタフェース75、ディスプレイ76、マウス、キーボード、タッチパネル等によりユーザ入力を受け付ける入力インタフェース77と、これらの間でデータを授受するためのバスBSと、を備える。 (4) FIG. 12 is a diagram showing an example of the hardware configuration of a general computer device 70. The information processing device 100 can be realized as one or more computer devices 70 having such a configuration. When the information processing device 100 is realized by two or more computer devices 70, information required for processing may be sent and received via a network. The computer device 70 includes a CPU (Central Processing Unit) 71 that executes predetermined instructions, a GPU (Graphics Processing Unit) 72 as a dedicated processor that executes some or all of the execution instructions of the CPU 71 in place of the CPU 71 or in cooperation with the CPU 71, a RAM 73 as a main storage device that provides a work area for the CPU 71 (and the GPU 72), a ROM 74 as an auxiliary storage device, a communication interface 75, a display 76, an input interface 77 that accepts user input via a mouse, a keyboard, a touch panel, etc., and a bus BS for transmitting and receiving data between them.

情報処理装置100の各機能部は、各部の機能に対応する所定のプログラムをROM74から読み込んで実行するCPU71及び／又はGPU72によって実現することができる。なお、CPU71及びGPU72は共に、演算装置（プロセッサ）の一種である。ここで、表示関連の処理が行われる場合にはさらに、ディスプレイ76が連動して動作し、データ送受信に関する通信関連の処理が行われる場合にはさらに通信インタフェース75が連動して動作する。データベース15は、補助記憶装置としてのROM74として実現してよい。 Each functional unit of the information processing device 100 can be realized by a CPU 71 and/or a GPU 72 that reads from a ROM 74 and executes a predetermined program corresponding to the function of each unit. Both the CPU 71 and the GPU 72 are a type of arithmetic device (processor). Here, when display-related processing is performed, the display 76 also operates in conjunction with the above, and when communication-related processing related to data transmission and reception is performed, the communication interface 75 also operates in conjunction with the above. The database 15 may be realized as a ROM 74 serving as an auxiliary storage device.

100…情報処理装置、11…エピゲノムデータセット入力部、12…エピゲノムデータ解析部、13…生活習慣データ入力部、14…対応付け保存部、15…データベース、21…生物学的年齢算出部、22…生活習慣影響解析部、23…生活習慣アドバイス生成部 100...information processing device, 11...epigenetic data set input unit, 12...epigenetic data analysis unit, 13...lifestyle data input unit, 14...association storage unit, 15...database, 21...biological age calculation unit, 22...lifestyle effect analysis unit, 23...lifestyle advice generation unit

Claims

a first process of calculating a biological age for each of a plurality of samples by applying a biological age prediction model to the evaluation value of each sample by referring to a database in which the actual age, the evaluation value of each item of biological information reflecting the lifestyle, and the actual value of each item of the lifestyle are associated with each other;
A second process of calculating an influence of each of the lifestyle items on the biological age by referring to the database ,
The information processing apparatus according to claim 1, wherein in the second process, the degree of influence is calculated using partial correlation between each item of the lifestyle habit and each item of the biological information .

a first process of calculating a biological age for each of a plurality of samples by applying a biological age prediction model to the evaluation value of each sample by referring to a database in which the actual age, the evaluation value of each item of biological information reflecting the lifestyle, and the actual value of each item of the lifestyle are associated with each other;
A second process of calculating an influence of each of the lifestyle items on the biological age by referring to the database ,
a third process of calculating a model value of the evaluation value of the biometric information within a certain range of chronological ages from samples determined to have a young biological age within the certain range of chronological ages by referring to the database;
and a fourth process of outputting, for a specified sample, the lifestyle habit items that are determined to be effective in making the biological age of the sample younger by comparing an evaluation value of the biometric information of the sample with a model value of the evaluation value of the biometric information that corresponds to the actual age of the sample .

In the second process, a partial correlation (rBk) between each item (B) of the lifestyle habit and each item (k) of the biological information is calculated;
In the fourth process, the comparison is made based on a difference between the evaluation value and the model value for each item (k) of biometric information in a designated sample;
3. The information processing device according to claim 2, further comprising: outputting the lifestyle habit item (B) determined to be effective in making the biological age of the sample younger, using the difference and the partial correlation (rBk).

In the first process, when calculating the biological age by applying the biological age prediction model, the biological age is calculated as a weighted sum of the evaluation values of each item (k) of the biological information of each sample by a predetermined coefficient (bk);
The information processing device according to claim 3, characterized in that in the fourth process, a sum is calculated for each lifestyle item (B) by adding a value (rBk*bk*Dwk) obtained by multiplying the absolute value of the difference (Dwk) by the predetermined coefficient (bk) and the partial correlation (rBk) for each item (k) of the biological information, and a lifestyle item corresponding to a sum whose value is negative and has a larger absolute value is output as an item that is more effective in making the biological age of the sample younger.

5. The information processing device according to claim 1 , wherein at least a part of the items of biological information reflecting the lifestyle in the database includes an epigenetic modification rate.

a first process of calculating a biological age for each of a plurality of samples by applying a biological age prediction model to the evaluation value of each sample by referring to a database in which the actual age, the evaluation value of each item of biological information reflecting the lifestyle, and the actual value of each item of the lifestyle are associated with each other;
A second process of calculating an influence of each of the lifestyle items on the biological age by referring to the database ,
2. An information processing apparatus comprising: a database including a health checkup data item among at least some of the items of biological information reflecting the lifestyle habits of the user;

7. The information processing apparatus according to claim 1 , wherein at least some of the lifestyle habit items in the database include items related to exercise, diet, smoking, sleep, or stress.

A computer-implemented information processing method, comprising:
a first process of calculating a biological age for each of a plurality of samples by applying a biological age prediction model to the evaluation value of each sample by referring to a database in which the actual age, the evaluation value of each item of biological information reflecting the lifestyle, and the actual value of each item of the lifestyle are associated with each other;
A second process of calculating an influence of each of the lifestyle items on the biological age by referring to the database ,
The information processing method , wherein the second process calculates the degree of influence using partial correlation between each item of the lifestyle habit and each item of the biological information .

A computer-implemented information processing method, comprising:
a first process of calculating a biological age for each of a plurality of samples by applying a biological age prediction model to the evaluation value of each sample by referring to a database in which the actual age, the evaluation value of each item of biological information reflecting the lifestyle, and the actual value of each item of the lifestyle are associated with each other;
A second process of calculating an influence of each of the lifestyle items on the biological age by referring to the database ,
a third process of calculating a model value of the evaluation value of the biological information within a certain range of chronological ages from samples determined to have a young biological age within the certain range of chronological ages by referring to the database;
and a fourth process of outputting, for a specified sample, the lifestyle habit items that are determined to be effective in making the biological age of the sample younger by comparing an evaluation value of the biometric information of the sample with a model value of the evaluation value of the biometric information that corresponds to the actual age of the sample.

A computer-implemented information processing method, comprising:
a first process of calculating a biological age for each of a plurality of samples by applying a biological age prediction model to the evaluation value of each sample by referring to a database in which the actual age, the evaluation value of each item of biological information reflecting the lifestyle, and the actual value of each item of the lifestyle are associated with each other;
A second process of calculating an influence of each of the lifestyle items on the biological age by referring to the database ,
An information processing method, characterized in that at least some of the items of biological information reflecting the lifestyle habits in the database include items of health checkup data .

8. A program for causing a computer to function as the information processing device according to claim 1.