WO2015071968A1

WO2015071968A1 - Analysis system

Info

Publication number: WO2015071968A1
Application number: PCT/JP2013/080616
Authority: WO
Inventors: 信二垂水; 利昇三好; 泰隆長谷川; 伴　秀行
Original assignee: 株式会社日立製作所
Priority date: 2013-11-13
Filing date: 2013-11-13
Publication date: 2015-05-21

Abstract

The purpose of the present invention is to select a disease to be addressed in the future. The following is an example of a means used for accomplishing this purpose. An analysis system capable of accessing a database that stores healthcare information including medical records and test values for subjects, and formatted information obtained by grouping the healthcare information on a predetermined period basis for each subject, and also capable of accessing a database that stores a graphical model created on the basis of the formatted information and defined by: a first node group which corresponds to a random variable representing a pathological condition, a second node group which corresponds to a random variable representing a factor affecting a change in the pathological condition, and a directed or undirected edge which represents the stochastic dependence between any two nodes included in a set comprising the first node group and the second node group. The analysis system is provided with a disease development probability prediction unit which predicts probability of disease development on the basis of the graphical model, and a unit for selection of a disease to be addressed, which selects at least one pathological condition as a disease to be addressed, on the basis of the probability of disease development.

Description

Analysis system

The present invention relates to a data analysis technique, and more particularly to a healthcare data analysis system for analyzing healthcare data.

The Health Insurance Association conducts an insurance business that provides health guidance for the prevention of lifestyle-related diseases and the prevention of serious diseases for the purpose of reducing medical expenses. However, resources such as public health nurses available for health guidance and costs for health guidance are limited. Therefore, a system that supports effective and efficient insurance business operation is desired.

As a system for supporting the operation of an insurance business, Patent Document 1 discloses a health business support system for selecting a health guidance target person based on receipt information, medical examination information, and health guidance information. A medical cost model creation unit that creates a medical cost model that indicates the predicted medical cost for each severity and test value, and a test value improvement model creation unit that creates a test value improvement model that indicates the amount of improvement for each severity and test value Predicting medical cost reduction by health guidance for each severity and test value, and predicting medical cost reduction effect, and health guidance for health insurance members belonging to severity and test value with high predicted medical cost reduction A health business support system including a target person selection unit that selects a target person is described.

JP 2012-128670 A

In order to conduct an effective and efficient insurance business within the limited resources of the health insurance association,
It is important to predict not only the current situation but also the future health insurance association situation and implement appropriate insurance business. For example, if the number of affected people is small, but it is possible to predict a disease that is expected to increase in the future, or an insured who is likely to be affected by the disease in the future, appropriate health guidance can be provided to appropriate subjects. Can be expected to lead to a reduction in the number of affected people and medical costs in the future. However, in the prior art, it is not easy to predict a disease that will be a future problem as described above.

In order to solve the above problems, for example, an analysis system that includes a processor that executes a program, a memory that stores the program, and an input unit that receives an input of information, and analyzes healthcare data by executing the program The analysis system is accessible to a database that stores health care information including medical records and test values of the subject, and shaping information in which the health care information is summarized for each subject and every predetermined period. The analysis system creates a first node group corresponding to a random variable representing a pathological condition and a second node group corresponding to a random variable representing a factor affecting the change in the pathological condition, which are created based on the shaping information. A directed or undirected edge representing a stochastic dependency between any two nodes included in the set of the first node group and the second node group, A database storing a defined graphical model, wherein the processor predicts the onset probability of the disease based on the graphical model, and the processor has at least one based on the onset probability. There is provided an analysis system comprising a problem disease extraction unit that extracts a disease state as a problem disease.

According to the representative embodiment of the present invention, it is possible to extract items related to diseases that will be a future problem based on health care data including information such as medical records and test values of the subject.

It is a block diagram which shows the structure of the medical data analyzer of 1st Example. It is a block diagram which shows another structure of the medical data analyzer of 1st Example. It is a flowchart of the subject disease extraction process of a 1st Example. It is a flowchart of the factor extraction process of 2nd Example. It is a flowchart of the factor extraction process of a 3rd Example. It is a figure explaining the receipt basic information of a 1st Example. It is a figure explaining the medical examination information of a 1st Example. It is a figure explaining the inquiry information of a 1st Example. It is a figure explaining the disease name information of a 1st Example. It is a figure explaining the wound name classification information of a 1st Example. It is a figure explaining the medical practice information of a 1st Example. It is a figure explaining the medical practice classification information of a 1st Example. It is a figure explaining the pharmaceutical information of a 1st Example. It is a figure explaining the pharmaceutical classification information of a 1st Example. It is a figure explaining an example of the shaping information of a 1st Example. It is a figure explaining another example of the shaping information of a 1st Example. It is a figure explaining the example of the operation screen of a 1st Example. It is a figure explaining the example of the operation screen of a 2nd Example. It is a figure explaining the process of the factor extraction in the factor extraction process of a 3rd Example. It is a figure explaining the item aggregation process in the visualization process of a 3rd Example. It is a figure explaining the model which is a Bayesian network. It is a figure explaining the model comprised by two random variables, and a random variable. It is a figure explaining the model comprised by four random variables, and a random variable. It is a flowchart of the visualization process of a 3rd Example. It is a flowchart of a high risk object person selection process of a 4th Example. It is a figure explaining the example of the operation screen of a 4th Example. It is a figure explaining an example of the shaping information of 5th Example, and virtual shaping information. It is a figure explaining an example of the prediction result based on the shaping information of 5th Example, and the prediction result based on virtual shaping information. It is a flowchart of the virtual shaping data creation process of a 5th Example. It is a figure explaining the example of the operation screen of a 5th Example. It is a figure explaining an example of the shaping information of a 6th Example.

Hereinafter, an embodiment for carrying out the invention will be described with reference to the drawings.

In the first embodiment, an example of a health care data analysis device that extracts a disease to be a problem based on health care data will be described.

Health insurance providers have a need to understand diseases that will be a future issue. The reason for this is that while it is easy to calculate the prevalence and medical expenses for each disease, it is not easy to predict future changes, and by grasping the future problem diseases, long-term health guidance is provided. The ability to create a plan. The definition of a disease that is a problem varies depending on the health insurance company and from time to time. Examples thereof include diseases in which the number of affected persons increases in the future and diseases in which medical expenses increase in the future.

In the first embodiment, the problem is based on the causal relationship between diseases created based on the health care data and the model of the transition structure of the disease state, and the problem disease extraction logic input by the user to extract the problem disease. Extract disease.

Here, the health care data is data including information on medical / health for each individual such as medical records and test values for each target person. Specific examples of information included in the health care data include, for example, the name of the subject's injury and illness, the medical practice performed on the subject, the cost of the medical practice, health guidance, lifestyle based on interviews, etc. .

Hereinafter, in the present embodiment, a case will be described in which three pieces of information including receipt information, medical examination information, and inquiry information exist in the health care data, but it is not necessary to include all of them. Hereinafter, an outline of three pieces of information, that is, receipt information, medical examination information, and inquiry information will be described.

Receipt information is information that records the name of the sickness, prescription drugs, medical practice performed, and medical expenses (scores) when a health insurance member visits a medical institution. 6 will be described later. In addition, the prescribed medicine and the practiced medical practice are collectively referred to as medical practice.

The health check information is information in which test values when a health insurance subscriber receives a health check, and an example thereof will be described later with reference to FIG.

The interview information is information in which the results of interviews such as lifestyle habits, past medical history, subjective symptoms, etc. when a health insurance subscriber receives a medical checkup, and an example thereof will be described later with reference to FIG.

FIG. 1 is a block diagram showing the configuration of the health care data analyzer of the first embodiment.

The health care data analysis apparatus according to the first embodiment includes a data analysis apparatus 101 and a database 115.

The data analysis apparatus 101 includes an input unit 102, an output unit 103, an arithmetic unit 104, a memory 105, and a storage medium 106.

The input unit 102 is a human interface such as a mouse and a keyboard, and receives input to the data analysis apparatus 101. The output unit 103 is a display or a printer that outputs a calculation result by the health care data analyzer. The storage medium 106 is a storage device that stores various programs that realize healthcare data analysis processing by the data analysis device 101, execution results of the data analysis processing, and the like. For example, the storage medium 106 is a non-volatile storage medium (magnetic disk drive, non-volatile Memory). In the memory 105, a program stored in the storage medium 106 is expanded. The arithmetic device 104 is an arithmetic device that executes a program loaded in the memory 105, and is, for example, a CPU, a GPU, or the like. The processing device 104 executes the processing and calculation described below.

The health care data analyzer system according to the first embodiment is a computer system configured on a single computer or on a plurality of logically or physically configured computers, and is separated on the same computer. May operate on a virtual machine constructed on a plurality of physical computer resources.

The program executed by the computing device 104 is provided to each server via a removable medium (CD-ROM, flash memory, etc.) or a network, and is stored in a non-volatile storage device that is a non-temporary storage medium. For this reason, the computer system may include an interface for reading removable media.

In the following, problem disease extraction, which is one of the health care data analysis processing, will be described first. Thereafter, various data used in the subject disease extraction process and the process of the various data will be described.

First, the problem disease extraction unit 110 will be described.

The problem disease extraction unit 110 extracts a problem disease that is considered to be a future problem, such as a disease that increases the probability of future onset or a disease that increases future medical expenses, and information on the problem disease in the subject group. I will provide a. Here, a case where the health insurance company extracts a problem disease from data of the insured group will be described as an example.

First, the outline of the processing of the problem disease extraction unit 110 will be described. The problem disease extraction unit 110 predicts the probability of occurrence of an unknown item, for example, a future disease, from the known information of the insured group that is the target for extracting the problem disease. Next, a disease evaluation index is calculated for each disease based on the onset probability, and a disease to be a problem is extracted based on the calculated disease evaluation index.

Hereinafter, details of the processing of the problem disease extraction unit 110 will be described.

First, the problem disease extraction unit 110 reads data of a group of subjects who perform problem disease extraction from the shaping information storage unit 113 or the input unit 102. For example, when the data of the insured group used for creating the graphical model stored in the graphical model storage unit 114 is used as it is, the shaping information stored in the shaping information storage unit 113 is used. When using data of an unknown insured group, data read from the input unit 102 and shaped by the data shaping unit 107 as necessary are used. The data of the subject group may be data of all subjects included in the data, or may be a sample of a subset of the subject group. For example, when targeting a group of insured persons older than a certain age, if a threshold is set in the item of age and only data of insured persons having an age equal to or greater than the threshold is selected from the data included in the shaping information, good. For sampling, thresholds may be provided for items such as age and the number of medical treatments. Moreover, you may sample using well-known sampling methods, such as random sampling. By performing sampling, it is possible to extract problem diseases in a specific group.

Next, among the items included in the graphical model stored in the graphical model storage unit 118 as a disease candidate, an item related to a disease is selected as a disease candidate. The item regarding the disease is selected based on the shaping information stored in the shaping information storage unit 117, for example.

Next, the process of extracting the target disease using the target group data and the disease candidate will be described with reference to FIG. FIG. 3 is a flowchart of a process for extracting a target disease using target group data and disease candidates.

Hereinafter, the processing of each step will be described.

In the problem disease extraction logic determination step 301, the problem disease extraction logic is determined based on the information input to the input unit 102. The problem disease extraction logic is calculated for each subject, a disease evaluation index that is an index calculated for each disease candidate, a disease evaluation index calculation method that is a method for calculating a disease evaluation index for each disease based on the probability of occurrence, and a target person. The disease evaluation index totaling method, which is a method for totaling the disease evaluation index for each disease, and the problem disease extraction condition, which is a condition for extracting a disease as a problem based on the total disease evaluation index.

The disease evaluation index is an index calculated for each disease in order to extract the target disease, and is an index determined based on the onset probability and medical expenses predicted by the onset probability prediction unit 109. Examples of basic indicators include the probability of disease onset after N years, the expected value of the number of people who develop disease after N years, and the expected value of medical expenses related to the disease after N years. Moreover, as a complicated index, an index combining a plurality of indices based on the onset probability in different years can be cited. For example, a new index of the rate of increase in the number of disease onset for 10 years from N years to N + 10 years from two indicators of the expected value of the number of disease onset after N years and the expected value of the number of disease onset after N + 10 years Can be defined. Here, N represents an arbitrary natural number.

The disease evaluation index calculation method is a calculation method for calculating a disease evaluation index from an onset probability, and is defined for each disease evaluation index. For example, if it is an expected value of the number of onset of illness N years later, the probability of onset per subject is directly the expected number of onset for one subject. If it is the expected value of the medical expenses related to the disease after N years, it can be calculated, for example, by multiplying the expected value of the onset probability of the disease after N years by the average medical expenses for each disease. As the average medical cost for each disease, for example, the average medical cost calculated for the subject suffering from the disease from the shaping information stored in the shaping information storage unit 117 may be used. In addition, when medical cost information is included in the health care data, for example, a value obtained by summing up expected medical cost values N years after the medical action related to the subject disease may be used. If the index is based on the probability of onset in different years, such as the rate of increase in the number of disease onset in 10 years from N years to N + 10 years later, the expected number of disease onset in N years and the onset of disease in N + 10 years After predicting the expected number of people, the rate of increase may be calculated from the expected number of sick people after N + 10 years and the expected number of sick people after N years.

The disease evaluation index counting method is a method of counting each disease evaluation index obtained for each subject as a disease evaluation index for the entire target group. For example, as an example of a counting method when the disease evaluation index is an expected value of the number of onset of illness after N years, the expected number of onset calculated for each subject is totaled for the entire subject, The expected number of people affected can be calculated for the entire population. If the disease evaluation index is the rate of increase in the number of people with disease onset for 10 years from N years to N + 10 years later, it cannot be tabulated by summing up the indicators calculated for each subject, so expect the number of people with disease onset in N years The values and the expected number of illnesses after N + 10 may be summed up by calculating the rate of increase after adding them up for the entire subject.

The problem disease extraction condition is a condition for extracting a disease as a problem based on a disease evaluation index collected by disease. For example, as an example of a condition for extracting a disease as a problem based on the expected number of people with onset of disease, there is a method of setting a threshold for the expected number of people with onset and extracting a disease having an expected number of people with onset exceeding the threshold as a task. Can be mentioned. As another example, there is a method in which diseases are rearranged in descending order of the expected number of onset patients, and a predetermined number of diseases are selected in descending order.

Hereinafter, an example in which the problem disease extraction logic is determined based on information input from the input unit 102 will be described.

As a first example, the user selects and determines a problem disease extraction logic registered in the database in advance.

As a second example, the user selects a problem disease extraction logic registered in the database in advance as a template, gives information to modify a part thereof, changes the logic, and determines a final problem disease extraction logic. . For example, it is possible to determine a problem disease extraction logic that allows a user to extract a desired problem disease by correcting a prediction year, a threshold value used for the problem disease extraction condition, or the like as desired.

Hereinafter, the steps from the target person sample selection step 302 to the step 307 are processes performed for each target person, and are one cycle of processes for all the target persons. Specific processing will be described below.

In subject sample selection step 302, one unprocessed insured sample is selected in the cycle. For the following explanation, the subject selected in this step is assumed to be insured S.

Hereinafter, the processing from the disease candidate selection step 303 to step 306 is processing performed for each item of the disease candidate, and is one cycle of processing for all the disease candidate items. Specific processing will be described below.

In disease candidate selection step 303, one disease candidate item that has not been evaluated in the cycle is selected. For the following description, the item selected in this step is referred to as disease D.

In the probability prediction step 304, the probability that the insured S selected in the subject sample selection step 302 will develop the disease D is predicted. When medical cost information is included in the health care data, the expected value of the medical cost of the medical practice related to the disease D is also predicted. The prediction is performed using the onset probability prediction unit 109 based on the known information of the insured person S. The predicted onset probability is determined based on information included in the problem disease extraction logic. For example, when an expected number of people with onset of a disease state after N years is designated as the disease evaluation index, assuming the current year as year X, the onset probability of disease D in X + N years and the expected value of medical expenses are predicted. In addition, if the index is based on the probability of onset in different years, such as the rate of increase in the number of disease onset for 10 years from N years to N + 10 years, the probability of onset in X + N years and the expected number of people in N + 10 years Predict.

In the disease evaluation index calculation step 305, a disease evaluation index is calculated from the onset probability of the disease D predicted in the disease onset probability prediction step 304 and the medical expenses. The disease evaluation index calculation method follows the disease evaluation index calculation method determined in the problem disease extraction logic determination step 301. The calculated disease evaluation index is stored in the disease evaluation index storage unit 116 together with information on the insured S for which the disease evaluation index has been calculated.

In step 306, if there is an unevaluated item in the cycle among illness candidates, the process returns to the illness candidate selection step 303, and an unevaluated item is selected. If not, the cycle is terminated and the routine goes to Step 307.

In step 307, if there is an unpredicted target person in the cycle in the target person group, the process returns to the target person sample selection step 302, and an unpredicted target person is selected. If not, the cycle is terminated and the process proceeds to step 308.

In the disease-specific evaluation index totaling step 308, the disease evaluation index storage unit 116 stores the
The disease evaluation index for each insured is tabulated by disease. The aggregation method follows the disease evaluation index aggregation method determined in the problem disease extraction logic determination step 301. The aggregated disease evaluation index is stored in the disease evaluation index storage unit 116.

In the target disease extraction step 309, the target disease is extracted using the disease-specific disease evaluation index calculated in the disease-specific evaluation index totaling step 308. The problem disease extraction method follows the problem disease extraction method determined in the problem disease extraction logic determination step 301.

In the above description, an example in which one disease candidate item is selected in the disease candidate selection step 303 is shown. However, in the disease candidate selection step 303, a plurality of disease candidates may be selected at a time. In this case, in the disease onset probability prediction step 304, the onset probability of a plurality of diseases is predicted at a time.

As described above, the information on the problem disease extracted by the process of the problem disease extraction unit 110 is stored in the problem disease storage unit 117. The information regarding the problem disease stored in the problem disease storage unit 117 may be output from the output unit 103 in a character format, a table format, or the like, for example.

FIG. 17 is a screen example of a user interface showing an example of a form for realizing the present embodiment.

Reference numeral 1701 denotes an operation window for performing setting of problem disease extraction. Here, an example is shown in which target group narrowing, disease candidate narrowing, and problem disease extraction logic can be set.

Reference numeral 1702 denotes an input window for setting a narrowing condition for the input target group data. Here, as an example, a male included in the subject group is set as a subject group for subject disease extraction.

1703 is an input window for setting a narrowing condition for narrowing down items of disease candidates related to diseases among items included in the graphical model stored in the graphical model storage unit 118. Here, as an example, conditions are not set, and all disease items are targeted for subject disease extraction.

1704 is an input window for determining the subject disease extraction logic. Here, an example is shown in which the next year medical expenses are selected as a disease evaluation index, and the disease evaluation index calculation method, the disease evaluation index tabulation method, and the problem disease extraction method are read from the database based on the selected disease evaluation index.

1705 is an execution button for starting the target disease extraction process based on the target disease extraction settings set in 1702, 1703, and 1704.

1706 is a display window for displaying the processing result.

1707 is a display screen for displaying the extracted issues. Here, the subject diseases extracted based on the next year's medical expenses are displayed in a table format in descending order of the next year's medical expenses.

In this embodiment, the data shaping unit 107 shapes the data stored in the medical information storage unit 116 in the database, and the graphical model creation unit 108 uses the graphical model based on the shaping data stored in the shaping information storage unit 117. The shaping information storage unit 117 stores the shaping information created in advance based on the health care data, and the graphical model storage unit 118 is created in advance from the shaping information. When the graphical model is stored, the data shaping unit 107, the graphical model creation unit 108, and the medical information storage unit 116 may not be provided in the configuration of this embodiment. FIG. 2 is a diagram illustrating another configuration example in which the healthcare data analysis apparatus 101 does not include the data shaping unit 107 and the graphical model creation unit 108, and the database 115 does not include the medical information storage unit 116.

As described above, the health care data analysis apparatus according to the present embodiment can extract a disease that will be a future problem from the accumulated health care data of the target group based on various indices and by a simple operation. .

Hereinafter, various data and data processing used in the above extraction processing will be described.

First, the health care data handled in the first embodiment will be described. The medical information storage unit 116 stores health care data input to the input unit 102. Hereinafter, the receipt information, the medical examination information, and the inquiry information will be taken as examples of typical health care data, and each will be described.

First, the receipt information will be described.

The receipt information includes basic receipt information, wound name information, medical practice information, drug information, wound name classification information, medical practice classification information, and pharmaceutical classification information.

FIG. 6 is a diagram illustrating an example of basic receipt information.

The basic receipt information 601 is information that holds the correspondence between the receipt and the health insurance subscriber. The basic receipt information 601 includes a search number 602, health insurance subscriber ID 603, gender 604, age 605, treatment date 606, total score 607, and the like.

The search number 602 is an identifier for uniquely identifying a receipt. The health insurance subscriber ID 603 is an identifier for uniquely identifying a health insurance subscriber. Gender 604 and age 605 are the gender and age of the subscriber.

The medical treatment month 606 is the year and month when the subscriber visited the medical institution. The total score 607 is information indicating the total score of one receipt.

FIG. 9 is a diagram for explaining an example of the disease name information 901.

The wound name information 901 includes a search number 602, a wound name code 902, a wound name 903, and the like.

The search number 602 is an identifier for uniquely identifying a receipt, and the same number as the search number of the basic receipt information 601 (FIG. 6) is used. The wound name code 902 is a wound name code written on the receipt. The wound name 903 is the name of the wound corresponding to the wound name code.

FIG. 10 is a diagram for explaining wound name classification information.

Wound and disease name classification information 1001 is information for associating a wound and disease classification with a wound and disease name belonging to the wound and disease classification, and includes a wound and disease classification 1002, a wound and disease name code 902, a wound and disease name 903, and a complication presence or absence 1003.

The injury / illness classification 1002 is a classification to which the injury / illness belongs. The wound name code 902 is a wound name code described in the receipt, and the same number as the wound name code 902 (FIG. 9) of the wound name information 901 is used. The wound name 903 is the name of the wound corresponding to the wound name code, and the same name as the wound name 903 (FIG. 9) of the wound name information 901 is used. Complication presence / absence 1003 indicates whether or not this wound is the name of a complication.

FIG. 11 is a diagram illustrating an example of medical practice information.

The medical practice information 1101 includes a search number 602, a medical practice code 1102, a medical practice name 1103, and a medical practice score 1104.

The search number 602 is an identifier for uniquely identifying a receipt, and the same number as the search number 602 (FIG. 6) of the basic receipt information 601 is used. The medical practice code 1102 is an identifier for identifying the medical practice described in the receipt. The medical practice name 1103 is the name of the medical practice corresponding to the medical practice code. The medical practice score 1104 is an insurance score of the medical practice.

In FIG. 11, for example, in the receipt of “11” in the search number 602, the names of medical treatments 1103 of “medical treatment A” and “medical practice C” are described.

FIG. 12 is a diagram illustrating an example of medical practice classification information.

The medical practice classification information 1201 includes a wound classification 1002, a medical practice code 1102, and a medical practice name 1103.

The wound classification 1002 uses the same classification as the wound classification 1002 (FIG. 10) of the wound name classification information 1001. The medical practice code 1102 is a medical practice code for identifying a medical practice performed for an injury or illness of the wound classification 1002, and uses the same code as the medical practice code 1102 (FIG. 11) of the medical practice information 1101. The medical practice name 1103 is the name of the medical practice corresponding to the medical practice code, and the same code as the medical practice name 1103 (FIG. 11) of the medical practice information 1101 is used.

FIG. 13 is a diagram illustrating an example of pharmaceutical information.

The drug information 1301 includes a search number 602, a drug code 1302, a drug name 1303, and a drug score 1304.

The search number 602 is an identifier for uniquely identifying a receipt, and the same number as the search number 602 (FIG. 6) of the basic receipt information 601 is used. The drug code 1302 is a drug code for identifying the drug described in the receipt. The drug name 1303 is the name of the drug described in the receipt. The drug score 1304 is the insurance score of the drug.

In FIG. 13, for example, a receipt with a search number 602 of “11” describes the drug names of diabetes oral drug A and hypertension oral drug A.

FIG. 14 is a diagram for explaining drug classification information.

The drug classification information 1401 includes a wound classification 1002, a drug code 1302, and a drug name 1303.

The wound classification 1002 uses the same classification as the wound classification 1002 (FIG. 10) of the wound name classification information 1001. The drug code 1302 is a drug code for identifying a drug prescribed by the classification registered in the injury and illness classification 1002, and the same code as the drug code 1302 (FIG. 13) of the drug information 1301 is used. The drug name 1303 is the name of the drug corresponding to the drug code, and the same name as the drug name 1303 (FIG. 13) of the drug information 1301 is used.

Note that the medical practice information 1101 shown in FIG. 11 and the pharmaceutical information shown in FIG. 13 are collectively referred to as medical practice information. Further, the medical practice classification information 1201 shown in FIG. 12 and the pharmaceutical classification information shown in FIG. 14 are collectively referred to as medical practice classification information.

Next, medical examination information will be described.

FIG. 7 is a diagram for explaining an example of the medical examination information.

The medical examination information 701 is information for managing medical examination information for a plurality of subscribers for a plurality of years. The health insurance subscriber ID 603, the medical examination reception date 702, and various examination values (for example, BMI 703, Abdominal circumference 704, fasting blood glucose 705, systolic blood pressure 706, neutral fat 707).

The health insurance subscriber ID 603 is an identifier of a health insurance subscriber who has undergone a medical examination, and uses the same identifier as the health insurance subscriber ID 603 (FIG. 6) of the basic receipt information 601. The medical checkup date 702 is the date on which the medical checkup was received. BMI 703 to neutral fat 707 are the results of a health checkup.

健 Data of medical examination information may be missing, such as when a specific examination is not taken. For example, in FIG. 7, data on systolic blood pressure 706 is missing from the examination items that the health insurance subscriber ID “K0004” consulted in 2004.

Next, the interview information will be explained.

FIG. 8 is a diagram for explaining an example of the inquiry information.

The inquiry information 801 is information for managing the inquiry information for a plurality of subscribers for a plurality of years. The health insurance subscriber ID 603, the inquiry date 802, and the answer to the inquiry (for example, smoking 803, drinking 804, walking 805) )including. The interview may include lifestyle habits, medical history, constitutions such as allergies, subjective symptoms, and the like.

The health insurance subscriber ID 603 is an identifier of a health insurance subscriber who has received an inquiry, and uses the same identifier as the health insurance subscriber ID 603 (FIG. 6) of the receipt basic information 601. The inquiry date 802 is the date on which the inquiry was received. A walk 805 from the cigarette 803 is the result of an inquiry. The cigarette 803 is the average number of cigarettes smoked per day when there is a smoking habit, and “none” when not smoking. The drinking 804 is the average daily drinking amount (unit = ml) when there is a drinking habit, and “none” when there is no drinking habit. Walking 805 is the average walking time (unit = minute) of the day.

In addition, detailed information such as the number of steps, the amount of alcohol consumed, and the number of smokers may not be obtained from the interview information. There is a case where not the specific amount of drinking but the corresponding one of the frequencies classified in advance in the questionnaire is answered. For example, if you can obtain information only about whether or not you smoke or drink alcohol, divide the frequency of alcohol consumption into several levels (eg (1) no alcohol consumption, (2) 1-2 times a week, (3) weeks (3 times or more). In this case, the value of the inquiry information is a number having no quantitative meaning.

∙ If there is no response to a specific item, the data of the inquiry information may be missing. For example, in FIG. 8, data for the walking 805 is missing among the inquiry items that the health insurance subscriber ID “K0003” consulted in 2004.

Next, processing of the data shaping unit 107 will be described. The data shaping unit 107 aggregates and integrates information for each subscriber and each period from the health care data stored in the medical information storage unit 117, and shapes the information into a table format. In the following description, one period is assumed to be one year, but another period such as six months, two years, or three years may be used. Moreover, although the example which shape | molds using all of receipt information, medical examination information, and inquiry information is demonstrated, these data do not necessarily need to be prepared, for example, only receipt information and medical examination information may be used. . Further, data other than these may be added.

FIG. 15 is a diagram for explaining an example of the shaping information 1501. The process of the data shaping part 107 is demonstrated using FIG.

The shaping information 1501 includes the receipt shaping information obtained by shaping the 2004 receipt information. Each row of the shaping information 1501 is obtained by tabulating data for one year corresponding to one health insurance subscriber ID.

The health insurance subscriber ID 603, gender 604, age 605 and total score 607 are the same as the health insurance subscriber ID 603, sex 604, age 605 and total score 607 (FIG. 6) of the basic receipt information 601, respectively. The data year 1502 is the year of the data from which the shaping information is created.

Wound and illness name code 10 (1503) is the number of receipts having a wound and illness name code of 10 among the receipts of the health insurance subscriber ID. Similarly, the wound name code 20 (1504) is the number of receipts having the wound name code 20 in the receipt of the health insurance subscriber ID. The medical practice code 1000 (1505) is the number of receipts for which the medical practice code of 1000 is performed among the receipts of the health insurance subscriber ID. The drug code 110 (1506) is the number of receipts for which a drug with the drug code 110 is prescribed among the receipts of the health insurance subscriber ID.

The processing of the data shaping unit 107 will be specifically described in the case of shaping the 2004 data.

First, select one health insurance subscriber ID. The receipt search number of the health insurance subscriber ID whose medical treatment date is 2004 is acquired from the receipt basic information 601. Next, referring to the wound name information 901, for each wound name code, the number of receipts in which the wound name code is described is counted. Thereby, the number of receipts of each disease name code is obtained. Similarly, the number of receipts for each medical practice code is counted with reference to the medical practice information 1101, and the number of receipts for each pharmaceutical code is counted with reference to the pharmaceutical information 1301. As a result, a 2004 data row of the selected health insurance subscriber ID is generated. This process is performed for all combinations of health insurance subscriber IDs and years to be analyzed.

For example, in the shaping information 1501 shown in FIG. 15, the search numbers “11”, “12”, and “13” can be acquired from the basic receipt information 601 for the 2004 data of the health insurance subscriber ID “K0001” on the first line. Referring to the wound and disease name information 901, of these three receipts, there are two of the search numbers “11” and “13” whose wound and disease name code is “10”. Therefore, 2 is registered in the column of the disease name code 10 in the first line of the shaping information 1501.

15 includes the medical examination shaping information shaped from the medical examination information. Each row is a total of data corresponding to one health insurance subscriber ID.

The value of each item is the value of the medical examination data for the subscriber and year indicated by the health insurance subscriber ID 603 and the data year 1502. This medical examination data can be acquired from the medical examination information 701. When the medical examination information 701 includes medical examination data of the same health insurance subscriber ID for the same year, the data of any one of the examination dates may be used, or the average of a plurality of medical examination results for the year may be used. . When using data from a single visit date, it is recommended to use data from a general checkup date that is carried out at approximately the same time every year. In addition, data with few defects may be selected. As the missing data, a numerical value indicating a predetermined missing is used. In the example shown in FIG. 15, −1 was used. In addition, all the values of subscribers without medical examination information are assumed to be missing data.

The shaping information 1501 shown in FIG. 15 includes inquiry shaping information shaped from the inquiry information. Each row is a total of data corresponding to one health insurance subscriber ID.

The value of each item is the value of the inquiry data for the subscriber and year shown in the health insurance subscriber ID 603 and the data year 1502. This inquiry data can be acquired from the inquiry information 801. When the inquiry information 801 includes inquiry data of the same health insurance subscriber ID in the same year, the data of any one of the consultation dates may be used, or an average of a plurality of interview results in the year may be used. When using data from a single visit date, it is recommended to use data from a general checkup date that is carried out at approximately the same time every year. Alternatively, data with few defects may be selected. As the missing data, a numerical value indicating a predetermined missing is used. In the example shown in FIG. 15, −1 was used. In addition, all the values of subscribers without medical examination information are assumed to be missing data.

Through the above processing, the receipt shaping information, the medical examination shaping information, and the inquiry shaping information can be generated. FIG. 15 shows only data for 2004, but shaping data for another year is also created.

Here, when creating the receipt shaping information, similar items may be collected and a plurality of items may be integrated. For example, when the function of the diabetic oral drug A and the function of the diabetic oral drug B are similar among the items of pharmaceuticals, these may be collectively treated as one item. At this time, a value obtained by adding the number of prescriptions of the oral diabetes drug A and the prescription number of the oral diabetes drug B in the same year is set as the value of the newly summarized item. The criteria for judging whether items are similar may be as follows. The medical practice name belonging to the same injury and illness classification in the medical practice classification information 1201 is set as a similar item. In addition, the names of drugs belonging to the same injury and illness classification in the drug classification information 1401 are set as similar items. Also, similar item information is created in advance by hand.

FIG. 16 is a diagram for explaining an example of shaping information 1501 obtained by integrating the wound name code 10 and the wound name code 20 of the receipt shaping information. The value of the wound name code 1601 is a value obtained by adding the value of the wound name code 1503 and the value of the wound name code 1504 in FIG. 15, and the number of receipts with the wound name code “10” and the wound name code “ The number and total of the receipts that are 20 ”.

The shaping information storage unit 118 of the database 116 stores the created receipt shaping information, medical examination shaping information, and inquiry shaping information shown in FIGS. The formatting information 1501 is numerical data in a tabular format.

In addition, although the value of the receipt shaping information is tabulated by the number of receipts, that is, the number of prescriptions, it may be information on the presence or absence of prescription. That is, a case where the number of prescriptions is 1 or more (there is a prescription) may be summarized as 1, and a case where the prescription number is 0 (there is no prescription) may be represented as binary. In addition, assuming that the number of prescriptions represents the severity, the value of the reception shaping information may be a value obtained by classifying the number of prescriptions into stages. For example, 0 may be used when the number of prescriptions is 0, 1 when the number of prescriptions is 1 to 4, and 2 when the number of prescriptions is 5 or more.

In the above-described example, the receipt information, the medical examination information, and the inquiry information are collected in a period of one year. However, different periods such as every two years may be used. In the following, the case where the period is summarized every year will be described as an example.

Next, the graphical model creation unit 108 will be described.

In the graphical model creation unit 108, a graph and a conditional probability table expressing each item of the shaping information stored in the shaping information storage unit 118 as a random variable, the random variable as a node, and a conditional dependency between the random variables as an edge Create a model consisting of However, there are two types of edges, directed and undirected. Assume that a node set is V, an edge set is E, and a graph is G = (V, E). The graphical model creation unit 108 creates a graphical model such as a Bayesian network or a Markov network as a model.

The following describes the graphical model with an example.

FIG. 22A is a simple model composed of two nodes. The number of X-year oral drug prescriptions is a random variable that represents the number of oral drug prescriptions for diabetes in year X, and the number of X + n-year insulin prescriptions is a random variable that represents the number of times of insulin prescription for diabetes of X + n years. If the nodes representing the respective random variables are v1 and v2, the graph of FIG. 22A is composed of v1, v2, and a directed edge e1 from v1 to v2. If V = (v1, v2) and E = (e1), the graph in FIG. 22A becomes G = (V, E).

Next, the conditional probability table will be described. If the random variables represented by the nodes v1 and v2 are x1 and x2, respectively, the graph G shown in FIG. 22A shows that the simultaneous distribution p (x1, x2) of x1 and x2 is p (x1, x2) = p (x2 | X1) is given by p (x1). That is, the probability distribution of x2 depends on the value of x1, and is given by the conditional probability p (x2 | x1) for x1. Since the probability variable x1 has no parent node, the probability distribution of x1 is p (x1). The conditional probability table is the value of p (x1) and p (x2 | x1). The probability table of p (x1) is a probability value for each value of x1. An example is shown at 2201 in FIG. 22B. Table 2201 shows that, for example, the probability that p (x1 = 0) = a1 is x1 = 0 is a1. This can be obtained by calculating the proportion of cases in which the number of oral drug prescriptions in year X is 0 among the cases (insured persons) of the receipt shaping information for model generation. a2, a3,... can be calculated in the same manner. Since p (x1) is a probability distribution, Σp (x1) = 1. Here, the sum is taken for all values of x1. The probability table of p (x2 | x1) is obtained by calculating p (x2 | x1) for each value of x1 and x2. For example, p (x2 = s2 | x1 = s1) is obtained by calculating the ratio of cases where x2 = s2 among cases where x1 = s1. By this calculation, a probability table is obtained.

22A and 22B, the graph G shown in FIG. 22A and the probability table shown in FIG. 22B are graphical models. By using this model, for example, when the number of oral drug prescriptions for a given insured for a certain year is known, the probability distribution of the number of times that the insured is prescribed insulin after n years can be obtained. it can. For example, when the number of oral drug prescriptions is 1 this year, the probability of prescribing insulin twice after n years is given by P (x2 = 2 | x1 = 1).

Next, an example of FIG. 23 in which the number of random variables is increased from FIG. 22 will be described. When it is desired to predict the number of insulin prescriptions in X + n years, the number of oral drug prescriptions in year X is used in FIG. However, the number of prescriptions of insulin in X + n years can be expected to be greater for people with higher blood sugar levels. It can also be expected to depend on age. Therefore, for example, as shown in FIG. 23, it is assumed that more accurate prediction can be made by predicting the number of X + n-year insulin prescriptions using the number of X-year oral drug prescriptions, the year X blood glucose level, and the year X age.

Random variables representing the number of X-year oral drug prescriptions, X-year blood glucose levels, X-year ages, and X + n-year insulin prescriptions are x1, x2, x3, x4, respectively, and the nodes representing these are v1, v2, v3, v4 deep. The node set of this graph is V = (v1, v2, v3, v4). Also, three directed edges are defined. If the directed edges from X1 to X4, X2 to X4, and X3 to X4 are set to e1, e2, and e3, respectively, the edge set is E = (e1, e2, e3). The graph is expressed as G = (V, E). From this graph, the simultaneous distribution of x1,..., X4 is p (x1, x2, x3, x4) = p (x4 | x1, x2, x3) p (x1) p (x2) p (x3). The conditional probability table is obtained by calculating p (x1), p (x2), p (x3), and p (x4 | x1, x2, x3) for each value of x1,. With this model, not only the number of X-year oral drug prescriptions but also the X-year blood glucose level is known, the number of X + n-year insulin prescriptions can be predicted more accurately.

In the case of a small model as shown in FIG. 22 or FIG. 23 described above, it is possible to define what depends on the probability distribution of the number of X + n-year insulin prescriptions based on experience and knowledge. However, it becomes difficult as the scale increases. For example, the number of X + n-year insulin prescriptions may depend on other diabetes-related medical prescription items such as sex, drugs, medical examinations, and some items of medical examination. In addition, the number of oral drug prescriptions and the blood glucose level itself depend on other items. Therefore, when a random variable becomes large like the item of the receipt shaping information, the stochastic dependency (edge) may be automatically created based on the data. Further, at the time of creation, the presence / absence of an edge, directed / undirected may be limited by a dependency based on experience and knowledge. A Bayesian network structure learning technique or the like can be used as an existing technique.

If the graphical model is used, for example, for predicting the probability of onset after 3 years, a graphical model may be created using the items in the receipt shaping information for year X and year X + 3 as random variables. These are created from past data. For example, data of 2008 and 2011, 2009 and 2012 are used. At this time, even if the data is for the same insured, the data for 2008 and 2011 and the data for 2009 and 2012 can be used for learning as different cases.

Here, a configuration example of the graphical model will be described with reference to FIG. 21A.

The graphical model in FIG. 21 is composed of an item for year X and an item for year X + n. There are three types of edge between items, the edge between items in the same year, the edge between the same items in X and X + N years, and between items that are not the same item in the X year and X + N years items There are no edges. The edges between the items of the first same year are indicated by solid arrows, and the edges between the remaining items of X years and X + N years are indicated by dotted arrows. Although not shown in FIG. 21, there are items indicating basic information such as age, sex, and occupation. These do not exist every X years and X + N years, but become one item as a whole. Therefore, there is a possibility that both items of year X and year X + N have an edge. Since these items have a large influence on the entire model, different models such as age, sex, and occupation may be created. In the figure, the directed side is indicated by an arrow, but it may be an undirected side.

The three types of edges described above will be described.

First, the edge between items of the same year indicated by a solid line will be described. At the edge between items of the same year, the stochastic dependence between items of the same year is shown. For example, when the cholesterol level is high, the BMI value tends to be high. The probabilistic dependence between items in the same year of interviews, medical examinations, and receipts is generally the same in all years unless the inspection method or the like changes significantly. Therefore, the edge structure between items in the same year is X It does not change in year or X + n year. That is, the edge structure indicated by the solid line is the same in the X year node group and the X + n year node group. This structure may be learned by a structure learning method of a Bayesian network or a Markov network based on data of items of the same year.

Next, the edge between the same items in year X and year X + N will be described. For example, as shown in the figure, this is an edge from the presence / absence of prescription diabetes oral medicine in year X, which is a receipt item, to the presence / absence of prescription diabetes oral medicine in year X + N. This is an edge representing the transition of the state over time, and indicates that the state of presence / absence of oral diabetes prescription in year X is used for prediction of the presence / absence of oral diabetes prescription in year X + N. For example, a person who received a prescription for oral diabetes in year X is likely to receive a prescription for oral diabetes in year X + N. Since the future state of each item is considered to depend on the current state of each item, this edge is defined between the same items in all X years and X + N years.

Next, the edges other than the same items in year X and year X + N will be described. This shows the cause and effect of affecting the state transition between the same items in the above year X and year X + N. For example, it is assumed that the probability that a person who has no prescription for oral diabetes in year X will receive a prescription for oral diabetes in year X + N is higher as the blood glucose level in year X is higher. Therefore, in order to more accurately predict the presence or absence of a diabetic oral drug prescription in year X + N, it is assumed that information on blood glucose level in year X is effective. Thus, these edges indicate that the state transition of an item from year X to year X + N is probabilistically dependent on the state of other items in year X. These edges are defined between non-identical items of year X and year X + N, where the stochastic dependence is above a certain level. For example, in a simple method, a correlation coefficient may be calculated and defined between items above a certain threshold.

Thus, the created graph and probability table are stored in the graphical model storage unit 117.

Next, the onset probability prediction unit 109 will be described. In the onset probability 109, the onset probability of a future item is predicted using the model stored in the graphical model storage unit 117. In the graphical model, a probability distribution of an unknown random variable when a known value is given to some random variables can be obtained. For example, given this year's health checkup, medical inquiry, and receipt data, it is possible to obtain the probability distribution of the remaining X + n-year random variables with the value of the random variable of X-year known. Thereby, for example, the probability of occurrence of a certain disease can be calculated by obtaining the probability distribution of the medical prescription of X + n years and the prescription of the medicine. For such probability reasoning, Junction Tree Algorithm can be used. Thereby, the onset probability after n years can be predicted based on this year's data of each insured. Further, when the medical cost information for each medical practice is included in the data, the probability distribution and expected value of the medical expenses for each medical practice in X + n years can be predicted by using the same method.

An example of onset probability prediction will be described using the example of the diagram shown in FIG. 21A. First, when the health checkup, inquiry, and receipt data for this year are obtained, the data is set as observation data in the year X node group in FIG. 21A. At this time, there may be an unknown item. For example, unexamined items and unanswered items such as interviews are unknown. First, the state of an unknown item is probabilistically inferred from the observation data based on the edge between X year nodes indicated by a solid line. This gives an estimated probability of each state of unknown items this year.

Next, the probability of the state of each item after N years is inferred based on the edge indicated by the dotted line. Thereby, the estimated probability of each state of each item after N years is obtained. By calculating the expected value of each item, a predicted value such as a test value after N years can be obtained.
Next, suppose we want to predict the state after 2N years. In this case, the same structure as that of the current layer and the layer after N years is used for the layer after N years and 2N years. That is, the layers after N years and 2N years in FIG. 21B are the same as the layers of years X and X + N years in FIG. 21A. Then, based on the estimated probability of each state of each item after N years, the estimated probability of each state of each item after 2N years is calculated. As a result, the state after 2N years can be predicted. By repeating this, the future state can be predicted as in 3N years and 4N years later.

Of the configurations included in FIGS. 1 and 2, configurations not described in the present embodiment will be clarified in the following embodiments.

In the first embodiment, an example of a health care data analysis apparatus that extracts a disease to be a problem based on health care data including receipt information, medical examination information, inquiry information, and the like has been described. Health insurance providers, on the other hand, want to grasp the causes of disease onset in order to reduce the onset of the disease in addition to the disease that will be a future issue. However, the amount of health care data is enormous and the relationship between the data is complex. Even if the problem disease can be grasped, it is not easy to grasp the cause.

In the second embodiment, an example of a health care data analysis device that extracts factors of a disease to be a problem in addition to the disease to be a problem and visualizes the relationship between the disease and the factor in a graph format will be described. .

Since the configuration and processing are the same as those in the first embodiment except for the factor extraction unit 111, the visualization unit 112, and the factor storage unit 122, the processing is omitted.

The factor extraction unit 111 of the health care data analysis system of the second embodiment extracts factors for each problem disease. The visualization unit 112 creates and visualizes a graph structure to which information on the problem disease and the factor is added.

First, the factor extraction unit 111 will be described.

The factor extraction unit 111 provides a function of extracting items that cause the problem disease stored in the problem disease storage unit 121. Here, a factor extraction function for a health insurance company to extract test values and lifestyle habits that affect the onset from one of the subject diseases extracted from the data of the insured group will be described.

FIG. 4 is a flowchart of the factor extraction function process.

In the target disease selection step 401, one item is selected as a target disease item from the target diseases stored in the target disease storage unit 121.

In the factor candidate narrowing step 402, factor candidate items to be factor candidates are selected from the items of the graphical model stored in the graphical model storage unit 118. For example, items that do not change for each target person such as gender, items that depend strongly on the data acquisition time such as age, etc. are items that are unique for each target person or that change reliably depending on the data acquisition time. There is no prospect of being affected by guidance interventions. Therefore, for example, when extracting only items that can be improved by intervention of health guidance as factors, these items may be excluded from the factor candidate items.

In the inter-item dependency calculation step 403, the inter-item dependency is calculated. The degree of dependence represents the degree of similarity or relevance between items, and takes a larger value as the degree of dependence is higher. The dependency between the node vi and the node vj is s (i, j).

Give a first example of inter-node dependency. The dependence between nodes with edges is 1 and the dependence between nodes with no other edges is 0.

Give a second example of inter-node dependency. A mutual information amount between two random variables expressed by two nodes is defined as a dependency. The mutual information I (X, Y) of the random variable X and the random variable Y is p (x, y) for the simultaneous probability distribution of X and Y, p (x), p (y) for the peripheral probability distribution of X and Y. ), I (X, Y) = ΣΣp (x, y) log (p (x, y) / p (x) p (y)). Here, the sum is taken for all X and Y values. When calculating the mutual information amount, the joint probability distribution p (x, y) for all node pairs and the peripheral probability distribution p (x) for all nodes are calculated in advance and stored in the storage device. You may keep it. Further, the degree of dependence between nodes having no edge may be 0 regardless of the mutual information amount.

Give a third example of inter-node dependency. Two random variables represented by two nodes are set as X1 and X2. Now, the dependence of the two random variables X1 and X2 is calculated. Based on the receipt shaping information, x1 = (x11, x12,..., X1n) and x2 = (x21, x22,..., X2n) are calculated as vectors in which X1 and X2 cases are arranged. In this example, the dependence is calculated based on the correlation coefficient when x1 and x2 are considered as vectors.

Here, the correlation coefficient between the vectors x1 and x2 is r (x1, x2). However, since elements x1 and x2 have missing values, elements having missing values in either x1 or x2 are removed. For example, when x1i is missing, x2i is removed. In this way, the vector obtained by removing the missing dimension from x1 and x2 is again set as v1 = (v11, v12,..., V1m) and v2 = (v21, v22,..., V2m).

The value of the correlation coefficient r (v1, v2) is shifted depending on the property of the value of v1, v2, even if it has the same degree of dependence. Therefore, first, it can be assumed that the vectors w1 and w2 in which the elements of v1 and v2 are rearranged independently and randomly are not dependent. Using this, | r (v1, v2) |-| r (w1, w2) | is calculated. If | r (v1, v2) | <| r (w1, w2) |, it can be determined that there is no dependency. Therefore, the dependence in this case is 0, and the dependence in other cases is | r (v1, v2) |-| r (w1, w2) |. This makes it possible to calculate the degree of dependence compared to a random case (when there is no dependence).

Give a fourth example of inter-node dependency. Two random variables represented by two nodes are set as X1 and X2. Now, the dependence of the two random variables X1 and X2 is calculated. Based on the receipt shaping information, x1 = (x11, x12,..., X1n) and x2 = (x21, x22,..., X2n) are calculated as vectors in which X1 and X2 cases are arranged. In this example, the dependence is calculated based on the entropy between x1 and x2.

First, as in the case of the quantitative dependency, vectors from which missing values are removed are denoted by v1 and v2. Next, a set of element pairs of vectors v1 and v2 is S = {(v1i, v2i)} (i is an integer value from 1 to m). The number of elements of S is m. For S element p = (p1, p2), the number of S elements equal to p is np. Also, let L be the number of elements with different S. At this time, the entropy of a pair of v1 and v2 normalized by L is expressed by the following equation.
(Equation 1)
e (v1, v2) = Σ [(− np / m) log (−np / m)] / L
Here, Σ is the sum of all elements p of S. As in the case of the third dependency example, e (w1, w2) is calculated for randomized w1, w2. e (v1, v2) is a positive value, and becomes smaller as the co-occurrence degree of v1, v2 is larger. Therefore, when e (v1, v2) / e (w1, w2) normalized in a random case is larger than 1, it can be determined that there is no dependency between v1 and v2. Further, e (v1, v2) / e (w1, w2) is a value of 0 or more. Therefore, the dependence when e (v1, v2) / e (w1, w2) is greater than 1 is set to 0, and the dependence in other cases is 1-e (v1, v2) / e (w1, w2). And

Thus, the dependency between nodes is given.

In the task factor extraction step 404, the degree of dependence with the task disease item selected in the task disease item selection step 401 among the factor candidate items selected in the factor candidate narrowing step 402 is compared with a preset threshold value, and the threshold value is exceeded. The items having the dependency are extracted as factors.

At this time, the factor may be extracted in consideration of the attribute of the edge existing between the problem disease item and the factor candidate item. As a first example, whether or not to extract a factor item may be determined based on whether or not an edge exists between the problem disease item and the factor item. For example, when there is no edge between the problem illness item and the factor item, the factor item may be excluded from the factor item extraction target regardless of the degree of dependency between the problem illness item and the factor item. As a second example, when there is a directed side between the problem disease item and the factor candidate item, it may be determined whether or not the factor item is a factor item extraction target according to the direction. .

In addition, when the problem disease item and the factor candidate item are items each grouped for each predetermined period, the factor may be extracted in consideration of the predetermined period. For example, only when the problem disease item is a random variable node based on data of year X and the candidate factor item is a random variable node based on data of previous Xk years (k is a predetermined natural number), the candidate factor Items may be extracted.

Hereinafter, the cycle consisting of the three processes of factor registration step 405, step 406, and factor extraction step 414 extracts items that are highly dependent on the factors extracted in task factor extraction step 404 as new factors. This is a processing cycle including both a process to be registered and a process to extract an item having a high dependency on the registered factor item as a new factor and register it as a factor item. The purpose of this cycle is to extract both direct and indirect factors. Specific processing will be described below.

In the factor item registration step 405, the factor item extracted in the factor item extraction step 404 is registered as the factor of the subject disease selected in the subject disease selection step 401. Further, the factor item extracted in the factor extraction step 414 described later is registered as a factor of the subject disease.

In step 406, it is determined whether or not there is a factor item that has been evaluated whether there is a further factor among the factor items registered in factor item registration step 405. If there is a factor item that has not been evaluated, the process proceeds to a factor extraction step 414. If there is no factor item that has not been evaluated, the process proceeds to a factor DB registration step 407.

In the factor extraction step 414, one unrated factor item is selected from the factors registered in the factor registration step 405, and the factor of the item is extracted. The extraction method is the same as that of the task factor extraction step 404. In the processing explanation of the task factor extraction step 404, the problem disease item selected in the problem disease item selection step 401 and the problem disease item are registered in the factor registration step 405. It is equivalent to the factor that has been read as an unevaluated factor item among the registered factors.

Note that the dependency factor calculation method used in the task factor extraction step 404 and the factor extraction step 414 and the threshold value set for the dependency factor may be different. Further, for each processing cycle of the factor registration step 405, step 406, and factor extraction step 414, the dependency calculation method of the factor extraction step 414 and the threshold set for the dependency may be changed. For example, the threshold value may be changed in association with the number of processing cycles.
In the factor DB registration step 407, the target disease selected in the target disease selection step 401 and the factor registered in the factor registration step 405 are stored in the factor storage unit 122.

Next, the visualization unit 112 will be described.

In the visualization unit 112, the structure of the graphical model G = (V, E) stored in the graphical model storage unit 118, the problem disease stored in the problem disease storage unit 127, and the factor stored in the factor storage unit 112 are displayed. Visualize by adding information.

In the visualization of the graph, the node V is shown in a two-dimensional or three-dimensional space. The node is displayed with an appropriate figure such as ○. At this time, a character string representing a node item may be displayed inside or around the figure. The edge E connects the nodes with straight lines or curves, and the directed edge is represented with an arrow or the like. Note that the edge does not have to be displayed, and even when the edge is displayed, it is not necessary to distinguish the directed edge from the undirected, and there may be no arrow. In addition, information defined between two nodes such as dependency and relationship between two nodes V connected by the edge may be displayed as a character string inside or around the graphic representing the edge. Further, when the shaping information is an item collected for each predetermined period, the edge display method may be changed in consideration of the predetermined period to which the two nodes V connected by the edge belong. For example, when two nodes connected by an edge are Vi and Vj, when both Vi and Vj are nodes representing random variables based on data obtained in the same predetermined period, the edge is represented by a solid line. When Vi and Vj are nodes representing random variables based on data obtained in different predetermined periods, the edges may be represented by dotted lines. The change of the edge display method may be expressed as, for example, a difference in edge color, a difference in thickness, or a difference in straight lines or curves.

This is not particularly limited as a node placement method. For example, based on the presence / absence of an edge between nodes, a generally well-known method may be used in which coordinates are determined so that nodes connected by edges are arranged close to each other, or between two nodes Define the attractive force and / or repulsive force defined by an index such as dependency between nodes, and set the coordinates so that the force between all nodes or some nodes included in the graph is at a minimum. A force-oriented algorithm for determining may be used.

In the visualization of the target disease, the node V corresponding to the item representing the target disease is displayed by a different display method from the node that is not the target disease. For example, it is displayed in a color, shape, size, etc. different from the node group representing the item that is not the subject disease. In addition, the display method of the character string representing the item instead of the graphic representing the node itself may be changed in the same manner, and a graphic such as a frame line is added to the graph structure to express that it is a problem disease item. May be. Furthermore, information regarding problem diseases such as a problem disease list may be displayed as a character string expressed in a table format or the like in a display area different from the graph structure.

In the visualization of the factor, the node V corresponding to the item representing the factor is displayed in a different display method from the non-factory node. For example, it is displayed in a different color, shape, size, etc. from a node group representing items that are not factors. In addition, the display method of the character string representing the item instead of the graphic representing the node itself may be changed in the same way, or a figure such as a frame line may be added to the graph structure to express that it is a factor item. Also good. Furthermore, information regarding factors such as a factor list may be displayed as a character string expressed in a table format or the like in a display area different from the graph structure.

In the visualization of the problem disease and the factor, the edge display method may be changed depending on whether or not the two nodes V connected by the edge are included in the problem disease and the factor. For example, in order to emphasize the relationship between a problem illness and a factor, when both of the edges of the node V are nodes included in the problem illness or the factor, the edge is displayed thickly, In this case, the edge may be displayed thinly. The change in the edge display method may be expressed as, for example, a difference in edge color, a difference in straight line or curve, or a difference in solid line and dotted line.

The graph structure to be visualized in the present embodiment may be a partial graph structure of the graphical model G = (V, E) stored in the graphical model storage unit 118. For example, in the graphical model, a subgraph structure including nodes related to diseases and edges existing between nodes related to diseases may be visualized.

FIG. 18 is a screen example of a user interface showing an example of a form for realizing the present embodiment.

1801 is an operation window for setting a problem disease extraction and a factor extraction setting. Here, an example is shown in which target group narrowing, disease candidate narrowing, problem disease extraction logic setting, and factor candidate narrowing are possible. 1802, 1803, and 1804 are the same as 1702, 703, and 1704 in FIG. 17 described in the first embodiment. Reference numeral 1808 denotes an input window for setting a narrowing-down condition for narrowing down the factor candidate items to be factor candidates among the items of the graphical model stored in the graphical model storage unit 118. Here, as an example, gender items are excluded from the target. Reference numeral 1805 denotes an execution button for starting task disease extraction and factor extraction processing based on the task disease extraction and factor extraction settings set in 1802, 1803, 1804, and 1808.

1806 is a display window for displaying the processing result. Reference numeral 1807 denotes a display screen that displays the extracted problem and the extracted factor. Here, the extracted problem diseases are displayed in a table format in descending order of medical expenses for the next year, and the factors for each problem disease are described in a row corresponding to each disease.

1809 is a graph display screen for displaying the visualization graph created by the visualization unit 112. Here, the item that is the cause of renal failure and renal failure extracted as the first problem disease is a round node, and the item that is the cause of myocardial infarction and heart failure extracted as the second problem disease is a square type. It is expressed by the node. In addition, the partial graph composed of each task and factor is expressed by a bold line compared with other nodes and edges, and is highlighted.

As described above, the health care data analysis apparatus according to the present embodiment extracts a target disease and its factor based on the health care data, and further visualizes the graph structure with information on the target disease and the factor added thereto. In this way, it is possible to support the understanding of the problem disease and its factors.

In Example 2, the factor extraction unit 111 extracts the factor item of the target disease stored in the target disease storage unit 121, and the visualization unit 112 stores the graphical model G = (V, The information of E) was added to the structure of E) and visualized. In this process, the nodes of the same item collected in different predetermined periods are treated as different nodes and displayed. In the present embodiment, an example will be described in which nodes of the same item collected in different predetermined periods are treated as the same node, and factor extraction and visualization are performed. According to the present embodiment, it is possible to visualize the relationship between lifestyle habits / test values / pathological conditions and the relationship between time-series pathological conditions in a more easily understandable manner.

Since the configuration and processing are the same as those in the second embodiment except for the factor extraction unit 111 and the visualization unit 112, description thereof is omitted.

First, the processing of the factor extraction unit 111 will be described.

The factor extraction unit 111 provides a function of extracting items that are factors of the target disease stored in the target disease storage unit 121 as in the second embodiment. In the second embodiment, the example of extracting the factor based on the probabilistic dependency relationship between the task and the candidate factor has been described. However, in this embodiment, the factor extracting function for collecting the same items and expressing the graph is described. To do.

FIG. 5 is a flowchart of the factor extraction function process.

The problem disease selection step 401, the factor candidate narrowing step 402, and the inter-item dependency calculation step 403 perform the same processing as the processing described in the first embodiment, and thus description thereof is omitted. Hereinafter, processing of the same item problem factor extraction step 501, the same item factor registration step 502, step 503, the same item factor extraction step 504, and the same item factor DB registration step 505 will be described.

In the same item problem factor extraction step 501, first, factors are extracted using the same method as that described in the second embodiment.

Next, a node that is the same item as the extracted factor item and whose collected period is different from the factor item is extracted as an additional factor. At this time, it is not necessary to consider the dependency between the problem disease selected in the problem disease selection step 401 and the additional factor. Moreover, it is not necessary to consider the dependency between the factor and the additional factor. For example, when the test value L acquired in year X is extracted as a factor of the disease D acquired in year X + 1, the test value L acquired in year X + 1 is extracted as an additional factor.

In the following, the cycle consisting of the three processes of the same item factor registration step 502, step 503, and the same item factor extraction step 504 is performed with an item having a high dependency on the factor extracted in the same item task factor extraction step 501 as a new factor. A processing cycle that includes both processing for extracting and registering as a factor item, and processing for extracting an item having a high dependency on the factor item registered in the same item factor registration step 502 as a new factor and registering it as a factor item It is. The purpose of this cycle is to extract both direct and indirect factors. Specific processing will be described below.

In the same item factor registration step 502, the factor item extracted in the same item task factor registration step 501 and the additional factor item are registered as factors of the problem disease selected in the problem disease selection step 2501. In addition, the factor item extracted in the same item factor extraction step 504 described later is registered as a factor of the problem disease.

In step 503, it is determined whether there is a factor item that has been evaluated whether there is a further factor among the factor items registered in the same item factor registration step 502. If there is a factor item that has not been evaluated, the process proceeds to the same item factor extraction step 504. If there is no factor item that has not been evaluated, the process proceeds to the same item factor DB registration step 505.

In the same item factor extraction step 504, one unrated factor item is selected from the factors registered in the same item factor registration step 502, and the factor of the item is extracted. The extraction method is the same as that of the same item task factor extraction step 501, and the problem disease selected in the problem disease item selection step 401 in the processing description of the same item task factor extraction step 501 is registered in the factor registration step 502. It is the same as what has been read as unassessed factor items.

In addition, the threshold value set to the calculation method of the dependence used in the same item problem factor extraction step 501 and the same item factor extraction step 504 and the dependency may be different. Further, even if the same item factor registration step 502, step 503, and the same item factor extraction step 504 are processed, the dependency calculation method of the same item factor extraction step 504 and the threshold value set for the dependency may be changed. good. For example, the threshold value may be changed in association with the number of processing cycles.

In the same item factor DB registration step 505, the problem disease selected in the problem disease selection step 401 and the factor registered in the same item factor registration step 502 are stored in the factor storage unit 122.

Here, the effect of the processing of the factor extraction unit 111 in this embodiment will be described with reference to FIG.

FIG. 19 is an example of a graph stored in the graphical model storage unit 118. This graph was calculated from data acquired in N years, a node representing a random variable related to prescription of oral diabetes, a node representing a random variable related to prescription of insulin, a node representing a random variable related to dialysis, and acquired in N + 1 year. It is a graph including six nodes, a node representing a random variable related to prescription of diabetic oral medicine, a node representing a random variable related to prescription of insulin, and a node representing a random variable related to dialysis calculated from data.

A dotted arrow in FIG. 19A represents an edge representing a stochastic dependency between nodes of the same item grouped in different predetermined periods, and a broken arrow represents a probability between nodes of different items grouped in different predetermined periods. Represents an edge representing a dependency relationship.

Hereinafter, the effect will be described using an example of extracting the cause of the dialysis node acquired in N + 1 as an issue.

FIG. 19A shows that the dialysis node in N + 1 year has an edge between the N-year insulin and a stochastic dependency between dialysis and insulin. Therefore, the N-year insulin node may be extracted as a factor of the N + 1 year dialysis node. On the other hand, since there is no edge between the N year diabetes oral medicine node and the N + 1 diabetes oral medicine node, there is no edge between the N + 1 dialysis node and the diabetes oral medicine item and the dialysis item. There is no stochastic dependency expressed between them. However, there is a directed edge between the N-year diabetic oral medicine and the N + 1-year insulin, and a stochastic dependence is expressed. In other words, it can be seen that oral diabetes drugs affect the next year's insulin, and that insulin affects the next year's dialysis. Therefore, it can be seen from the graph that there is a stochastic dependence also in diabetic oral drugs and dialysis.

Hereinafter, an example of extracting factors based on these relationships will be described according to the present embodiment.

FIG. 19B is an example in which items directly connected to the problem disease are extracted as factors in the same item problem factor extraction step 501. In this factor extraction, an example is shown in which, when there is a directed edge in the problem disease and a directed edge from the factor candidate toward the target node, the candidate factor is extracted as an extraction target.

FIG. 19C shows the result of extracting a node having the same item as the extracted factor as an additional factor in the same item task factor extracting step 501. In this factor extraction, N + 1 year insulin is extracted as an additional factor.

FIG. 19D shows a result of repeating all the processes by repeating the cycle of the same item factor registration step 502, step 503, and the same item factor extraction step 504. From the results, it can be seen that an oral diabetes drug not directly connected to the dialysis node can be extracted as a factor.

Next, processing of the visualization unit 112 will be described with reference to FIGS. FIG. 24 is a flowchart of the processing of the visualization unit 112. FIG. 20 is an example showing a change in the graph to which the process is applied.

The flowchart of FIG. 24 will be described.

In the visualization edge selection step 2401, an edge representing a probabilistic dependency between nodes of different items collected in different predetermined periods is selected as an edge to be displayed. Edges not selected in this step are excluded from visualization targets in this process.

An example of processing in this step will be described with reference to FIGS. 20A and 20B.

FIG. 20A is an example of a graph created by the graphical model creation unit 108. This graph shows a node representing a random variable of lifestyle A, a node representing a random variable related to prescription of diabetic oral medicine, a node representing a random variable related to prescription of insulin, and a random variable related to dialysis, calculated from data acquired in N years , A node representing a lifestyle variable random variable, a node representing a random variable related to prescription of oral diabetes, a node representing a random variable related to prescription of insulin, and a probability related to dialysis, calculated from data acquired in N + 1 years It is a graph including eight nodes of nodes representing variables. The solid line arrows in FIG. 20A represent edges representing stochastic dependencies between nodes of different items collected in the same predetermined period, and the dotted arrows represent probabilistic relationships between nodes of the same item grouped in different predetermined periods. An edge representing a dependency relationship is represented, and a broken-line arrow represents an edge representing a stochastic dependency relationship between nodes of different items collected in different predetermined periods.

FIG. 20B is an example in which the visualization edge selection step 2401 is applied to the graph shown in FIG. 20A. Solid arrows in FIG. 20B represent edges between nodes of different items selected in the visualization edge selection step 2401 and collected in different predetermined periods.

In the same item aggregation step 2402, the nodes of the same item collected in different predetermined periods are aggregated into the same node, and then the coordinates are calculated for each aggregated node.

Give a first example of the aggregation method. The same item nodes collected in different predetermined periods are superimposed on the same coordinates. If a character string representing a node item is displayed inside or around the node, the contents of the character string are changed as appropriate.

Give a second example of the aggregation method. Separate the same item nodes collected in different predetermined periods from the connected edges, exclude those nodes from the visualization target, and add one new node connected to all the separated edges for visualization. Create a graph structure and make it a visualization target. If a character string representing a node item is displayed inside or around the node, the contents of the character string are changed as appropriate.

Next, the first example of coordinate calculation method is given. After applying the widely known node coordinate calculation method to the graph stored in the graphical model storage unit 118 before applying the processing of the visualization edge selection step 2401, the coordinates of each node are calculated, and then different predetermined periods. The coordinates after the aggregation are calculated from the coordinates of the nodes of the same item collected in (1). For example, when there are two nodes of the same item, the intermediate position of the original coordinates or the position obtained by weighted averaging is set as the coordinate after aggregation.

Give a second example of coordinate calculation method. Coordinates are calculated from the graph structure for visualization after applying the visualization edge selection step 2401 process to the graph stored in the graphical model storage unit 118. For example, as shown in the second example of the aggregation method, when a new visualization graph structure with a new node added is created, the coordinate calculation is performed by applying a widely known node coordinate calculation method to this graph structure. To do.

An example of processing in this step will be described with reference to FIG. 20C.

FIG. 20C is an example in which the same item node aggregation step 2402 is applied to the graph to which the processing of the visualization edge selection step 2401 shown in FIG. Here, an example is shown in which the processing described in the first example is used as the aggregation method, and the processing described in the first example is used as the coordinate calculation method.

In visualization step 2403, nodes and edges are visualized based on the coordinates obtained in the same item aggregation step. The node and edge display method uses the method shown in the description of the processing of the visualization unit 112 of the first embodiment.

Here, the effect of the processing of the visualization unit 112 will be described with reference to FIG.

The probabilistic dependency between nodes of different items collected in different predetermined periods represents the strength of the influence of one item on the transition of another item. Here, transition is a stochastic dependency between the same items over a plurality of years. For example, in FIG. 20A, the node of the lifestyle A of N years has a directional side with the node of the oral oral medicine of N + 1 years, and has a stochastic dependence relationship. This means that the lifestyle A item affects the transition to the next year of the oral diabetes drug item. On the other hand, N-year diabetic oral medicine has a promising edge between N + 1 year insulin and N + 1 year dialysis, and has a stochastic dependence. This means that the oral diabetes drug item affects the transition of the insulin item and the dialysis item to the next year. In summary, the lifestyle A item affects the transition to the next year of the oral diabetes drug item, and the diabetes propensity drug item affects the transition of the insulin item and the dialysis item the next year. Therefore, it can be seen that lifestyle habit A indirectly affects the transition of insulin items and dialysis items. However, in the graph visualization method shown in FIG. 20A, the stochastic dependency between lifestyle A and diabetes oral medicine can be read by the existence of directed edges, but the relationship between lifestyle A and insulin, and lifestyle A and dialysis is It is difficult to read because there is no edge between them. On the other hand, in the graph after applying the processing of the visualization unit 112 shown in FIG. 20C, visualization is performed in a form that makes it easy to understand the influence of the four items of lifestyle A, diabetes oral medicine, insulin, and dialysis on each other's transition. You can see that.

In the present embodiment, for the purpose of explanation, in the visualization edge selection step 2401, only edges representing stochastic dependence relationships between different items collected in different predetermined periods are selected, but for example, they are collected in the same predetermined period. Alternatively, an edge representing a stochastic dependency between different items may be selected, or both edges may be selected. Further, an example has been described in which an edge that has not been selected in the visualization edge selection step is not subject to visualization and is not displayed. For example, whether or not an edge is displayed may be expressed by the color, shape, or thickness of the edge.

As described above, the health care data analysis apparatus according to the present embodiment can visualize the relationship between lifestyle habits / test values / pathological conditions and the relationship between time-series pathological conditions in a more easily understandable manner.

In the first embodiment, an example of a health care data analysis device that extracts a disease that becomes a problem based on medical health care data including receipt information, medical checkup information, and inquiry information has been described. Health insurance providers, on the other hand, want to identify health-care workers who have a high risk of developing a disease in addition to a disease that will be a future issue. However, it is not easy to search for insured persons who have a high risk of developing future diseases from health care data, because deep knowledge about the causal relationship between illness and data is necessary and the amount of data is enormous.

In the fourth embodiment, an example of a health care data analysis device that extracts a subject who has a high risk of onset using information on a disease to be a problem will be described.

Since the configuration and processing are the same as those in the first embodiment except for the high-risk target person selecting unit 113, the description thereof is omitted.

The high-risk target person selection unit 113 determines the risk of developing the disease based on the information on the target disease stored in the target disease storage unit 121 and the disease evaluation index for each target stored in the disease evaluation index storage unit 120. Provide a high-risk target selection function to select high target persons. Hereinafter, a case where the health insurance company selects an insured person who has a high risk of developing the disease from the group of insured persons will be described as an example.

First, the high-risk target person selecting unit 113 reads the data of the insured group who selects the high-risk target person from the shaping information storage unit 113 or the input unit 102. For example, when using the data of the insured group used for creating the graphical model stored in the graphical model storage unit 114, the shaping information stored in the shaping information storage unit 113 is used as it is. When using data of an unknown insured group, data read from the input unit 102 and shaped by the data shaping unit 107 as necessary are used. Note that the data of the subject group may be data of all subjects, or may be used by sampling a subset of the subject group. For example, when targeting a group of insured persons older than a certain age, if a threshold is set in the item of age and only data of insured persons having an age equal to or greater than the threshold is selected from the data included in the shaping information, good. For sampling, thresholds may be provided for other items such as age and the number of medical treatments. Moreover, you may sample using well-known sampling methods, such as random sampling.

Next, the high risk target person selection function is applied to the insured group to select high risk target persons.

FIG. 25 is a flowchart of processing of the high risk target person selection function.

In the problem disease selection step 2501, a problem disease for which a high risk target person is selected is selected from the problem diseases stored in the problem disease storage unit 121. For the following description, the disease selected in this step is referred to as disease D.

Hereinafter, the steps from the target person sample selection step 2502 to the step 2504 are processes performed for each target person, and are one cycle of a process for all target persons. Specific processing will be described below.

In subject sample selection step 2502, one unprocessed insured sample is selected in the cycle. For the following explanation, the subject selected in this step is assumed to be insured S.

In the disease evaluation index reading step 2503, the disease evaluation index related to the disease D of the insured S stored in the disease evaluation index storage unit 120 is read.

In step 2504, if there is an unprocessed target person in the target group among the data of the target group, the process returns to the target person sample selection step 2502 to select an unpredicted target person. If not, the cycle is terminated and the routine goes to Step 2505.

In the high risk target person selection step 2505, the disease evaluation index for each target person read in the disease evaluation index reading step 2503 is compared, and an insured person having a high onset risk is selected.

The first example of the selection method is given. A threshold is set for the disease evaluation index, and a target person having an index equal to or higher than the threshold is selected as a high-risk insured person. For example, when the onset probability of the next year is selected as the disease evaluation index, an insured person whose onset probability of the next year predicted is equal to or higher than a threshold can be selected as having a high risk.

Give a second example of selection method. The disease evaluation indexes are arranged in descending order or in ascending order, and a predetermined number of target persons at the upper or lower order are selected as high-risk insured persons. For example, when the onset probability of the next year is selected as the disease evaluation index, the insured is selected in descending order of the onset probability of the next year.

The information on the target selected by the high risk target selection unit 113 may be output by the output unit 103 in a character format, a table format, or the like.

FIG. 26 is a screen example of a user interface showing an example of a form for realizing the present embodiment.

2601 is an operation window for setting a problem disease selection. Reference numeral 2602 denotes an operation window for selecting a problem disease. Reference numeral 2603 denotes an execution button for executing a high-risk target person selection process related to the target disease selected in 2602. Reference numeral 2604 denotes a display window for displaying the processing result. Reference numeral 2605 denotes a display area for displaying the target disease selected as the target of the high risk target person selection process. Reference numeral 2606 denotes a display screen that displays information on the subject selected as having a high risk for the selected problem disease in a table format. Examples of the display items include the subject disease onset probability, name, ID, and age for each target person.

As described above, the health care data analysis apparatus according to the present embodiment can select a health person who has a high onset risk for each problem disease in addition to a disease that will be a future problem.

In Example 2 and Example 3, an example of a health care data analysis system that extracts and visualizes a disease that is a problem and its factor has been described. In this example, the effect of changes in factors or other items on the subject or group of subjects on the onset probability of disease or the change in the stochastic dependence between items is simulated, and the results are An example of a medical data analysis system to be visualized will be described.

The configuration and processing are the same as those in the second or third embodiment except for the onset probability prediction unit 109, the visualization unit 112, and the virtual shaping data creation unit 114, and thus the description thereof is omitted.

In the fifth embodiment, based on the shaping information, a change in the probability of disease onset due to changes in factors and other items is simulated, and differences in the number of onset, medical expenses, etc. born based on the results are visualized.

First, the virtual shaping data creation unit 114 will be described.

The virtual shaping data creation unit 114 creates virtual shaping data by changing a part of the shaping data included in the shaping information storage unit 117.

FIG. 29 is a flowchart of processing of the virtual shaping data creation unit 114.

Hereinafter, the processing of each step will be described.

In item change information setting step 2901, information such as an item to be changed, a change amount, and a value after change is set. An example is shown below. It is assumed that the disease storage unit 121 stores the disease D as a task, and the factor storage unit 122 stores the average daily drinking amount (ml) as a factor of the disease D. For example, when a subject with an average daily drinking amount of 500 ml or more reduces the drinking amount by 500 ml, predicts how the onset probability of the target disease D and the onset probability of other diseases will change after N years In this case, the item of average drinking amount (ml) per day is set as the item to be changed, and -500 is set as the amount of change.

In subject sample selection step 2902, the data of the subject to be simulated is selected from the shaping data stored in the shaping information storage unit 117. For example, when predicting the influence on the onset probability of one insured person, the data corresponding to the insured person is selected from the shaped data stored in the shaped information storage unit 117 and used. When predicting the influence on the onset probability of an insured group composed of a plurality of insured persons, select data corresponding to the target insured group from the shaping data stored in the shaping information storage unit 117. And use.

In the virtual shaping data creation step 2903, new data is created by changing the value of the item whose effect on the disease is to be evaluated from the data selected in the subject sample selection step 2902. In the above-described example, the value of the daily average drinking amount (ml) item is changed to be new data. This data is called virtual shaping data. The created virtual shaping data is stored in the virtual shaping information storage unit 123.

FIG. 27A shows an example of the shaping data selected in the subject sample selection step 2902. FIG. 27B shows an example of virtual shaping data created by changing items related to drinking in the virtual shaping data creation step 2903. Here, the disease A item and the disease B item indicate the number of times of consultation with the corresponding disease per year, and alcohol consumption is the average daily alcohol consumption (ml).

Next, the onset probability prediction unit 109 will be described. The onset probability prediction unit 109 predicts the onset probability of a future item using the model stored in the graphical model storage unit 118. In the graphical model, a probability distribution of an unknown random variable when a known value is given to some random variables can be obtained. In the present embodiment, prediction is performed using data stored in the shaping information storage unit 117 and data stored in the virtual shaping information storage unit 123 as known values. Since the prediction method of each item after inputting known data is the same as the process demonstrated in Example 1, description is abbreviate | omitted. Thereby, in the above-mentioned example, it is a prediction result based on the subject's X-year health care data, and the amount of drinking does not change, and when it changes, two types after N years (N is The future state of any natural number) can be predicted. The predicted result is stored in the prediction result storage unit 119.

FIG. 28A is an example showing the predicted future state by applying the process of the onset probability prediction unit 109 to the shaping information shown in FIG. 27A. Here, an example in which a graphical model is created using medical data for two years and a state after one year is predicted is shown. The value of each item indicates an expected value related to the value calculated from the predicted occurrence probability of each item. FIG. 28B shows the predicted future state by applying the process of the onset probability prediction unit 109 to the shaping information created by changing the values of the items relating to smoking and drinking shown in FIG. 27B. In this example, it can be seen that the change in the item of drinking influences the predicted expected value of the number of consultations for disease A and disease B in the following year.

Next, the visualization unit 112 will be described.

The visualization unit 112 visualizes the structure of the graphical model G = (V, E) stored in the graphical model storage unit 118 by adding information related to the prediction result stored in the prediction result storage unit 119. Visualization of the graph structure is the same processing as that described in the second and third embodiments, and thus the description thereof is omitted. In the present embodiment, the graphs are visualized by the number of prediction results stored in the prediction result storage unit 119. All the graphs to be displayed have the same structure, and the coordinates of the nodes are displayed at the same coordinates in the coordinate system of each display area. At this time, the node and edge display methods are changed according to the corresponding prediction results. Specifically, the display method of the node is changed according to the difference in the prediction probability of the item represented by each node, and the display method of the edge connecting the nodes is changed according to the difference in the stochastic dependency between the nodes. Let

The following is a first example of changes in the node display method. For each prediction result, the average of the expected occurrence value for each item is calculated, and the size, shape, color, etc. are changed depending on the magnitude of the value. For example, when shaping information and virtual shaping information exist for a plurality of persons, the average per person of each onset probability is calculated.

を挙げる Give a second example of changes in the node display method. Based on the occurrence probability for each item, the number of onset for each disease is calculated, and the size, shape, color, etc. are changed according to the number of people for each disease. For example, in the case of an item indicating the presence or absence of occurrence by 0 and 1, the predicted expected value of the number of onsets in the subject group can be obtained by adding the onset probability predicted for each item to the number of subjects.

を挙げる Give a third example of changes in the node display method. Based on the occurrence probability for each item, the medical cost for each item is calculated, and the size, shape, color, etc. are changed according to the medical cost.

In any example of changes in the node display method, when a node is displayed, it may be displayed in the vicinity of the node as a character string representing the calculated onset probability, predicted number of people, or medical expenses.

The following is a first example of changes in the edge display method. The degree of dependence between items in the prediction result is calculated, and the size, shape, color, etc. of the edge are changed according to the degree of dependence between items corresponding to the nodes at both ends of the edge. Since the calculation method of the dependence is the same as that of the second embodiment, the description thereof is omitted.

Give a second example of the change in edge display method. The size, shape, color, and the like of the edge are changed according to the difference in values such as the onset probability, the expected number of occurrences, and the medical expenses of items corresponding to the nodes connected to both ends of the edge. For example, when there is a directed side from the disease D1 to the disease D2 between the disease D1 and the disease D2, the edge is displayed based on a value obtained by subtracting the expected number of people onset of the disease D1 from the expected number of people onset of the disease D2. Change the way.

In this embodiment, the example in which the graph is visualized by the number of the prediction results stored in the prediction result storage unit 119 has been described, but only a part of the prediction result may be visualized. For example, when there are two types of different virtual shaping data created by changing the item change information setting from the shaping data and one type of shaping data, and three types of prediction results based on them are stored in the prediction result storage unit 119 All the prediction results may be visualized in three graphs, respectively, or only one of the prediction results based on the shaped data and the prediction result based on the virtual shaped data may be visualized. Further, only two prediction results based on virtual shaped data may be visualized without visualizing the prediction results based on shaped data.

FIG. 30 is a screen example of a user interface showing an example of a form for realizing the present embodiment. Here, an example of a screen for setting two conditions and comparing the respective prediction results is shown. Reference numeral 3001 denotes an operation window for setting target selection information, item change information, and the like. 3002 is the same as 1702 in FIG. 17 described in the first embodiment. Reference numeral 3003 denotes a condition setting table for setting item change information. Each row meets one condition. In this example, an item represented by − indicates that no change is made, and an item for which a numerical value is input is changed to that value. Reference numeral 3004 denotes an operation button for performing an operation for adding a new condition to the condition setting table displayed in 3003. Reference numeral 3005 denotes an operation button for creating virtual shaping data based on the conditions set in the condition setting table and performing prediction using the shaping data and the virtual shaping data. Reference numeral 3006 denotes a display window for displaying a prediction result, and displays a graph representing a result predicted based on each condition displayed in the condition setting table 3003. Reference numeral 3007 denotes an operation window for switching the node display method, and displays the node by changing its appearance according to the selected index. Reference numeral 3008 denotes an operation window for switching the edge display method, and displays the edge by changing the appearance according to the selected index.

As described above, the health care data analysis apparatus according to the present embodiment predicts changes in the future situation such as the probability of occurrence of the disease, the number of cases, the medical cost, etc., due to the change in the factor of the subject disease and other items. It can be visualized with high visibility using a plurality of graphs.

Example 2 and Example 3 described examples of medical analysis systems that extract items related to test values and lifestyle habits as factors. Further, in the fifth embodiment, an example of an analysis system that predicts and visualizes changes in future situations such as the onset probability of the disease, the number of cases of onset, and medical costs due to changes in the factors of the subject disease and other items has been described. In this embodiment, health guidance effective for the target disease is extracted using health care data including the presence or absence of health guidance. Furthermore, an example of an analysis system that predicts and visualizes the effect of health guidance will be described.

Since the configuration and processing are the same as those of the second embodiment, the third embodiment, and the fifth embodiment except for the shaping information storage unit 117, the health guidance selection unit 124, and the virtual shaping data creation unit 114, the description thereof is omitted.

The shaping information storage unit 117 stores shaping data including items indicating the presence or absence of health guidance.

31 in FIG. 31 is an example of shaping data including an item indicating the presence / absence of health guidance. 603, 604, 605, and 607 in FIG. 31 are the same as those in FIG. In FIG. 31,

reference numerals

1501, 1502, 1503, 1504, 1505, 1506, 1508, 1509, 1510, 1511, 1512, 1514, 1515 and 1516 are the same as those in FIG.

Reference numerals

3102 and 3103 are items relating to the presence or absence of health guidance, and 1 is included for subjects who have received insurance guidance, and 0 is included for subjects who have not conducted insurance guidance. In this example, the shaping data including the health guidance presence / absence information has been described. For example, values such as the number of implementations for each insurance guidance may be used.

Hereinafter, the health guidance selection unit 124 that provides a function of extracting insurance guidance effective for the subject disease will be described first.

The health guidance selection unit 124 selects, for each task, an item including information on the implementation of health guidance among the factors for each problem disease stored in the factor storage unit 122. For example, when the test value V, the lifestyle S, and the health guidance G are stored as the factors of the problem disease D, the insurance guidance G is selected as the insurance guidance effective for the problem disease D.

The reason why the insurance guidance effective for the problem illness can be selected by the process of the insurance guidance selection unit 124 will be described with reference to the relationship between the problem illness and the health guidance. It is assumed that the shaping data of year X includes a group that has implemented a certain health guidance G and a group that has not, and each group has the same prevalence of the target disease D. However, if there is a difference in the prevalence of the target disease D in each group of X + N years, especially if the prevalence of the group that implemented the health guidance G is less than the group that did not implement the health guidance G, the insurance guidance G It can be expected to be effective in reducing the incidence of D. Similarly, there is a similar relationship between laboratory values and health guidance, and lifestyle and health guidance. In the graphical model stored in the graphical model storage unit 118, the probabilistic dependence between items is expressed by an edge, and the factor extraction unit 110 is a factor item of the subject disease based on the dependency defined between the items. To extract. Therefore, the health guidance item included in the factor item extracted by the processing of the factor extraction unit 110 is considered to be insurance guidance that affects the onset of the target disease or the test value / lifestyle that is the cause of the target disease. .

Next, processing of the virtual shaping data creation unit 114 necessary for providing a function for predicting and visualizing the effect of insurance guidance will be described.

The flowchart of the process of the virtual shaping data creation unit 114 is the same as the flowchart of FIG. 29 except for the item change information setting step 2901 in FIG.

In item change information setting step 2901, information such as the type of item to be changed, the amount of change, and the value after change is set. At this time, only items relating to the implementation of insurance guidance are selected as items.

The virtual shaping information stored in the virtual shaping information storage unit 123 is predicted by the same processing as the onset probability prediction unit 108 described in the fifth embodiment, and the prediction result is stored in the prediction result storage unit 119. The prediction result is visualized by the same process as the process of the visualization unit 112 described in the fifth embodiment.

複数 By obtaining multiple prediction results that change the content of health guidance implementation, and visualizing and comparing them in a graph, it is possible to grasp the future situation according to each insurance guidance implementation.

As described above, the health care data analysis apparatus according to the present embodiment can extract the health guidance effective for the target disease, and further determine the future situation such as the probability of occurrence of the disease, the number of cases, the medical cost, etc. by the health guidance implementation. Predict and visualize.

The present invention is not limited to the above-described embodiments, and includes various modifications. The above embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Also, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment. Moreover, the structure of another Example can also be added to the structure of a certain Example. Further, with respect to a part of the configuration of each embodiment, another configuration can be added, deleted, or replaced.

DESCRIPTION OF SYMBOLS 101 Healthcare data analyzer 102 Input part 103 Output part 104 Arithmetic unit 105 Memory 106 Storage medium 107 Data shaping part 108 Graphical model preparation part 109 Onset probability prediction part 110 Problem disease extraction part 111 Factor extraction part 112 Visualization part 113 High risk object Selection unit 114 virtual shaping data creation unit 115 database 116 medical information storage unit 117 shaping information storage unit 118 graphical model storage unit 119 prediction result storage unit 120 disease evaluation index storage unit 121 problem disease storage unit 122 factor storage unit 123 virtual shaping information Storage unit 124 Health guidance selection unit

Claims

A processor that executes a program; a memory that stores the program; and an input unit that receives input of information; and an analysis system that analyzes healthcare data by executing the program,
The analysis system is accessible to a database that stores health care information including medical records and test values of the subject, and shaping information that summarizes the health care information for each subject and every predetermined period,
The analysis system includes a first node group corresponding to a random variable representing a pathological condition and a second node group corresponding to a random variable representing a factor that affects a change in the pathological condition created based on the shaping information. And a directed or undirected edge representing a stochastic dependency between any two nodes included in the set of the first node group and the second node group, and a graphical model defined by Access to the database to store,
The processor predicts the onset probability of a disease based on the graphical model;
The processor extracts at least one pathological condition as a target disease based on the onset probability;
An analysis system comprising:
The analysis system according to claim 1,
The processor extracts a disease state or factor associated with the extracted problem disease as a factor of the problem disease based on a stochastic dependency between a random variable representing the extracted problem disease and the disease state or the factor. Factor extraction unit,
An analysis system further comprising:
The analysis system according to claim 1,
The processor is a high-risk target person selecting unit that selects a high-risk target person related to the target disease based on an onset probability of the target disease and a predetermined threshold value,
An analysis system further comprising:
The analysis system according to claim 1,
The subject disease extraction unit calculates a disease evaluation index for evaluating a disease state for each disease state based on the information received by the input unit and the onset probability, and based on the calculated disease evaluation index, at least one of the disease states An analysis system characterized by extracting a problem as a disease.
The analysis system according to claim 4,
The onset probability prediction unit predicts the onset probability of the disease for each subject from the shaping information,
The problem disease extraction unit calculates the disease evaluation index for evaluating a disease state for each subject and for each disease state based on information received by the input unit and the onset probability predicted for each subject. An analysis system characterized in that at least one of the disease states is extracted as a problem disease using a value obtained by counting the calculated disease evaluation indicators for each subject for each disease state as a new disease evaluation index.
The analysis system according to claim 2,
The factor extraction unit is put together in a predetermined period different from the predetermined period of the shaping information based on when the extracted factor is created, and a node of the same item as the extracted factor item is used as a new factor An analysis system characterized by extraction.
The analysis system according to claim 6,
The factor extracting unit extracts, as a new factor, a disease state or factor associated with the extracted problem disease based on a stochastic dependency between the random variable representing the extracted disease problem and the disease state or the factor. Analysis system characterized by repeating the processing to be performed at least once
The analysis system according to claim 1,
A visualization unit for visualizing the created graphical model by adding information on the extracted problem disease;
An analysis system comprising:
The analysis system according to claim 2,
The analysis system is accessible to a database storing health care information including information relating to the implementation of insurance guidance,
The analysis system comprising a health guidance content selection unit that selects, as the health guidance effective for the subject disease, a factor representing a random variable related to the implementation of health guidance among the extracted factors.
A processor that executes a program; a memory that stores the program; and an input unit that receives input of information; and an analysis system that analyzes healthcare data by executing the program,
The analysis system is accessible to a database that stores health care information including medical records and test values of the subject, and shaping information that summarizes the health care information for each subject and every predetermined period,
The analysis system includes a first node group corresponding to a random variable representing a pathological condition and a second node group corresponding to a random variable representing a factor that affects a change in the pathological condition created based on the shaping information. And a directed or undirected edge representing a stochastic dependency between any two nodes included in the set of the first node group and the second node group, and a graphical model defined by Access to the database to store,
The processor includes at least one virtual shaping information in which a part of information included in the shaping information is changed, and includes a virtual shaping data creation unit that stores the virtual shaping information in the database,
The onset probability predicting unit predicts an onset probability corresponding to each shaping information from the shaping information and at least one or more virtual shaping information based on the graphical model, and the predicted onset probability is Stored in the database,
The visualization unit displays the graphical model whose display form is changed in accordance with the predicted onset probability as many as the number of sets of the predicted onset probability.
The analysis system according to claim 10, wherein
The visualization unit changes a display form of the node according to an onset probability predicted for each node, and changes a display form of an edge existing between the nodes according to a stochastic dependency between nodes. An analysis system characterized by that.