WO2015071968A1 - Analysis system - Google Patents

Analysis system Download PDF

Info

Publication number
WO2015071968A1
WO2015071968A1 PCT/JP2013/080616 JP2013080616W WO2015071968A1 WO 2015071968 A1 WO2015071968 A1 WO 2015071968A1 JP 2013080616 W JP2013080616 W JP 2013080616W WO 2015071968 A1 WO2015071968 A1 WO 2015071968A1
Authority
WO
WIPO (PCT)
Prior art keywords
disease
information
factor
analysis system
item
Prior art date
Application number
PCT/JP2013/080616
Other languages
French (fr)
Japanese (ja)
Inventor
信二 垂水
利昇 三好
泰隆 長谷川
伴 秀行
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2013/080616 priority Critical patent/WO2015071968A1/en
Publication of WO2015071968A1 publication Critical patent/WO2015071968A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Definitions

  • the present invention relates to a data analysis technique, and more particularly to a healthcare data analysis system for analyzing healthcare data.
  • the Health Insurance Association conducts an insurance business that provides health guidance for the prevention of lifestyle-related diseases and the prevention of serious diseases for the purpose of reducing medical expenses.
  • resources such as public health nurses available for health guidance and costs for health guidance are limited. Therefore, a system that supports effective and efficient insurance business operation is desired.
  • Patent Document 1 discloses a health business support system for selecting a health guidance target person based on receipt information, medical examination information, and health guidance information.
  • a medical cost model creation unit that creates a medical cost model that indicates the predicted medical cost for each severity and test value
  • a test value improvement model creation unit that creates a test value improvement model that indicates the amount of improvement for each severity and test value
  • a health business support system including a target person selection unit that selects a target person is described.
  • an analysis system that includes a processor that executes a program, a memory that stores the program, and an input unit that receives an input of information, and analyzes healthcare data by executing the program
  • the analysis system is accessible to a database that stores health care information including medical records and test values of the subject, and shaping information in which the health care information is summarized for each subject and every predetermined period.
  • the analysis system creates a first node group corresponding to a random variable representing a pathological condition and a second node group corresponding to a random variable representing a factor affecting the change in the pathological condition, which are created based on the shaping information.
  • a database storing a defined graphical model, wherein the processor predicts the onset probability of the disease based on the graphical model, and the processor has at least one based on the onset probability.
  • Health insurance providers have a need to understand diseases that will be a future issue. The reason for this is that while it is easy to calculate the prevalence and medical expenses for each disease, it is not easy to predict future changes, and by grasping the future problem diseases, long-term health guidance is provided. The ability to create a plan.
  • the definition of a disease that is a problem varies depending on the health insurance company and from time to time. Examples thereof include diseases in which the number of affected persons increases in the future and diseases in which medical expenses increase in the future.
  • the problem is based on the causal relationship between diseases created based on the health care data and the model of the transition structure of the disease state, and the problem disease extraction logic input by the user to extract the problem disease. Extract disease.
  • the health care data is data including information on medical / health for each individual such as medical records and test values for each target person.
  • Specific examples of information included in the health care data include, for example, the name of the subject's injury and illness, the medical practice performed on the subject, the cost of the medical practice, health guidance, lifestyle based on interviews, etc. .
  • Receipt information is information that records the name of the sickness, prescription drugs, medical practice performed, and medical expenses (scores) when a health insurance member visits a medical institution. 6 will be described later.
  • the prescribed medicine and the practiced medical practice are collectively referred to as medical practice.
  • the health check information is information in which test values when a health insurance subscriber receives a health check, and an example thereof will be described later with reference to FIG.
  • the interview information is information in which the results of interviews such as lifestyle habits, past medical history, subjective symptoms, etc. when a health insurance subscriber receives a medical checkup, and an example thereof will be described later with reference to FIG.
  • FIG. 1 is a block diagram showing the configuration of the health care data analyzer of the first embodiment.
  • the health care data analysis apparatus includes a data analysis apparatus 101 and a database 115.
  • the data analysis apparatus 101 includes an input unit 102, an output unit 103, an arithmetic unit 104, a memory 105, and a storage medium 106.
  • the input unit 102 is a human interface such as a mouse and a keyboard, and receives input to the data analysis apparatus 101.
  • the output unit 103 is a display or a printer that outputs a calculation result by the health care data analyzer.
  • the storage medium 106 is a storage device that stores various programs that realize healthcare data analysis processing by the data analysis device 101, execution results of the data analysis processing, and the like.
  • the storage medium 106 is a non-volatile storage medium (magnetic disk drive, non-volatile Memory).
  • a program stored in the storage medium 106 is expanded.
  • the arithmetic device 104 is an arithmetic device that executes a program loaded in the memory 105, and is, for example, a CPU, a GPU, or the like. The processing device 104 executes the processing and calculation described below.
  • the health care data analyzer system is a computer system configured on a single computer or on a plurality of logically or physically configured computers, and is separated on the same computer. May operate on a virtual machine constructed on a plurality of physical computer resources.
  • the program executed by the computing device 104 is provided to each server via a removable medium (CD-ROM, flash memory, etc.) or a network, and is stored in a non-volatile storage device that is a non-temporary storage medium.
  • the computer system may include an interface for reading removable media.
  • problem disease extraction which is one of the health care data analysis processing, will be described first. Thereafter, various data used in the subject disease extraction process and the process of the various data will be described.
  • the problem disease extraction unit 110 extracts a problem disease that is considered to be a future problem, such as a disease that increases the probability of future onset or a disease that increases future medical expenses, and information on the problem disease in the subject group. I will provide a.
  • a case where the health insurance company extracts a problem disease from data of the insured group will be described as an example.
  • the problem disease extraction unit 110 predicts the probability of occurrence of an unknown item, for example, a future disease, from the known information of the insured group that is the target for extracting the problem disease.
  • a disease evaluation index is calculated for each disease based on the onset probability, and a disease to be a problem is extracted based on the calculated disease evaluation index.
  • the problem disease extraction unit 110 reads data of a group of subjects who perform problem disease extraction from the shaping information storage unit 113 or the input unit 102.
  • the shaping information stored in the shaping information storage unit 113 is used.
  • data read from the input unit 102 and shaped by the data shaping unit 107 as necessary are used.
  • the data of the subject group may be data of all subjects included in the data, or may be a sample of a subset of the subject group.
  • thresholds may be provided for items such as age and the number of medical treatments. Moreover, you may sample using well-known sampling methods, such as random sampling. By performing sampling, it is possible to extract problem diseases in a specific group.
  • an item related to a disease is selected as a disease candidate.
  • the item regarding the disease is selected based on the shaping information stored in the shaping information storage unit 117, for example.
  • FIG. 3 is a flowchart of a process for extracting a target disease using target group data and disease candidates.
  • the problem disease extraction logic is determined based on the information input to the input unit 102.
  • the problem disease extraction logic is calculated for each subject, a disease evaluation index that is an index calculated for each disease candidate, a disease evaluation index calculation method that is a method for calculating a disease evaluation index for each disease based on the probability of occurrence, and a target person.
  • the disease evaluation index totaling method which is a method for totaling the disease evaluation index for each disease
  • the problem disease extraction condition which is a condition for extracting a disease as a problem based on the total disease evaluation index.
  • the disease evaluation index is an index calculated for each disease in order to extract the target disease, and is an index determined based on the onset probability and medical expenses predicted by the onset probability prediction unit 109.
  • Examples of basic indicators include the probability of disease onset after N years, the expected value of the number of people who develop disease after N years, and the expected value of medical expenses related to the disease after N years.
  • an index combining a plurality of indices based on the onset probability in different years can be cited.
  • N represents an arbitrary natural number.
  • the disease evaluation index calculation method is a calculation method for calculating a disease evaluation index from an onset probability, and is defined for each disease evaluation index. For example, if it is an expected value of the number of onset of illness N years later, the probability of onset per subject is directly the expected number of onset for one subject. If it is the expected value of the medical expenses related to the disease after N years, it can be calculated, for example, by multiplying the expected value of the onset probability of the disease after N years by the average medical expenses for each disease. As the average medical cost for each disease, for example, the average medical cost calculated for the subject suffering from the disease from the shaping information stored in the shaping information storage unit 117 may be used.
  • a value obtained by summing up expected medical cost values N years after the medical action related to the subject disease may be used. If the index is based on the probability of onset in different years, such as the rate of increase in the number of disease onset in 10 years from N years to N + 10 years later, the expected number of disease onset in N years and the onset of disease in N + 10 years After predicting the expected number of people, the rate of increase may be calculated from the expected number of sick people after N + 10 years and the expected number of sick people after N years.
  • the disease evaluation index counting method is a method of counting each disease evaluation index obtained for each subject as a disease evaluation index for the entire target group. For example, as an example of a counting method when the disease evaluation index is an expected value of the number of onset of illness after N years, the expected number of onset calculated for each subject is totaled for the entire subject, The expected number of people affected can be calculated for the entire population.
  • the disease evaluation index is the rate of increase in the number of people with disease onset for 10 years from N years to N + 10 years later, it cannot be tabulated by summing up the indicators calculated for each subject, so expect the number of people with disease onset in N years
  • the values and the expected number of illnesses after N + 10 may be summed up by calculating the rate of increase after adding them up for the entire subject.
  • the problem disease extraction condition is a condition for extracting a disease as a problem based on a disease evaluation index collected by disease.
  • a condition for extracting a disease as a problem based on the expected number of people with onset of disease there is a method of setting a threshold for the expected number of people with onset and extracting a disease having an expected number of people with onset exceeding the threshold as a task.
  • the user selects and determines a problem disease extraction logic registered in the database in advance.
  • the user selects a problem disease extraction logic registered in the database in advance as a template, gives information to modify a part thereof, changes the logic, and determines a final problem disease extraction logic.
  • a problem disease extraction logic that allows a user to extract a desired problem disease by correcting a prediction year, a threshold value used for the problem disease extraction condition, or the like as desired.
  • the steps from the target person sample selection step 302 to the step 307 are processes performed for each target person, and are one cycle of processes for all the target persons. Specific processing will be described below.
  • subject sample selection step 302 one unprocessed insured sample is selected in the cycle.
  • the subject selected in this step is assumed to be insured S.
  • processing from the disease candidate selection step 303 to step 306 is processing performed for each item of the disease candidate, and is one cycle of processing for all the disease candidate items. Specific processing will be described below.
  • disease candidate selection step 303 one disease candidate item that has not been evaluated in the cycle is selected.
  • the item selected in this step is referred to as disease D.
  • the probability that the insured S selected in the subject sample selection step 302 will develop the disease D is predicted.
  • the expected value of the medical cost of the medical practice related to the disease D is also predicted.
  • the prediction is performed using the onset probability prediction unit 109 based on the known information of the insured person S.
  • the predicted onset probability is determined based on information included in the problem disease extraction logic. For example, when an expected number of people with onset of a disease state after N years is designated as the disease evaluation index, assuming the current year as year X, the onset probability of disease D in X + N years and the expected value of medical expenses are predicted.
  • the index is based on the probability of onset in different years, such as the rate of increase in the number of disease onset for 10 years from N years to N + 10 years, the probability of onset in X + N years and the expected number of people in N + 10 years Predict.
  • a disease evaluation index is calculated from the onset probability of the disease D predicted in the disease onset probability prediction step 304 and the medical expenses.
  • the disease evaluation index calculation method follows the disease evaluation index calculation method determined in the problem disease extraction logic determination step 301.
  • the calculated disease evaluation index is stored in the disease evaluation index storage unit 116 together with information on the insured S for which the disease evaluation index has been calculated.
  • step 306 if there is an unevaluated item in the cycle among illness candidates, the process returns to the illness candidate selection step 303, and an unevaluated item is selected. If not, the cycle is terminated and the routine goes to Step 307.
  • step 307 if there is an unpredicted target person in the cycle in the target person group, the process returns to the target person sample selection step 302, and an unpredicted target person is selected. If not, the cycle is terminated and the process proceeds to step 308.
  • the disease evaluation index storage unit 116 stores the The disease evaluation index for each insured is tabulated by disease.
  • the aggregation method follows the disease evaluation index aggregation method determined in the problem disease extraction logic determination step 301.
  • the aggregated disease evaluation index is stored in the disease evaluation index storage unit 116.
  • the target disease is extracted using the disease-specific disease evaluation index calculated in the disease-specific evaluation index totaling step 308.
  • the problem disease extraction method follows the problem disease extraction method determined in the problem disease extraction logic determination step 301.
  • the disease candidate selection step 303 an example in which one disease candidate item is selected in the disease candidate selection step 303 is shown.
  • a plurality of disease candidates may be selected at a time.
  • the disease onset probability prediction step 304 the onset probability of a plurality of diseases is predicted at a time.
  • the information on the problem disease extracted by the process of the problem disease extraction unit 110 is stored in the problem disease storage unit 117.
  • the information regarding the problem disease stored in the problem disease storage unit 117 may be output from the output unit 103 in a character format, a table format, or the like, for example.
  • FIG. 17 is a screen example of a user interface showing an example of a form for realizing the present embodiment.
  • Reference numeral 1701 denotes an operation window for performing setting of problem disease extraction.
  • target group narrowing, disease candidate narrowing, and problem disease extraction logic can be set.
  • Reference numeral 1702 denotes an input window for setting a narrowing condition for the input target group data.
  • a male included in the subject group is set as a subject group for subject disease extraction.
  • 1703 is an input window for setting a narrowing condition for narrowing down items of disease candidates related to diseases among items included in the graphical model stored in the graphical model storage unit 118.
  • a narrowing condition for narrowing down items of disease candidates related to diseases among items included in the graphical model stored in the graphical model storage unit 118 is set.
  • 1704 is an input window for determining the subject disease extraction logic.
  • the next year medical expenses are selected as a disease evaluation index
  • the disease evaluation index calculation method, the disease evaluation index tabulation method, and the problem disease extraction method are read from the database based on the selected disease evaluation index.
  • 1705 is an execution button for starting the target disease extraction process based on the target disease extraction settings set in 1702, 1703, and 1704.
  • 1706 is a display window for displaying the processing result.
  • 1707 is a display screen for displaying the extracted issues.
  • the subject diseases extracted based on the next year's medical expenses are displayed in a table format in descending order of the next year's medical expenses.
  • the data shaping unit 107 shapes the data stored in the medical information storage unit 116 in the database, and the graphical model creation unit 108 uses the graphical model based on the shaping data stored in the shaping information storage unit 117.
  • the shaping information storage unit 117 stores the shaping information created in advance based on the health care data, and the graphical model storage unit 118 is created in advance from the shaping information.
  • the data shaping unit 107, the graphical model creation unit 108, and the medical information storage unit 116 may not be provided in the configuration of this embodiment.
  • FIG. 2 is a diagram illustrating another configuration example in which the healthcare data analysis apparatus 101 does not include the data shaping unit 107 and the graphical model creation unit 108, and the database 115 does not include the medical information storage unit 116.
  • the health care data analysis apparatus can extract a disease that will be a future problem from the accumulated health care data of the target group based on various indices and by a simple operation. .
  • the medical information storage unit 116 stores health care data input to the input unit 102.
  • the receipt information, the medical examination information, and the inquiry information will be taken as examples of typical health care data, and each will be described.
  • the receipt information includes basic receipt information, wound name information, medical practice information, drug information, wound name classification information, medical practice classification information, and pharmaceutical classification information.
  • FIG. 6 is a diagram illustrating an example of basic receipt information.
  • the basic receipt information 601 is information that holds the correspondence between the receipt and the health insurance subscriber.
  • the basic receipt information 601 includes a search number 602, health insurance subscriber ID 603, gender 604, age 605, treatment date 606, total score 607, and the like.
  • the search number 602 is an identifier for uniquely identifying a receipt.
  • the health insurance subscriber ID 603 is an identifier for uniquely identifying a health insurance subscriber.
  • Gender 604 and age 605 are the gender and age of the subscriber.
  • the medical treatment month 606 is the year and month when the subscriber visited the medical institution.
  • the total score 607 is information indicating the total score of one receipt.
  • FIG. 9 is a diagram for explaining an example of the disease name information 901.
  • the wound name information 901 includes a search number 602, a wound name code 902, a wound name 903, and the like.
  • the search number 602 is an identifier for uniquely identifying a receipt, and the same number as the search number of the basic receipt information 601 (FIG. 6) is used.
  • the wound name code 902 is a wound name code written on the receipt.
  • the wound name 903 is the name of the wound corresponding to the wound name code.
  • FIG. 10 is a diagram for explaining wound name classification information.
  • Wound and disease name classification information 1001 is information for associating a wound and disease classification with a wound and disease name belonging to the wound and disease classification, and includes a wound and disease classification 1002, a wound and disease name code 902, a wound and disease name 903, and a complication presence or absence 1003.
  • the injury / illness classification 1002 is a classification to which the injury / illness belongs.
  • the wound name code 902 is a wound name code described in the receipt, and the same number as the wound name code 902 (FIG. 9) of the wound name information 901 is used.
  • the wound name 903 is the name of the wound corresponding to the wound name code, and the same name as the wound name 903 (FIG. 9) of the wound name information 901 is used.
  • Complication presence / absence 1003 indicates whether or not this wound is the name of a complication.
  • FIG. 11 is a diagram illustrating an example of medical practice information.
  • the medical practice information 1101 includes a search number 602, a medical practice code 1102, a medical practice name 1103, and a medical practice score 1104.
  • the search number 602 is an identifier for uniquely identifying a receipt, and the same number as the search number 602 (FIG. 6) of the basic receipt information 601 is used.
  • the medical practice code 1102 is an identifier for identifying the medical practice described in the receipt.
  • the medical practice name 1103 is the name of the medical practice corresponding to the medical practice code.
  • the medical practice score 1104 is an insurance score of the medical practice.
  • FIG. 12 is a diagram illustrating an example of medical practice classification information.
  • the medical practice classification information 1201 includes a wound classification 1002, a medical practice code 1102, and a medical practice name 1103.
  • the wound classification 1002 uses the same classification as the wound classification 1002 (FIG. 10) of the wound name classification information 1001.
  • the medical practice code 1102 is a medical practice code for identifying a medical practice performed for an injury or illness of the wound classification 1002, and uses the same code as the medical practice code 1102 (FIG. 11) of the medical practice information 1101.
  • the medical practice name 1103 is the name of the medical practice corresponding to the medical practice code, and the same code as the medical practice name 1103 (FIG. 11) of the medical practice information 1101 is used.
  • FIG. 13 is a diagram illustrating an example of pharmaceutical information.
  • the drug information 1301 includes a search number 602, a drug code 1302, a drug name 1303, and a drug score 1304.
  • the search number 602 is an identifier for uniquely identifying a receipt, and the same number as the search number 602 (FIG. 6) of the basic receipt information 601 is used.
  • the drug code 1302 is a drug code for identifying the drug described in the receipt.
  • the drug name 1303 is the name of the drug described in the receipt.
  • the drug score 1304 is the insurance score of the drug.
  • a receipt with a search number 602 of “11” describes the drug names of diabetes oral drug A and hypertension oral drug A.
  • FIG. 14 is a diagram for explaining drug classification information.
  • the drug classification information 1401 includes a wound classification 1002, a drug code 1302, and a drug name 1303.
  • the wound classification 1002 uses the same classification as the wound classification 1002 (FIG. 10) of the wound name classification information 1001.
  • the drug code 1302 is a drug code for identifying a drug prescribed by the classification registered in the injury and illness classification 1002, and the same code as the drug code 1302 (FIG. 13) of the drug information 1301 is used.
  • the drug name 1303 is the name of the drug corresponding to the drug code, and the same name as the drug name 1303 (FIG. 13) of the drug information 1301 is used.
  • medical practice information 1101 shown in FIG. 11 and the pharmaceutical information shown in FIG. 13 are collectively referred to as medical practice information.
  • medical practice classification information 1201 shown in FIG. 12 and the pharmaceutical classification information shown in FIG. 14 are collectively referred to as medical practice classification information.
  • FIG. 7 is a diagram for explaining an example of the medical examination information.
  • the medical examination information 701 is information for managing medical examination information for a plurality of subscribers for a plurality of years.
  • the health insurance subscriber ID 603, the medical examination reception date 702, and various examination values for example, BMI 703, Abdominal circumference 704, fasting blood glucose 705, systolic blood pressure 706, neutral fat 707).
  • the health insurance subscriber ID 603 is an identifier of a health insurance subscriber who has undergone a medical examination, and uses the same identifier as the health insurance subscriber ID 603 (FIG. 6) of the basic receipt information 601.
  • the medical checkup date 702 is the date on which the medical checkup was received.
  • BMI 703 to neutral fat 707 are the results of a health checkup.
  • Data of medical examination information may be missing, such as when a specific examination is not taken. For example, in FIG. 7, data on systolic blood pressure 706 is missing from the examination items that the health insurance subscriber ID “K0004” consulted in 2004.
  • FIG. 8 is a diagram for explaining an example of the inquiry information.
  • the inquiry information 801 is information for managing the inquiry information for a plurality of subscribers for a plurality of years.
  • the health insurance subscriber ID 603, the inquiry date 802, and the answer to the inquiry (for example, smoking 803, drinking 804, walking 805) )including.
  • the interview may include lifestyle habits, medical history, constitutions such as allergies, subjective symptoms, and the like.
  • the health insurance subscriber ID 603 is an identifier of a health insurance subscriber who has received an inquiry, and uses the same identifier as the health insurance subscriber ID 603 (FIG. 6) of the receipt basic information 601.
  • the inquiry date 802 is the date on which the inquiry was received.
  • a walk 805 from the cigarette 803 is the result of an inquiry.
  • the cigarette 803 is the average number of cigarettes smoked per day when there is a smoking habit, and “none” when not smoking.
  • the data of the inquiry information may be missing. For example, in FIG. 8, data for the walking 805 is missing among the inquiry items that the health insurance subscriber ID “K0003” consulted in 2004.
  • the data shaping unit 107 aggregates and integrates information for each subscriber and each period from the health care data stored in the medical information storage unit 117, and shapes the information into a table format.
  • one period is assumed to be one year, but another period such as six months, two years, or three years may be used.
  • these data do not necessarily need to be prepared, for example, only receipt information and medical examination information may be used. . Further, data other than these may be added.
  • FIG. 15 is a diagram for explaining an example of the shaping information 1501. The process of the data shaping part 107 is demonstrated using FIG.
  • the shaping information 1501 includes the receipt shaping information obtained by shaping the 2004 receipt information. Each row of the shaping information 1501 is obtained by tabulating data for one year corresponding to one health insurance subscriber ID.
  • the health insurance subscriber ID 603, gender 604, age 605 and total score 607 are the same as the health insurance subscriber ID 603, sex 604, age 605 and total score 607 (FIG. 6) of the basic receipt information 601, respectively.
  • the data year 1502 is the year of the data from which the shaping information is created.
  • Wound and illness name code 10 (1503) is the number of receipts having a wound and illness name code of 10 among the receipts of the health insurance subscriber ID.
  • the wound name code 20 (1504) is the number of receipts having the wound name code 20 in the receipt of the health insurance subscriber ID.
  • the medical practice code 1000 (1505) is the number of receipts for which the medical practice code of 1000 is performed among the receipts of the health insurance subscriber ID.
  • the drug code 110 (1506) is the number of receipts for which a drug with the drug code 110 is prescribed among the receipts of the health insurance subscriber ID.
  • the processing of the data shaping unit 107 will be specifically described in the case of shaping the 2004 data.
  • the receipt search number of the health insurance subscriber ID whose medical treatment date is 2004 is acquired from the receipt basic information 601.
  • the wound name information 901 for each wound name code, the number of receipts in which the wound name code is described is counted. Thereby, the number of receipts of each disease name code is obtained.
  • the number of receipts for each medical practice code is counted with reference to the medical practice information 1101, and the number of receipts for each pharmaceutical code is counted with reference to the pharmaceutical information 1301.
  • a 2004 data row of the selected health insurance subscriber ID is generated. This process is performed for all combinations of health insurance subscriber IDs and years to be analyzed.
  • the search numbers “11”, “12”, and “13” can be acquired from the basic receipt information 601 for the 2004 data of the health insurance subscriber ID “K0001” on the first line.
  • the wound and disease name information 901, of these three receipts there are two of the search numbers “11” and “13” whose wound and disease name code is “10”. Therefore, 2 is registered in the column of the disease name code 10 in the first line of the shaping information 1501.
  • Each row is a total of data corresponding to one health insurance subscriber ID.
  • the value of each item is the value of the medical examination data for the subscriber and year indicated by the health insurance subscriber ID 603 and the data year 1502.
  • This medical examination data can be acquired from the medical examination information 701.
  • the medical examination information 701 includes medical examination data of the same health insurance subscriber ID for the same year, the data of any one of the examination dates may be used, or the average of a plurality of medical examination results for the year may be used. .
  • data from a single visit date it is recommended to use data from a general checkup date that is carried out at approximately the same time every year.
  • data with few defects may be selected.
  • the missing data a numerical value indicating a predetermined missing is used. In the example shown in FIG. 15, ⁇ 1 was used.
  • all the values of subscribers without medical examination information are assumed to be missing data.
  • the shaping information 1501 shown in FIG. 15 includes inquiry shaping information shaped from the inquiry information. Each row is a total of data corresponding to one health insurance subscriber ID.
  • the value of each item is the value of the inquiry data for the subscriber and year shown in the health insurance subscriber ID 603 and the data year 1502.
  • This inquiry data can be acquired from the inquiry information 801.
  • the inquiry information 801 includes inquiry data of the same health insurance subscriber ID in the same year, the data of any one of the consultation dates may be used, or an average of a plurality of interview results in the year may be used.
  • data from a single visit date it is recommended to use data from a general checkup date that is carried out at approximately the same time every year. Alternatively, data with few defects may be selected.
  • the missing data a numerical value indicating a predetermined missing is used. In the example shown in FIG. 15, ⁇ 1 was used.
  • all the values of subscribers without medical examination information are assumed to be missing data.
  • FIG. 15 shows only data for 2004, but shaping data for another year is also created.
  • similar items may be collected and a plurality of items may be integrated.
  • the function of the diabetic oral drug A and the function of the diabetic oral drug B are similar among the items of pharmaceuticals, these may be collectively treated as one item.
  • a value obtained by adding the number of prescriptions of the oral diabetes drug A and the prescription number of the oral diabetes drug B in the same year is set as the value of the newly summarized item.
  • the criteria for judging whether items are similar may be as follows.
  • the medical practice name belonging to the same injury and illness classification in the medical practice classification information 1201 is set as a similar item.
  • the names of drugs belonging to the same injury and illness classification in the drug classification information 1401 are set as similar items.
  • similar item information is created in advance by hand.
  • FIG. 16 is a diagram for explaining an example of shaping information 1501 obtained by integrating the wound name code 10 and the wound name code 20 of the receipt shaping information.
  • the value of the wound name code 1601 is a value obtained by adding the value of the wound name code 1503 and the value of the wound name code 1504 in FIG. 15, and the number of receipts with the wound name code “10” and the wound name code “ The number and total of the receipts that are 20 ”.
  • the shaping information storage unit 118 of the database 116 stores the created receipt shaping information, medical examination shaping information, and inquiry shaping information shown in FIGS.
  • the formatting information 1501 is numerical data in a tabular format.
  • the value of the receipt shaping information is tabulated by the number of receipts, that is, the number of prescriptions, it may be information on the presence or absence of prescription. That is, a case where the number of prescriptions is 1 or more (there is a prescription) may be summarized as 1, and a case where the prescription number is 0 (there is no prescription) may be represented as binary.
  • the value of the reception shaping information may be a value obtained by classifying the number of prescriptions into stages. For example, 0 may be used when the number of prescriptions is 0, 1 when the number of prescriptions is 1 to 4, and 2 when the number of prescriptions is 5 or more.
  • the receipt information, the medical examination information, and the inquiry information are collected in a period of one year.
  • different periods such as every two years may be used.
  • the case where the period is summarized every year will be described as an example.
  • the graphical model creation unit 108 creates a graphical model such as a Bayesian network or a Markov network as a model.
  • FIG. 22A is a simple model composed of two nodes.
  • the number of X-year oral drug prescriptions is a random variable that represents the number of oral drug prescriptions for diabetes in year X
  • the number of X + n-year insulin prescriptions is a random variable that represents the number of times of insulin prescription for diabetes of X + n years.
  • the probability table of p (x1) is a probability value for each value of x1.
  • An example is shown at 2201 in FIG. 22B.
  • x1) is obtained by calculating p (x2
  • x1) for each value of x1 and x2. For example, p (x2 s2
  • the graph G shown in FIG. 22A and the probability table shown in FIG. 22B are graphical models.
  • x1 1).
  • FIG. 23 an example of FIG. 23 in which the number of random variables is increased from FIG. 22 will be described.
  • the number of oral drug prescriptions in year X is used in FIG.
  • the number of prescriptions of insulin in X + n years can be expected to be greater for people with higher blood sugar levels. It can also be expected to depend on age. Therefore, for example, as shown in FIG. 23, it is assumed that more accurate prediction can be made by predicting the number of X + n-year insulin prescriptions using the number of X-year oral drug prescriptions, the year X blood glucose level, and the year X age.
  • Random variables representing the number of X-year oral drug prescriptions, X-year blood glucose levels, X-year ages, and X + n-year insulin prescriptions are x1, x2, x3, x4, respectively, and the nodes representing these are v1, v2, v3, v4 deep.
  • the conditional probability table is obtained by calculating p (x1), p (x2), p (x3), and p (x4
  • the number of X + n-year insulin prescriptions may depend on other diabetes-related medical prescription items such as sex, drugs, medical examinations, and some items of medical examination.
  • the number of oral drug prescriptions and the blood glucose level itself depend on other items. Therefore, when a random variable becomes large like the item of the receipt shaping information, the stochastic dependency (edge) may be automatically created based on the data. Further, at the time of creation, the presence / absence of an edge, directed / undirected may be limited by a dependency based on experience and knowledge.
  • a Bayesian network structure learning technique or the like can be used as an existing technique.
  • a graphical model may be created using the items in the receipt shaping information for year X and year X + 3 as random variables. These are created from past data. For example, data of 2008 and 2011, 2009 and 2012 are used. At this time, even if the data is for the same insured, the data for 2008 and 2011 and the data for 2009 and 2012 can be used for learning as different cases.
  • the graphical model in FIG. 21 is composed of an item for year X and an item for year X + n.
  • edge between items There are three types of edge between items, the edge between items in the same year, the edge between the same items in X and X + N years, and between items that are not the same item in the X year and X + N years items
  • the edges between the items of the first same year are indicated by solid arrows, and the edges between the remaining items of X years and X + N years are indicated by dotted arrows.
  • there are items indicating basic information such as age, sex, and occupation. These do not exist every X years and X + N years, but become one item as a whole.
  • the edge between items of the same year indicated by a solid line will be described.
  • the stochastic dependence between items of the same year is shown. For example, when the cholesterol level is high, the BMI value tends to be high.
  • the probabilistic dependence between items in the same year of interviews, medical examinations, and receipts is generally the same in all years unless the inspection method or the like changes significantly. Therefore, the edge structure between items in the same year is X It does not change in year or X + n year. That is, the edge structure indicated by the solid line is the same in the X year node group and the X + n year node group. This structure may be learned by a structure learning method of a Bayesian network or a Markov network based on data of items of the same year.
  • this is an edge from the presence / absence of prescription diabetes oral medicine in year X, which is a receipt item, to the presence / absence of prescription diabetes oral medicine in year X + N.
  • This is an edge representing the transition of the state over time, and indicates that the state of presence / absence of oral diabetes prescription in year X is used for prediction of the presence / absence of oral diabetes prescription in year X + N.
  • a person who received a prescription for oral diabetes in year X is likely to receive a prescription for oral diabetes in year X + N. Since the future state of each item is considered to depend on the current state of each item, this edge is defined between the same items in all X years and X + N years.
  • edges other than the same items in year X and year X + N will be described. This shows the cause and effect of affecting the state transition between the same items in the above year X and year X + N.
  • the probability that a person who has no prescription for oral diabetes in year X will receive a prescription for oral diabetes in year X + N is higher as the blood glucose level in year X is higher. Therefore, in order to more accurately predict the presence or absence of a diabetic oral drug prescription in year X + N, it is assumed that information on blood glucose level in year X is effective.
  • these edges indicate that the state transition of an item from year X to year X + N is probabilistically dependent on the state of other items in year X.
  • These edges are defined between non-identical items of year X and year X + N, where the stochastic dependence is above a certain level. For example, in a simple method, a correlation coefficient may be calculated and defined between items above a certain threshold.
  • the created graph and probability table are stored in the graphical model storage unit 117.
  • the onset probability prediction unit 109 will be described.
  • the onset probability 109 the onset probability of a future item is predicted using the model stored in the graphical model storage unit 117.
  • the graphical model a probability distribution of an unknown random variable when a known value is given to some random variables can be obtained. For example, given this year's health checkup, medical inquiry, and receipt data, it is possible to obtain the probability distribution of the remaining X + n-year random variables with the value of the random variable of X-year known. Thereby, for example, the probability of occurrence of a certain disease can be calculated by obtaining the probability distribution of the medical prescription of X + n years and the prescription of the medicine. For such probability reasoning, Junction Tree Algorithm can be used.
  • the onset probability after n years can be predicted based on this year's data of each insured. Further, when the medical cost information for each medical practice is included in the data, the probability distribution and expected value of the medical expenses for each medical practice in X + n years can be predicted by using the same method.
  • onset probability prediction will be described using the example of the diagram shown in FIG. 21A.
  • the data is set as observation data in the year X node group in FIG. 21A.
  • unexamined items and unanswered items such as interviews are unknown.
  • the state of an unknown item is probabilistically inferred from the observation data based on the edge between X year nodes indicated by a solid line. This gives an estimated probability of each state of unknown items this year.
  • the probability of the state of each item after N years is inferred based on the edge indicated by the dotted line. Thereby, the estimated probability of each state of each item after N years is obtained. By calculating the expected value of each item, a predicted value such as a test value after N years can be obtained.
  • a predicted value such as a test value after N years can be obtained.
  • the estimated probability of each state of each item after 2N years is calculated.
  • the state after 2N years can be predicted.
  • the future state can be predicted as in 3N years and 4N years later.
  • a health care data analysis apparatus that extracts a disease to be a problem based on health care data including receipt information, medical examination information, inquiry information, and the like has been described.
  • Health insurance providers want to grasp the causes of disease onset in order to reduce the onset of the disease in addition to the disease that will be a future issue.
  • the amount of health care data is enormous and the relationship between the data is complex. Even if the problem disease can be grasped, it is not easy to grasp the cause.
  • the factor extraction unit 111 of the health care data analysis system of the second embodiment extracts factors for each problem disease.
  • the visualization unit 112 creates and visualizes a graph structure to which information on the problem disease and the factor is added.
  • the factor extraction unit 111 will be described.
  • the factor extraction unit 111 provides a function of extracting items that cause the problem disease stored in the problem disease storage unit 121.
  • a factor extraction function for a health insurance company to extract test values and lifestyle habits that affect the onset from one of the subject diseases extracted from the data of the insured group will be described.
  • FIG. 4 is a flowchart of the factor extraction function process.
  • the target disease selection step 401 one item is selected as a target disease item from the target diseases stored in the target disease storage unit 121.
  • factor candidate items to be factor candidates are selected from the items of the graphical model stored in the graphical model storage unit 118.
  • items that do not change for each target person such as gender, items that depend strongly on the data acquisition time such as age, etc. are items that are unique for each target person or that change reliably depending on the data acquisition time.
  • guidance interventions There is no prospect of being affected by guidance interventions. Therefore, for example, when extracting only items that can be improved by intervention of health guidance as factors, these items may be excluded from the factor candidate items.
  • the inter-item dependency is calculated.
  • the degree of dependence represents the degree of similarity or relevance between items, and takes a larger value as the degree of dependence is higher.
  • the dependency between the node vi and the node vj is s (i, j).
  • a mutual information amount between two random variables expressed by two nodes is defined as a dependency.
  • the mutual information I (X, Y) of the random variable X and the random variable Y is p (x, y) for the simultaneous probability distribution of X and Y, p (x), p (y) for the peripheral probability distribution of X and Y.
  • I (X, Y) ⁇ p (x, y) log (p (x, y) / p (x) p (y)).
  • the sum is taken for all X and Y values.
  • the joint probability distribution p (x, y) for all node pairs and the peripheral probability distribution p (x) for all nodes are calculated in advance and stored in the storage device. You may keep it. Further, the degree of dependence between nodes having no edge may be 0 regardless of the mutual information amount.
  • the correlation coefficient between the vectors x1 and x2 is r (x1, x2).
  • elements x1 and x2 have missing values, elements having missing values in either x1 or x2 are removed.
  • x1i is missing
  • x2i is removed.
  • the value of the correlation coefficient r (v1, v2) is shifted depending on the property of the value of v1, v2, even if it has the same degree of dependence. Therefore, first, it can be assumed that the vectors w1 and w2 in which the elements of v1 and v2 are rearranged independently and randomly are not dependent. Using this,
  • is the sum of all elements p of S.
  • e (w1, w2) is calculated for randomized w1, w2.
  • e (v1, v2) is a positive value, and becomes smaller as the co-occurrence degree of v1, v2 is larger. Therefore, when e (v1, v2) / e (w1, w2) normalized in a random case is larger than 1, it can be determined that there is no dependency between v1 and v2.
  • e (v1, v2) / e (w1, w2) is a value of 0 or more. Therefore, the dependence when e (v1, v2) / e (w1, w2) is greater than 1 is set to 0, and the dependence in other cases is 1-e (v1, v2) / e (w1, w2). And
  • the degree of dependence with the task disease item selected in the task disease item selection step 401 among the factor candidate items selected in the factor candidate narrowing step 402 is compared with a preset threshold value, and the threshold value is exceeded.
  • the items having the dependency are extracted as factors.
  • the factor may be extracted in consideration of the attribute of the edge existing between the problem disease item and the factor candidate item.
  • whether or not to extract a factor item may be determined based on whether or not an edge exists between the problem disease item and the factor item. For example, when there is no edge between the problem illness item and the factor item, the factor item may be excluded from the factor item extraction target regardless of the degree of dependency between the problem illness item and the factor item.
  • it may be determined whether or not the factor item is a factor item extraction target according to the direction. .
  • the factor may be extracted in consideration of the predetermined period. For example, only when the problem disease item is a random variable node based on data of year X and the candidate factor item is a random variable node based on data of previous Xk years (k is a predetermined natural number), the candidate factor Items may be extracted.
  • the cycle consisting of the three processes of factor registration step 405, step 406, and factor extraction step 414 extracts items that are highly dependent on the factors extracted in task factor extraction step 404 as new factors.
  • This is a processing cycle including both a process to be registered and a process to extract an item having a high dependency on the registered factor item as a new factor and register it as a factor item.
  • the purpose of this cycle is to extract both direct and indirect factors. Specific processing will be described below.
  • the factor item extracted in the factor item extraction step 404 is registered as the factor of the subject disease selected in the subject disease selection step 401. Further, the factor item extracted in the factor extraction step 414 described later is registered as a factor of the subject disease.
  • step 406 it is determined whether or not there is a factor item that has been evaluated whether there is a further factor among the factor items registered in factor item registration step 405. If there is a factor item that has not been evaluated, the process proceeds to a factor extraction step 414. If there is no factor item that has not been evaluated, the process proceeds to a factor DB registration step 407.
  • the factor extraction step 414 one unrated factor item is selected from the factors registered in the factor registration step 405, and the factor of the item is extracted.
  • the extraction method is the same as that of the task factor extraction step 404.
  • the problem disease item selected in the problem disease item selection step 401 and the problem disease item are registered in the factor registration step 405. It is equivalent to the factor that has been read as an unevaluated factor item among the registered factors.
  • the dependency factor calculation method used in the task factor extraction step 404 and the factor extraction step 414 and the threshold value set for the dependency factor may be different. Further, for each processing cycle of the factor registration step 405, step 406, and factor extraction step 414, the dependency calculation method of the factor extraction step 414 and the threshold set for the dependency may be changed. For example, the threshold value may be changed in association with the number of processing cycles.
  • the target disease selected in the target disease selection step 401 and the factor registered in the factor registration step 405 are stored in the factor storage unit 122.
  • the node V is shown in a two-dimensional or three-dimensional space.
  • the node is displayed with an appropriate figure such as ⁇ .
  • a character string representing a node item may be displayed inside or around the figure.
  • the edge E connects the nodes with straight lines or curves, and the directed edge is represented with an arrow or the like. Note that the edge does not have to be displayed, and even when the edge is displayed, it is not necessary to distinguish the directed edge from the undirected, and there may be no arrow.
  • information defined between two nodes such as dependency and relationship between two nodes V connected by the edge may be displayed as a character string inside or around the graphic representing the edge.
  • the edge display method may be changed in consideration of the predetermined period to which the two nodes V connected by the edge belong. For example, when two nodes connected by an edge are Vi and Vj, when both Vi and Vj are nodes representing random variables based on data obtained in the same predetermined period, the edge is represented by a solid line. When Vi and Vj are nodes representing random variables based on data obtained in different predetermined periods, the edges may be represented by dotted lines.
  • the change of the edge display method may be expressed as, for example, a difference in edge color, a difference in thickness, or a difference in straight lines or curves.
  • This is not particularly limited as a node placement method.
  • a generally well-known method may be used in which coordinates are determined so that nodes connected by edges are arranged close to each other, or between two nodes Define the attractive force and / or repulsive force defined by an index such as dependency between nodes, and set the coordinates so that the force between all nodes or some nodes included in the graph is at a minimum.
  • a force-oriented algorithm for determining may be used.
  • the node V corresponding to the item representing the target disease is displayed by a different display method from the node that is not the target disease. For example, it is displayed in a color, shape, size, etc. different from the node group representing the item that is not the subject disease.
  • the display method of the character string representing the item instead of the graphic representing the node itself may be changed in the same manner, and a graphic such as a frame line is added to the graph structure to express that it is a problem disease item. May be.
  • information regarding problem diseases such as a problem disease list may be displayed as a character string expressed in a table format or the like in a display area different from the graph structure.
  • the node V corresponding to the item representing the factor is displayed in a different display method from the non-factory node. For example, it is displayed in a different color, shape, size, etc. from a node group representing items that are not factors.
  • the display method of the character string representing the item instead of the graphic representing the node itself may be changed in the same way, or a figure such as a frame line may be added to the graph structure to express that it is a factor item. Also good.
  • information regarding factors such as a factor list may be displayed as a character string expressed in a table format or the like in a display area different from the graph structure.
  • the edge display method may be changed depending on whether or not the two nodes V connected by the edge are included in the problem disease and the factor. For example, in order to emphasize the relationship between a problem illness and a factor, when both of the edges of the node V are nodes included in the problem illness or the factor, the edge is displayed thickly, In this case, the edge may be displayed thinly.
  • the change in the edge display method may be expressed as, for example, a difference in edge color, a difference in straight line or curve, or a difference in solid line and dotted line.
  • G (V, E)
  • a subgraph structure including nodes related to diseases and edges existing between nodes related to diseases may be visualized.
  • FIG. 18 is a screen example of a user interface showing an example of a form for realizing the present embodiment.
  • 1801 is an operation window for setting a problem disease extraction and a factor extraction setting.
  • target group narrowing, disease candidate narrowing, problem disease extraction logic setting, and factor candidate narrowing are possible.
  • 1802, 1803, and 1804 are the same as 1702, 703, and 1704 in FIG. 17 described in the first embodiment.
  • Reference numeral 1808 denotes an input window for setting a narrowing-down condition for narrowing down the factor candidate items to be factor candidates among the items of the graphical model stored in the graphical model storage unit 118.
  • gender items are excluded from the target.
  • Reference numeral 1805 denotes an execution button for starting task disease extraction and factor extraction processing based on the task disease extraction and factor extraction settings set in 1802, 1803, 1804, and 1808.
  • Reference numeral 1806 is a display window for displaying the processing result.
  • Reference numeral 1807 denotes a display screen that displays the extracted problem and the extracted factor.
  • the extracted problem diseases are displayed in a table format in descending order of medical expenses for the next year, and the factors for each problem disease are described in a row corresponding to each disease.
  • the item that is the cause of renal failure and renal failure extracted as the first problem disease is a round node
  • the item that is the cause of myocardial infarction and heart failure extracted as the second problem disease is a square type. It is expressed by the node.
  • the partial graph composed of each task and factor is expressed by a bold line compared with other nodes and edges, and is highlighted.
  • the health care data analysis apparatus extracts a target disease and its factor based on the health care data, and further visualizes the graph structure with information on the target disease and the factor added thereto. In this way, it is possible to support the understanding of the problem disease and its factors.
  • the nodes of the same item collected in different predetermined periods are treated as different nodes and displayed.
  • an example will be described in which nodes of the same item collected in different predetermined periods are treated as the same node, and factor extraction and visualization are performed. According to the present embodiment, it is possible to visualize the relationship between lifestyle habits / test values / pathological conditions and the relationship between time-series pathological conditions in a more easily understandable manner.
  • the factor extraction unit 111 provides a function of extracting items that are factors of the target disease stored in the target disease storage unit 121 as in the second embodiment.
  • the example of extracting the factor based on the probabilistic dependency relationship between the task and the candidate factor has been described.
  • the factor extracting function for collecting the same items and expressing the graph is described. To do.
  • FIG. 5 is a flowchart of the factor extraction function process.
  • the problem disease selection step 401, the factor candidate narrowing step 402, and the inter-item dependency calculation step 403 perform the same processing as the processing described in the first embodiment, and thus description thereof is omitted.
  • processing of the same item problem factor extraction step 501, the same item factor registration step 502, step 503, the same item factor extraction step 504, and the same item factor DB registration step 505 will be described.
  • a node that is the same item as the extracted factor item and whose collected period is different from the factor item is extracted as an additional factor.
  • the cycle consisting of the three processes of the same item factor registration step 502, step 503, and the same item factor extraction step 504 is performed with an item having a high dependency on the factor extracted in the same item task factor extraction step 501 as a new factor.
  • a processing cycle that includes both processing for extracting and registering as a factor item, and processing for extracting an item having a high dependency on the factor item registered in the same item factor registration step 502 as a new factor and registering it as a factor item It is.
  • the purpose of this cycle is to extract both direct and indirect factors. Specific processing will be described below.
  • the factor item extracted in the same item task factor registration step 501 and the additional factor item are registered as factors of the problem disease selected in the problem disease selection step 2501.
  • the factor item extracted in the same item factor extraction step 504 described later is registered as a factor of the problem disease.
  • step 503 it is determined whether there is a factor item that has been evaluated whether there is a further factor among the factor items registered in the same item factor registration step 502. If there is a factor item that has not been evaluated, the process proceeds to the same item factor extraction step 504. If there is no factor item that has not been evaluated, the process proceeds to the same item factor DB registration step 505.
  • one unrated factor item is selected from the factors registered in the same item factor registration step 502, and the factor of the item is extracted.
  • the extraction method is the same as that of the same item task factor extraction step 501, and the problem disease selected in the problem disease item selection step 401 in the processing description of the same item task factor extraction step 501 is registered in the factor registration step 502. It is the same as what has been read as unassessed factor items.
  • the threshold value set to the calculation method of the dependence used in the same item problem factor extraction step 501 and the same item factor extraction step 504 and the dependency may be different. Further, even if the same item factor registration step 502, step 503, and the same item factor extraction step 504 are processed, the dependency calculation method of the same item factor extraction step 504 and the threshold value set for the dependency may be changed. good. For example, the threshold value may be changed in association with the number of processing cycles.
  • the problem disease selected in the problem disease selection step 401 and the factor registered in the same item factor registration step 502 are stored in the factor storage unit 122.
  • FIG. 19 is an example of a graph stored in the graphical model storage unit 118. This graph was calculated from data acquired in N years, a node representing a random variable related to prescription of oral diabetes, a node representing a random variable related to prescription of insulin, a node representing a random variable related to dialysis, and acquired in N + 1 year. It is a graph including six nodes, a node representing a random variable related to prescription of diabetic oral medicine, a node representing a random variable related to prescription of insulin, and a node representing a random variable related to dialysis calculated from data.
  • a dotted arrow in FIG. 19A represents an edge representing a stochastic dependency between nodes of the same item grouped in different predetermined periods, and a broken arrow represents a probability between nodes of different items grouped in different predetermined periods. Represents an edge representing a dependency relationship.
  • FIG. 19A shows that the dialysis node in N + 1 year has an edge between the N-year insulin and a stochastic dependency between dialysis and insulin. Therefore, the N-year insulin node may be extracted as a factor of the N + 1 year dialysis node.
  • the N-year insulin node since there is no edge between the N year diabetes oral medicine node and the N + 1 diabetes oral medicine node, there is no edge between the N + 1 dialysis node and the diabetes oral medicine item and the dialysis item. There is no stochastic dependency expressed between them. However, there is a directed edge between the N-year diabetic oral medicine and the N + 1-year insulin, and a stochastic dependence is expressed. In other words, it can be seen that oral diabetes drugs affect the next year's insulin, and that insulin affects the next year's dialysis. Therefore, it can be seen from the graph that there is a stochastic dependence also in diabetic oral drugs and dialysis.
  • FIG. 19B is an example in which items directly connected to the problem disease are extracted as factors in the same item problem factor extraction step 501.
  • this factor extraction an example is shown in which, when there is a directed edge in the problem disease and a directed edge from the factor candidate toward the target node, the candidate factor is extracted as an extraction target.
  • FIG. 19C shows the result of extracting a node having the same item as the extracted factor as an additional factor in the same item task factor extracting step 501.
  • N + 1 year insulin is extracted as an additional factor.
  • FIG. 19D shows a result of repeating all the processes by repeating the cycle of the same item factor registration step 502, step 503, and the same item factor extraction step 504. From the results, it can be seen that an oral diabetes drug not directly connected to the dialysis node can be extracted as a factor.
  • FIG. 24 is a flowchart of the processing of the visualization unit 112.
  • FIG. 20 is an example showing a change in the graph to which the process is applied.
  • an edge representing a probabilistic dependency between nodes of different items collected in different predetermined periods is selected as an edge to be displayed. Edges not selected in this step are excluded from visualization targets in this process.
  • FIG. 20A is an example of a graph created by the graphical model creation unit 108.
  • This graph shows a node representing a random variable of lifestyle A, a node representing a random variable related to prescription of diabetic oral medicine, a node representing a random variable related to prescription of insulin, and a random variable related to dialysis, calculated from data acquired in N years ,
  • a node representing a lifestyle variable random variable a node representing a random variable related to prescription of oral diabetes, a node representing a random variable related to prescription of insulin, and a probability related to dialysis, calculated from data acquired in N + 1 years
  • It is a graph including eight nodes of nodes representing variables. The solid line arrows in FIG.
  • 20A represent edges representing stochastic dependencies between nodes of different items collected in the same predetermined period, and the dotted arrows represent probabilistic relationships between nodes of the same item grouped in different predetermined periods.
  • An edge representing a dependency relationship is represented, and a broken-line arrow represents an edge representing a stochastic dependency relationship between nodes of different items collected in different predetermined periods.
  • FIG. 20B is an example in which the visualization edge selection step 2401 is applied to the graph shown in FIG. 20A.
  • Solid arrows in FIG. 20B represent edges between nodes of different items selected in the visualization edge selection step 2401 and collected in different predetermined periods.
  • the nodes of the same item collected in different predetermined periods are aggregated into the same node, and then the coordinates are calculated for each aggregated node.
  • the first example of coordinate calculation method is given. After applying the widely known node coordinate calculation method to the graph stored in the graphical model storage unit 118 before applying the processing of the visualization edge selection step 2401, the coordinates of each node are calculated, and then different predetermined periods.
  • the coordinates after the aggregation are calculated from the coordinates of the nodes of the same item collected in (1). For example, when there are two nodes of the same item, the intermediate position of the original coordinates or the position obtained by weighted averaging is set as the coordinate after aggregation.
  • Coordinates are calculated from the graph structure for visualization after applying the visualization edge selection step 2401 process to the graph stored in the graphical model storage unit 118.
  • the coordinate calculation is performed by applying a widely known node coordinate calculation method to this graph structure. To do.
  • FIG. 20C is an example in which the same item node aggregation step 2402 is applied to the graph to which the processing of the visualization edge selection step 2401 shown in FIG.
  • an example is shown in which the processing described in the first example is used as the aggregation method, and the processing described in the first example is used as the coordinate calculation method.
  • nodes and edges are visualized based on the coordinates obtained in the same item aggregation step.
  • the node and edge display method uses the method shown in the description of the processing of the visualization unit 112 of the first embodiment.
  • the probabilistic dependency between nodes of different items collected in different predetermined periods represents the strength of the influence of one item on the transition of another item.
  • transition is a stochastic dependency between the same items over a plurality of years.
  • the node of the lifestyle A of N years has a directional side with the node of the oral oral medicine of N + 1 years, and has a stochastic dependence relationship. This means that the lifestyle A item affects the transition to the next year of the oral diabetes drug item.
  • N-year diabetic oral medicine has a promising edge between N + 1 year insulin and N + 1 year dialysis, and has a stochastic dependence.
  • the oral diabetes drug item affects the transition of the insulin item and the dialysis item to the next year.
  • the lifestyle A item affects the transition to the next year of the oral diabetes drug item
  • the diabetes propensity drug item affects the transition of the insulin item and the dialysis item the next year. Therefore, it can be seen that lifestyle habit A indirectly affects the transition of insulin items and dialysis items.
  • the graph visualization method shown in FIG. 20A the stochastic dependency between lifestyle A and diabetes oral medicine can be read by the existence of directed edges, but the relationship between lifestyle A and insulin, and lifestyle A and dialysis is It is difficult to read because there is no edge between them.
  • visualization unit 112 shown in FIG. 20C visualization is performed in a form that makes it easy to understand the influence of the four items of lifestyle A, diabetes oral medicine, insulin, and dialysis on each other's transition. You can see that.
  • edges representing stochastic dependence relationships between different items collected in different predetermined periods are selected, but for example, they are collected in the same predetermined period.
  • an edge representing a stochastic dependency between different items may be selected, or both edges may be selected.
  • the health care data analysis apparatus can visualize the relationship between lifestyle habits / test values / pathological conditions and the relationship between time-series pathological conditions in a more easily understandable manner.
  • Health insurance providers want to identify health-care workers who have a high risk of developing a disease in addition to a disease that will be a future issue.
  • it is not easy to search for insured persons who have a high risk of developing future diseases from health care data because deep knowledge about the causal relationship between illness and data is necessary and the amount of data is enormous.
  • the high-risk target person selection unit 113 determines the risk of developing the disease based on the information on the target disease stored in the target disease storage unit 121 and the disease evaluation index for each target stored in the disease evaluation index storage unit 120. Provide a high-risk target selection function to select high target persons.
  • a case where the health insurance company selects an insured person who has a high risk of developing the disease from the group of insured persons will be described as an example.
  • the high-risk target person selecting unit 113 reads the data of the insured group who selects the high-risk target person from the shaping information storage unit 113 or the input unit 102.
  • the shaping information stored in the shaping information storage unit 113 is used as it is.
  • data read from the input unit 102 and shaped by the data shaping unit 107 as necessary are used.
  • the data of the subject group may be data of all subjects, or may be used by sampling a subset of the subject group.
  • thresholds may be provided for other items such as age and the number of medical treatments. Moreover, you may sample using well-known sampling methods, such as random sampling.
  • the high risk target person selection function is applied to the insured group to select high risk target persons.
  • FIG. 25 is a flowchart of processing of the high risk target person selection function.
  • a problem disease for which a high risk target person is selected is selected from the problem diseases stored in the problem disease storage unit 121.
  • the disease selected in this step is referred to as disease D.
  • the steps from the target person sample selection step 2502 to the step 2504 are processes performed for each target person, and are one cycle of a process for all target persons. Specific processing will be described below.
  • subject sample selection step 2502 one unprocessed insured sample is selected in the cycle.
  • the subject selected in this step is assumed to be insured S.
  • the disease evaluation index related to the disease D of the insured S stored in the disease evaluation index storage unit 120 is read.
  • step 2504 if there is an unprocessed target person in the target group among the data of the target group, the process returns to the target person sample selection step 2502 to select an unpredicted target person. If not, the cycle is terminated and the routine goes to Step 2505.
  • the disease evaluation index for each target person read in the disease evaluation index reading step 2503 is compared, and an insured person having a high onset risk is selected.
  • a threshold is set for the disease evaluation index, and a target person having an index equal to or higher than the threshold is selected as a high-risk insured person. For example, when the onset probability of the next year is selected as the disease evaluation index, an insured person whose onset probability of the next year predicted is equal to or higher than a threshold can be selected as having a high risk.
  • the disease evaluation indexes are arranged in descending order or in ascending order, and a predetermined number of target persons at the upper or lower order are selected as high-risk insured persons. For example, when the onset probability of the next year is selected as the disease evaluation index, the insured is selected in descending order of the onset probability of the next year.
  • the information on the target selected by the high risk target selection unit 113 may be output by the output unit 103 in a character format, a table format, or the like.
  • FIG. 26 is a screen example of a user interface showing an example of a form for realizing the present embodiment.
  • 2601 is an operation window for setting a problem disease selection.
  • Reference numeral 2602 denotes an operation window for selecting a problem disease.
  • Reference numeral 2603 denotes an execution button for executing a high-risk target person selection process related to the target disease selected in 2602.
  • Reference numeral 2604 denotes a display window for displaying the processing result.
  • Reference numeral 2605 denotes a display area for displaying the target disease selected as the target of the high risk target person selection process.
  • Reference numeral 2606 denotes a display screen that displays information on the subject selected as having a high risk for the selected problem disease in a table format. Examples of the display items include the subject disease onset probability, name, ID, and age for each target person.
  • the health care data analysis apparatus can select a health person who has a high onset risk for each problem disease in addition to a disease that will be a future problem.
  • Example 2 and Example 3 an example of a health care data analysis system that extracts and visualizes a disease that is a problem and its factor has been described.
  • the effect of changes in factors or other items on the subject or group of subjects on the onset probability of disease or the change in the stochastic dependence between items is simulated, and the results are An example of a medical data analysis system to be visualized will be described.
  • the configuration and processing are the same as those in the second or third embodiment except for the onset probability prediction unit 109, the visualization unit 112, and the virtual shaping data creation unit 114, and thus the description thereof is omitted.
  • a change in the probability of disease onset due to changes in factors and other items is simulated, and differences in the number of onset, medical expenses, etc. born based on the results are visualized.
  • the virtual shaping data creation unit 114 creates virtual shaping data by changing a part of the shaping data included in the shaping information storage unit 117.
  • FIG. 29 is a flowchart of processing of the virtual shaping data creation unit 114.
  • item change information setting step 2901 information such as an item to be changed, a change amount, and a value after change is set.
  • An example is shown below. It is assumed that the disease storage unit 121 stores the disease D as a task, and the factor storage unit 122 stores the average daily drinking amount (ml) as a factor of the disease D. For example, when a subject with an average daily drinking amount of 500 ml or more reduces the drinking amount by 500 ml, predicts how the onset probability of the target disease D and the onset probability of other diseases will change after N years In this case, the item of average drinking amount (ml) per day is set as the item to be changed, and -500 is set as the amount of change.
  • the data of the subject to be simulated is selected from the shaping data stored in the shaping information storage unit 117.
  • the data corresponding to the insured person is selected from the shaped data stored in the shaped information storage unit 117 and used.
  • select data corresponding to the target insured group from the shaping data stored in the shaping information storage unit 117. And use.
  • new data is created by changing the value of the item whose effect on the disease is to be evaluated from the data selected in the subject sample selection step 2902.
  • the value of the daily average drinking amount (ml) item is changed to be new data.
  • This data is called virtual shaping data.
  • the created virtual shaping data is stored in the virtual shaping information storage unit 123.
  • FIG. 27A shows an example of the shaping data selected in the subject sample selection step 2902.
  • FIG. 27B shows an example of virtual shaping data created by changing items related to drinking in the virtual shaping data creation step 2903.
  • the disease A item and the disease B item indicate the number of times of consultation with the corresponding disease per year, and alcohol consumption is the average daily alcohol consumption (ml).
  • the onset probability prediction unit 109 predicts the onset probability of a future item using the model stored in the graphical model storage unit 118.
  • a probability distribution of an unknown random variable when a known value is given to some random variables can be obtained.
  • prediction is performed using data stored in the shaping information storage unit 117 and data stored in the virtual shaping information storage unit 123 as known values. Since the prediction method of each item after inputting known data is the same as the process demonstrated in Example 1, description is abbreviate
  • N is The future state of any natural number
  • FIG. 28A is an example showing the predicted future state by applying the process of the onset probability prediction unit 109 to the shaping information shown in FIG. 27A.
  • a graphical model is created using medical data for two years and a state after one year is predicted is shown.
  • the value of each item indicates an expected value related to the value calculated from the predicted occurrence probability of each item.
  • FIG. 28B shows the predicted future state by applying the process of the onset probability prediction unit 109 to the shaping information created by changing the values of the items relating to smoking and drinking shown in FIG. 27B.
  • the change in the item of drinking influences the predicted expected value of the number of consultations for disease A and disease B in the following year.
  • Visualization of the graph structure is the same processing as that described in the second and third embodiments, and thus the description thereof is omitted.
  • the graphs are visualized by the number of prediction results stored in the prediction result storage unit 119. All the graphs to be displayed have the same structure, and the coordinates of the nodes are displayed at the same coordinates in the coordinate system of each display area.
  • the node and edge display methods are changed according to the corresponding prediction results. Specifically, the display method of the node is changed according to the difference in the prediction probability of the item represented by each node, and the display method of the edge connecting the nodes is changed according to the difference in the stochastic dependency between the nodes.
  • the following is a first example of changes in the node display method.
  • the average of the expected occurrence value for each item is calculated, and the size, shape, color, etc. are changed depending on the magnitude of the value.
  • the average per person of each onset probability is calculated.
  • ⁇ ⁇ ⁇ ⁇ Give a second example of changes in the node display method.
  • the number of onset for each disease is calculated, and the size, shape, color, etc. are changed according to the number of people for each disease. For example, in the case of an item indicating the presence or absence of occurrence by 0 and 1, the predicted expected value of the number of onsets in the subject group can be obtained by adding the onset probability predicted for each item to the number of subjects.
  • ⁇ ⁇ ⁇ ⁇ Give a third example of changes in the node display method. Based on the occurrence probability for each item, the medical cost for each item is calculated, and the size, shape, color, etc. are changed according to the medical cost.
  • a node when a node is displayed, it may be displayed in the vicinity of the node as a character string representing the calculated onset probability, predicted number of people, or medical expenses.
  • the following is a first example of changes in the edge display method.
  • the degree of dependence between items in the prediction result is calculated, and the size, shape, color, etc. of the edge are changed according to the degree of dependence between items corresponding to the nodes at both ends of the edge. Since the calculation method of the dependence is the same as that of the second embodiment, the description thereof is omitted.
  • the size, shape, color, and the like of the edge are changed according to the difference in values such as the onset probability, the expected number of occurrences, and the medical expenses of items corresponding to the nodes connected to both ends of the edge. For example, when there is a directed side from the disease D1 to the disease D2 between the disease D1 and the disease D2, the edge is displayed based on a value obtained by subtracting the expected number of people onset of the disease D1 from the expected number of people onset of the disease D2. Change the way.
  • the example in which the graph is visualized by the number of the prediction results stored in the prediction result storage unit 119 has been described, but only a part of the prediction result may be visualized.
  • All the prediction results may be visualized in three graphs, respectively, or only one of the prediction results based on the shaped data and the prediction result based on the virtual shaped data may be visualized. Further, only two prediction results based on virtual shaped data may be visualized without visualizing the prediction results based on shaped data.
  • FIG. 30 is a screen example of a user interface showing an example of a form for realizing the present embodiment.
  • Reference numeral 3001 denotes an operation window for setting target selection information, item change information, and the like.
  • 3002 is the same as 1702 in FIG. 17 described in the first embodiment.
  • Reference numeral 3003 denotes a condition setting table for setting item change information. Each row meets one condition.
  • an item represented by ⁇ indicates that no change is made, and an item for which a numerical value is input is changed to that value.
  • Reference numeral 3004 denotes an operation button for performing an operation for adding a new condition to the condition setting table displayed in 3003.
  • Reference numeral 3005 denotes an operation button for creating virtual shaping data based on the conditions set in the condition setting table and performing prediction using the shaping data and the virtual shaping data.
  • Reference numeral 3006 denotes a display window for displaying a prediction result, and displays a graph representing a result predicted based on each condition displayed in the condition setting table 3003.
  • Reference numeral 3007 denotes an operation window for switching the node display method, and displays the node by changing its appearance according to the selected index.
  • Reference numeral 3008 denotes an operation window for switching the edge display method, and displays the edge by changing the appearance according to the selected index.
  • the health care data analysis apparatus predicts changes in the future situation such as the probability of occurrence of the disease, the number of cases, the medical cost, etc., due to the change in the factor of the subject disease and other items. It can be visualized with high visibility using a plurality of graphs.
  • Example 2 and Example 3 described examples of medical analysis systems that extract items related to test values and lifestyle habits as factors. Further, in the fifth embodiment, an example of an analysis system that predicts and visualizes changes in future situations such as the onset probability of the disease, the number of cases of onset, and medical costs due to changes in the factors of the subject disease and other items has been described. In this embodiment, health guidance effective for the target disease is extracted using health care data including the presence or absence of health guidance. Furthermore, an example of an analysis system that predicts and visualizes the effect of health guidance will be described.
  • the configuration and processing are the same as those of the second embodiment, the third embodiment, and the fifth embodiment except for the shaping information storage unit 117, the health guidance selection unit 124, and the virtual shaping data creation unit 114, the description thereof is omitted.
  • the shaping information storage unit 117 stores shaping data including items indicating the presence or absence of health guidance.
  • FIG. 31 in FIG. 31 is an example of shaping data including an item indicating the presence / absence of health guidance.
  • 603, 604, 605, and 607 in FIG. 31 are the same as those in FIG.
  • reference numerals 1501, 1502, 1503, 1504, 1505, 1506, 1508, 1509, 1510, 1511, 1512, 1514, 1515 and 1516 are the same as those in FIG.
  • Reference numerals 3102 and 3103 are items relating to the presence or absence of health guidance, and 1 is included for subjects who have received insurance guidance, and 0 is included for subjects who have not conducted insurance guidance.
  • the shaping data including the health guidance presence / absence information has been described. For example, values such as the number of implementations for each insurance guidance may be used.
  • the health guidance selection unit 124 that provides a function of extracting insurance guidance effective for the subject disease will be described first.
  • the health guidance selection unit 124 selects, for each task, an item including information on the implementation of health guidance among the factors for each problem disease stored in the factor storage unit 122. For example, when the test value V, the lifestyle S, and the health guidance G are stored as the factors of the problem disease D, the insurance guidance G is selected as the insurance guidance effective for the problem disease D.
  • the reason why the insurance guidance effective for the problem illness can be selected by the process of the insurance guidance selection unit 124 will be described with reference to the relationship between the problem illness and the health guidance. It is assumed that the shaping data of year X includes a group that has implemented a certain health guidance G and a group that has not, and each group has the same prevalence of the target disease D. However, if there is a difference in the prevalence of the target disease D in each group of X + N years, especially if the prevalence of the group that implemented the health guidance G is less than the group that did not implement the health guidance G, the insurance guidance G It can be expected to be effective in reducing the incidence of D. Similarly, there is a similar relationship between laboratory values and health guidance, and lifestyle and health guidance.
  • the probabilistic dependence between items is expressed by an edge
  • the factor extraction unit 110 is a factor item of the subject disease based on the dependency defined between the items.
  • the health guidance item included in the factor item extracted by the processing of the factor extraction unit 110 is considered to be insurance guidance that affects the onset of the target disease or the test value / lifestyle that is the cause of the target disease. .
  • the flowchart of the process of the virtual shaping data creation unit 114 is the same as the flowchart of FIG. 29 except for the item change information setting step 2901 in FIG.
  • item change information setting step 2901 information such as the type of item to be changed, the amount of change, and the value after change is set. At this time, only items relating to the implementation of insurance guidance are selected as items.
  • the virtual shaping information stored in the virtual shaping information storage unit 123 is predicted by the same processing as the onset probability prediction unit 108 described in the fifth embodiment, and the prediction result is stored in the prediction result storage unit 119.
  • the prediction result is visualized by the same process as the process of the visualization unit 112 described in the fifth embodiment.
  • the health care data analysis apparatus can extract the health guidance effective for the target disease, and further determine the future situation such as the probability of occurrence of the disease, the number of cases, the medical cost, etc. by the health guidance implementation. Predict and visualize.
  • the present invention is not limited to the above-described embodiments, and includes various modifications.
  • the above embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described.
  • a part of the configuration of one embodiment can be replaced with the configuration of another embodiment.
  • the structure of another Example can also be added to the structure of a certain Example.
  • another configuration can be added, deleted, or replaced.

Abstract

The purpose of the present invention is to select a disease to be addressed in the future. The following is an example of a means used for accomplishing this purpose. An analysis system capable of accessing a database that stores healthcare information including medical records and test values for subjects, and formatted information obtained by grouping the healthcare information on a predetermined period basis for each subject, and also capable of accessing a database that stores a graphical model created on the basis of the formatted information and defined by: a first node group which corresponds to a random variable representing a pathological condition, a second node group which corresponds to a random variable representing a factor affecting a change in the pathological condition, and a directed or undirected edge which represents the stochastic dependence between any two nodes included in a set comprising the first node group and the second node group. The analysis system is provided with a disease development probability prediction unit which predicts probability of disease development on the basis of the graphical model, and a unit for selection of a disease to be addressed, which selects at least one pathological condition as a disease to be addressed, on the basis of the probability of disease development.

Description

分析システムAnalysis system
 本発明は、データ分析技術に関し、特に、ヘルスケアデータを分析するヘルスケアデータ分析システムに関する。 The present invention relates to a data analysis technique, and more particularly to a healthcare data analysis system for analyzing healthcare data.
 健康保険組合は、医療費の低減を目的に、生活習慣病の予防及び重症化の予防のための保健指導を行う保険事業を行っている。しかし、保健指導のために確保できる保健師、及び保健指導のための費用などのリソースは限られている。このため、効果的・効率的な保険事業の運営を支援するシステムが望まれている。 The Health Insurance Association conducts an insurance business that provides health guidance for the prevention of lifestyle-related diseases and the prevention of serious diseases for the purpose of reducing medical expenses. However, resources such as public health nurses available for health guidance and costs for health guidance are limited. Therefore, a system that supports effective and efficient insurance business operation is desired.
 保険事業の運営を支援するシステムとして、特許文献1には、レセプト情報、健診情報、及び保健指導情報に基づいて、保健指導対象者を選択する保健事業支援システムであって、健康保険加入者の重症度及び検査値ごとの予測医療費を示す医療費モデルを作成する医療費モデル作成部と、重症度及び検査値ごとの改善量を示す検査値改善モデルを作成する検査値改善モデル作成部と、保健指導による予測医療費削減量を重症度及び検査値ごとに算出する予測医療費削減効果算出部と、予測医療費削減量が高い重症度及び検査値に属する健康保険加入者を保健指導対象者として選択する対象者選択部と、を備える保健事業支援システムが記載されている。 As a system for supporting the operation of an insurance business, Patent Document 1 discloses a health business support system for selecting a health guidance target person based on receipt information, medical examination information, and health guidance information. A medical cost model creation unit that creates a medical cost model that indicates the predicted medical cost for each severity and test value, and a test value improvement model creation unit that creates a test value improvement model that indicates the amount of improvement for each severity and test value Predicting medical cost reduction by health guidance for each severity and test value, and predicting medical cost reduction effect, and health guidance for health insurance members belonging to severity and test value with high predicted medical cost reduction A health business support system including a target person selection unit that selects a target person is described.
特開2012-128670号公報JP 2012-128670 A
 健康保険組合の限られたリソースの中で、効果的・効率的な保険事業を行うためには、
現状だけでなく、将来の健康保険組合の状況を予測し、適切な保険事業を実施することが重要である。例えば、現在の罹患者数は少ないが、今後、罹患者数が増加すると見込まれる疾病や、その疾病に将来罹患する可能性の高い被保険者を予測できれば、適切な対象者に適切な保健指導を実施でき、将来の罹患者数低減・医療費低減に繋がると期待できる。しかし、従来技術では、先述のように将来の課題になると考えられる疾病を予測することは容易ではなかった。
In order to conduct an effective and efficient insurance business within the limited resources of the health insurance association,
It is important to predict not only the current situation but also the future health insurance association situation and implement appropriate insurance business. For example, if the number of affected people is small, but it is possible to predict a disease that is expected to increase in the future, or an insured who is likely to be affected by the disease in the future, appropriate health guidance can be provided to appropriate subjects. Can be expected to lead to a reduction in the number of affected people and medical costs in the future. However, in the prior art, it is not easy to predict a disease that will be a future problem as described above.
 上記課題を解決するために、例えば、 プログラムを実行するプロセッサと、プログラムを格納するメモリと、情報の入力を受け付ける入力部とを有し、プログラムを実行することによってヘルスケアデータを分析する分析システムであって、分析システムは、対象者の診療記録と検査値を含むヘルスケア情報と、ヘルスケア情報を対象者毎かつ所定期間毎に纏めた整形情報とを格納するデータベースにアクセス可能であって、分析システムは、整形情報に基づいて作成された、病態を表す確率変数に対応する第1のノード群と、病態の変化に影響を与える因子を表す確率変数に対応する第2のノード群と、第1のノード群と第2のノード群から成る集合に含まれる任意の2つのノード間の確率的依存性を表す有向又は無向のエッジと、により定義されるグラフィカルモデルを格納するデータベースにアクセス可能であって、プロセッサが、グラフィカルモデルに基づいて、病気の発症確率を予測する発症確率予測部と、プロセッサが、発症確率に基づき、少なくとも1つの病態を課題疾病として抽出する課題疾病抽出部と、を備えることを特徴とする分析システムを提供する。 In order to solve the above problems, for example, an analysis system that includes a processor that executes a program, a memory that stores the program, and an input unit that receives an input of information, and analyzes healthcare data by executing the program The analysis system is accessible to a database that stores health care information including medical records and test values of the subject, and shaping information in which the health care information is summarized for each subject and every predetermined period. The analysis system creates a first node group corresponding to a random variable representing a pathological condition and a second node group corresponding to a random variable representing a factor affecting the change in the pathological condition, which are created based on the shaping information. A directed or undirected edge representing a stochastic dependency between any two nodes included in the set of the first node group and the second node group, A database storing a defined graphical model, wherein the processor predicts the onset probability of the disease based on the graphical model, and the processor has at least one based on the onset probability. There is provided an analysis system comprising a problem disease extraction unit that extracts a disease state as a problem disease.
 本発明の代表的な実施の形態によれば、対象者の診療記録と検査値などの情報を含むヘルスケアデータに基づいて、将来の課題となる疾病に関する項目を抽出できる。 According to the representative embodiment of the present invention, it is possible to extract items related to diseases that will be a future problem based on health care data including information such as medical records and test values of the subject.
第1の実施例の医療データ分析装置の構成を示すブロック図である。It is a block diagram which shows the structure of the medical data analyzer of 1st Example. 第1の実施例の医療データ分析装置の別の構成を示すブロック図である。It is a block diagram which shows another structure of the medical data analyzer of 1st Example. 第1の実施例の課題疾病抽出処理のフローチャートである。It is a flowchart of the subject disease extraction process of a 1st Example. 第2の実施例の要因抽出処理のフローチャートである。It is a flowchart of the factor extraction process of 2nd Example. 第3の実施例の要因抽出処理のフローチャートである。It is a flowchart of the factor extraction process of a 3rd Example. 第1の実施例のレセプト基本情報を説明する図である。It is a figure explaining the receipt basic information of a 1st Example. 第1の実施例の健診情報を説明する図である。It is a figure explaining the medical examination information of a 1st Example. 第1の実施例の問診情報を説明する図である。It is a figure explaining the inquiry information of a 1st Example. 第1の実施例の傷病名情報を説明する図である。It is a figure explaining the disease name information of a 1st Example. 第1の実施例の傷病名分類情報を説明する図である。It is a figure explaining the wound name classification information of a 1st Example. 第1の実施例の診療行為情報を説明する図である。It is a figure explaining the medical practice information of a 1st Example. 第1の実施例の診療行為分類情報を説明する図である。It is a figure explaining the medical practice classification information of a 1st Example. 第1の実施例の医薬品情報を説明する図である。It is a figure explaining the pharmaceutical information of a 1st Example. 第1の実施例の医薬品分類情報を説明する図である。It is a figure explaining the pharmaceutical classification information of a 1st Example. 第1の実施例の整形情報の一例を説明する図である。It is a figure explaining an example of the shaping information of a 1st Example. 第1の実施例の整形情報の別な例を説明する図である。It is a figure explaining another example of the shaping information of a 1st Example. 第1の実施例の操作画面の例を説明する図である。It is a figure explaining the example of the operation screen of a 1st Example. 第2の実施例の操作画面の例を説明する図である。It is a figure explaining the example of the operation screen of a 2nd Example. 第3の実施例の要因抽出処理における要因抽出の過程を説明する図である。It is a figure explaining the process of the factor extraction in the factor extraction process of a 3rd Example. 第3の実施例の可視化処理における項目集約処理を説明する図である。It is a figure explaining the item aggregation process in the visualization process of a 3rd Example. ベイジアンネットワークであるモデルを説明する図である。It is a figure explaining the model which is a Bayesian network. 二つの確率変数によって構成されるモデル及び確率変数を説明する図である。It is a figure explaining the model comprised by two random variables, and a random variable. 四つの確率変数によって構成されるモデル及び確率変数を説明する図である。It is a figure explaining the model comprised by four random variables, and a random variable. 第3の実施例の可視化処理のフローチャートである。It is a flowchart of the visualization process of a 3rd Example. 第4の実施例の高リスク対象者選定処理のフローチャートである。It is a flowchart of a high risk object person selection process of a 4th Example. 第4の実施例の操作画面の例を説明する図である。It is a figure explaining the example of the operation screen of a 4th Example. 第5の実施例の整形情報と仮想整形情報の一例を説明する図である。It is a figure explaining an example of the shaping information of 5th Example, and virtual shaping information. 第5の実施例の整形情報に基づく予測結果と仮想整形情報に基づく予測結果の一例を説明する図である。It is a figure explaining an example of the prediction result based on the shaping information of 5th Example, and the prediction result based on virtual shaping information. 第5の実施例の仮想整形データ作成処理のフローチャートである。It is a flowchart of the virtual shaping data creation process of a 5th Example. 第5の実施例の操作画面の例を説明する図である。It is a figure explaining the example of the operation screen of a 5th Example. 第6の実施例の整形情報の一例を説明する図である。It is a figure explaining an example of the shaping information of a 6th Example.
 以下、発明を実施するための実施例を、図面を用いて説明する。 Hereinafter, an embodiment for carrying out the invention will be described with reference to the drawings.
 第1の実施例では、ヘルスケアデータに基づいて、課題となる疾病を抽出するヘルスケアデータ分析装置の例を説明する。 In the first embodiment, an example of a health care data analysis device that extracts a disease to be a problem based on health care data will be described.
 健康保険事業者は、将来の課題となる疾病を把握したいというニーズを有している。この理由としては、現状の疾病毎の罹患率や医療費の算出は容易だが、将来の変化を予測することは容易ではないこと、また将来の課題疾病を把握することで、長期的な保健指導計画を作成できること、などが挙げられる。課題となる疾病の定義は健康保険事業者毎かつその時々で異なるが、例えば、将来の罹患者数が増加する疾病、将来かかる医療費が増加する疾病などが挙げられる。 Health insurance providers have a need to understand diseases that will be a future issue. The reason for this is that while it is easy to calculate the prevalence and medical expenses for each disease, it is not easy to predict future changes, and by grasping the future problem diseases, long-term health guidance is provided. The ability to create a plan. The definition of a disease that is a problem varies depending on the health insurance company and from time to time. Examples thereof include diseases in which the number of affected persons increases in the future and diseases in which medical expenses increase in the future.
 第1の実施例では、ヘルスケアデータに基づき作成された疾病間の因果関係及び病態の遷移構造のモデルと、課題疾病を抽出するためにユーザが入力した課題疾病抽出論理に基づき、課題となる疾病を抽出する。 In the first embodiment, the problem is based on the causal relationship between diseases created based on the health care data and the model of the transition structure of the disease state, and the problem disease extraction logic input by the user to extract the problem disease. Extract disease.
 ここで、ヘルスケアデータとは、対象者毎の診療記録や検査値など、個人毎の医療・健康に関する情報を含むデータのことである。ヘルスケアデータに含まれる情報の具体的な例としては、例えば、対象者の傷病名、対象者に行われた医療行為、医療行為の費用、保健指導、問診に基づく生活習慣、などがあげられる。 Here, the health care data is data including information on medical / health for each individual such as medical records and test values for each target person. Specific examples of information included in the health care data include, for example, the name of the subject's injury and illness, the medical practice performed on the subject, the cost of the medical practice, health guidance, lifestyle based on interviews, etc. .
 以後、本実施例では、レセプト情報、健診情報、問診情報の3つの情報がヘルスケアデータ内に存在している場合を説明するが、必ずしもこれらを全て含む必要はない。以下、レセプト情報、健診情報、問診情報の3つの情報の概要を説明する。 Hereinafter, in the present embodiment, a case will be described in which three pieces of information including receipt information, medical examination information, and inquiry information exist in the health care data, but it is not necessary to include all of them. Hereinafter, an outline of three pieces of information, that is, receipt information, medical examination information, and inquiry information will be described.
 レセプト情報は、健康保険の加入者が医療機関を受診した際の傷病名、処方された医薬品、実施された診療行為、及び医療費(点数)などが記録された情報であり、その一例は図6を用いて後述する。なお、処方された医薬品、及び実施された診療行為を医療行為と総称する。 Receipt information is information that records the name of the sickness, prescription drugs, medical practice performed, and medical expenses (scores) when a health insurance member visits a medical institution. 6 will be described later. In addition, the prescribed medicine and the practiced medical practice are collectively referred to as medical practice.
 健診情報は、健康保険の加入者が健康診断を受診した場合の検査値が記憶された情報であり、その一例は図7を用いて後述する。 The health check information is information in which test values when a health insurance subscriber receives a health check, and an example thereof will be described later with reference to FIG.
 問診情報は、健康保険の加入者が健康診断を受診した場合の生活習慣や既往歴、自覚症状などの問診の結果が記憶された情報であり、その一例は図8を用いて後述する。 The interview information is information in which the results of interviews such as lifestyle habits, past medical history, subjective symptoms, etc. when a health insurance subscriber receives a medical checkup, and an example thereof will be described later with reference to FIG.
 図1は、第1の実施例のヘルスケアデータ分析装置の構成を示すブロック図である。 FIG. 1 is a block diagram showing the configuration of the health care data analyzer of the first embodiment.
 第1の実施例のヘルスケアデータ分析装置は、データ分析装置101及びデータベース115を有する。 The health care data analysis apparatus according to the first embodiment includes a data analysis apparatus 101 and a database 115.
 データ分析装置101は、入力部102、出力部103、演算装置104、メモリ105及び記憶媒体106を有する。 The data analysis apparatus 101 includes an input unit 102, an output unit 103, an arithmetic unit 104, a memory 105, and a storage medium 106.
 入力部102は、マウス、キーボードなどのヒューマンインターフェースであり、データ分析装置101への入力を受け付ける。出力部103は、ヘルスケアデータ分析装置による演算結果を出力するディスプレイやプリンタである。記憶媒体106は、データ分析装置101によるヘルスケアデータ分析処理を実現する各種プログラム、及びデータ分析処理の実行結果等を格納する記憶装置であり、例えば、不揮発性記憶媒体(磁気ディスクドライブ、不揮発性メモリ等)である。メモリ105には、記憶媒体106に格納されているプログラムが展開される。演算装置104は、メモリ105にロードされたプログラムを実行する演算装置であり、例えば、CPU、GPUなどである。以下に説明する処理及び演算は、演算装置104が実行する。 The input unit 102 is a human interface such as a mouse and a keyboard, and receives input to the data analysis apparatus 101. The output unit 103 is a display or a printer that outputs a calculation result by the health care data analyzer. The storage medium 106 is a storage device that stores various programs that realize healthcare data analysis processing by the data analysis device 101, execution results of the data analysis processing, and the like. For example, the storage medium 106 is a non-volatile storage medium (magnetic disk drive, non-volatile Memory). In the memory 105, a program stored in the storage medium 106 is expanded. The arithmetic device 104 is an arithmetic device that executes a program loaded in the memory 105, and is, for example, a CPU, a GPU, or the like. The processing device 104 executes the processing and calculation described below.
 第1の実施例のヘルスケアデータ分析装置システムは、一つの計算機上で、又は、論理的又は物理的に構成された複数の計算機上で構成される計算機システムであり、同一の計算機上で別個のスレッドで動作してもよく、複数の物理的計算機資源上に構築された仮想計算機上で動作してもよい。 The health care data analyzer system according to the first embodiment is a computer system configured on a single computer or on a plurality of logically or physically configured computers, and is separated on the same computer. May operate on a virtual machine constructed on a plurality of physical computer resources.
 演算装置104によって実行されるプログラムは、リムーバブルメディア(CD-ROM、フラッシュメモリなど)又はネットワークを介して各サーバに提供され、非一時的記憶媒体である不揮発性記憶装置に格納される。このため、計算機システムは、リムーバブルメディアを読み込むインターフェースを備えてもよい。 The program executed by the computing device 104 is provided to each server via a removable medium (CD-ROM, flash memory, etc.) or a network, and is stored in a non-volatile storage device that is a non-temporary storage medium. For this reason, the computer system may include an interface for reading removable media.
 以下では、まずヘルスケアデータ分析処理の一つである課題疾病抽出について説明する。その後、課題疾病抽出処理などで用いられる各種データやその各種データの処理について説明する。 In the following, problem disease extraction, which is one of the health care data analysis processing, will be described first. Thereafter, various data used in the subject disease extraction process and the process of the various data will be described.
 まず、課題疾病抽出部110について説明する。 First, the problem disease extraction unit 110 will be described.
 課題疾病抽出部110では、対象者群において、将来の発症確率が増加する疾病、将来の医療費が増加する疾病など、将来の課題になると考えられる課題疾病と、課題疾病に関する情報を抽出する機能を提供する。ここでは、健康保険事業者が、被保険者群のデータから課題疾病を抽出する場合を例に挙げて説明する。 The problem disease extraction unit 110 extracts a problem disease that is considered to be a future problem, such as a disease that increases the probability of future onset or a disease that increases future medical expenses, and information on the problem disease in the subject group. I will provide a. Here, a case where the health insurance company extracts a problem disease from data of the insured group will be described as an example.
 まず、課題疾病抽出部110の処理の概要を説明する。課題疾病抽出部110では、課題疾病を抽出する対象である被保険者群の既知の情報から、未知の項目、例えば、将来の疾病の発症確率を予測する。次に発症確率に基づき、疾病毎に疾病評価指標を算出し、算出した疾病評価指標に基づき、課題となる疾病を抽出する。 First, the outline of the processing of the problem disease extraction unit 110 will be described. The problem disease extraction unit 110 predicts the probability of occurrence of an unknown item, for example, a future disease, from the known information of the insured group that is the target for extracting the problem disease. Next, a disease evaluation index is calculated for each disease based on the onset probability, and a disease to be a problem is extracted based on the calculated disease evaluation index.
 以下、課題疾病抽出部110の処理の詳細を説明する。 Hereinafter, details of the processing of the problem disease extraction unit 110 will be described.
 まず、課題疾病抽出部110では、課題疾病抽出を行う対象者群のデータを整形情報記憶部113または入力部102から読み込む。例えば、グラフィカルモデル記憶部114に記憶されるグラフィカルモデルの作成に利用された被保険者群のデータをそのまま用いる場合は、整形情報記憶部113に記憶される整形情報を用いる。未知の被保険者群のデータを用いる場合は、データを入力部102から読込み、必要に応じて、データ整形部107で整形したものを用いる。なお、対象者群のデータは、データに含まれる全対象者のデータを用いても良いし、対象者群の部分集合をサンプリングしたものを用いてもよい。例えば、ある年齢以上の被保険者群を対象とする場合、年齢の項目に閾値を設定し、整形情報に含まれるデータの内、閾値以上の年齢を有する被保険者のデータだけを選択すれば良い。サンプリングは、年齢や診療行為数などの項目に閾値を設けても良い。また、ランダムサンプリングなどの公知のサンプリング手法を用いてサンプリングしても良い。サンプリングを実施することにより、特定の集団における課題疾病を抽出できる。 First, the problem disease extraction unit 110 reads data of a group of subjects who perform problem disease extraction from the shaping information storage unit 113 or the input unit 102. For example, when the data of the insured group used for creating the graphical model stored in the graphical model storage unit 114 is used as it is, the shaping information stored in the shaping information storage unit 113 is used. When using data of an unknown insured group, data read from the input unit 102 and shaped by the data shaping unit 107 as necessary are used. The data of the subject group may be data of all subjects included in the data, or may be a sample of a subset of the subject group. For example, when targeting a group of insured persons older than a certain age, if a threshold is set in the item of age and only data of insured persons having an age equal to or greater than the threshold is selected from the data included in the shaping information, good. For sampling, thresholds may be provided for items such as age and the number of medical treatments. Moreover, you may sample using well-known sampling methods, such as random sampling. By performing sampling, it is possible to extract problem diseases in a specific group.
 次に、疾病候補として、グラフィカルモデル記憶部118に記憶されるグラフィカルモデルに含まれる項目のうち、疾病に関する項目を疾病候補として選択する。疾病に関する項目は、例えば、整形情報記憶部117に記憶される整形情報に基づき選択する。 Next, among the items included in the graphical model stored in the graphical model storage unit 118 as a disease candidate, an item related to a disease is selected as a disease candidate. The item regarding the disease is selected based on the shaping information stored in the shaping information storage unit 117, for example.
 次に、対象群のデータと疾病候補を用いて課題疾病抽出を行う処理を、図3を用いて説明する。図3は対象群のデータと疾病候補を用いて課題疾病抽出を行う処理のフローチャットである。 Next, the process of extracting the target disease using the target group data and the disease candidate will be described with reference to FIG. FIG. 3 is a flowchart of a process for extracting a target disease using target group data and disease candidates.
 以下、各ステップの処理について説明する。 Hereinafter, the processing of each step will be described.
 課題疾病抽出論理決定ステップ301では、入力部102に入力された情報に基づき、課題疾病抽出論理を決定する。課題疾病抽出論理は、疾病候補毎に算出される指標である疾病評価指標と、発症確率に基づき疾病毎に疾病評価指標を算出する方法である疾病評価指標算出方法と、対象者毎に算出された疾病評価指標を疾病別に集計する方法である疾病評価指標集計方法と、集計した疾病評価指標に基づき疾病を課題として抽出する条件である課題疾病抽出条件、の4つから構成される。 In the problem disease extraction logic determination step 301, the problem disease extraction logic is determined based on the information input to the input unit 102. The problem disease extraction logic is calculated for each subject, a disease evaluation index that is an index calculated for each disease candidate, a disease evaluation index calculation method that is a method for calculating a disease evaluation index for each disease based on the probability of occurrence, and a target person. The disease evaluation index totaling method, which is a method for totaling the disease evaluation index for each disease, and the problem disease extraction condition, which is a condition for extracting a disease as a problem based on the total disease evaluation index.
 疾病評価指標とは、課題疾病を抽出するために疾病毎に算出される指標であり、発症確率予測部109で予測した発症確率および医療費に基づき決定される指標である。基本的な指標としては、例えば、N年後の疾病発症確率、N年後の疾病の発症人数の期待値、N年後の疾病にかかる医療費の期待値、などが挙げられる。また複雑な指標としては、異なる年度の発症確率に基づいた複数の指標を組み合わせた指標が挙げられる。例えば、N年後の疾病発症人数の期待値とN+10年後の疾病発症人数の期待値の2つの指標から、N年後からN+10年後の10年間の疾病発症人数増加率、という新たな指標を定義することができる。ここでNは任意の自然数を表す。 The disease evaluation index is an index calculated for each disease in order to extract the target disease, and is an index determined based on the onset probability and medical expenses predicted by the onset probability prediction unit 109. Examples of basic indicators include the probability of disease onset after N years, the expected value of the number of people who develop disease after N years, and the expected value of medical expenses related to the disease after N years. Moreover, as a complicated index, an index combining a plurality of indices based on the onset probability in different years can be cited. For example, a new index of the rate of increase in the number of disease onset for 10 years from N years to N + 10 years from two indicators of the expected value of the number of disease onset after N years and the expected value of the number of disease onset after N + 10 years Can be defined. Here, N represents an arbitrary natural number.
 疾病評価指標算出方法とは、疾病評価指標を、発症確率から算出するための算出方法であり、疾病評価指標毎に定義される。例えば、N年後の疾病の発症人数の期待値であれば、対象者1人あたりの発症確率が、そのまま対象者1人に対する発症人数の期待値となる。N年後の疾病にかかる医療費の期待値であれば、例えば、N年後の疾病の発症確率期待値に、疾病毎にかかる平均医療費を掛け合わせることで算出できる。疾病毎の平均医療費は、例えば、整形情報記憶部117に記憶される整形情報から、該疾病に罹患した対象者の平均医療費を算出したものを用いれば良い。また、ヘルスケアデータに医療費情報が含まれる場合は、例えば、課題疾病に関係する医療行為のN年後の医療費期待値を合計した値を用いても良い。N年後からN+10年後の10年間の疾病発症人数増加率のように、異なる年度の発症確率に基づいた指標であれば、N年後の疾病発症人数期待値と、N+10年後の疾病発症人数期待値を予測した後、N+10年後の疾病人数期待値とN年後の疾病発症人数期待値から増加率を計算すれば良い。 The disease evaluation index calculation method is a calculation method for calculating a disease evaluation index from an onset probability, and is defined for each disease evaluation index. For example, if it is an expected value of the number of onset of illness N years later, the probability of onset per subject is directly the expected number of onset for one subject. If it is the expected value of the medical expenses related to the disease after N years, it can be calculated, for example, by multiplying the expected value of the onset probability of the disease after N years by the average medical expenses for each disease. As the average medical cost for each disease, for example, the average medical cost calculated for the subject suffering from the disease from the shaping information stored in the shaping information storage unit 117 may be used. In addition, when medical cost information is included in the health care data, for example, a value obtained by summing up expected medical cost values N years after the medical action related to the subject disease may be used. If the index is based on the probability of onset in different years, such as the rate of increase in the number of disease onset in 10 years from N years to N + 10 years later, the expected number of disease onset in N years and the onset of disease in N + 10 years After predicting the expected number of people, the rate of increase may be calculated from the expected number of sick people after N + 10 years and the expected number of sick people after N years.
 疾病評価指標集計方法とは、対象者毎に求められた各々の疾病評価指標を、対象群全体の疾病評価指標として集計する方法である。例えば、疾病評価指標が、N年後の疾病の発症人数期待値であるときの集計方法の一例としては、対象者毎に算出した発症人数期待値を、対象者全体で合計することで、対象者全体における発症人数期待値を算出できる。疾病評価指標が、N年後からN+10年後の10年間の疾病発症人数増加率であるときは、対象者毎に算出した指標を合計することでは集計できないため、N年後の疾病発症人数期待値とN+10後の疾病人数期待値のそれぞれを、対象者全体で合計した後、増加率を計算することで、集計すれば良い。 The disease evaluation index counting method is a method of counting each disease evaluation index obtained for each subject as a disease evaluation index for the entire target group. For example, as an example of a counting method when the disease evaluation index is an expected value of the number of onset of illness after N years, the expected number of onset calculated for each subject is totaled for the entire subject, The expected number of people affected can be calculated for the entire population. If the disease evaluation index is the rate of increase in the number of people with disease onset for 10 years from N years to N + 10 years later, it cannot be tabulated by summing up the indicators calculated for each subject, so expect the number of people with disease onset in N years The values and the expected number of illnesses after N + 10 may be summed up by calculating the rate of increase after adding them up for the entire subject.
 課題疾病抽出条件とは、疾病別に集計された疾病評価指標に基づき、疾病を課題とし抽出するための条件である。例えば、疾病の発症人数期待値に基づき、疾病を課題として抽出する条件の一例としては、発症人数期待値に関する閾値を設定し、閾値を超える発症人数期待値を有する疾病を課題として抽出する方法が挙げられる。また別の例としては、発症人数期待値の大きい順に疾病を並べ替えて、大きい順に所定の数だけ選定した疾病を、課題とする方法が挙げられる。 The problem disease extraction condition is a condition for extracting a disease as a problem based on a disease evaluation index collected by disease. For example, as an example of a condition for extracting a disease as a problem based on the expected number of people with onset of disease, there is a method of setting a threshold for the expected number of people with onset and extracting a disease having an expected number of people with onset exceeding the threshold as a task. Can be mentioned. As another example, there is a method in which diseases are rearranged in descending order of the expected number of onset patients, and a predetermined number of diseases are selected in descending order.
 以下に、入力部102から入力される情報に基づき、課題疾病抽出論理を決定する例を示す。 Hereinafter, an example in which the problem disease extraction logic is determined based on information input from the input unit 102 will be described.
 1つ目の例としては、事前にデータベースに登録された課題疾病抽出論理をユーザが選択し、決定する。 As a first example, the user selects and determines a problem disease extraction logic registered in the database in advance.
 2つ目の例としては、事前にデータベースに登録された課題疾病抽出論理をユーザがテンプレートとして選択し、その一部を修正する情報を与えて変更し、最終的な課題疾病抽出論理を決定する。例えば、予測する年度、課題疾病抽出条件に用いる閾値などをユーザが要望に応じて修正することで、所望の課題疾病を抽出できる課題疾病抽出論理を決定できる。 As a second example, the user selects a problem disease extraction logic registered in the database in advance as a template, gives information to modify a part thereof, changes the logic, and determines a final problem disease extraction logic. . For example, it is possible to determine a problem disease extraction logic that allows a user to extract a desired problem disease by correcting a prediction year, a threshold value used for the problem disease extraction condition, or the like as desired.
 以下、対象者サンプル選択ステップ302から、ステップ307までのステップは対象者一人ひとりに対して実施する処理であり、全対象者に対して一巡する1サイクルの処理である。以下、具体的な処理を説明する。 Hereinafter, the steps from the target person sample selection step 302 to the step 307 are processes performed for each target person, and are one cycle of processes for all the target persons. Specific processing will be described below.
 対象者サンプル選択ステップ302では、当該サイクルで未処理の被保険者サンプルを一つ選択する。以後の説明のため、本ステップで選択された対象者を被保険者Sとする。 In subject sample selection step 302, one unprocessed insured sample is selected in the cycle. For the following explanation, the subject selected in this step is assumed to be insured S.
 以下、疾病候補選択ステップ303から、ステップ306の処理は、疾病候補の項目一つひとつに実施する処理であり、全疾病候補項目に対して一巡する1サイクルの処理である。以下、具体的な処理を説明する。 Hereinafter, the processing from the disease candidate selection step 303 to step 306 is processing performed for each item of the disease candidate, and is one cycle of processing for all the disease candidate items. Specific processing will be described below.
 疾病候補選択ステップ303では、当該サイクルで未評価の疾病候補項目を一つ選択する。以後の説明のため、本ステップで選択された項目を疾病Dとする。 In disease candidate selection step 303, one disease candidate item that has not been evaluated in the cycle is selected. For the following description, the item selected in this step is referred to as disease D.
 確率予測ステップ304では、対象者サンプル選択ステップ302で選択した被保険者Sが疾病Dを発症する確率を予測する。ヘルスケアデータに医療費情報が含まれる場合は、疾病Dに関係する医療行為の医療費の期待値も予測する。予測は、被保険者Sの既知情報に基づき、発症確率予測部109を用いて行う。予測する発症確率の時期は、課題疾病抽出論理に含まれる情報に基づいて決定する。例えば、疾病評価指標に、N年後の病態の発症人数期待値が指定されていた場合、現在をX年とするとき、疾病DのX+N年の発症確率および医療費の期待値を予測する。また、N年後からN+10年後の10年間の疾病発症人数増加率のように、異なる年度の発症確率に基づいた指標であれば、X+N年の発症確率と、N+10年の発症人数期待値を予測する。 In the probability prediction step 304, the probability that the insured S selected in the subject sample selection step 302 will develop the disease D is predicted. When medical cost information is included in the health care data, the expected value of the medical cost of the medical practice related to the disease D is also predicted. The prediction is performed using the onset probability prediction unit 109 based on the known information of the insured person S. The predicted onset probability is determined based on information included in the problem disease extraction logic. For example, when an expected number of people with onset of a disease state after N years is designated as the disease evaluation index, assuming the current year as year X, the onset probability of disease D in X + N years and the expected value of medical expenses are predicted. In addition, if the index is based on the probability of onset in different years, such as the rate of increase in the number of disease onset for 10 years from N years to N + 10 years, the probability of onset in X + N years and the expected number of people in N + 10 years Predict.
 疾病評価指標算出ステップ305では、疾病発症確率予測ステップ304で予測した疾病Dの発症確率および医療費から、疾病評価指標を算出する。疾病評価指標の算出方法は、課題疾病抽出論理決定ステップ301で決定した疾病評価指標算出方法に従う。算出した疾病評価指標は、疾病評価指標を算出した被保険者Sに関する情報と纏めて、疾病評価指標記憶部116に保存する。 In the disease evaluation index calculation step 305, a disease evaluation index is calculated from the onset probability of the disease D predicted in the disease onset probability prediction step 304 and the medical expenses. The disease evaluation index calculation method follows the disease evaluation index calculation method determined in the problem disease extraction logic determination step 301. The calculated disease evaluation index is stored in the disease evaluation index storage unit 116 together with information on the insured S for which the disease evaluation index has been calculated.
 ステップ306では、疾病候補のうち、当該サイクルで未評価の項目があれば、疾病候補選択ステップ303に戻り、未評価の項目を選択する。なければ、当該サイクルを終了し、ステップ307に移る。 In step 306, if there is an unevaluated item in the cycle among illness candidates, the process returns to the illness candidate selection step 303, and an unevaluated item is selected. If not, the cycle is terminated and the routine goes to Step 307.
 ステップ307では、対象者群のうち、当該サイクルで未予測の対象者がいれば、対象者サンプル選択ステップ302に戻り、未予測の対象者を選択する。なければ、当該サイクルを終了し、ステップ308に移る。 In step 307, if there is an unpredicted target person in the cycle in the target person group, the process returns to the target person sample selection step 302, and an unpredicted target person is selected. If not, the cycle is terminated and the process proceeds to step 308.
 疾病別評価指標集計ステップ308では、疾病評価指標記憶部116に記憶された、
被保険者毎の疾病評価指標を、疾病別に集計する。集計方法は、課題疾病抽出論理決定ステップ301で決定された疾病評価指標集計方法に従う。集計した疾病評価指標は、疾病評価指標記憶部116に記憶する。
In the disease-specific evaluation index totaling step 308, the disease evaluation index storage unit 116 stores the
The disease evaluation index for each insured is tabulated by disease. The aggregation method follows the disease evaluation index aggregation method determined in the problem disease extraction logic determination step 301. The aggregated disease evaluation index is stored in the disease evaluation index storage unit 116.
 課題疾病抽出ステップ309では、疾病別評価指標集計ステップ308で集計した疾病別の疾病評価指標を用いて、課題疾病を抽出する。課題疾病の抽出方法は、課題疾病抽出論理決定ステップ301で決定された課題疾病抽出方法に従う。 In the target disease extraction step 309, the target disease is extracted using the disease-specific disease evaluation index calculated in the disease-specific evaluation index totaling step 308. The problem disease extraction method follows the problem disease extraction method determined in the problem disease extraction logic determination step 301.
 なお、上記の説明では、疾病候補選択ステップ303で疾病候補項目を一つ選択する例を示したが、疾病候補選択ステップ303では、複数の疾病候補を一度に選択しても良い。この場合、疾病発症確率予測ステップ304では、複数の疾病の発症確率を一度に予測する。 In the above description, an example in which one disease candidate item is selected in the disease candidate selection step 303 is shown. However, in the disease candidate selection step 303, a plurality of disease candidates may be selected at a time. In this case, in the disease onset probability prediction step 304, the onset probability of a plurality of diseases is predicted at a time.
 以上、課題疾病抽出部110の処理により抽出した課題疾病の情報は、課題疾病記憶部117に記憶する。課題疾病記憶部117に記憶された課題疾病に関する情報は、例えば、出力部103から文字形式、表形式、などで出力しても良い。 As described above, the information on the problem disease extracted by the process of the problem disease extraction unit 110 is stored in the problem disease storage unit 117. The information regarding the problem disease stored in the problem disease storage unit 117 may be output from the output unit 103 in a character format, a table format, or the like, for example.
 図17は本実施例を実現する形態の一例を示す、ユーザインターフェイスの画面例である。 FIG. 17 is a screen example of a user interface showing an example of a form for realizing the present embodiment.
 1701は課題疾病抽出の設定を行う操作窓である。ここでは、対象群の絞り込みと、疾病候補の絞り込みと、課題疾病抽出論理の設定が可能な例を示している。 Reference numeral 1701 denotes an operation window for performing setting of problem disease extraction. Here, an example is shown in which target group narrowing, disease candidate narrowing, and problem disease extraction logic can be set.
 1702は、入力した対象者群のデータに対する絞込み条件を設定する入力窓である。ここでは例として、対象者群に含まれる男性を、課題疾病抽出の対象群として設定している。 Reference numeral 1702 denotes an input window for setting a narrowing condition for the input target group data. Here, as an example, a male included in the subject group is set as a subject group for subject disease extraction.
 1703は、グラフィカルモデル記憶部118に記憶されるグラフィカルモデルに含まれる項目のうち、疾病に関する疾病候補の項目を絞り込む絞込み条件を設定する入力窓である。ここでは例として、条件を設定せず、全ての疾病項目を課題疾病抽出の対象としている。 1703 is an input window for setting a narrowing condition for narrowing down items of disease candidates related to diseases among items included in the graphical model stored in the graphical model storage unit 118. Here, as an example, conditions are not set, and all disease items are targeted for subject disease extraction.
 1704は、課題疾病抽出論理を決定する入力窓である。ここでは、翌年医療費を疾病評価指標として選択し、疾病評価指標算出方法と疾病評価指標集計方法と課題疾病抽出方法は、選択した疾病評価指標に基づき、データベースから読み出される例を示している。 1704 is an input window for determining the subject disease extraction logic. Here, an example is shown in which the next year medical expenses are selected as a disease evaluation index, and the disease evaluation index calculation method, the disease evaluation index tabulation method, and the problem disease extraction method are read from the database based on the selected disease evaluation index.
 1705は、1702、1703、1704で設定した課題疾病抽出の設定に基づき、課題疾病抽出処理を開始する実行ボタンである。 1705 is an execution button for starting the target disease extraction process based on the target disease extraction settings set in 1702, 1703, and 1704.
 1706は、処理結果を表示する表示窓である。 1706 is a display window for displaying the processing result.
 1707は、抽出した課題を表示する表示画面である。ここでは、翌年医療費に基づき抽出された課題疾病が、翌年医療費の高い順に、表形式で表示されている。 1707 is a display screen for displaying the extracted issues. Here, the subject diseases extracted based on the next year's medical expenses are displayed in a table format in descending order of the next year's medical expenses.
 なお、本実施例では、データベース内の医療情報記憶部116に記憶されたデータをデータ整形部107が整形し、整形情報記憶部117に記憶された整形データに基づきグラフィカルモデル作成部108がグラフィカルモデルを作成する例を説明したが、整形情報記憶部117がヘルスケアデータに基づいて事前に作成された整形情報を記憶しており、かつ、グラフィカルモデル記憶部118が整形情報から事前に作成されたグラフィカルモデルを記憶している場合、データ整形部107、グラフィカルモデル作成部108、医療情報記憶部116は、本実施例の構成に無くとも良い。図2は、ヘルスケアデータ分析装置101が、データ整形部107、グラフィカルモデル作成部108を備えず、データベース115が医療情報記憶部116を備えない、別の構成例を示す図である。 In this embodiment, the data shaping unit 107 shapes the data stored in the medical information storage unit 116 in the database, and the graphical model creation unit 108 uses the graphical model based on the shaping data stored in the shaping information storage unit 117. The shaping information storage unit 117 stores the shaping information created in advance based on the health care data, and the graphical model storage unit 118 is created in advance from the shaping information. When the graphical model is stored, the data shaping unit 107, the graphical model creation unit 108, and the medical information storage unit 116 may not be provided in the configuration of this embodiment. FIG. 2 is a diagram illustrating another configuration example in which the healthcare data analysis apparatus 101 does not include the data shaping unit 107 and the graphical model creation unit 108, and the database 115 does not include the medical information storage unit 116.
 以上のように、本実施例に係るヘルスケアデータ分析装置は、蓄積された対象群のヘルスケアデータから、将来の課題となる疾病を様々な指標に基づき、かつ、簡便な操作により、抽出できる。 As described above, the health care data analysis apparatus according to the present embodiment can extract a disease that will be a future problem from the accumulated health care data of the target group based on various indices and by a simple operation. .
 以下では、上記の抽出処理において用いられた各種データ及びデータ処理について説明する。 Hereinafter, various data and data processing used in the above extraction processing will be described.
 まず、第1の実施例で扱うヘルスケアデータについて説明する。医療情報記憶部116は、入力部102に入力されたヘルスケアデータを格納する。以下、レセプト情報、健診情報、問診情報の3つを、代表的なヘルスケアデータの例として取り上げ、それぞれについて説明する。 First, the health care data handled in the first embodiment will be described. The medical information storage unit 116 stores health care data input to the input unit 102. Hereinafter, the receipt information, the medical examination information, and the inquiry information will be taken as examples of typical health care data, and each will be described.
 まず、レセプト情報について説明する。 First, the receipt information will be described.
 レセプト情報は、レセプト基本情報、傷病名情報、診療行為情報、医薬品情報、傷病名分類情報、診療行為分類情報、及び医薬品分類情報などを含む。 The receipt information includes basic receipt information, wound name information, medical practice information, drug information, wound name classification information, medical practice classification information, and pharmaceutical classification information.
 図6は、レセプト基本情報の一例を説明する図である。 FIG. 6 is a diagram illustrating an example of basic receipt information.
 レセプト基本情報601は、レセプトと健康保険の加入者との対応関係を保持する情報である。レセプト基本情報601は、検索番号602、健保加入者ID603、性別604、年齢605、診療年月606、及び合計点数607などを含む。 The basic receipt information 601 is information that holds the correspondence between the receipt and the health insurance subscriber. The basic receipt information 601 includes a search number 602, health insurance subscriber ID 603, gender 604, age 605, treatment date 606, total score 607, and the like.
 検索番号602は、レセプトを一意に識別するための識別子である。健保加入者ID603は、健康保険の加入者を一意に識別するための識別子である。性別604及び年齢605は、当該加入者の性別及び年齢である。 The search number 602 is an identifier for uniquely identifying a receipt. The health insurance subscriber ID 603 is an identifier for uniquely identifying a health insurance subscriber. Gender 604 and age 605 are the gender and age of the subscriber.
 診療年月606は、加入者が医療機関を受診した年及び月である。合計点数607は、一件のレセプトの合計点数を示す情報である。 The medical treatment month 606 is the year and month when the subscriber visited the medical institution. The total score 607 is information indicating the total score of one receipt.
 図9は、傷病名情報901の一例を説明する図である。 FIG. 9 is a diagram for explaining an example of the disease name information 901.
 傷病名情報901は、検索番号602、傷病名コード902、傷病名903などを含む。 The wound name information 901 includes a search number 602, a wound name code 902, a wound name 903, and the like.
 検索番号602は、レセプトを一意に識別するための識別子であり、レセプト基本情報601の検索番号(図6)と同じ番号を用いる。傷病名コード902は、レセプトに記載される傷病名コードである。傷病名903は、当該傷病名コードに対応する傷病の名称である。 The search number 602 is an identifier for uniquely identifying a receipt, and the same number as the search number of the basic receipt information 601 (FIG. 6) is used. The wound name code 902 is a wound name code written on the receipt. The wound name 903 is the name of the wound corresponding to the wound name code.
 図10は、傷病名分類情報を説明する図である。 FIG. 10 is a diagram for explaining wound name classification information.
 傷病名分類情報1001は、傷病分類と当該傷病分類に属する傷病名とを対応づける情報であり、傷病分類1002、傷病名コード902、傷病名903、及び合併症有無1003を含む。 Wound and disease name classification information 1001 is information for associating a wound and disease classification with a wound and disease name belonging to the wound and disease classification, and includes a wound and disease classification 1002, a wound and disease name code 902, a wound and disease name 903, and a complication presence or absence 1003.
 傷病分類1002は、この傷病が属する分類である。傷病名コード902は、レセプトに記載される傷病名コードであり、傷病名情報901の傷病名コード902(図9)と同じ番号を用いる。傷病名903は、当該傷病名コードに対応する傷病の名称であり、傷病名情報901の傷病名903(図9)と同じ名称を用いる。合併症有無1003は、この傷病が合併症の傷病名であるかを示す。 The injury / illness classification 1002 is a classification to which the injury / illness belongs. The wound name code 902 is a wound name code described in the receipt, and the same number as the wound name code 902 (FIG. 9) of the wound name information 901 is used. The wound name 903 is the name of the wound corresponding to the wound name code, and the same name as the wound name 903 (FIG. 9) of the wound name information 901 is used. Complication presence / absence 1003 indicates whether or not this wound is the name of a complication.
 図11は、診療行為情報の一例を説明する図である。 FIG. 11 is a diagram illustrating an example of medical practice information.
 診療行為情報1101は、検索番号602、診療行為コード1102、診療行為名1103、及び診療行為点数1104を含む。 The medical practice information 1101 includes a search number 602, a medical practice code 1102, a medical practice name 1103, and a medical practice score 1104.
 検索番号602は、レセプトを一意に識別するための識別子であり、レセプト基本情報601の検索番号602(図6)と同じ番号を用いる。診療行為コード1102は、レセプトに記載された診療行為を識別するための識別子である。診療行為名1103は、当該診療行為コードに対応する診療行為の名称である。診療行為点数1104は、当該診療行為の保険点数である。 The search number 602 is an identifier for uniquely identifying a receipt, and the same number as the search number 602 (FIG. 6) of the basic receipt information 601 is used. The medical practice code 1102 is an identifier for identifying the medical practice described in the receipt. The medical practice name 1103 is the name of the medical practice corresponding to the medical practice code. The medical practice score 1104 is an insurance score of the medical practice.
 図11では、例えば、検索番号602に「11」のレセプトには、「診療行為A」と「診療行為C」の診療行為名1103が記載されている。 In FIG. 11, for example, in the receipt of “11” in the search number 602, the names of medical treatments 1103 of “medical treatment A” and “medical practice C” are described.
 図12は、診療行為分類情報の一例を説明する図である。 FIG. 12 is a diagram illustrating an example of medical practice classification information.
 診療行為分類情報1201は、傷病分類1002、診療行為コード1102、及び診療行為名1103を含む。 The medical practice classification information 1201 includes a wound classification 1002, a medical practice code 1102, and a medical practice name 1103.
 傷病分類1002は、傷病名分類情報1001の傷病分類1002(図10)と同じ分類を用いる。診療行為コード1102は、傷病分類1002の傷病で行われる診療行為を識別する診療行為コードであり、診療行為情報1101の診療行為コード1102(図11)と同じコードを用いる。診療行為名1103は、当該診療行為コードに対応する診療行為の名称であり、診療行為情報1101の診療行為名1103(図11)と同じコードを用いる。 The wound classification 1002 uses the same classification as the wound classification 1002 (FIG. 10) of the wound name classification information 1001. The medical practice code 1102 is a medical practice code for identifying a medical practice performed for an injury or illness of the wound classification 1002, and uses the same code as the medical practice code 1102 (FIG. 11) of the medical practice information 1101. The medical practice name 1103 is the name of the medical practice corresponding to the medical practice code, and the same code as the medical practice name 1103 (FIG. 11) of the medical practice information 1101 is used.
 図13は、医薬品情報の一例を説明する図である。 FIG. 13 is a diagram illustrating an example of pharmaceutical information.
 医薬品情報1301は、検索番号602、医薬品コード1302、医薬品名1303、及び医薬品点数1304を含む。 The drug information 1301 includes a search number 602, a drug code 1302, a drug name 1303, and a drug score 1304.
 検索番号602は、レセプトを一意に識別するための識別子であり、レセプト基本情報601の検索番号602(図6)と同じ番号を用いる。医薬品コード1302は、レセプトに記載された医薬品を識別するための医薬品コードである。医薬品名1303は、レセプトに記載された医薬品の名称である。医薬品点数1304は、医薬品の保険点数である。 The search number 602 is an identifier for uniquely identifying a receipt, and the same number as the search number 602 (FIG. 6) of the basic receipt information 601 is used. The drug code 1302 is a drug code for identifying the drug described in the receipt. The drug name 1303 is the name of the drug described in the receipt. The drug score 1304 is the insurance score of the drug.
 図13では、例えば、検索番号602が「11」のレセプトは、糖尿病経口薬A及び高血圧経口薬Aとの医薬品名が記載されている。 In FIG. 13, for example, a receipt with a search number 602 of “11” describes the drug names of diabetes oral drug A and hypertension oral drug A.
 図14は、医薬品分類情報を説明する図である。 FIG. 14 is a diagram for explaining drug classification information.
 医薬品分類情報1401は、傷病分類1002、医薬品コード1302、及び医薬品名1303を含む。 The drug classification information 1401 includes a wound classification 1002, a drug code 1302, and a drug name 1303.
 傷病分類1002は、傷病名分類情報1001の傷病分類1002(図10)と同じ分類を用いる。医薬品コード1302は、傷病分類1002に登録された分類で処方される医薬品を識別する医薬品コードであり、医薬品情報1301の医薬品コード1302(図13)と同じコードが用いられる。医薬品名1303は、当該医薬品コードに対応する医薬品の名称であり、医薬品情報1301の医薬品名1303(図13)と同じ名称が用いられる。 The wound classification 1002 uses the same classification as the wound classification 1002 (FIG. 10) of the wound name classification information 1001. The drug code 1302 is a drug code for identifying a drug prescribed by the classification registered in the injury and illness classification 1002, and the same code as the drug code 1302 (FIG. 13) of the drug information 1301 is used. The drug name 1303 is the name of the drug corresponding to the drug code, and the same name as the drug name 1303 (FIG. 13) of the drug information 1301 is used.
 なお、図11に示す医療行為情報1101及び図13に示す医薬品情報を、医療行為情報と総称する。また、図12に示す診療行為分類情報1201及び図14に示す医薬品分類情報を、医療行為分類情報と総称する。 Note that the medical practice information 1101 shown in FIG. 11 and the pharmaceutical information shown in FIG. 13 are collectively referred to as medical practice information. Further, the medical practice classification information 1201 shown in FIG. 12 and the pharmaceutical classification information shown in FIG. 14 are collectively referred to as medical practice classification information.
 次に、健診情報について説明する。 Next, medical examination information will be described.
 図7は、健診情報の一例を説明する図である。 FIG. 7 is a diagram for explaining an example of the medical examination information.
 健診情報701は、複数の加入者の複数年分の健診情報を管理するための情報であり、健保加入者ID603、健診受診日702、及び健康診断における各種検査値(例えば、BMI703、腹囲704、空腹時血糖705、収縮期血圧706、中性脂肪707)を含む。 The medical examination information 701 is information for managing medical examination information for a plurality of subscribers for a plurality of years. The health insurance subscriber ID 603, the medical examination reception date 702, and various examination values (for example, BMI 703, Abdominal circumference 704, fasting blood glucose 705, systolic blood pressure 706, neutral fat 707).
 健保加入者ID603は、健康診断を受診した健康保険の加入者の識別子であり、レセプト基本情報601の健保加入者ID603(図6)と同じ識別子を用いる。健診受診日702は、健康診断を受診した年月日である。BMI703から中性脂肪707は、健康診断の検査の結果である。 The health insurance subscriber ID 603 is an identifier of a health insurance subscriber who has undergone a medical examination, and uses the same identifier as the health insurance subscriber ID 603 (FIG. 6) of the basic receipt information 601. The medical checkup date 702 is the date on which the medical checkup was received. BMI 703 to neutral fat 707 are the results of a health checkup.
 特定の検査を受けなかった場合など、健診情報のデータが欠落することがある。例えば、図7では、健保加入者ID「K0004」が2004年に受診した検査項目のうち収縮期血圧706のデータが欠落している。 健 Data of medical examination information may be missing, such as when a specific examination is not taken. For example, in FIG. 7, data on systolic blood pressure 706 is missing from the examination items that the health insurance subscriber ID “K0004” consulted in 2004.
 次に、問診情報について説明する。 Next, the interview information will be explained.
 図8は、問診情報の一例を説明する図である。 FIG. 8 is a diagram for explaining an example of the inquiry information.
 問診情報801は、複数の加入者の複数年分の問診情報を管理するための情報であり、健保加入者ID603、問診受診日802、及び問診の回答(例えば、喫煙803、飲酒804、歩行805)を含む。なお、問診は、生活習慣、既往歴、アレルギー等の体質、自覚症状などを含んでもよい。 The inquiry information 801 is information for managing the inquiry information for a plurality of subscribers for a plurality of years. The health insurance subscriber ID 603, the inquiry date 802, and the answer to the inquiry (for example, smoking 803, drinking 804, walking 805) )including. The interview may include lifestyle habits, medical history, constitutions such as allergies, subjective symptoms, and the like.
 健保加入者ID603は、問診を受診した健康保険の加入者の識別子であり、レセプト基本情報601の健保加入者ID603(図6)と同じ識別子を用いる。問診受診日802は、問診を受診した年月日である。タバコ803から歩行805は、問診の結果である。タバコ803は、喫煙習慣がある場合は一日の平均喫煙本数であり、喫煙しない場合は「なし」である。飲酒804は、飲酒習慣がある場合は一日の平均飲酒量(単位=ml)であり、飲酒習慣がない場合は「なし」である。歩行805は、一日の平均歩行時間(単位=分)である。 The health insurance subscriber ID 603 is an identifier of a health insurance subscriber who has received an inquiry, and uses the same identifier as the health insurance subscriber ID 603 (FIG. 6) of the receipt basic information 601. The inquiry date 802 is the date on which the inquiry was received. A walk 805 from the cigarette 803 is the result of an inquiry. The cigarette 803 is the average number of cigarettes smoked per day when there is a smoking habit, and “none” when not smoking. The drinking 804 is the average daily drinking amount (unit = ml) when there is a drinking habit, and “none” when there is no drinking habit. Walking 805 is the average walking time (unit = minute) of the day.
 なお、問診情報では、歩数、飲酒量、喫煙本数などの詳しい情報が得られない場合もある。具体的な飲酒量ではなく、予め問診表で区分けされた頻度のうち、該当するものを回答する場合がある。例えば、喫煙や飲酒の有無のみの情報が得られる場合、飲酒の頻度をいくつかの程度に分けて(例えば、(1)飲酒無し、(2)週に1~2回、(3)週に3回以上)回答する場合などである。この場合、問診情報の値は、定量的な意味がない番号である。 In addition, detailed information such as the number of steps, the amount of alcohol consumed, and the number of smokers may not be obtained from the interview information. There is a case where not the specific amount of drinking but the corresponding one of the frequencies classified in advance in the questionnaire is answered. For example, if you can obtain information only about whether or not you smoke or drink alcohol, divide the frequency of alcohol consumption into several levels (eg (1) no alcohol consumption, (2) 1-2 times a week, (3) weeks (3 times or more). In this case, the value of the inquiry information is a number having no quantitative meaning.
 特定の項目に対する回答が無かった場合、問診情報のデータが欠落することがある。例えば、図8では、健保加入者ID「K0003」が2004年に受診した問診項目のうち歩行805に対するデータが欠落している。 ∙ If there is no response to a specific item, the data of the inquiry information may be missing. For example, in FIG. 8, data for the walking 805 is missing among the inquiry items that the health insurance subscriber ID “K0003” consulted in 2004.
 次に、データ整形部107の処理について説明する。データ整形部107は、医療情報記憶部117に記憶されているヘルスケアデータから、加入者毎かつ期間毎の情報を集計・統合し、表形式に整形する。以下では、一つの期間は1年であるとして説明するが、半年、2年、3年など、別の期間でもよい。また、レセプト情報、健診情報、問診情報の全てを用いて整形する例を説明するが、必ずしもこれらのデータが揃っている必要はなく、例えば、レセプト情報と健診情報だけを用いても良い。また、これら以外のデータを追加して行っても良い。 Next, processing of the data shaping unit 107 will be described. The data shaping unit 107 aggregates and integrates information for each subscriber and each period from the health care data stored in the medical information storage unit 117, and shapes the information into a table format. In the following description, one period is assumed to be one year, but another period such as six months, two years, or three years may be used. Moreover, although the example which shape | molds using all of receipt information, medical examination information, and inquiry information is demonstrated, these data do not necessarily need to be prepared, for example, only receipt information and medical examination information may be used. . Further, data other than these may be added.
 図15は、整形情報1501の一例を説明する図である。図15を用いて、データ整形部107の処理を説明する。 FIG. 15 is a diagram for explaining an example of the shaping information 1501. The process of the data shaping part 107 is demonstrated using FIG.
 整形情報1501は、2004年のレセプト情報を整形したレセプト整形情報を含む。整形情報1501の各行は、一つの健保加入者IDに対応する一つの年のデータを集計したものである。 The shaping information 1501 includes the receipt shaping information obtained by shaping the 2004 receipt information. Each row of the shaping information 1501 is obtained by tabulating data for one year corresponding to one health insurance subscriber ID.
 健保加入者ID603、性別604、年齢605及び合計点数607は、それぞれ、レセプト基本情報601の健保加入者ID603、性別604、年齢605及び合計点数607(図6)と同じである。データ年1502は、当該整形情報を作成する元となったデータの年である。 The health insurance subscriber ID 603, gender 604, age 605 and total score 607 are the same as the health insurance subscriber ID 603, sex 604, age 605 and total score 607 (FIG. 6) of the basic receipt information 601, respectively. The data year 1502 is the year of the data from which the shaping information is created.
 傷病名コード10(1503)は、当該健保加入者IDのレセプトのうち傷病名コードが10であるレセプトの数である。傷病名コード20(1504)も同様に、当該健保加入者IDのレセプトのうち傷病名コードが20であるレセプトの数である。診療行為コード1000(1505)は、当該健保加入者IDのレセプトのうち診療行為コードが1000の診療行為が行われたレセプトの数である。医薬品コード110(1506)は、当該健保加入者IDのレセプトのうち医薬品コードが110の医薬品が処方されたレセプトの数である。 Wound and illness name code 10 (1503) is the number of receipts having a wound and illness name code of 10 among the receipts of the health insurance subscriber ID. Similarly, the wound name code 20 (1504) is the number of receipts having the wound name code 20 in the receipt of the health insurance subscriber ID. The medical practice code 1000 (1505) is the number of receipts for which the medical practice code of 1000 is performed among the receipts of the health insurance subscriber ID. The drug code 110 (1506) is the number of receipts for which a drug with the drug code 110 is prescribed among the receipts of the health insurance subscriber ID.
 データ整形部107の処理について、2004年のデータを整形する場合を具体的に説明する。 The processing of the data shaping unit 107 will be specifically described in the case of shaping the 2004 data.
 まず、一つの健保加入者IDを選択する。診療年月が2004年である当該健保加入者IDのレセプトの検索番号をレセプト基本情報601より取得する。次に、傷病名情報901を参照して、傷病名コード毎に、当該傷病名コードが記載されているレセプトの数をカウントする。これによって、各傷病名コードのレセプトの数が得られる。同様に、診療行為情報1101を参照して、診療行為コード毎のレセプトの数をカウントし、医薬品情報1301を参照して、医薬品コード毎のレセプトの数をカウントする。これにより、選択された健保加入者IDの2004年のデータ行が生成される。この処理を、分析対象となる全ての健保加入者ID及び年の組み合わせに対して行う。 First, select one health insurance subscriber ID. The receipt search number of the health insurance subscriber ID whose medical treatment date is 2004 is acquired from the receipt basic information 601. Next, referring to the wound name information 901, for each wound name code, the number of receipts in which the wound name code is described is counted. Thereby, the number of receipts of each disease name code is obtained. Similarly, the number of receipts for each medical practice code is counted with reference to the medical practice information 1101, and the number of receipts for each pharmaceutical code is counted with reference to the pharmaceutical information 1301. As a result, a 2004 data row of the selected health insurance subscriber ID is generated. This process is performed for all combinations of health insurance subscriber IDs and years to be analyzed.
 例えば、図15に示す整形情報1501において、1行目の健保加入者ID「K0001」の2004年のデータは、検索番号「11」「12」「13」がレセプト基本情報601から取得できる。傷病名情報901を参照すると、この三つのレセプトのうち、傷病名コードが「10」であるレセプトは、検索番号「11」及び「13」の二つである。従って、整形情報1501の1行目の傷病名コード10の欄には2が登録される。 For example, in the shaping information 1501 shown in FIG. 15, the search numbers “11”, “12”, and “13” can be acquired from the basic receipt information 601 for the 2004 data of the health insurance subscriber ID “K0001” on the first line. Referring to the wound and disease name information 901, of these three receipts, there are two of the search numbers “11” and “13” whose wound and disease name code is “10”. Therefore, 2 is registered in the column of the disease name code 10 in the first line of the shaping information 1501.
 図15に示す整形情報1501は、健診情報から整形された健診整形情報も含む。各行は、一つの健保加入者IDに対応するデータを集計したものである。 15 includes the medical examination shaping information shaped from the medical examination information. Each row is a total of data corresponding to one health insurance subscriber ID.
 各項目の値は、健保加入者ID603及びデータ年1502に示される加入者及び年における健診データの値である。この健診データは健診情報701から取得できる。健診情報701が同一健保加入者IDの同一年の健診データを含む場合、いずれか一つの受診日のデータを使っても、当該年の複数回の健診結果の平均を使ってもよい。一つの受診日のデータを使う場合、毎年ほぼ同じ時期に実施される一斉健診日のデータを使うとよい。また、欠損が少ないデータを選択してもよい。欠損データは、予め定められた欠損であることを示す数値を用いる。図15に示す例では、-1を用いた。なお、健診情報がない加入者の値は、全て欠損データとする。 The value of each item is the value of the medical examination data for the subscriber and year indicated by the health insurance subscriber ID 603 and the data year 1502. This medical examination data can be acquired from the medical examination information 701. When the medical examination information 701 includes medical examination data of the same health insurance subscriber ID for the same year, the data of any one of the examination dates may be used, or the average of a plurality of medical examination results for the year may be used. . When using data from a single visit date, it is recommended to use data from a general checkup date that is carried out at approximately the same time every year. In addition, data with few defects may be selected. As the missing data, a numerical value indicating a predetermined missing is used. In the example shown in FIG. 15, −1 was used. In addition, all the values of subscribers without medical examination information are assumed to be missing data.
 図15に示す整形情報1501は、問診情報から整形された問診整形情報も含む。各行は、一つの健保加入者IDに対応するデータを集計したものである。 The shaping information 1501 shown in FIG. 15 includes inquiry shaping information shaped from the inquiry information. Each row is a total of data corresponding to one health insurance subscriber ID.
 各項目の値は、健保加入者ID603及びデータ年1502に示される加入者及び年における問診データの値である。この問診データは問診情報801から取得できる。問診情報801が同一健保加入者IDの同一年の問診データを含む場合、いずれか一つの受診日のデータを使っても、当該年の複数回の問診結果を平均を使ってもよい。一つの受診日のデータを使う場合、毎年ほぼ同じ時期に実施される一斉健診日のデータを使うとよい。又は、欠損が少ないデータを選択してもよい。欠損データは、予め定められた欠損であることを示す数値を用いる。図15に示す例では、-1を用いた。なお、健診情報がない加入者の値は、全て欠損データとする。 The value of each item is the value of the inquiry data for the subscriber and year shown in the health insurance subscriber ID 603 and the data year 1502. This inquiry data can be acquired from the inquiry information 801. When the inquiry information 801 includes inquiry data of the same health insurance subscriber ID in the same year, the data of any one of the consultation dates may be used, or an average of a plurality of interview results in the year may be used. When using data from a single visit date, it is recommended to use data from a general checkup date that is carried out at approximately the same time every year. Alternatively, data with few defects may be selected. As the missing data, a numerical value indicating a predetermined missing is used. In the example shown in FIG. 15, −1 was used. In addition, all the values of subscribers without medical examination information are assumed to be missing data.
 以上の処理によって、レセプト整形情報、健診整形情報及び問診整形情報を生成することができる。なお、図15には2004年のデータのみを示したが、別の年の整形データも作成する。 Through the above processing, the receipt shaping information, the medical examination shaping information, and the inquiry shaping information can be generated. FIG. 15 shows only data for 2004, but shaping data for another year is also created.
 ここで、レセプト整形情報を作成する際に、類似の項目を纏めて、複数の項目を統合してもよい。例えば、医薬品の項目のうち、糖尿病経口薬Aの機能と糖尿病経口薬Bの機能とが類似している場合、これらを纏めて一つの項目として扱ってもよい。このとき、同一年度の糖尿病経口薬Aの処方回数と糖尿病経口薬Bの処方回数とを加算した値を、新しく纏めた項目の値とする。項目が類似するかを判断するための基準は、以下の通りとするよい。診療行為分類情報1201で同一傷病分類に属する診療行為名を類似項目とする。また、医薬品分類情報1401で同一傷病分類に属する医薬品名を類似項目とする。また、予め類似項目情報を人手により作成しておく。 Here, when creating the receipt shaping information, similar items may be collected and a plurality of items may be integrated. For example, when the function of the diabetic oral drug A and the function of the diabetic oral drug B are similar among the items of pharmaceuticals, these may be collectively treated as one item. At this time, a value obtained by adding the number of prescriptions of the oral diabetes drug A and the prescription number of the oral diabetes drug B in the same year is set as the value of the newly summarized item. The criteria for judging whether items are similar may be as follows. The medical practice name belonging to the same injury and illness classification in the medical practice classification information 1201 is set as a similar item. In addition, the names of drugs belonging to the same injury and illness classification in the drug classification information 1401 are set as similar items. Also, similar item information is created in advance by hand.
 図16は、レセプト整形情報の傷病名コード10と傷病名コード20とを統合した整形情報1501の例を説明する図である。傷病名コード1601の値は、図15の傷病名コード1503の値と傷病名コード1504の値とを加えた値であり、傷病名コードが「10」であるレセプトの数と傷病名コードが「20」であるレセプトの数と合計である。 FIG. 16 is a diagram for explaining an example of shaping information 1501 obtained by integrating the wound name code 10 and the wound name code 20 of the receipt shaping information. The value of the wound name code 1601 is a value obtained by adding the value of the wound name code 1503 and the value of the wound name code 1504 in FIG. 15, and the number of receipts with the wound name code “10” and the wound name code “ The number and total of the receipts that are 20 ”.
 図15、図16に示す、作成されたレセプト整形情報、健診整形情報及び問診整形情報は、データベース116の整形情報記憶部118が記憶する。整形情報1501は表形式の数値データである。 The shaping information storage unit 118 of the database 116 stores the created receipt shaping information, medical examination shaping information, and inquiry shaping information shown in FIGS. The formatting information 1501 is numerical data in a tabular format.
 なお、レセプト整形情報の値は、レセプトの数、すなわち処方回数で集計したが、処方の有無の情報でもよい。すなわち、処方回数が1以上の(処方がある)場合を1として纏め、処方回数が0の(処方がない)場合を0として、2値で表わしてもよい。また、処方回数が重症度を表すと考えて、レセプト整形情報の値は、処方回数を段階に分類した値でもよい。例えば、処方回数が0回の場合を0とし、処方回数が1~4回の場合を1とし、処方回数が5回以上の場合を2とするなど、3段階で表してもよい。 In addition, although the value of the receipt shaping information is tabulated by the number of receipts, that is, the number of prescriptions, it may be information on the presence or absence of prescription. That is, a case where the number of prescriptions is 1 or more (there is a prescription) may be summarized as 1, and a case where the prescription number is 0 (there is no prescription) may be represented as binary. In addition, assuming that the number of prescriptions represents the severity, the value of the reception shaping information may be a value obtained by classifying the number of prescriptions into stages. For example, 0 may be used when the number of prescriptions is 0, 1 when the number of prescriptions is 1 to 4, and 2 when the number of prescriptions is 5 or more.
 前述した例では、1年毎の期間でレセプト情報、健診情報及び問診情報を纏めたが、例えば、2年毎、3年毎など異なる期間でもよい。なお、以下では、期間は1年毎に纏めた場合を例にして説明する。 In the above-described example, the receipt information, the medical examination information, and the inquiry information are collected in a period of one year. However, different periods such as every two years may be used. In the following, the case where the period is summarized every year will be described as an example.
 次に、グラフィカルモデル作成部108について説明する。 Next, the graphical model creation unit 108 will be described.
 グラフィカルモデル作成部108では、整形情報記憶部118に記憶される整形情報の各項目を確率変数とし、確率変数をノード、確率変数間の条件付き依存関係をエッジとして表現したグラフ及び条件付き確率テーブルよりなるモデルを作成する。ただし、エッジは有向、無向の2種類がある。ノードの集合をV、エッジの集合をE、グラフをG=(V、E)とおくことにする。グラフィカルモデル作成部108では、モデルとして、ベイジアンネットワークやマルコフネットワークなどのグラフィカルモデルを作成する。 In the graphical model creation unit 108, a graph and a conditional probability table expressing each item of the shaping information stored in the shaping information storage unit 118 as a random variable, the random variable as a node, and a conditional dependency between the random variables as an edge Create a model consisting of However, there are two types of edges, directed and undirected. Assume that a node set is V, an edge set is E, and a graph is G = (V, E). The graphical model creation unit 108 creates a graphical model such as a Bayesian network or a Markov network as a model.
 以下ではグラフィカルモデルについて、例を挙げて説明する。 The following describes the graphical model with an example.
 図22Aは、2つのノードから成る単純なモデルである。X年経口薬処方回数は、X年の糖尿病の経口薬処方回数を表す確率変数とし、X+n年インスリン処方回数は、X+n年の糖尿病のインスリン処方回数を表す確率変数とする。それぞれの確率変数を表すノードを、v1、v2とおくと、図22Aのグラフは、v1、v2、およびv1からv2への有向エッジe1より成る。V=(v1、v2)、E=(e1)とおくと、図22AのグラフはG=(V、E)となる。 FIG. 22A is a simple model composed of two nodes. The number of X-year oral drug prescriptions is a random variable that represents the number of oral drug prescriptions for diabetes in year X, and the number of X + n-year insulin prescriptions is a random variable that represents the number of times of insulin prescription for diabetes of X + n years. If the nodes representing the respective random variables are v1 and v2, the graph of FIG. 22A is composed of v1, v2, and a directed edge e1 from v1 to v2. If V = (v1, v2) and E = (e1), the graph in FIG. 22A becomes G = (V, E).
 次に条件付確率テーブルについて説明する。ノードv1、v2が表す確率変数を、それぞれx1、x2とおくと、図22Aで示されるグラフGは、x1とx2の同時分布p(x1、x2)がp(x1、x2)=p(x2|x1)p(x1)により与えられることを示している。つまり、x2の確率分布は、x1の値に依存し、x1に関する条件付き確率p(x2|x1)により与えられる。確率変数x1には親ノードがないため、x1の確率分布はp(x1)となる。条件付確率テーブルは、p(x1)とp(x2|x1)の値である。p(x1)の確率テーブルは、x1の各値に対する確率値である。図22Bの2201に例を示した。表2201は、例えば、p(x1=0)=a1はx1=0となる確率がa1であることを示す。これは、モデル生成用のレセプト整形情報の事例(被保険者)のうち、X年に経口薬処方回数が0であった人の割合を計算することにより得ることができる。a2、a3、…、も同様にして計算できる。p(x1)は確率分布であるので、Σp(x1)=1となる。ここで、和はx1の全ての値に対してとる。p(x2|x1)の確率テーブルは、x1、x2の各値に対して、p(x2|x1)を求めることで得られる。例えば、p(x2=s2|x1=s1)は、x1=s1となる事例のうち、x2=s2となっている事例の割合を計算することで得られる。この計算により、確率テーブルが得られる。 Next, the conditional probability table will be described. If the random variables represented by the nodes v1 and v2 are x1 and x2, respectively, the graph G shown in FIG. 22A shows that the simultaneous distribution p (x1, x2) of x1 and x2 is p (x1, x2) = p (x2 | X1) is given by p (x1). That is, the probability distribution of x2 depends on the value of x1, and is given by the conditional probability p (x2 | x1) for x1. Since the probability variable x1 has no parent node, the probability distribution of x1 is p (x1). The conditional probability table is the value of p (x1) and p (x2 | x1). The probability table of p (x1) is a probability value for each value of x1. An example is shown at 2201 in FIG. 22B. Table 2201 shows that, for example, the probability that p (x1 = 0) = a1 is x1 = 0 is a1. This can be obtained by calculating the proportion of cases in which the number of oral drug prescriptions in year X is 0 among the cases (insured persons) of the receipt shaping information for model generation. a2, a3,... can be calculated in the same manner. Since p (x1) is a probability distribution, Σp (x1) = 1. Here, the sum is taken for all values of x1. The probability table of p (x2 | x1) is obtained by calculating p (x2 | x1) for each value of x1 and x2. For example, p (x2 = s2 | x1 = s1) is obtained by calculating the ratio of cases where x2 = s2 among cases where x1 = s1. By this calculation, a probability table is obtained.
 図22A、図22Bの単純な例の場合には、図22Aに示すグラフGと図22Bに示す確率テーブルがグラフィカルモデルとなる。このモデルを用いることにより、例えば、ある被保険者のある年の経口薬処方回数が分かっている場合に、その被保険者がn年後、インスリンを処方される回数の確率分布を求めることができる。例えば、今年、経口薬処方回数が1の場合に、n年後、インスリンを2回処方される確率は、P(x2=2|x1=1)により与えられる。 22A and 22B, the graph G shown in FIG. 22A and the probability table shown in FIG. 22B are graphical models. By using this model, for example, when the number of oral drug prescriptions for a given insured for a certain year is known, the probability distribution of the number of times that the insured is prescribed insulin after n years can be obtained. it can. For example, when the number of oral drug prescriptions is 1 this year, the probability of prescribing insulin twice after n years is given by P (x2 = 2 | x1 = 1).
 次に、図22より確率変数の数を増やした図23の例で説明する。X+n年のインスリン処方回数を予測したいとき、図22では、X年の経口薬処方回数を用いた。しかし、X+n年のインスリンの処方回数は、血糖値が高い人のほうが大きいと予想できる。さらに、年齢にも依存すると予想できる。そこで、例えば、図23のように、X年経口薬処方回数、X年血糖値、X年年齢を用いてX+n年インスリン処方回数を予測したほうが、より正確な予測ができると想定される。 Next, an example of FIG. 23 in which the number of random variables is increased from FIG. 22 will be described. When it is desired to predict the number of insulin prescriptions in X + n years, the number of oral drug prescriptions in year X is used in FIG. However, the number of prescriptions of insulin in X + n years can be expected to be greater for people with higher blood sugar levels. It can also be expected to depend on age. Therefore, for example, as shown in FIG. 23, it is assumed that more accurate prediction can be made by predicting the number of X + n-year insulin prescriptions using the number of X-year oral drug prescriptions, the year X blood glucose level, and the year X age.
 X年経口薬処方回数、X年血糖値、X年年齢、X+n年インスリン処方回数を表す確率変数を、それぞれ、x1、x2、x3、x4、これらをあらわすノードをv1、v2、v3、v4とおく。このグラフのノード集合は、V=(v1、v2、v3、v4)である。また、3つの有向エッジが定義されており、X1からX4、X2からX4、X3からX4への有向エッジを、それぞれe1、e2、e3とおくと、エッジ集合は、E=(e1、e2、e3)となる。グラフはG=(V、E)とあらわされる。このグラフにより、x1、…、x4の同時分布はp(x1、x2、x3、x4)=p(x4|x1、x2、x3)p(x1)p(x2)p(x3)となる。条件付き確率テーブルは、p(x1)、p(x2)、p(x3)、p(x4|x1、x2、x3)をx1、…、x4の各値に対して計算することにより得られる。このモデルにより、X年経口薬処方回数だけでなく、X年血糖値が分かっている場合には、より正確にX+n年インスリン処方回数が予測できる。 Random variables representing the number of X-year oral drug prescriptions, X-year blood glucose levels, X-year ages, and X + n-year insulin prescriptions are x1, x2, x3, x4, respectively, and the nodes representing these are v1, v2, v3, v4 deep. The node set of this graph is V = (v1, v2, v3, v4). Also, three directed edges are defined. If the directed edges from X1 to X4, X2 to X4, and X3 to X4 are set to e1, e2, and e3, respectively, the edge set is E = (e1, e2, e3). The graph is expressed as G = (V, E). From this graph, the simultaneous distribution of x1,..., X4 is p (x1, x2, x3, x4) = p (x4 | x1, x2, x3) p (x1) p (x2) p (x3). The conditional probability table is obtained by calculating p (x1), p (x2), p (x3), and p (x4 | x1, x2, x3) for each value of x1,. With this model, not only the number of X-year oral drug prescriptions but also the X-year blood glucose level is known, the number of X + n-year insulin prescriptions can be predicted more accurately.
 上記で説明した図22や図23のような小規模なモデルの場合には、X+n年インスリン処方回数の確率分布が何に依存しているか、経験や知識に基づいて定義することも可能であるが、規模が大きくなると困難となる。例えば、X+n年インスリン処方回数は、性別など、他の糖尿病関連の医科処方項目や薬剤、問診、健診の何らかの項目に依存する可能性がある。また、経口薬処方回数や血糖値自体も、他の項目に依存する。そのため、レセプト整形情報の項目のように確率変数が大規模になる場合には、その確率的依存関係(エッジ)をデータに基づいて自動的に作成してもよい。また、作成の際に、経験や知識に基づく依存関係によりエッジの有無や有向、無向を制限してもよい。既存技術に、ベイジアンネットワークの構造学習技術などを用いることができる。 In the case of a small model as shown in FIG. 22 or FIG. 23 described above, it is possible to define what depends on the probability distribution of the number of X + n-year insulin prescriptions based on experience and knowledge. However, it becomes difficult as the scale increases. For example, the number of X + n-year insulin prescriptions may depend on other diabetes-related medical prescription items such as sex, drugs, medical examinations, and some items of medical examination. In addition, the number of oral drug prescriptions and the blood glucose level itself depend on other items. Therefore, when a random variable becomes large like the item of the receipt shaping information, the stochastic dependency (edge) may be automatically created based on the data. Further, at the time of creation, the presence / absence of an edge, directed / undirected may be limited by a dependency based on experience and knowledge. A Bayesian network structure learning technique or the like can be used as an existing technique.
 グラフィカルモデルを、例えば、3年後の発症確率の予測に利用する場合には、X年とX+3年のレセプト整形情報の項目を確率変数としたグラフィカルモデルを作成すればよい。これらは過去のデータから作成され、例えば、2008年と2011年、2009年と2012年のデータを用いるなどとする。このとき、同一被保険者のデータであっても、2008年と2011年のデータと、2009年と2012年のデータは、別の事例として、学習に利用できる。 If the graphical model is used, for example, for predicting the probability of onset after 3 years, a graphical model may be created using the items in the receipt shaping information for year X and year X + 3 as random variables. These are created from past data. For example, data of 2008 and 2011, 2009 and 2012 are used. At this time, even if the data is for the same insured, the data for 2008 and 2011 and the data for 2009 and 2012 can be used for learning as different cases.
 ここで、グラフィカルモデルの構成例を、図21Aを用いて説明する。 Here, a configuration example of the graphical model will be described with reference to FIG. 21A.
 図21のグラフィカルモデルは、X年の項目とX+n年の項目より成る。項目間のエッジには3種類あり、同一年の項目同士の間のエッジと、X年とX+N年の同一項目間のエッジと、X年の項目とX+N年の項目の同一項目ではない項目間のエッジがある。最初の同一年の項目同士の間のエッジは実線の矢印で、残りのX年とX+N年の項目間のエッジは点線の矢印で示した。なお、図21には示していないが、年齢、性別、職種などの基本情報を示す項目も存在する。これらは、X年、X+N年ごとに存在するのではなく、全体で一つの項目となる。そのため、X年、X+N年のいずれの項目ともエッジをもつ可能性がある。これらの項目は、モデル全体に及ぼす影響が大きいため、年齢ごと、性別ごと、職種ごと、など、異なるモデルを作成してもよい。図では有向辺として矢印で示したが、無向辺でもよい。 The graphical model in FIG. 21 is composed of an item for year X and an item for year X + n. There are three types of edge between items, the edge between items in the same year, the edge between the same items in X and X + N years, and between items that are not the same item in the X year and X + N years items There are no edges. The edges between the items of the first same year are indicated by solid arrows, and the edges between the remaining items of X years and X + N years are indicated by dotted arrows. Although not shown in FIG. 21, there are items indicating basic information such as age, sex, and occupation. These do not exist every X years and X + N years, but become one item as a whole. Therefore, there is a possibility that both items of year X and year X + N have an edge. Since these items have a large influence on the entire model, different models such as age, sex, and occupation may be created. In the figure, the directed side is indicated by an arrow, but it may be an undirected side.
 上記で説明した3種類のエッジについて説明する。 The three types of edges described above will be described.
 まず、実線で示した同一年の項目間のエッジについて説明する。同一年の項目間のエッジでは、同一年の項目間の確率的依存性を示す。たとえば、コレステロール値が高い場合にはBMI値も高い傾向がある、などである。問診、健診、レセプトの同一年の項目間の確率的依存性は、検査方法などが大きく変化しなければ、概ね、どの年も同一であるため、同一年の項目間のエッジ構造は、X年でもX+n年でも変わらない。すなわち、実線で示されるエッジ構造は、X年ノード群とX+n年ノード群で同一である。この構造は、同一年の項目のデータに基づいて、ベイジアンネットワークやマルコフネットワークの構造学習法によって学習してもよい。 First, the edge between items of the same year indicated by a solid line will be described. At the edge between items of the same year, the stochastic dependence between items of the same year is shown. For example, when the cholesterol level is high, the BMI value tends to be high. The probabilistic dependence between items in the same year of interviews, medical examinations, and receipts is generally the same in all years unless the inspection method or the like changes significantly. Therefore, the edge structure between items in the same year is X It does not change in year or X + n year. That is, the edge structure indicated by the solid line is the same in the X year node group and the X + n year node group. This structure may be learned by a structure learning method of a Bayesian network or a Markov network based on data of items of the same year.
 次に、X年とX+N年の同一項目間のエッジについて説明する。これは、例えば、図に示したように、レセプト項目であるX年の糖尿病経口薬処方有無からX+N年の糖尿病経口薬処方有無へのエッジである。これは、経年での状態の遷移を表すエッジであり、X+N年の糖尿病経口薬処方の有無の予測に、X年の糖尿病経口薬処方の有無の状態を用いることを示している。たとえば、X年に糖尿病経口薬の処方を受けた人はX+N年にも糖尿病経口薬の処方を受ける可能性が高い。将来の各項目の状態は、現在の各項目の状態に依存していると考えられるため、このエッジは、すべてのX年とX+N年の同一項目間に定義する。 Next, the edge between the same items in year X and year X + N will be described. For example, as shown in the figure, this is an edge from the presence / absence of prescription diabetes oral medicine in year X, which is a receipt item, to the presence / absence of prescription diabetes oral medicine in year X + N. This is an edge representing the transition of the state over time, and indicates that the state of presence / absence of oral diabetes prescription in year X is used for prediction of the presence / absence of oral diabetes prescription in year X + N. For example, a person who received a prescription for oral diabetes in year X is likely to receive a prescription for oral diabetes in year X + N. Since the future state of each item is considered to depend on the current state of each item, this edge is defined between the same items in all X years and X + N years.
 次に、X年とX+N年の同一項目間以外のエッジについて説明する。これは、上記のX年とX+N年の同一項目間の状態遷移に影響を及ぼす因果を示している。たとえば、X年に糖尿病経口薬の処方がない人が、X+N年に糖尿病の経口薬の処方を受ける確率は、その人のX年の血糖値の値が高いほど、高いと想定される。そのため、X+N年の糖尿病経口薬処方の有無をより正確に予測するために、X年の血糖値の情報が有効であると想定される。このように、これらのエッジは、X年からX+N年へのある項目の状態遷移が、他のX年の項目の状態に確率的に依存していることを示している。これらのエッジは、確率的依存性が一定以上となるようなX年とX+N年の同一でない項目間に定義される。たとえば、単純な方法では、相関係数を計算し、ある閾値以上の項目間に定義してもよい。 Next, the edges other than the same items in year X and year X + N will be described. This shows the cause and effect of affecting the state transition between the same items in the above year X and year X + N. For example, it is assumed that the probability that a person who has no prescription for oral diabetes in year X will receive a prescription for oral diabetes in year X + N is higher as the blood glucose level in year X is higher. Therefore, in order to more accurately predict the presence or absence of a diabetic oral drug prescription in year X + N, it is assumed that information on blood glucose level in year X is effective. Thus, these edges indicate that the state transition of an item from year X to year X + N is probabilistically dependent on the state of other items in year X. These edges are defined between non-identical items of year X and year X + N, where the stochastic dependence is above a certain level. For example, in a simple method, a correlation coefficient may be calculated and defined between items above a certain threshold.
 以上により、作成したグラフと確率テーブルは、グラフィカルモデル記憶部117に記憶しておく。 Thus, the created graph and probability table are stored in the graphical model storage unit 117.
 次に、発症確率予測部109について説明する。発症確率109では、グラフィカルモデル記憶部117に記憶されたモデルを用いて、将来の項目の発症確率を予測する。グラフィカルモデルでは、一部の確率変数に既知の値が与えられたときの未知の確率変数の確率分布を求めることができる。例えば、今年の健診、問診、レセプトのデータが与えられたとき、X年の確率変数の値を既知として、残りのX+n年の確率変数の確率分布を求めることができる。これにより、例えば、X+n年の医科、薬剤の処方の確率分布を求めることにより、ある病気の発症確率が計算できる。このような確率推論には、Junction Tree Algorithmなどを用いることができる。これにより、各被保険者の今年のデータに基づいて、n年後の発症確率を予測できる。また、データに医療行為毎の医療費情報が含まれる場合、同様の方法を用いることで、X+n年の各医療行為別の医療費の確率分布および期待値も予測できる。 Next, the onset probability prediction unit 109 will be described. In the onset probability 109, the onset probability of a future item is predicted using the model stored in the graphical model storage unit 117. In the graphical model, a probability distribution of an unknown random variable when a known value is given to some random variables can be obtained. For example, given this year's health checkup, medical inquiry, and receipt data, it is possible to obtain the probability distribution of the remaining X + n-year random variables with the value of the random variable of X-year known. Thereby, for example, the probability of occurrence of a certain disease can be calculated by obtaining the probability distribution of the medical prescription of X + n years and the prescription of the medicine. For such probability reasoning, Junction Tree Algorithm can be used. Thereby, the onset probability after n years can be predicted based on this year's data of each insured. Further, when the medical cost information for each medical practice is included in the data, the probability distribution and expected value of the medical expenses for each medical practice in X + n years can be predicted by using the same method.
 図21Aで示した図の例を用いて、発症確率予測の例について説明する。まず、今年分の健診、問診、レセプトのデータが得られた場合、図21AのX年ノード群にそのデータを観測データとして設定する。このとき、未知項目があってもよい。たとえば、未検査項目や問診等の未回答項目などは未知となる。まず、実線で示したX年ノード間のエッジに基づいて、観測データから、未知項目の状態を確率推論する。これにより今年の未知項目の各状態の推定確率が得られる。 An example of onset probability prediction will be described using the example of the diagram shown in FIG. 21A. First, when the health checkup, inquiry, and receipt data for this year are obtained, the data is set as observation data in the year X node group in FIG. 21A. At this time, there may be an unknown item. For example, unexamined items and unanswered items such as interviews are unknown. First, the state of an unknown item is probabilistically inferred from the observation data based on the edge between X year nodes indicated by a solid line. This gives an estimated probability of each state of unknown items this year.
 次に、N年後の各項目の状態の確率を点線で示されたエッジに基づいて確率推論する。これにより、N年後の各項目の各状態の推定確率が得られる。各項目の期待値を計算することにより、N年後の検査値などの予測値が得られる。
次に、2N年後の状態を予測したいとする。この場合には、現在とN年後の層と同一の構造をN年後と2N年後の層にも用いる。すなわち、図21BのN年後と2N年後の層は、図21AのX年とX+N年の層の構造と同一である。そして、N年後の各項目の各状態の推定確率に基づいて、2N年後の各項目の各状態の推定確率を計算する。これにより2N年後の状態が予測できる。これを繰り返すことにより、3N年後、4N年後、のように将来の状態が予測できる。
Next, the probability of the state of each item after N years is inferred based on the edge indicated by the dotted line. Thereby, the estimated probability of each state of each item after N years is obtained. By calculating the expected value of each item, a predicted value such as a test value after N years can be obtained.
Next, suppose we want to predict the state after 2N years. In this case, the same structure as that of the current layer and the layer after N years is used for the layer after N years and 2N years. That is, the layers after N years and 2N years in FIG. 21B are the same as the layers of years X and X + N years in FIG. 21A. Then, based on the estimated probability of each state of each item after N years, the estimated probability of each state of each item after 2N years is calculated. As a result, the state after 2N years can be predicted. By repeating this, the future state can be predicted as in 3N years and 4N years later.
 また、図1および図2に含まれる構成のうち、本実施例で説明していない構成については、以後の実施例で明らかにされる。 Of the configurations included in FIGS. 1 and 2, configurations not described in the present embodiment will be clarified in the following embodiments.
 第1の実施例では、レセプト情報、健診情報、問診情報などを含むヘルスケアデータに基づいて、課題となる疾病を抽出するヘルスケアデータ分析装置の例を説明した。一方、健康保険事業者は、将来の課題となる疾病に加えて、その疾病の発症を低減するため、疾病発症の要因を把握したいと考えている。しかし、ヘルスケアデータは膨大かつデータ間の関係性が複雑であり、課題疾病を把握できたとしても、その要因を把握することは容易ではなかった。 In the first embodiment, an example of a health care data analysis apparatus that extracts a disease to be a problem based on health care data including receipt information, medical examination information, inquiry information, and the like has been described. Health insurance providers, on the other hand, want to grasp the causes of disease onset in order to reduce the onset of the disease in addition to the disease that will be a future issue. However, the amount of health care data is enormous and the relationship between the data is complex. Even if the problem disease can be grasped, it is not easy to grasp the cause.
 第2の実施例では、課題となる疾病に加えて、その課題となる疾病の要因を抽出し、さらに課題疾病と要因との関連をグラフ形式で可視化するヘルスケアデータ分析装置の例を説明する。 In the second embodiment, an example of a health care data analysis device that extracts factors of a disease to be a problem in addition to the disease to be a problem and visualizes the relationship between the disease and the factor in a graph format will be described. .
 構成と処理などは、要因抽出部111、可視化部112、要因記憶部122を除き、実施例1と同様であるため、処理を省略する。 Since the configuration and processing are the same as those in the first embodiment except for the factor extraction unit 111, the visualization unit 112, and the factor storage unit 122, the processing is omitted.
 第2の実施例のヘルスケアデータ分析システムの要因抽出部111は、課題疾病毎の要因を抽出する。可視化部112は、課題疾病と要因の情報を付加したグラフ構造を作成し、可視化する。 The factor extraction unit 111 of the health care data analysis system of the second embodiment extracts factors for each problem disease. The visualization unit 112 creates and visualizes a graph structure to which information on the problem disease and the factor is added.
 まず、要因抽出部111について説明する。 First, the factor extraction unit 111 will be described.
 要因抽出部111では、課題疾病記憶部121に記憶された課題疾病の要因となる項目を抽出する機能を提供する。ここでは、健康保険事業者が、被保険者群のデータから抽出した課題疾病の一つから、発症に影響を与える要因となる検査値や生活習慣を抽出するための要因抽出機能について説明する。 The factor extraction unit 111 provides a function of extracting items that cause the problem disease stored in the problem disease storage unit 121. Here, a factor extraction function for a health insurance company to extract test values and lifestyle habits that affect the onset from one of the subject diseases extracted from the data of the insured group will be described.
 図4は要因抽出機能の処理のフローチャートである。 FIG. 4 is a flowchart of the factor extraction function process.
 課題疾病選択ステップ401では、課題疾病記憶部121に記憶された課題疾病から一つの項目を課題疾病項目として選択する。 In the target disease selection step 401, one item is selected as a target disease item from the target diseases stored in the target disease storage unit 121.
 要因候補絞込みステップ402では、グラフィカルモデル記憶部118に記憶されたグラフィカルモデルが有する項目のうち、要因の候補とする要因候補項目を選択する。例えば、性別などの対象者毎に不変の項目、年齢などのデータ取得時期に強く依存する項目などは、対象者毎に一意、もしくはデータ取得時期に依存して確実に変化する項目であり、保健指導の介入により影響を与えられる見込みはない。そのため、例えば、保健指導の介入により改善できる項目のみを要因として抽出する場合には、これらの項目を要因候補項目から除外しても良い。 In the factor candidate narrowing step 402, factor candidate items to be factor candidates are selected from the items of the graphical model stored in the graphical model storage unit 118. For example, items that do not change for each target person such as gender, items that depend strongly on the data acquisition time such as age, etc. are items that are unique for each target person or that change reliably depending on the data acquisition time. There is no prospect of being affected by guidance interventions. Therefore, for example, when extracting only items that can be improved by intervention of health guidance as factors, these items may be excluded from the factor candidate items.
 項目間依存度算出ステップ403では、項目間の依存度を算出する。依存度は、項目間の類似度や関連度を表すものであり、依存度が高いほど大きな値を取る。ノードviとノードvjの間の依存度をs(i、j)とおく。 In the inter-item dependency calculation step 403, the inter-item dependency is calculated. The degree of dependence represents the degree of similarity or relevance between items, and takes a larger value as the degree of dependence is higher. The dependency between the node vi and the node vj is s (i, j).
 ノード間依存度の1つ目の例を挙げる。エッジがあるノード間の依存度は1、それ以外のエッジが無いノード間の依存度を0とする。 Give a first example of inter-node dependency. The dependence between nodes with edges is 1 and the dependence between nodes with no other edges is 0.
 ノード間依存度の2つ目の例を挙げる。2つのノードが表現する2つの確率変数間の相互情報量を依存度とする。確率変数X、確率変数Yの相互情報量I(X、Y)は、XとYの同時確率分布をp(x、y)、XとYの周辺確率分布をp(x)、p(y)とおくと、I(X、Y)=ΣΣp(x、y)log(p(x、y)/p(x)p(y))により与えられる。ここで、和は、全てのX、Yの値に対してとる。相互情報量を計算する場合には、予め全てのノードの組に対する同時確率分布p(x、y)と、全てのノードに対する周辺確率分布p(x)を計算しておいて、記憶装置に保存しておいてもよい。また、エッジが無いノード間の依存度は相互情報量に関わらず0としてもよい。 Give a second example of inter-node dependency. A mutual information amount between two random variables expressed by two nodes is defined as a dependency. The mutual information I (X, Y) of the random variable X and the random variable Y is p (x, y) for the simultaneous probability distribution of X and Y, p (x), p (y) for the peripheral probability distribution of X and Y. ), I (X, Y) = ΣΣp (x, y) log (p (x, y) / p (x) p (y)). Here, the sum is taken for all X and Y values. When calculating the mutual information amount, the joint probability distribution p (x, y) for all node pairs and the peripheral probability distribution p (x) for all nodes are calculated in advance and stored in the storage device. You may keep it. Further, the degree of dependence between nodes having no edge may be 0 regardless of the mutual information amount.
 ノード間依存度の3つ目の例を挙げる。2つのノードが表現する2つの確率変数をX1、X2とおく。いま、二つの確率変数X1、X2の依存度を計算する。レセプト整形情報に基づいて、X1、X2の事例を並べたベクトルとして、それぞれx1=(x11、x12、…、x1n)、x2=(x21、x22、…、x2n)を計算する。この例では、x1とx2とをベクトルと考えたときの相関係数に基づいて依存度を算出する。 Give a third example of inter-node dependency. Two random variables represented by two nodes are set as X1 and X2. Now, the dependence of the two random variables X1 and X2 is calculated. Based on the receipt shaping information, x1 = (x11, x12,..., X1n) and x2 = (x21, x22,..., X2n) are calculated as vectors in which X1 and X2 cases are arranged. In this example, the dependence is calculated based on the correlation coefficient when x1 and x2 are considered as vectors.
 ここで、ベクトルx1とx2との相関係数をr(x1、x2)とする。ところが、x1、x2の要素には欠損値があるため、x1、x2のいずれかで欠損値がある要素を取り除く。例えば、x1iが欠損している場合には、x2iを取り除く。このようにして、x1、x2から欠損次元を取り除いたベクトルを改めてv1=(v11、v12、…、v1m)、v2=(v21、v22、…、v2m)とする。 Here, the correlation coefficient between the vectors x1 and x2 is r (x1, x2). However, since elements x1 and x2 have missing values, elements having missing values in either x1 or x2 are removed. For example, when x1i is missing, x2i is removed. In this way, the vector obtained by removing the missing dimension from x1 and x2 is again set as v1 = (v11, v12,..., V1m) and v2 = (v21, v22,..., V2m).
 相関係数r(v1、v2)の値は、v1、v2の値の性質の違いによって、同程度の依存性を有しているとしても、その値にずれが生じる。従って、まず、v1、v2の要素を、独立にランダムに並べなおしたベクトルw1、w2には依存度がないことが想定できる。これを用いて、|r(v1、v2)|-|r(w1、w2)|を計算する。|r(v1、v2)|<|r(w1、w2)|である場合、依存度はないと判断できる。このため、この場合の依存度を0とし、それ以外の場合の依存度を|r(v1、v2)|-|r(w1、w2)|とする。これによって、ランダムな場合(依存性が無い場合)と比較した依存度を計算することができる。 The value of the correlation coefficient r (v1, v2) is shifted depending on the property of the value of v1, v2, even if it has the same degree of dependence. Therefore, first, it can be assumed that the vectors w1 and w2 in which the elements of v1 and v2 are rearranged independently and randomly are not dependent. Using this, | r (v1, v2) |-| r (w1, w2) | is calculated. If | r (v1, v2) | <| r (w1, w2) |, it can be determined that there is no dependency. Therefore, the dependence in this case is 0, and the dependence in other cases is | r (v1, v2) |-| r (w1, w2) |. This makes it possible to calculate the degree of dependence compared to a random case (when there is no dependence).
 ノード間依存度の4つ目の例を挙げる。2つのノードが表現する2つの確率変数をX1、X2とおく。いま、二つの確率変数X1、X2の依存度を計算する。レセプト整形情報に基づいて、X1、X2の事例を並べたベクトルとして、それぞれx1=(x11、x12、…、x1n)、x2=(x21、x22、…、x2n)を計算する。この例では、x1とx2とのエントロピーに基づいて依存度を算出する。 Give a fourth example of inter-node dependency. Two random variables represented by two nodes are set as X1 and X2. Now, the dependence of the two random variables X1 and X2 is calculated. Based on the receipt shaping information, x1 = (x11, x12,..., X1n) and x2 = (x21, x22,..., X2n) are calculated as vectors in which X1 and X2 cases are arranged. In this example, the dependence is calculated based on the entropy between x1 and x2.
 まず、定量依存度の場合と同様に、欠損値を取り除いたベクトルをv1、v2とする。次に、ベクトルv1、v2の要素対の集合をS={(v1i、v2i)}(iは1からmの整数値)とする。Sの要素数はm個である。Sの要素p=(p1、p2)に対して、pと等しいSの要素の個数をnpとする。また、Sの異なる要素の数をLとする。このとき、Lで正規化したv1、v2の対のエントロピーを下式で表す。
(数1) 
e(v1、v2)=Σ[(-np/m)log(-np/m)]/L
 ここで、ΣはSの全ての要素pの和である。3つ目の依存度の例の場合と同様に、ランダム化したw1、w2についても、e(w1、w2)を計算する。e(v1、v2)は、正の値とし、v1、v2の共起度が大きいほど、小さい値となる。そのため、ランダムな場合で正規化したe(v1、v2)/e(w1、w2)が1より大きい場合、v1とv2とには依存関係がないと判断できる。また、e(v1、v2)/e(w1、w2)は0以上の値である。そこで、e(v1、v2)/e(w1、w2)が1より大きい場合の依存度を0とし、それ以外の場合の依存度を1-e(v1、v2)/e(w1、w2)とする。
First, as in the case of the quantitative dependency, vectors from which missing values are removed are denoted by v1 and v2. Next, a set of element pairs of vectors v1 and v2 is S = {(v1i, v2i)} (i is an integer value from 1 to m). The number of elements of S is m. For S element p = (p1, p2), the number of S elements equal to p is np. Also, let L be the number of elements with different S. At this time, the entropy of a pair of v1 and v2 normalized by L is expressed by the following equation.
(Equation 1)
e (v1, v2) = Σ [(− np / m) log (−np / m)] / L
Here, Σ is the sum of all elements p of S. As in the case of the third dependency example, e (w1, w2) is calculated for randomized w1, w2. e (v1, v2) is a positive value, and becomes smaller as the co-occurrence degree of v1, v2 is larger. Therefore, when e (v1, v2) / e (w1, w2) normalized in a random case is larger than 1, it can be determined that there is no dependency between v1 and v2. Further, e (v1, v2) / e (w1, w2) is a value of 0 or more. Therefore, the dependence when e (v1, v2) / e (w1, w2) is greater than 1 is set to 0, and the dependence in other cases is 1-e (v1, v2) / e (w1, w2). And
 以上により、ノード間の依存度が与えられる。 Thus, the dependency between nodes is given.
 課題要因抽出ステップ404では、要因候補絞込みステップ402で選択した要因候補項目の中で、課題疾病項目選択ステップ401で選択した課題疾病項目との依存度と予め設定された閾値を比較し、閾値以上の依存度を有する項目を要因として抽出する。 In the task factor extraction step 404, the degree of dependence with the task disease item selected in the task disease item selection step 401 among the factor candidate items selected in the factor candidate narrowing step 402 is compared with a preset threshold value, and the threshold value is exceeded. The items having the dependency are extracted as factors.
 このとき、課題疾病項目と要因候補項目との間に存在するエッジの属性を考慮して要因を抽出しても良い。1つ目の例としては、課題疾病項目と要因項目との間にエッジが存在するか否かによって、要因項目を抽出するか否か判定しても良い。例えば、課題疾病項目と要因項目との間にエッジが存在しない場合、課題疾病項目と要因項目との依存度に関わらず、要因項目を要因項目抽出対象から除外しても良い。2つ目の例としては、課題疾病項目と要因候補項目との間に有向辺が存在する場合、その向きに応じて、要因項目を要因項目抽出対象とするか否かを判定して良い。 At this time, the factor may be extracted in consideration of the attribute of the edge existing between the problem disease item and the factor candidate item. As a first example, whether or not to extract a factor item may be determined based on whether or not an edge exists between the problem disease item and the factor item. For example, when there is no edge between the problem illness item and the factor item, the factor item may be excluded from the factor item extraction target regardless of the degree of dependency between the problem illness item and the factor item. As a second example, when there is a directed side between the problem disease item and the factor candidate item, it may be determined whether or not the factor item is a factor item extraction target according to the direction. .
 また、課題疾病項目と要因候補項目がそれぞれ所定期間毎に纏められた項目であるとき、それらの所定期間を考慮して要因を抽出しても良い。例えば、課題疾病項目がX年のデータに基づく確率変数ノードであり、要因候補項目がそれ以前のX-k年(kは所定の自然数)のデータに基づく確率変数ノードであるときのみ、要因候補項目を抽出対象としても良い。 In addition, when the problem disease item and the factor candidate item are items each grouped for each predetermined period, the factor may be extracted in consideration of the predetermined period. For example, only when the problem disease item is a random variable node based on data of year X and the candidate factor item is a random variable node based on data of previous Xk years (k is a predetermined natural number), the candidate factor Items may be extracted.
 以下、要因登録ステップ405、ステップ406、要因抽出ステップ414の3つの処理からなるサイクルは、課題要因抽出ステップ404で抽出した要因との依存度の高い項目を新たな要因として抽出し、要因項目として登録する処理と、登録されている要因項目との依存度の高い項目を新たな要因として抽出し、要因項目として登録する処理の両方を含む、処理サイクルである。本サイクルは、直接的な要因と間接的な要因の両方の要因の抽出を目的とする。以下、具体的な処理を説明する。 Hereinafter, the cycle consisting of the three processes of factor registration step 405, step 406, and factor extraction step 414 extracts items that are highly dependent on the factors extracted in task factor extraction step 404 as new factors. This is a processing cycle including both a process to be registered and a process to extract an item having a high dependency on the registered factor item as a new factor and register it as a factor item. The purpose of this cycle is to extract both direct and indirect factors. Specific processing will be described below.
 要因項目登録ステップ405では、要因項目抽出ステップ404で抽出した要因項目を課題疾病選択ステップ401で選択した課題疾病の要因として登録する。また後述する要因抽出ステップ414で抽出した要因項目を課題疾病の要因として登録する。 In the factor item registration step 405, the factor item extracted in the factor item extraction step 404 is registered as the factor of the subject disease selected in the subject disease selection step 401. Further, the factor item extracted in the factor extraction step 414 described later is registered as a factor of the subject disease.
 ステップ406では、要因項目登録ステップ405で登録された要因項目の中で、更なる要因が存在するか評価されていない要因項目があるか否かを判定する。評価されていない要因項目が存在する場合、要因抽出ステップ414に進む。評価されていない要因項目が存在しない場合、要因DB登録ステップ407に進む。 In step 406, it is determined whether or not there is a factor item that has been evaluated whether there is a further factor among the factor items registered in factor item registration step 405. If there is a factor item that has not been evaluated, the process proceeds to a factor extraction step 414. If there is no factor item that has not been evaluated, the process proceeds to a factor DB registration step 407.
 要因抽出ステップ414では、要因登録ステップ405で登録された要因の中で未評価の要因項目をひとつ選択し、その項目の要因を抽出する。抽出方法は課題要因抽出ステップ404と同様であり、課題要因抽出ステップ404の処理説明文において、課題疾病項目選択ステップ401で選択した課題疾病項目、および、課題疾病項目、を、要因登録ステップ405で登録された要因の中で未評価の要因項目、として読み替えたものと等しい。 In the factor extraction step 414, one unrated factor item is selected from the factors registered in the factor registration step 405, and the factor of the item is extracted. The extraction method is the same as that of the task factor extraction step 404. In the processing explanation of the task factor extraction step 404, the problem disease item selected in the problem disease item selection step 401 and the problem disease item are registered in the factor registration step 405. It is equivalent to the factor that has been read as an unevaluated factor item among the registered factors.
 なお、課題要因抽出ステップ404と要因抽出ステップ414で用いる依存度の計算方法および依存度に設定する閾値は異なるものを用いてもよい。さらに、要因登録ステップ405、ステップ406、要因抽出ステップ414の処理サイクルのサイクル毎に、要因抽出ステップ414の依存度の計算方法および依存度に設定する閾値を変化させても良い。例えば、処理のサイクル回数に関連付けて閾値を変化させても良い。
要因DB登録ステップ407は課題疾病選択ステップ401で選択した課題疾病と、要因登録ステップ405で登録された要因を、要因記憶部122に記憶する。
Note that the dependency factor calculation method used in the task factor extraction step 404 and the factor extraction step 414 and the threshold value set for the dependency factor may be different. Further, for each processing cycle of the factor registration step 405, step 406, and factor extraction step 414, the dependency calculation method of the factor extraction step 414 and the threshold set for the dependency may be changed. For example, the threshold value may be changed in association with the number of processing cycles.
In the factor DB registration step 407, the target disease selected in the target disease selection step 401 and the factor registered in the factor registration step 405 are stored in the factor storage unit 122.
 次に、可視化部112について説明する。 Next, the visualization unit 112 will be described.
 可視化部112では、グラフィカルモデル記憶部118に記憶されるグラフィカルモデルG=(V、E)の構造に、課題疾病記憶部127に記憶される課題疾病と、要因記憶部112に記憶される要因の情報を付加して可視化する。 In the visualization unit 112, the structure of the graphical model G = (V, E) stored in the graphical model storage unit 118, the problem disease stored in the problem disease storage unit 127, and the factor stored in the factor storage unit 112 are displayed. Visualize by adding information.
 グラフの可視化では、ノードVを2次元、または3次元空間上に配置して示す。ノードは○などの適当な図形により表示する。このとき図形内部もしくは図形周辺に、ノードの項目を表す文字列を表示しても良い。エッジEは、ノード間を直線や曲線で結び、有向エッジは矢印などで表わす。なお、エッジは表示しなくてもよいし、エッジを表示する場合でも、有向エッジと無向の区別をせず、矢印なしでも良い。また、エッジを表す図形の内部もしくは図形周辺に、エッジで結ばれた2つのノードVの依存度や関係性などの2ノード間に定義された情報を文字列として表示しても良い。さらに、整形情報がそれぞれ所定期間毎に纏められた項目であるとき、エッジで結ばれた2つのノードVが属する所定期間を考慮して、エッジの表示方法を変更しても良い。例えば、エッジで結ばれた2つのノードをViとVjとするとき、ViとVjが両方とも同じ所定期間に得られたデータに基づく確率変数を表すノードであるときは、エッジを実線で表現し、ViとVjがそれぞれ異なる所定期間に得られたデータに基づく確率変数を表すノードであるときは、エッジを点線で表現しても良い。エッジの表示方法の変更は、例えばエッジの色の違いや、太さの違いや、直線や曲線の違いとして表現しても良い。 In the visualization of the graph, the node V is shown in a two-dimensional or three-dimensional space. The node is displayed with an appropriate figure such as ○. At this time, a character string representing a node item may be displayed inside or around the figure. The edge E connects the nodes with straight lines or curves, and the directed edge is represented with an arrow or the like. Note that the edge does not have to be displayed, and even when the edge is displayed, it is not necessary to distinguish the directed edge from the undirected, and there may be no arrow. In addition, information defined between two nodes such as dependency and relationship between two nodes V connected by the edge may be displayed as a character string inside or around the graphic representing the edge. Further, when the shaping information is an item collected for each predetermined period, the edge display method may be changed in consideration of the predetermined period to which the two nodes V connected by the edge belong. For example, when two nodes connected by an edge are Vi and Vj, when both Vi and Vj are nodes representing random variables based on data obtained in the same predetermined period, the edge is represented by a solid line. When Vi and Vj are nodes representing random variables based on data obtained in different predetermined periods, the edges may be represented by dotted lines. The change of the edge display method may be expressed as, for example, a difference in edge color, a difference in thickness, or a difference in straight lines or curves.
 ノードの配置方法としては、特にこれを限定しない。例えば、ノード間のエッジの有無に基づいて、エッジで結ばれているノード同士がお互いに近くに配置されるように座標を決定する、一般に広く知られる方法を用いても良いし、2ノード間に、ノード間の依存度などの指標によって定義された引力または斥力またはその両方の力を定義し、グラフに含まれる全ノードもしくは一部ノード間の力が最小である安定状態となるよう座標を決定する力指向アルゴリズムを用いてもよい。 This is not particularly limited as a node placement method. For example, based on the presence / absence of an edge between nodes, a generally well-known method may be used in which coordinates are determined so that nodes connected by edges are arranged close to each other, or between two nodes Define the attractive force and / or repulsive force defined by an index such as dependency between nodes, and set the coordinates so that the force between all nodes or some nodes included in the graph is at a minimum. A force-oriented algorithm for determining may be used.
 課題疾病の可視化では、課題疾病を表す項目に対応するノードVを課題疾病でないノードと異なる表示方法で表示する。例えば、課題疾病でない項目を表すノード群と異なる色、形、大きさ、などで表示する。また、ノードを表す図形そのものではなく、項目を表す文字列の表示方法を同様に変更しても良いし、グラフ構造上に枠線などの図形を追加して課題疾病項目であることを表現しても良い。さらに、グラフ構造とは異なる表示領域に、表形式などで表現される文字列として、課題疾病一覧などの課題疾病に関する情報を表示しても良い。 In the visualization of the target disease, the node V corresponding to the item representing the target disease is displayed by a different display method from the node that is not the target disease. For example, it is displayed in a color, shape, size, etc. different from the node group representing the item that is not the subject disease. In addition, the display method of the character string representing the item instead of the graphic representing the node itself may be changed in the same manner, and a graphic such as a frame line is added to the graph structure to express that it is a problem disease item. May be. Furthermore, information regarding problem diseases such as a problem disease list may be displayed as a character string expressed in a table format or the like in a display area different from the graph structure.
 要因の可視化では、要因を表す項目に対応するノードVを要因でないノードと異なる表示方法で表示する。例えば、要因でない項目を表すノード群と異なる色、形、大きさ、などで表示する。また、ノードを表す図形そのものではなく、項目を表す文字列の表示方法を同様に変更しても良いし、グラフ構造上に枠線などの図形を追加して要因項目であることを表現しても良い。さらに、グラフ構造とは異なる表示領域に、表形式などで表現される文字列として、要因一覧などの要因に関する情報を表示しても良い。 In the visualization of the factor, the node V corresponding to the item representing the factor is displayed in a different display method from the non-factory node. For example, it is displayed in a different color, shape, size, etc. from a node group representing items that are not factors. In addition, the display method of the character string representing the item instead of the graphic representing the node itself may be changed in the same way, or a figure such as a frame line may be added to the graph structure to express that it is a factor item. Also good. Furthermore, information regarding factors such as a factor list may be displayed as a character string expressed in a table format or the like in a display area different from the graph structure.
 課題疾病及び要因の可視化では、エッジで結ばれた2つのノードVが課題疾病及び要因に含まれるか否かによって、エッジの表示方法を変更しても良い。例えば、課題疾病と要因の関係性を強調するために、エッジのうち、エッジの両端のノードVの両方が、課題疾病もしくは要因に含まれるノードである場合、エッジを太く表示し、それ以外の場合、エッジを細く表示しても良い。エッジの表示方法の変更は、例えばエッジの色の違いや、直線や曲線の違いや、実線と点線の違いとして表現しても良い。 In the visualization of the problem disease and the factor, the edge display method may be changed depending on whether or not the two nodes V connected by the edge are included in the problem disease and the factor. For example, in order to emphasize the relationship between a problem illness and a factor, when both of the edges of the node V are nodes included in the problem illness or the factor, the edge is displayed thickly, In this case, the edge may be displayed thinly. The change in the edge display method may be expressed as, for example, a difference in edge color, a difference in straight line or curve, or a difference in solid line and dotted line.
 なお、本実施例で可視化の対象とするグラフ構造は、グラフィカルモデル記憶部118に記憶されたグラフィカルモデルG=(V、E)の部分グラフ構造であっても良い。例えば、グラフィカルモデルの内、疾病に関するノードと、疾病に関するノード間に存在するエッジから構成される部分グラフ構造を可視化の対象としても良い。 The graph structure to be visualized in the present embodiment may be a partial graph structure of the graphical model G = (V, E) stored in the graphical model storage unit 118. For example, in the graphical model, a subgraph structure including nodes related to diseases and edges existing between nodes related to diseases may be visualized.
 図18は本実施例を実現する形態の一例を示す、ユーザインターフェイスの画面例である。 FIG. 18 is a screen example of a user interface showing an example of a form for realizing the present embodiment.
 1801は課題疾病抽出と要因抽出設定の設定を行う操作窓である。ここでは、対象群の絞り込みと、疾病候補の絞り込みと、課題疾病抽出論理の設定と、要因候補の絞り込みが可能な例を示している。   1802と1803と1804は、それぞれ実施例1で説明した図17の1702、703、1704と同様である。1808は、グラフィカルモデル記憶部118に記憶されたグラフィカルモデルの項目のうち、要因の候補とする要因候補項目を絞り込む絞込み条件を設定する入力窓である。ここでは例として、性別の項目を対象から除外している。1805は、1802、1803、1804、1808で設定した課題疾病抽出および要因抽出の設定に基づき、課題疾病抽出および要因抽出処理を開始する実行ボタンである。 1801 is an operation window for setting a problem disease extraction and a factor extraction setting. Here, an example is shown in which target group narrowing, disease candidate narrowing, problem disease extraction logic setting, and factor candidate narrowing are possible. 1802, 1803, and 1804 are the same as 1702, 703, and 1704 in FIG. 17 described in the first embodiment. Reference numeral 1808 denotes an input window for setting a narrowing-down condition for narrowing down the factor candidate items to be factor candidates among the items of the graphical model stored in the graphical model storage unit 118. Here, as an example, gender items are excluded from the target. Reference numeral 1805 denotes an execution button for starting task disease extraction and factor extraction processing based on the task disease extraction and factor extraction settings set in 1802, 1803, 1804, and 1808.
 1806は、処理結果を表示する表示窓である。1807は、抽出した課題および抽出した要因を表示する表示画面である。ここでは、翌年医療費の高い順に、抽出された課題疾病が、表形式で表示され、さらに課題疾病毎の要因が、各疾病に対応する行に記載されている。 1806 is a display window for displaying the processing result. Reference numeral 1807 denotes a display screen that displays the extracted problem and the extracted factor. Here, the extracted problem diseases are displayed in a table format in descending order of medical expenses for the next year, and the factors for each problem disease are described in a row corresponding to each disease.
 1809は、可視化部112によって作成された可視化用グラフを表示するグラフ表示画面である。ここでは、第一の課題疾病として抽出された腎不全および腎不全の要因である項目を丸型のノードで、第二の課題疾病として抽出された心筋梗塞および心不全の要因である項目を四角型のノードで表現している。加えて、各課題と要因から構成される部分グラフを、その他のノードおよびエッジと比較して太線で表現し、強調表示している。 1809 is a graph display screen for displaying the visualization graph created by the visualization unit 112. Here, the item that is the cause of renal failure and renal failure extracted as the first problem disease is a round node, and the item that is the cause of myocardial infarction and heart failure extracted as the second problem disease is a square type. It is expressed by the node. In addition, the partial graph composed of each task and factor is expressed by a bold line compared with other nodes and edges, and is highlighted.
 以上のように、本実施例に係るヘルスケアデータ分析装置は、ヘルスケアデータに基づき、課題疾病とその要因を抽出し、さらにグラフ構造上に課題疾病と要因に関する情報を付加した状態で可視化することで、課題疾病とその要因の把握を支援できる。 As described above, the health care data analysis apparatus according to the present embodiment extracts a target disease and its factor based on the health care data, and further visualizes the graph structure with information on the target disease and the factor added thereto. In this way, it is possible to support the understanding of the problem disease and its factors.
 実施例2では、要因抽出部111で、課題疾病記憶部121に記憶された課題疾病の要因項目を抽出し、可視化部112において、グラフィカルモデル記憶部118に記憶されたグラフィカルモデルG=(V、E)の構造に、課題疾病と要因に関する情報を付加して可視化した。この過程で、異なる所定期間に纏められた同一項目のノードを異なるノードとして扱い、表示した。本実施例では、異なる所定期間に纏められた同一項目のノードを同一のノードとして扱い、要因抽出および可視化を実施する例について説明する。本実施例により、生活習慣・検査値・病態の関係や、時系列的な病態間の関連性をより把握しやすい形で可視化できる。 In Example 2, the factor extraction unit 111 extracts the factor item of the target disease stored in the target disease storage unit 121, and the visualization unit 112 stores the graphical model G = (V, The information of E) was added to the structure of E) and visualized. In this process, the nodes of the same item collected in different predetermined periods are treated as different nodes and displayed. In the present embodiment, an example will be described in which nodes of the same item collected in different predetermined periods are treated as the same node, and factor extraction and visualization are performed. According to the present embodiment, it is possible to visualize the relationship between lifestyle habits / test values / pathological conditions and the relationship between time-series pathological conditions in a more easily understandable manner.
 構成や処理などは、要因抽出部111及び可視化部112を除き、実施例2と同様であるため、説明を省略する。 Since the configuration and processing are the same as those in the second embodiment except for the factor extraction unit 111 and the visualization unit 112, description thereof is omitted.
 まず、要因抽出部111の処理について説明する。 First, the processing of the factor extraction unit 111 will be described.
 要因抽出部111では、実施例2と同様に、課題疾病記憶部121に記憶された課題疾病の要因となる項目を抽出する機能を提供する。実施例2では、課題と要因候補の確率的依存関係に基づき、要因を抽出する例を説明したが、本実施例では、同一項目を集約してグラフを表現するための、要因抽出機能を説明する。 The factor extraction unit 111 provides a function of extracting items that are factors of the target disease stored in the target disease storage unit 121 as in the second embodiment. In the second embodiment, the example of extracting the factor based on the probabilistic dependency relationship between the task and the candidate factor has been described. However, in this embodiment, the factor extracting function for collecting the same items and expressing the graph is described. To do.
 図5は要因抽出機能の処理のフローチャートである。 FIG. 5 is a flowchart of the factor extraction function process.
 課題疾病選択ステップ401、要因候補絞込みステップ402、項目間依存度算出ステップ403は、実施例1で説明した処理と同様の処理を実施するため、説明を省略する。以下、同一項目課題要因抽出ステップ501、同一項目要因登録ステップ502、ステップ503、同一項目要因抽出ステップ504、同一項目要因DB登録ステップ505の処理について説明する。 The problem disease selection step 401, the factor candidate narrowing step 402, and the inter-item dependency calculation step 403 perform the same processing as the processing described in the first embodiment, and thus description thereof is omitted. Hereinafter, processing of the same item problem factor extraction step 501, the same item factor registration step 502, step 503, the same item factor extraction step 504, and the same item factor DB registration step 505 will be described.
 同一項目課題要因抽出ステップ501では、まず、実施例2で説明した方法と同様の方法を用いて、要因を抽出する。 In the same item problem factor extraction step 501, first, factors are extracted using the same method as that described in the second embodiment.
 次に、抽出された要因項目と同一項目で、かつ纏められた期間が要因項目と異なるノードを追加要因として抽出する。このとき、課題疾病選択ステップ401で選択した課題疾病と追加要因との依存度は考慮しなくても良い。また、要因と追加要因との依存度も考慮しなくても良い。例えば、X+1年に取得された疾病Dの要因として、X年に取得された検査値Lが抽出された場合、追加の要因としてX+1年に取得された検査値Lを追加要因として抽出する。 Next, a node that is the same item as the extracted factor item and whose collected period is different from the factor item is extracted as an additional factor. At this time, it is not necessary to consider the dependency between the problem disease selected in the problem disease selection step 401 and the additional factor. Moreover, it is not necessary to consider the dependency between the factor and the additional factor. For example, when the test value L acquired in year X is extracted as a factor of the disease D acquired in year X + 1, the test value L acquired in year X + 1 is extracted as an additional factor.
 以下、同一項目要因登録ステップ502、ステップ503、同一項目要因抽出ステップ504の3つの処理からなるサイクルは、同一項目課題要因抽出ステップ501で抽出した要因との依存度の高い項目を新たな要因として抽出し、要因項目として登録する処理と、同一項目要因登録ステップ502で登録した要因項目との依存度の高い項目を新たな要因として抽出し、要因項目として登録する処理の両方を含む、処理サイクルである。本サイクルは、直接的な要因と間接的な要因の両方の要因の抽出を目的とする。以下、具体的な処理を説明する。 In the following, the cycle consisting of the three processes of the same item factor registration step 502, step 503, and the same item factor extraction step 504 is performed with an item having a high dependency on the factor extracted in the same item task factor extraction step 501 as a new factor. A processing cycle that includes both processing for extracting and registering as a factor item, and processing for extracting an item having a high dependency on the factor item registered in the same item factor registration step 502 as a new factor and registering it as a factor item It is. The purpose of this cycle is to extract both direct and indirect factors. Specific processing will be described below.
 同一項目要因登録ステップ502では、同一項目課題要因登録ステップ501で抽出した要因項目と追加要因項目を課題疾病選択ステップ2501で選択した課題疾病の要因として登録する。また後述する同一項目要因抽出ステップ504で抽出した要因項目を課題疾病の要因として登録する。 In the same item factor registration step 502, the factor item extracted in the same item task factor registration step 501 and the additional factor item are registered as factors of the problem disease selected in the problem disease selection step 2501. In addition, the factor item extracted in the same item factor extraction step 504 described later is registered as a factor of the problem disease.
 ステップ503では、同一項目要因登録ステップ502で登録された要因項目の中で、更なる要因が存在するか評価されていない要因項目があるか否かを判定する。評価されていない要因項目が存在する場合、同一項目要因抽出ステップ504に進む。評価されていない要因項目が存在しない場合、同一項目要因DB登録ステップ505に進む。 In step 503, it is determined whether there is a factor item that has been evaluated whether there is a further factor among the factor items registered in the same item factor registration step 502. If there is a factor item that has not been evaluated, the process proceeds to the same item factor extraction step 504. If there is no factor item that has not been evaluated, the process proceeds to the same item factor DB registration step 505.
 同一項目要因抽出ステップ504では、同一項目要因登録ステップ502で登録された要因の中で未評価の要因項目をひとつ選択し、その項目の要因を抽出する。抽出方法は同一項目課題要因抽出ステップ501と同様であり、同一項目課題要因抽出ステップ501の処理説明文において、課題疾病項目選択ステップ401で選択した課題疾病、を、要因登録ステップ502で登録された要因の中で未評価の要因項目、として読み替えたものと等しい。 In the same item factor extraction step 504, one unrated factor item is selected from the factors registered in the same item factor registration step 502, and the factor of the item is extracted. The extraction method is the same as that of the same item task factor extraction step 501, and the problem disease selected in the problem disease item selection step 401 in the processing description of the same item task factor extraction step 501 is registered in the factor registration step 502. It is the same as what has been read as unassessed factor items.
 なお、同一項目課題要因抽出ステップ501と同一項目要因抽出ステップ504で用いる依存度の計算方法および依存度に設定する閾値は異なるものを用いてもよい。さらに、同一項目要因登録ステップ502、ステップ503、同一項目要因抽出ステップ504の処理サイクルのサイクル毎に、同一項目要因抽出ステップ504の依存度の計算方法および依存度に設定する閾値を変化させても良い。例えば、処理のサイクル回数に関連付けて閾値を変化させても良い。 In addition, the threshold value set to the calculation method of the dependence used in the same item problem factor extraction step 501 and the same item factor extraction step 504 and the dependency may be different. Further, even if the same item factor registration step 502, step 503, and the same item factor extraction step 504 are processed, the dependency calculation method of the same item factor extraction step 504 and the threshold value set for the dependency may be changed. good. For example, the threshold value may be changed in association with the number of processing cycles.
 同一項目要因DB登録ステップ505は課題疾病選択ステップ401で選択した課題疾病と、同一項目要因登録ステップ502で登録された要因を、要因記憶部122に記憶しておく。 In the same item factor DB registration step 505, the problem disease selected in the problem disease selection step 401 and the factor registered in the same item factor registration step 502 are stored in the factor storage unit 122.
 ここで、図19を用いて、本実施例における要因抽出部111の処理の効果を説明する。 Here, the effect of the processing of the factor extraction unit 111 in this embodiment will be described with reference to FIG.
 図19は、グラフィカルモデル記憶部118に記憶されたグラフの一例である。このグラフは、N年に取得したデータから算出した、糖尿病経口薬の処方に関する確率変数を表すノード、インスリンの処方に関する確率変数を表すノード、透析に関する確率変数を表すノードと、N+1年に取得したデータから算出した、糖尿病経口薬の処方に関する確率変数を表すノード、インスリンの処方に関する確率変数を表すノード、透析に関する確率変数を表すノード、の6つのノードを含むグラフである。 FIG. 19 is an example of a graph stored in the graphical model storage unit 118. This graph was calculated from data acquired in N years, a node representing a random variable related to prescription of oral diabetes, a node representing a random variable related to prescription of insulin, a node representing a random variable related to dialysis, and acquired in N + 1 year. It is a graph including six nodes, a node representing a random variable related to prescription of diabetic oral medicine, a node representing a random variable related to prescription of insulin, and a node representing a random variable related to dialysis calculated from data.
 図19A中の点線の矢印は異なる所定期間に纏められた同一項目のノード間の確率的依存関係を表すエッジを表し、破線の矢印は異なる所定期間に纏められた異なる項目のノード間の確率的依存関係を表すエッジを表す。 A dotted arrow in FIG. 19A represents an edge representing a stochastic dependency between nodes of the same item grouped in different predetermined periods, and a broken arrow represents a probability between nodes of different items grouped in different predetermined periods. Represents an edge representing a dependency relationship.
 以下、N+1年に取得した透析のノードを課題として、その要因を抽出する例を用いて、効果を説明する。 Hereinafter, the effect will be described using an example of extracting the cause of the dialysis node acquired in N + 1 as an issue.
 図19Aでは、N+1年の透析のノードはN年のインスリンとの間にエッジが存在しており、透析とインスリンの間に確率的依存関係が存在することが表現されている。そのため、N年のインスリンノードは、N+1年の透析ノードの要因として抽出される可能性がある。一方、N年の糖尿病経口薬のノードと、N+1年の糖尿病経口薬のノードは、いずれもN+1年の透析のノードとの間にエッジが存在していないため、糖尿病経口薬項目と透析項目の間には確率的依存関係が表現されていない。しかし、N年の糖尿病経口薬とN+1年のインスリンの間には有向辺が存在し、確率的依存関係が表現されている。つまり、糖尿病経口薬は翌年のインスリンに影響を及ぼし、またインスリンは翌年の透析に影響を及ぼすことが分かる。よって、糖尿病経口薬と透析にも、確率的依存性が存在することが、グラフより読み取れる。 FIG. 19A shows that the dialysis node in N + 1 year has an edge between the N-year insulin and a stochastic dependency between dialysis and insulin. Therefore, the N-year insulin node may be extracted as a factor of the N + 1 year dialysis node. On the other hand, since there is no edge between the N year diabetes oral medicine node and the N + 1 diabetes oral medicine node, there is no edge between the N + 1 dialysis node and the diabetes oral medicine item and the dialysis item. There is no stochastic dependency expressed between them. However, there is a directed edge between the N-year diabetic oral medicine and the N + 1-year insulin, and a stochastic dependence is expressed. In other words, it can be seen that oral diabetes drugs affect the next year's insulin, and that insulin affects the next year's dialysis. Therefore, it can be seen from the graph that there is a stochastic dependence also in diabetic oral drugs and dialysis.
 以下、本実施例により、これらの関係性に基づき、要因を抽出する例を示す。 Hereinafter, an example of extracting factors based on these relationships will be described according to the present embodiment.
 図19Bは、同一項目課題要因抽出ステップ501により、課題疾病と直接繋がった
項目を要因として抽出した例である。この要因抽出では、要因候補から、課題疾病に有向辺が存在し、かつ要因候補から対象ノードに向いた有向辺である場合、その要因候補を抽出対象として抽出した例を示している。
FIG. 19B is an example in which items directly connected to the problem disease are extracted as factors in the same item problem factor extraction step 501. In this factor extraction, an example is shown in which, when there is a directed edge in the problem disease and a directed edge from the factor candidate toward the target node, the candidate factor is extracted as an extraction target.
 図19Cは、同一項目課題要因抽出ステップ501により、抽出した要因と同一項目のノードを追加要因として抽出した結果である。この要因抽出では、N+1年インスリンが追加要因として抽出されている。 FIG. 19C shows the result of extracting a node having the same item as the extracted factor as an additional factor in the same item task factor extracting step 501. In this factor extraction, N + 1 year insulin is extracted as an additional factor.
 図19Dは、同一項目要因登録ステップ502、ステップ503、同一項目要因抽出ステップ504のサイクルを反復し、全ての処理を完了した結果である。結果より、透析ノードと直接繋がっていない糖尿病経口薬を要因として抽出できていることが分かる。 FIG. 19D shows a result of repeating all the processes by repeating the cycle of the same item factor registration step 502, step 503, and the same item factor extraction step 504. From the results, it can be seen that an oral diabetes drug not directly connected to the dialysis node can be extracted as a factor.
 次に、図20及び図24を用いて可視化部112の処理について説明する。図24は可視化部112の処理のフローチャートである。図20は処理を適用したグラフの変化を表す例である。 Next, processing of the visualization unit 112 will be described with reference to FIGS. FIG. 24 is a flowchart of the processing of the visualization unit 112. FIG. 20 is an example showing a change in the graph to which the process is applied.
 図24のフローチャートについて説明する。 The flowchart of FIG. 24 will be described.
 可視化エッジ選択ステップ2401では、表示するエッジとして、異なる所定期間に纏められた異なる項目のノード間の確率的依存関係を表すエッジを選択する。本ステップで選択しなかったエッジについては、本処理において可視化の対象から除外する。 In the visualization edge selection step 2401, an edge representing a probabilistic dependency between nodes of different items collected in different predetermined periods is selected as an edge to be displayed. Edges not selected in this step are excluded from visualization targets in this process.
 本ステップの処理の例を、図20A及び図20Bを用いて説明する。 An example of processing in this step will be described with reference to FIGS. 20A and 20B.
 図20Aは、グラフィカルモデル作成部108で作成したグラフの一例である。このグラフは、N年に取得したデータから算出した、生活習慣Aの確率変数を表すノード、糖尿病経口薬の処方に関する確率変数を表すノード、インスリンの処方に関する確率変数を表すノード、透析に関する確率変数を表すノードと、N+1年に取得したデータから算出した、生活習慣Aの確率変数を表すノード、糖尿病経口薬の処方に関する確率変数を表すノード、インスリンの処方に関する確率変数を表すノード、透析に関する確率変数を表すノード、の8つのノードを含むグラフである。図20A中の実線の矢印は同じ所定期間に纏められた異なる項目のノード間の確率的依存関係を表すエッジを表し、点線の矢印は異なる所定期間に纏められた同一項目のノード間の確率的依存関係を表すエッジを表し、破線の矢印は異なる所定期間に纏められた異なる項目のノード間の確率的依存関係を表すエッジを表す。 FIG. 20A is an example of a graph created by the graphical model creation unit 108. This graph shows a node representing a random variable of lifestyle A, a node representing a random variable related to prescription of diabetic oral medicine, a node representing a random variable related to prescription of insulin, and a random variable related to dialysis, calculated from data acquired in N years , A node representing a lifestyle variable random variable, a node representing a random variable related to prescription of oral diabetes, a node representing a random variable related to prescription of insulin, and a probability related to dialysis, calculated from data acquired in N + 1 years It is a graph including eight nodes of nodes representing variables. The solid line arrows in FIG. 20A represent edges representing stochastic dependencies between nodes of different items collected in the same predetermined period, and the dotted arrows represent probabilistic relationships between nodes of the same item grouped in different predetermined periods. An edge representing a dependency relationship is represented, and a broken-line arrow represents an edge representing a stochastic dependency relationship between nodes of different items collected in different predetermined periods.
 図20Bは、図20Aで示したグラフに、可視化エッジ選択ステップ2401を適用した例である。図20B中の実線の矢印は、可視化エッジ選択ステップ2401で選択された、異なる所定期間に纏められた異なる項目のノード間のエッジを表す。 FIG. 20B is an example in which the visualization edge selection step 2401 is applied to the graph shown in FIG. 20A. Solid arrows in FIG. 20B represent edges between nodes of different items selected in the visualization edge selection step 2401 and collected in different predetermined periods.
 同一項目集約ステップ2402では、異なる所定期間に纏められた同一項目のノードを、同一のノードへと集約した後、集約したノード毎に座標を計算する。 In the same item aggregation step 2402, the nodes of the same item collected in different predetermined periods are aggregated into the same node, and then the coordinates are calculated for each aggregated node.
 集約方法の1つ目の例を挙げる。異なる所定期間に纏められた同一項目ノードを、同一の座標に重ね合わせる。ノードの項目を表す文字列をノードの内部もしくは周辺に表示していた場合、文字列の内容を適宜変更する。 Give a first example of the aggregation method. The same item nodes collected in different predetermined periods are superimposed on the same coordinates. If a character string representing a node item is displayed inside or around the node, the contents of the character string are changed as appropriate.
 集約方法の2つ目の例を挙げる。異なる所定期間に纏められた同一項目ノードを、それぞれ繋がっていたエッジと切り離し、それらのノードを可視化対象から除外した後、切り離したエッジ全てと繋がる新たなノードをひとつ追加することにより、可視化用のグラフ構造を作成し、これを可視化対象とする。ノードの項目を表す文字列をノードの内部もしくは周辺に表示していた場合、文字列の内容を適宜変更する。 Give a second example of the aggregation method. Separate the same item nodes collected in different predetermined periods from the connected edges, exclude those nodes from the visualization target, and add one new node connected to all the separated edges for visualization. Create a graph structure and make it a visualization target. If a character string representing a node item is displayed inside or around the node, the contents of the character string are changed as appropriate.
 次に座標計算方法の1つ目の例を挙げる。可視化エッジ選択ステップ2401の処理を適用する前の、グラフィカルモデル記憶部118に記憶されるグラフに対して、広く知られるノード座標計算手法を適用して各ノードの座標を算出した後に、異なる所定期間に纏められた同一項目のノードの座標から集約後の座標を計算する。例えば、同一項目のノードが2つ存在する場合、それらの元々の座標の中間位置、もしくは重み付け平均によって求めた位置を、集約後の座標とする。 Next, the first example of coordinate calculation method is given. After applying the widely known node coordinate calculation method to the graph stored in the graphical model storage unit 118 before applying the processing of the visualization edge selection step 2401, the coordinates of each node are calculated, and then different predetermined periods. The coordinates after the aggregation are calculated from the coordinates of the nodes of the same item collected in (1). For example, when there are two nodes of the same item, the intermediate position of the original coordinates or the position obtained by weighted averaging is set as the coordinate after aggregation.
 座標計算方法の2つ目の例を挙げる。グラフィカルモデル記憶部118に記憶されるグラフに、可視化エッジ選択ステップ2401処理を適用した後の可視化用グラフ構造から座標を計算する。例えば、集約方法の2つ目の例で挙げたように、新たなノードを追加した新しい可視化用グラフ構造を作成した場合、このグラフ構造に広く知られるノード座標計算手法を適用し、座標を計算する。 Give a second example of coordinate calculation method. Coordinates are calculated from the graph structure for visualization after applying the visualization edge selection step 2401 process to the graph stored in the graphical model storage unit 118. For example, as shown in the second example of the aggregation method, when a new visualization graph structure with a new node added is created, the coordinate calculation is performed by applying a widely known node coordinate calculation method to this graph structure. To do.
 本ステップの処理の例を、図20Cを用いて説明する。 An example of processing in this step will be described with reference to FIG. 20C.
 図20Cは、図20Bで表わす、可視化エッジ選択ステップ2401の処理を適用したグラフに、同一項目ノード集約ステップ2402を適用し、同一項目を集約した例である。ここでは、集約方法として、1つ目の例で挙げた処理を、座標計算方法として、1つ目の例で挙げた処理を用いた例を示す。 FIG. 20C is an example in which the same item node aggregation step 2402 is applied to the graph to which the processing of the visualization edge selection step 2401 shown in FIG. Here, an example is shown in which the processing described in the first example is used as the aggregation method, and the processing described in the first example is used as the coordinate calculation method.
 可視化ステップ2403では、同一項目集約ステップで求めた座標に基づき、ノードおよびエッジを可視化する。ノードおよびエッジの表示方法は、実施例1の可視化部112の処理の説明で示した方法を用いる。 In visualization step 2403, nodes and edges are visualized based on the coordinates obtained in the same item aggregation step. The node and edge display method uses the method shown in the description of the processing of the visualization unit 112 of the first embodiment.
 ここで、可視化部112の処理の効果を、図20を用いて説明する。 Here, the effect of the processing of the visualization unit 112 will be described with reference to FIG.
 異なる所定期間に纏められた異なる項目のノード間の確率的依存関係は、ある項目が、別の項目の遷移に与える影響の強さを表す。ここで遷移とは、複数年に及ぶ同一項目同士の確率的依存関係である。例えば、図20Aにおいて、N年の生活習慣Aのノードは、N+1年の糖尿病経口薬のノードとの間に有向辺を有しており、確率的依存関係がある。これは、生活習慣A項目が、糖尿病経口薬項目の翌年への遷移に影響を与えることを意味する。一方、N年の糖尿病経口薬は、N+1年のインスリン及びN+1年の透析との間に有向辺を有しており、確率的依存関係がある。これは、糖尿病経口薬項目が、インスリン項目のおよび透析項目の翌年への遷移に影響を与えることを意味する。これらをまとめると、生活習慣A項目は、糖尿病経口薬項目の翌年への遷移に影響を与え、さらに糖尿病傾向薬項目は、インスリン項目と透析項目の翌年の遷移に影響を与えるということである。よって生活習慣Aは間接的に、インスリン項目および透析項目の遷移に影響を与えていることが分かる。しかし、図20Aで示したグラフ可視化方法では、生活習慣Aと糖尿病経口薬の確率的依存関係は有向エッジの存在により読み取れるが、生活習慣Aとインスリン、および生活習慣Aと透析の関係性は、それらの間にエッジが存在しないため読み取りづらい。一方、図20Cで示した可視化部112の処理を適用した後のグラフでは、生活習慣A、糖尿病経口薬、インスリン、透析の4つの項目が、互いの遷移に与える影響が理解しやすい形で可視化されていることが分かる。 The probabilistic dependency between nodes of different items collected in different predetermined periods represents the strength of the influence of one item on the transition of another item. Here, transition is a stochastic dependency between the same items over a plurality of years. For example, in FIG. 20A, the node of the lifestyle A of N years has a directional side with the node of the oral oral medicine of N + 1 years, and has a stochastic dependence relationship. This means that the lifestyle A item affects the transition to the next year of the oral diabetes drug item. On the other hand, N-year diabetic oral medicine has a promising edge between N + 1 year insulin and N + 1 year dialysis, and has a stochastic dependence. This means that the oral diabetes drug item affects the transition of the insulin item and the dialysis item to the next year. In summary, the lifestyle A item affects the transition to the next year of the oral diabetes drug item, and the diabetes propensity drug item affects the transition of the insulin item and the dialysis item the next year. Therefore, it can be seen that lifestyle habit A indirectly affects the transition of insulin items and dialysis items. However, in the graph visualization method shown in FIG. 20A, the stochastic dependency between lifestyle A and diabetes oral medicine can be read by the existence of directed edges, but the relationship between lifestyle A and insulin, and lifestyle A and dialysis is It is difficult to read because there is no edge between them. On the other hand, in the graph after applying the processing of the visualization unit 112 shown in FIG. 20C, visualization is performed in a form that makes it easy to understand the influence of the four items of lifestyle A, diabetes oral medicine, insulin, and dialysis on each other's transition. You can see that.
 なお、本実施例では説明のため、可視化エッジ選択ステップ2401において、異なる所定期間に纏められた異なる項目間の確率的依存関係を表すエッジのみを選択したが、例えば、同一の所定期間に纏められた異なる項目間の確率的依存関係を表すエッジを選択しても良いし、その両方のエッジを選択しても良い。また、可視化エッジ選択ステップで選択しなかったエッジを可視化の対象せず表示しない例を説明したが、例えば、エッジの表示の有無をエッジの色・形状・太さなどで表現しても良い。 In the present embodiment, for the purpose of explanation, in the visualization edge selection step 2401, only edges representing stochastic dependence relationships between different items collected in different predetermined periods are selected, but for example, they are collected in the same predetermined period. Alternatively, an edge representing a stochastic dependency between different items may be selected, or both edges may be selected. Further, an example has been described in which an edge that has not been selected in the visualization edge selection step is not subject to visualization and is not displayed. For example, whether or not an edge is displayed may be expressed by the color, shape, or thickness of the edge.
 以上のように、本実施例に係るヘルスケアデータ分析装置は、生活習慣・検査値・病態の関係や、時系列的な病態間の関連性をより把握しやすい形で可視化できる。 As described above, the health care data analysis apparatus according to the present embodiment can visualize the relationship between lifestyle habits / test values / pathological conditions and the relationship between time-series pathological conditions in a more easily understandable manner.
 第1の実施例では、レセプト情報、健診情報、問診情報などを含む医療ヘルスケアデータに基づいて、課題となる疾病を抽出するヘルスケアデータ分析装置の例を説明した。一方、健康保険事業者は、将来の課題となる疾病に加えて、その疾病の発症リスクが高い被保健者を把握したいと考えている。しかし、将来の発症リスクが高い被保険者をヘルスケアデータから探し出すことは、疾病とデータとの因果関係に関する深い知識が必要であること、データが膨大であることなどから、容易ではなかった。 In the first embodiment, an example of a health care data analysis device that extracts a disease that becomes a problem based on medical health care data including receipt information, medical checkup information, and inquiry information has been described. Health insurance providers, on the other hand, want to identify health-care workers who have a high risk of developing a disease in addition to a disease that will be a future issue. However, it is not easy to search for insured persons who have a high risk of developing future diseases from health care data, because deep knowledge about the causal relationship between illness and data is necessary and the amount of data is enormous.
 第4の実施例では、課題となる疾病に関する情報を用いて、発症リスクの高い対象者を抽出するヘルスケアデータ分析装置の例を説明する。 In the fourth embodiment, an example of a health care data analysis device that extracts a subject who has a high risk of onset using information on a disease to be a problem will be described.
 構成や処理などは、高リスク対象者選定部113を除き、実施例1と同様であるため、説明を省略する。 Since the configuration and processing are the same as those in the first embodiment except for the high-risk target person selecting unit 113, the description thereof is omitted.
 高リスク対象者選定部113では、課題疾病記憶部121に記憶された課題疾病に関する情報と、疾病評価指標記憶部120に記憶された、対象者毎の疾病評価指標に基づき、疾病発症のリスクの高い対象者を選定する高リスク対象者選定機能を提供する。以下、健康保険事業者が、被保健者群の中から、課題疾病の発症リスクの高い被保険者を選定する場合を例に説明する。 The high-risk target person selection unit 113 determines the risk of developing the disease based on the information on the target disease stored in the target disease storage unit 121 and the disease evaluation index for each target stored in the disease evaluation index storage unit 120. Provide a high-risk target selection function to select high target persons. Hereinafter, a case where the health insurance company selects an insured person who has a high risk of developing the disease from the group of insured persons will be described as an example.
 まず、高リスク対象者選定部113では、高リスク対象者選定を行う被保険者群のデータを整形情報記憶部113または入力部102から読み込む。例えば、グラフィカルモデル記憶部114に記憶されるグラフィカルモデル作成に利用した被保険者群のデータを用いる場合は、整形情報記憶部113に記憶される整形情報をそのまま用いる。未知の被保険者群のデータを用いる場合は、データを入力部102から読込み、必要に応じて、データ整形部107で整形したものを用いる。なお、対象者群のデータは全対象者のデータを用いても良いし、対象者群の部分集合をサンプリングして用いてもよい。例えば、ある年齢以上の被保険者群を対象とする場合、年齢の項目に閾値を設定し、整形情報に含まれるデータの内、閾値以上の年齢を有する被保険者のデータだけを選択すれば良い。サンプリングは、年齢や診療行為数など他の項目に閾値を設けても良い。また、ランダムサンプリングなどの公知のサンプリング手法を用いてサンプリングしても良い。 First, the high-risk target person selecting unit 113 reads the data of the insured group who selects the high-risk target person from the shaping information storage unit 113 or the input unit 102. For example, when using the data of the insured group used for creating the graphical model stored in the graphical model storage unit 114, the shaping information stored in the shaping information storage unit 113 is used as it is. When using data of an unknown insured group, data read from the input unit 102 and shaped by the data shaping unit 107 as necessary are used. Note that the data of the subject group may be data of all subjects, or may be used by sampling a subset of the subject group. For example, when targeting a group of insured persons older than a certain age, if a threshold is set in the item of age and only data of insured persons having an age equal to or greater than the threshold is selected from the data included in the shaping information, good. For sampling, thresholds may be provided for other items such as age and the number of medical treatments. Moreover, you may sample using well-known sampling methods, such as random sampling.
 次に、被保険者群に対して、高リスク対象者選定機能を適用し、高リスク対象者を選定する。 Next, the high risk target person selection function is applied to the insured group to select high risk target persons.
 図25は高リスク対象者選定機能の処理のフローチャートである。 FIG. 25 is a flowchart of processing of the high risk target person selection function.
 課題疾病選択ステップ2501では、課題疾病記憶部121に記憶された課題疾病から、高リスク対象者選定を行う課題疾病を選択する。以後の説明のため、本ステップで選択された疾病を疾病Dとする。 In the problem disease selection step 2501, a problem disease for which a high risk target person is selected is selected from the problem diseases stored in the problem disease storage unit 121. For the following description, the disease selected in this step is referred to as disease D.
 以下、対象者サンプル選択ステップ2502から、ステップ2504までのステップは対象者一人ひとりに対して実施する処理であり、全対象者に対して一巡する1サイクルの処理である。以下、具体的な処理を説明する。 Hereinafter, the steps from the target person sample selection step 2502 to the step 2504 are processes performed for each target person, and are one cycle of a process for all target persons. Specific processing will be described below.
 対象者サンプル選択ステップ2502では、当該サイクルで未処理の被保険者サンプルを一つ選択する。以後の説明のため、本ステップで選択された対象者を被保険者Sとする。 In subject sample selection step 2502, one unprocessed insured sample is selected in the cycle. For the following explanation, the subject selected in this step is assumed to be insured S.
 疾病評価指標読込ステップ2503では、疾病評価指標記憶部120に記憶された、被保険者Sの疾病Dに関する疾病評価指標を読み出す。 In the disease evaluation index reading step 2503, the disease evaluation index related to the disease D of the insured S stored in the disease evaluation index storage unit 120 is read.
 ステップ2504では、対象群のデータのうち、当該サイクルで未処理の対象者がいれば、対象者サンプル選択ステップ2502に戻り、未予測の対象者を選択する。なければ、当該サイクルを終了し、ステップ2505に移る。 In step 2504, if there is an unprocessed target person in the target group among the data of the target group, the process returns to the target person sample selection step 2502 to select an unpredicted target person. If not, the cycle is terminated and the routine goes to Step 2505.
 高リスク対象者選定ステップ2505では、疾病評価指標読込ステップ2503で読み込んだ対象者毎の疾病評価指標を比較し、発症リスクの高い被保険者を選定する。 In the high risk target person selection step 2505, the disease evaluation index for each target person read in the disease evaluation index reading step 2503 is compared, and an insured person having a high onset risk is selected.
 選定方法の1つ目の例を挙げる。疾病評価指標に対して閾値を設定し、閾値以上、または以下の指標を有する対象者を、リスクの高い被保険者として選定する。例えば、疾病評価指標に翌年の発症確率を選択した場合、予測した翌年の発症確率がある閾値以上である被保険者をリスクが高いとして選定できる。 The first example of the selection method is given. A threshold is set for the disease evaluation index, and a target person having an index equal to or higher than the threshold is selected as a high-risk insured person. For example, when the onset probability of the next year is selected as the disease evaluation index, an insured person whose onset probability of the next year predicted is equal to or higher than a threshold can be selected as having a high risk.
 選定方法の2つ目の例を挙げる。疾病評価指標を大きい順、または小さい順に並べ、その上位または下位の所定の人数の対象者を、リスクの高い被保険者として選定する。例えば、疾病評価指標に翌年の発症確率を選択した場合、翌年の発症確率が高い順に、被保険者を選定する。 Give a second example of selection method. The disease evaluation indexes are arranged in descending order or in ascending order, and a predetermined number of target persons at the upper or lower order are selected as high-risk insured persons. For example, when the onset probability of the next year is selected as the disease evaluation index, the insured is selected in descending order of the onset probability of the next year.
 高リスク対象者選定部113で選定した対象者に関する情報は、出力部103により、文字形式、表形式などで出力しても良い。 The information on the target selected by the high risk target selection unit 113 may be output by the output unit 103 in a character format, a table format, or the like.
 図26は本実施例を実現する形態の一例を示す、ユーザインターフェイスの画面例である。 FIG. 26 is a screen example of a user interface showing an example of a form for realizing the present embodiment.
 2601は、課題疾病選択の設定を行う操作窓である。2602は、課題疾病を選択する操作窓である。2603は2602で選択した課題疾病に関する高リスク対象者選定処理を実行する実行ボタンである。2604は、処理結果を表示する表示窓である。2605は、高リスク対象者選定処理の対象として選択した、課題疾病を表示する表示領域である。2606は、選択した課題疾病に関してリスクが高いとして選定された対象者の情報を表形式で表示する表示画面である。表示項目としては、例えば対象者毎の、課題疾病発症確率、氏名、ID、年齢などが挙げられる。 2601 is an operation window for setting a problem disease selection. Reference numeral 2602 denotes an operation window for selecting a problem disease. Reference numeral 2603 denotes an execution button for executing a high-risk target person selection process related to the target disease selected in 2602. Reference numeral 2604 denotes a display window for displaying the processing result. Reference numeral 2605 denotes a display area for displaying the target disease selected as the target of the high risk target person selection process. Reference numeral 2606 denotes a display screen that displays information on the subject selected as having a high risk for the selected problem disease in a table format. Examples of the display items include the subject disease onset probability, name, ID, and age for each target person.
 以上のように、本実施例に係るヘルスケアデータ分析装置は、将来の課題となる疾病に加えて、課題疾病毎の発症リスクの高い被保健者を選定できる。 As described above, the health care data analysis apparatus according to the present embodiment can select a health person who has a high onset risk for each problem disease in addition to a disease that will be a future problem.
 実施例2、実施例3では、課題となる疾病とその要因を抽出し、可視化するヘルスケアデータ分析システムの例を説明した。本実施例では、対象者もしくは対象者群に対して、要因もしくはその他の項目の変化が、課病の発症確率や項目間の確率的依存性の変化に与える影響をシミュレートし、その結果を可視化する医療データ分析システムの例を説明する。 In Example 2 and Example 3, an example of a health care data analysis system that extracts and visualizes a disease that is a problem and its factor has been described. In this example, the effect of changes in factors or other items on the subject or group of subjects on the onset probability of disease or the change in the stochastic dependence between items is simulated, and the results are An example of a medical data analysis system to be visualized will be described.
 構成や処理などは、発症確率予測部109、可視化部112、仮想整形データ作成部114を除き、実施例2または実施例3と同様であるため、説明を省略する。 The configuration and processing are the same as those in the second or third embodiment except for the onset probability prediction unit 109, the visualization unit 112, and the virtual shaping data creation unit 114, and thus the description thereof is omitted.
 第5の実施例では、整形情報に基づき、要因およびその他の項目の変化による疾病発症確率の変化をシミュレートし、その結果に基づき生まれる発症人数、医療費などの違いを可視化する。 In the fifth embodiment, based on the shaping information, a change in the probability of disease onset due to changes in factors and other items is simulated, and differences in the number of onset, medical expenses, etc. born based on the results are visualized.
 まず、仮想整形データ作成部114について説明する。 First, the virtual shaping data creation unit 114 will be described.
 仮想整形データ作成部114では、整形情報記憶部117に含まれる整形データに基づき、その一部を変化させて仮想的な整形データを作成する。 The virtual shaping data creation unit 114 creates virtual shaping data by changing a part of the shaping data included in the shaping information storage unit 117.
 図29は仮想整形データ作成部114の処理のフローチャートである。 FIG. 29 is a flowchart of processing of the virtual shaping data creation unit 114.
 以下、各ステップの処理について説明する。 Hereinafter, the processing of each step will be described.
 項目変化情報設定ステップ2901では、変化させる項目、変化量、変化後の値などの情報を設定する。以下に例を示す。課題疾病記憶部121には、疾病Dが課題として記憶されており、要因記憶部122には、1日の平均飲酒量(ml)が疾病Dの要因として記憶されているとする。例えば、1日の平均飲酒量が500ml以上の対象者が飲酒量を500ml減らしたとき、N年後の課題疾病Dの発症確率、および他の疾病の発症確率がどのように変化するかを予測する場合、変化させる項目として、1日の平均飲酒量(ml)の項目を、変化量として-500を設定する。 In item change information setting step 2901, information such as an item to be changed, a change amount, and a value after change is set. An example is shown below. It is assumed that the disease storage unit 121 stores the disease D as a task, and the factor storage unit 122 stores the average daily drinking amount (ml) as a factor of the disease D. For example, when a subject with an average daily drinking amount of 500 ml or more reduces the drinking amount by 500 ml, predicts how the onset probability of the target disease D and the onset probability of other diseases will change after N years In this case, the item of average drinking amount (ml) per day is set as the item to be changed, and -500 is set as the amount of change.
 対象者サンプル選択ステップ2902では、シミュレートを行う対象者のデータを整形情報記憶部117に記憶される整形データから選択する。例えば、一人の被保険者の発症確率に対する影響を予測する場合は、整形情報記憶部117に記憶される整形データから、対象の被保険者に該当するデータを選択して、用いる。複数の被保険者から構成される被保険者群の発症確率に対する影響を予測する場合は、整形情報記憶部117に記憶される整形データから、対象の被保険者群に該当するデータを選択して、用いる。 In subject sample selection step 2902, the data of the subject to be simulated is selected from the shaping data stored in the shaping information storage unit 117. For example, when predicting the influence on the onset probability of one insured person, the data corresponding to the insured person is selected from the shaped data stored in the shaped information storage unit 117 and used. When predicting the influence on the onset probability of an insured group composed of a plurality of insured persons, select data corresponding to the target insured group from the shaping data stored in the shaping information storage unit 117. And use.
 仮想整形データ作成ステップ2903では、対象者サンプル選択ステップ2902で選択したデータから、疾病への影響を評価したい項目の値を変化させた、新たなデータを作成する。前述の例では、1日の平均飲酒量(ml)項目の値を変更し、新たなデータとする。このデータを仮想整形データと呼ぶ。作成した仮想整形データは、仮想整形情報記憶部123に記憶する。 In the virtual shaping data creation step 2903, new data is created by changing the value of the item whose effect on the disease is to be evaluated from the data selected in the subject sample selection step 2902. In the above-described example, the value of the daily average drinking amount (ml) item is changed to be new data. This data is called virtual shaping data. The created virtual shaping data is stored in the virtual shaping information storage unit 123.
 図27Aは対象者サンプル選択ステップ2902で選択した整形データの例を示す。図27Bは、仮想整形データ作成ステップ2903で、飲酒に関する項目を変化させて作成した仮想整形データの例を表す。ここでは、疾病A項目および疾病B項目は、年間に該当疾病で受診した回数を示し、飲酒は1日の平均飲酒量(ml)とする。 FIG. 27A shows an example of the shaping data selected in the subject sample selection step 2902. FIG. 27B shows an example of virtual shaping data created by changing items related to drinking in the virtual shaping data creation step 2903. Here, the disease A item and the disease B item indicate the number of times of consultation with the corresponding disease per year, and alcohol consumption is the average daily alcohol consumption (ml).
 次に、発症確率予測部109について説明する。発症確率予測部109では、グラフィカルモデル記憶部118に記憶されたモデルを用いて、将来の項目の発症確率を予測する。グラフィカルモデルでは、一部の確率変数に既知の値が与えられたときの未知の確率変数の確率分布を求めることができる。本実施例では、既知の値として、整形情報記憶部117に記憶されたデータと、仮想整形情報記憶部123に記憶されたデータをそれぞれ用いて、予測を行う。既知データを入力した後の各項目の予測方法は実施例1で説明した処理と同様であるため、説明を省略する。これにより、前述の例では、対象者のX年のヘルスケアデータに基づいた予測結果であって、飲酒量が変化しなかった場合と、変化した場合の、2種類のN年後(Nは任意の自然数)の将来の状態が予測できる。予測した結果は、予測結果記憶部119に記憶する。 Next, the onset probability prediction unit 109 will be described. The onset probability prediction unit 109 predicts the onset probability of a future item using the model stored in the graphical model storage unit 118. In the graphical model, a probability distribution of an unknown random variable when a known value is given to some random variables can be obtained. In the present embodiment, prediction is performed using data stored in the shaping information storage unit 117 and data stored in the virtual shaping information storage unit 123 as known values. Since the prediction method of each item after inputting known data is the same as the process demonstrated in Example 1, description is abbreviate | omitted. Thereby, in the above-mentioned example, it is a prediction result based on the subject's X-year health care data, and the amount of drinking does not change, and when it changes, two types after N years (N is The future state of any natural number) can be predicted. The predicted result is stored in the prediction result storage unit 119.
 図28Aは、図27Aに示した整形情報に、発症確率予測部109の処理を適用し、予測した将来の状態を表す例である。ここでは、グラフィカルモデルが2年分の医療データを用いて作成されており、1年後の状態を予測した場合の例を示している。各項目の値は、予測した各項目の発生確率から算出した、値に関する期待値を示している。図28Bは、図27Bに示した、喫煙と飲酒に関する項目の値を変更し作成した整形情報に、発症確率予測部109の処理を適用し、予測した将来の状態である。この例では、飲酒の項目の変化が、翌年の疾病Aおよび疾病Bの受診回数の予測期待値に影響を及ぼしていることが分かる。 FIG. 28A is an example showing the predicted future state by applying the process of the onset probability prediction unit 109 to the shaping information shown in FIG. 27A. Here, an example in which a graphical model is created using medical data for two years and a state after one year is predicted is shown. The value of each item indicates an expected value related to the value calculated from the predicted occurrence probability of each item. FIG. 28B shows the predicted future state by applying the process of the onset probability prediction unit 109 to the shaping information created by changing the values of the items relating to smoking and drinking shown in FIG. 27B. In this example, it can be seen that the change in the item of drinking influences the predicted expected value of the number of consultations for disease A and disease B in the following year.
 次に可視化部112について説明する。 Next, the visualization unit 112 will be described.
 可視化部112では、グラフィカルモデル記憶部118に記憶されたグラフィカルモデルG=(V、E)の構造を、予測結果記憶部119に記憶された予測結果に関する情報を付加して可視化する。グラフ構造の可視化は実施例2、および実施例3で説明した処理と同様の処理であるため、説明を省略する。本実施例では、予測結果記憶部119に記憶された予測結果の数だけ、グラフを可視化する。表示するグラフはいずれも同じ構造を有し、またノードの座標も各表示領域の座標系において、同一の座標に表示する。このとき、ノードおよびエッジの表示方法を、対応する予測結果に応じて変化させる。具体的には、各ノードが表す項目の予測確率の違いに応じて、ノードの表示方法を変化させ、各ノード間の確率的依存関係の違いに応じて、ノードを結ぶエッジの表示方法を変化させる。 The visualization unit 112 visualizes the structure of the graphical model G = (V, E) stored in the graphical model storage unit 118 by adding information related to the prediction result stored in the prediction result storage unit 119. Visualization of the graph structure is the same processing as that described in the second and third embodiments, and thus the description thereof is omitted. In the present embodiment, the graphs are visualized by the number of prediction results stored in the prediction result storage unit 119. All the graphs to be displayed have the same structure, and the coordinates of the nodes are displayed at the same coordinates in the coordinate system of each display area. At this time, the node and edge display methods are changed according to the corresponding prediction results. Specifically, the display method of the node is changed according to the difference in the prediction probability of the item represented by each node, and the display method of the edge connecting the nodes is changed according to the difference in the stochastic dependency between the nodes. Let
 ノード表示方法の変化の1つ目の例を挙げる。予測結果毎に、項目毎の発生期待値の平均を算出し、その値の大きさによって、大きさ・形・色などを変化させる。例えば、整形情報および仮想整形情報が複数人数分存在するとき、各発症確率の一人あたりの平均を算出する。 The following is a first example of changes in the node display method. For each prediction result, the average of the expected occurrence value for each item is calculated, and the size, shape, color, etc. are changed depending on the magnitude of the value. For example, when shaping information and virtual shaping information exist for a plurality of persons, the average per person of each onset probability is calculated.
 ノード表示方法の変化の2つ目の例を挙げる。項目毎の発生確率に基づき、疾病毎の発症人数を算出し、疾病毎の人数に応じて、大きさ・形・色などを変化させる。例えば、発症の有無を0と1で表わす項目の場合、項目毎に予測された発症確率を、対象者人数分足し合わせることで、対象者群における発症人数の予測期待値を得られる。 を 挙 げ る Give a second example of changes in the node display method. Based on the occurrence probability for each item, the number of onset for each disease is calculated, and the size, shape, color, etc. are changed according to the number of people for each disease. For example, in the case of an item indicating the presence or absence of occurrence by 0 and 1, the predicted expected value of the number of onsets in the subject group can be obtained by adding the onset probability predicted for each item to the number of subjects.
 ノード表示方法の変化の3つ目の例を挙げる。項目毎の発生確率に基づき、各項目にかかる医療費を算出し、医療費に応じて、大きさ・形・色などを変化させる。 を 挙 げ る Give a third example of changes in the node display method. Based on the occurrence probability for each item, the medical cost for each item is calculated, and the size, shape, color, etc. are changed according to the medical cost.
 なお、ノード表示方法の変化のいずれの例においても、ノードを表示する際に、算出した発症確率や予測人数や医療費を表す文字列としてノード近傍に表示しても良い。 In any example of changes in the node display method, when a node is displayed, it may be displayed in the vicinity of the node as a character string representing the calculated onset probability, predicted number of people, or medical expenses.
 エッジの表示方法の変化の1つ目の例を挙げる。予測結果における項目間の依存度を算出し、エッジの両端のノードに該当する項目間の依存度に応じて、エッジの大きさ、形状、色などを変化させる。依存度の算出方法は、実施例2と同様であるため、説明を省略する。 The following is a first example of changes in the edge display method. The degree of dependence between items in the prediction result is calculated, and the size, shape, color, etc. of the edge are changed according to the degree of dependence between items corresponding to the nodes at both ends of the edge. Since the calculation method of the dependence is the same as that of the second embodiment, the description thereof is omitted.
 エッジの表示方法の変化の2つ目の例を挙げる。エッジの両端に繋がるノードに該当する項目の発症確率、発症人数期待値、医療費などの値の差によって、エッジの大きさ、形状、色などを変化させる。例えば、疾病D1、疾病D2の間に、疾病D1からD2に向かう有向辺が存在するとき、疾病D2の発症人数期待値から疾病D1の発症人数期待値を減算した値に基づき、エッジの表示方法を変化させる。 Give a second example of the change in edge display method. The size, shape, color, and the like of the edge are changed according to the difference in values such as the onset probability, the expected number of occurrences, and the medical expenses of items corresponding to the nodes connected to both ends of the edge. For example, when there is a directed side from the disease D1 to the disease D2 between the disease D1 and the disease D2, the edge is displayed based on a value obtained by subtracting the expected number of people onset of the disease D1 from the expected number of people onset of the disease D2. Change the way.
 なお、本実施例では、予測結果記憶部119に記憶された予測結果の数だけグラフを可視化する例を説明したが、予測結果の一部だけを可視化してもよい。例えば、整形データ1種類と、整形データから項目変化情報設定を変えて作成された異なる仮想整形データが2種類存在し、それらに基づく3種類予測結果が予測結果記憶部119に保存されている場合、全ての予測結果を3つのグラフでそれぞれ可視化しても良いし、整形データに基づく予測結果1つと、仮想整形データに基づく予測結果1つの計2つだけを可視化しても良い。また、整形データに基づく予測結果を可視化せず、仮想整形データに基づく予測結果2つだけを可視化しても良い。 In this embodiment, the example in which the graph is visualized by the number of the prediction results stored in the prediction result storage unit 119 has been described, but only a part of the prediction result may be visualized. For example, when there are two types of different virtual shaping data created by changing the item change information setting from the shaping data and one type of shaping data, and three types of prediction results based on them are stored in the prediction result storage unit 119 All the prediction results may be visualized in three graphs, respectively, or only one of the prediction results based on the shaped data and the prediction result based on the virtual shaped data may be visualized. Further, only two prediction results based on virtual shaped data may be visualized without visualizing the prediction results based on shaped data.
 図30は本実施例を実現する形態の一例を示す、ユーザインターフェイスの画面例である。ここでは、2つの条件を設定し、それぞれの予測結果を比較する画面の例を示す。3001は、対象者選択情報、項目変化情報などの設定を行う操作窓である。3002は、実施例1で説明した図17の1702と同様である。3003は、項目変化情報を設定する条件設定テーブルである。各行が一つの条件に該当する。この例では、-で表現される項目は変更を加えず、数値が入力された項目は、その値に変化させることを表す。3004は、3003に表示されている条件設定テーブルに新たな条件を追加する操作を行う操作ボタンである。3005は、条件設定テーブルに設定された条件に基づき仮想整形データを作成し、整形データおよび仮想整形データを用いて予測を行う操作ボタンである。3006は予測結果を表示する表示窓であり、条件設定テーブル3003に表示された各条件に基づき予測された結果を表すグラフをそれぞれ表示する。3007は、ノードの表示方法を切り替える操作窓であり、選択した指標に応じて、ノードの見た目を変化させて表示する。3008は、エッジの表示方法を切り替える操作窓であり、選択した指標に応じて、エッジの見た目を変化させて表示する。 FIG. 30 is a screen example of a user interface showing an example of a form for realizing the present embodiment. Here, an example of a screen for setting two conditions and comparing the respective prediction results is shown. Reference numeral 3001 denotes an operation window for setting target selection information, item change information, and the like. 3002 is the same as 1702 in FIG. 17 described in the first embodiment. Reference numeral 3003 denotes a condition setting table for setting item change information. Each row meets one condition. In this example, an item represented by − indicates that no change is made, and an item for which a numerical value is input is changed to that value. Reference numeral 3004 denotes an operation button for performing an operation for adding a new condition to the condition setting table displayed in 3003. Reference numeral 3005 denotes an operation button for creating virtual shaping data based on the conditions set in the condition setting table and performing prediction using the shaping data and the virtual shaping data. Reference numeral 3006 denotes a display window for displaying a prediction result, and displays a graph representing a result predicted based on each condition displayed in the condition setting table 3003. Reference numeral 3007 denotes an operation window for switching the node display method, and displays the node by changing its appearance according to the selected index. Reference numeral 3008 denotes an operation window for switching the edge display method, and displays the edge by changing the appearance according to the selected index.
 以上のように、本実施例に係るヘルスケアデータ分析装置は、課題疾病の要因やその他の項目の値の変化による、疾病の発症確率や発症人数、医療費などの将来の状況の変化を予測でき、さらに複数のグラフを用いて、高い視認性で可視化できる。 As described above, the health care data analysis apparatus according to the present embodiment predicts changes in the future situation such as the probability of occurrence of the disease, the number of cases, the medical cost, etc., due to the change in the factor of the subject disease and other items. It can be visualized with high visibility using a plurality of graphs.
 実施例2と実施例3では、検査値や生活習慣に関する項目を要因として抽出する医療分析システムの例を説明した。また実施例5では、課題疾病の要因やその他の項目の値の変化による、疾病の発症確率や発症人数、医療費用などの将来の状況の変化を予測、可視化する分析システムの例を説明した。本実施例では、保健指導の有無を含むヘルスケアデータを用いて、課題疾病に有効な保健指導を抽出する。さらに、保健指導の効果を予測し可視化する分析システムの例を説明する。 Example 2 and Example 3 described examples of medical analysis systems that extract items related to test values and lifestyle habits as factors. Further, in the fifth embodiment, an example of an analysis system that predicts and visualizes changes in future situations such as the onset probability of the disease, the number of cases of onset, and medical costs due to changes in the factors of the subject disease and other items has been described. In this embodiment, health guidance effective for the target disease is extracted using health care data including the presence or absence of health guidance. Furthermore, an example of an analysis system that predicts and visualizes the effect of health guidance will be described.
 構成や処理などは、整形情報記憶部117、保健指導選定部124、仮想整形データ作成部114を除き、実施例2および実施例3および実施例5と同様であるため、説明を省略する。 Since the configuration and processing are the same as those of the second embodiment, the third embodiment, and the fifth embodiment except for the shaping information storage unit 117, the health guidance selection unit 124, and the virtual shaping data creation unit 114, the description thereof is omitted.
 整形情報記憶部117は、保健指導の有無を表す項目を含む整形データを記憶する。 The shaping information storage unit 117 stores shaping data including items indicating the presence or absence of health guidance.
 図31の3101は保健指導の有無を表す項目を含む整形データの例である。図31の603、604、605、607はそれぞれ図6と同様である。図31の1501、1502、1503、1504、1505、1506、1508、1509、1510、1511、1512、1514、1515、1516は図15と同様である。3102と3103はそれぞれ、保健指導実施の有無に関する項目であり、保険指導を実施した対象者には1、保険指導を実施していない対象者には0が含まれている。この例では保健指導の有無情報を含んだ整形データを説明したが、例えば、保険指導毎の実施回数などの値を用いても良い。 31 in FIG. 31 is an example of shaping data including an item indicating the presence / absence of health guidance. 603, 604, 605, and 607 in FIG. 31 are the same as those in FIG. In FIG. 31, reference numerals 1501, 1502, 1503, 1504, 1505, 1506, 1508, 1509, 1510, 1511, 1512, 1514, 1515 and 1516 are the same as those in FIG. Reference numerals 3102 and 3103 are items relating to the presence or absence of health guidance, and 1 is included for subjects who have received insurance guidance, and 0 is included for subjects who have not conducted insurance guidance. In this example, the shaping data including the health guidance presence / absence information has been described. For example, values such as the number of implementations for each insurance guidance may be used.
 以下、まず課題疾病に有効な保険指導を抽出する機能を提供する保健指導選定部124について説明する。 Hereinafter, the health guidance selection unit 124 that provides a function of extracting insurance guidance effective for the subject disease will be described first.
 保健指導選定部124は、要因記憶部122に保存された課題疾病毎の要因の中で保健指導の実施に関する情報を含む項目を、課題毎に選定する。例えば、課題疾病Dの要因として、検査値Vと生活習慣Sと保健指導Gの3つが保存されている場合、保険指導Gを課題疾病Dに有効な保険指導として選定する。 The health guidance selection unit 124 selects, for each task, an item including information on the implementation of health guidance among the factors for each problem disease stored in the factor storage unit 122. For example, when the test value V, the lifestyle S, and the health guidance G are stored as the factors of the problem disease D, the insurance guidance G is selected as the insurance guidance effective for the problem disease D.
 保険指導選定部124の処理により課題疾病に有効な保険指導を選択できる理由を、課題疾病と保健指導の関連を、例を挙げて説明する。X年の整形データに、ある保健指導Gを実施した群と実施していない群が含まれており、各群で、課題疾病Dの罹患率が同じだったとする。しかし、X+N年の各群の課題疾病Dの罹患率に違いが存在しており、特に保健指導Gを実施した群の罹患率が、実施していない群より少ない場合、保険指導Gは課題疾病Dの罹患率減少に効果があると期待できる。同様に、検査値と保健指導、生活習慣と保健指導にも同様の関連が存在する。グラフィカルモデル記憶部118に記憶されるグラフィカルモデルでは、項目間の確率的依存性がエッジで表現されており、また、要因抽出部110では項目間に定義した依存度に基づき、課題疾病の要因項目を抽出する。そのため、要因抽出部110の処理で抽出された要因項目に含まれる保健指導項目は、課題疾病の発症、または課題疾病の要因である検査値・生活習慣などに影響を与える保険指導だと考えられる。 The reason why the insurance guidance effective for the problem illness can be selected by the process of the insurance guidance selection unit 124 will be described with reference to the relationship between the problem illness and the health guidance. It is assumed that the shaping data of year X includes a group that has implemented a certain health guidance G and a group that has not, and each group has the same prevalence of the target disease D. However, if there is a difference in the prevalence of the target disease D in each group of X + N years, especially if the prevalence of the group that implemented the health guidance G is less than the group that did not implement the health guidance G, the insurance guidance G It can be expected to be effective in reducing the incidence of D. Similarly, there is a similar relationship between laboratory values and health guidance, and lifestyle and health guidance. In the graphical model stored in the graphical model storage unit 118, the probabilistic dependence between items is expressed by an edge, and the factor extraction unit 110 is a factor item of the subject disease based on the dependency defined between the items. To extract. Therefore, the health guidance item included in the factor item extracted by the processing of the factor extraction unit 110 is considered to be insurance guidance that affects the onset of the target disease or the test value / lifestyle that is the cause of the target disease. .
 次に、保険指導の効果を予測し可視化する機能を提供するために必要な、仮想整形データ作成部114の処理について説明する。 Next, processing of the virtual shaping data creation unit 114 necessary for providing a function for predicting and visualizing the effect of insurance guidance will be described.
 仮想整形データ作成部114の処理のフローチャートは図29内の項目変化情報設定ステップ2901を除き、図29のフローチャートと同様である。 The flowchart of the process of the virtual shaping data creation unit 114 is the same as the flowchart of FIG. 29 except for the item change information setting step 2901 in FIG.
 項目変化情報設定ステップ2901では、変化させる項目の種類、変化量、変化後の値などの情報を設定する。このとき項目として、保険指導の実施に関する項目のみを選択する。 In item change information setting step 2901, information such as the type of item to be changed, the amount of change, and the value after change is set. At this time, only items relating to the implementation of insurance guidance are selected as items.
 仮想整形情報記憶部123に記憶された仮想整形情報は実施例5で説明した発症確率予測部108の処理と同様の処理で予測され、予測結果が、予測結果記憶部119に記憶される。予測結果は、実施例5で説明した可視化部112の処理と同様の処理で可視化される。 The virtual shaping information stored in the virtual shaping information storage unit 123 is predicted by the same processing as the onset probability prediction unit 108 described in the fifth embodiment, and the prediction result is stored in the prediction result storage unit 119. The prediction result is visualized by the same process as the process of the visualization unit 112 described in the fifth embodiment.
 保健指導の実施内容を変化させた予測結果を複数取得し、それらをグラフで可視化・比較することで、それぞれの保険指導の実施に応じた将来の状況を把握できる。 複数 By obtaining multiple prediction results that change the content of health guidance implementation, and visualizing and comparing them in a graph, it is possible to grasp the future situation according to each insurance guidance implementation.
 以上のように、本実施例に係るヘルスケアデータ分析装置は、課題疾病に有効な保健指導を抽出でき、さらに、保健指導実施による疾病の発症確率や発症人数、医療費などの将来の状況を予測し、可視化できる。 As described above, the health care data analysis apparatus according to the present embodiment can extract the health guidance effective for the target disease, and further determine the future situation such as the probability of occurrence of the disease, the number of cases, the medical cost, etc. by the health guidance implementation. Predict and visualize.
 本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。上記実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることもできる。また、ある実施例の構成に他の実施例の構成を加えることもできる。また、各実施例の構成の一部について、他の構成を追加・削除・置換することもできる。 The present invention is not limited to the above-described embodiments, and includes various modifications. The above embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Also, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment. Moreover, the structure of another Example can also be added to the structure of a certain Example. Further, with respect to a part of the configuration of each embodiment, another configuration can be added, deleted, or replaced.
101 ヘルスケアデータ分析装置
102 入力部
103 出力部
104 演算装置
105 メモリ
106 記憶媒体
107 データ整形部
108 グラフィカルモデル作成部
109 発症確率予測部
110 課題疾病抽出部
111 要因抽出部
112 可視化部
113 高リスク対象者選定部
114 仮想整形データ作成部
115 データベース
116 医療情報記憶部
117 整形情報記憶部
118 グラフィカルモデル記憶部
119 予測結果記憶部
120 疾病評価指標記憶部
121 課題疾病記憶部
122 要因記憶部
123 仮想整形情報記憶部
124 保健指導選定部
DESCRIPTION OF SYMBOLS 101 Healthcare data analyzer 102 Input part 103 Output part 104 Arithmetic unit 105 Memory 106 Storage medium 107 Data shaping part 108 Graphical model preparation part 109 Onset probability prediction part 110 Problem disease extraction part 111 Factor extraction part 112 Visualization part 113 High risk object Selection unit 114 virtual shaping data creation unit 115 database 116 medical information storage unit 117 shaping information storage unit 118 graphical model storage unit 119 prediction result storage unit 120 disease evaluation index storage unit 121 problem disease storage unit 122 factor storage unit 123 virtual shaping information Storage unit 124 Health guidance selection unit

Claims (11)

  1.  プログラムを実行するプロセッサと、前記プログラムを格納するメモリと、情報の入力を受け付ける入力部とを有し、前記プログラムを実行することによってヘルスケアデータを分析する分析システムであって、
     前記分析システムは、対象者の診療記録と検査値を含むヘルスケア情報と、前記ヘルスケア情報を前記対象者毎かつ所定期間毎に纏めた整形情報とを格納するデータベースにアクセス可能であって、
     前記分析システムは、前記整形情報に基づいて作成された、病態を表す確率変数に対応する第1のノード群と、病態の変化に影響を与える因子を表す確率変数に対応する第2のノード群と、前記第1のノード群と前記第2のノード群から成る集合に含まれる任意の2つのノード間の確率的依存性を表す有向又は無向のエッジと、により定義されるグラフィカルモデルを格納するデータベースにアクセス可能であって、
     前記プロセッサが、前記グラフィカルモデルに基づいて、病気の発症確率を予測する発症確率予測部と、
     前記プロセッサが、前記発症確率に基づき、少なくとも1つの前記病態を課題疾病として抽出する課題疾病抽出部と、
     を備えることを特徴とする分析システム。
    A processor that executes a program; a memory that stores the program; and an input unit that receives input of information; and an analysis system that analyzes healthcare data by executing the program,
    The analysis system is accessible to a database that stores health care information including medical records and test values of the subject, and shaping information that summarizes the health care information for each subject and every predetermined period,
    The analysis system includes a first node group corresponding to a random variable representing a pathological condition and a second node group corresponding to a random variable representing a factor that affects a change in the pathological condition created based on the shaping information. And a directed or undirected edge representing a stochastic dependency between any two nodes included in the set of the first node group and the second node group, and a graphical model defined by Access to the database to store,
    The processor predicts the onset probability of a disease based on the graphical model;
    The processor extracts at least one pathological condition as a target disease based on the onset probability;
    An analysis system comprising:
  2.  請求項1に記載の分析システムであって、
     前記プロセッサが、前記抽出された課題疾病を表す確率変数と前記病態または前記因子との確率的依存性に基づき、前記抽出された課題疾病と関連付けられる病態または因子を、前記課題疾病の要因として抽出する要因抽出部、
     をさらに備えることを特徴とする分析システム。
    The analysis system according to claim 1,
    The processor extracts a disease state or factor associated with the extracted problem disease as a factor of the problem disease based on a stochastic dependency between a random variable representing the extracted problem disease and the disease state or the factor. Factor extraction unit,
    An analysis system further comprising:
  3.  請求項1に記載の分析システムであって、
     前記プロセッサが、前記課題疾病の発症確率と予め定められた閾値とに基づき、前記課題疾病に関する高リスク対象者を選定する高リスク対象者選定部、
     をさらに備えることを特徴とする分析システム。
    The analysis system according to claim 1,
    The processor is a high-risk target person selecting unit that selects a high-risk target person related to the target disease based on an onset probability of the target disease and a predetermined threshold value,
    An analysis system further comprising:
  4.  請求項1に記載の分析システムであって、
     前記課題疾病抽出部は、前記入力部で受け付けた情報と前記発症確率に基づき、病態を評価する疾病評価指標を病態毎に算出し、前記算出された疾病評価指標に基づき、少なくとも1つの前記病態を課題疾病として抽出することを特徴とする分析システム。
    The analysis system according to claim 1,
    The subject disease extraction unit calculates a disease evaluation index for evaluating a disease state for each disease state based on the information received by the input unit and the onset probability, and based on the calculated disease evaluation index, at least one of the disease states An analysis system characterized by extracting a problem as a disease.
  5.  請求項4に記載の分析システムであって、
     前記発症確率予測部は、前記整形情報から前記対象者毎に病気の発症確率を予測し、
     前記課題疾病抽出部は、前記入力部で入力を受け付けた情報と前記対象者毎に予測された前記発症確率に基づき、前記対象者毎かつ病態毎に、病態を評価する前記疾病評価指標を算出し、前記算出された対象者別疾病評価指標を病態毎に集計した値を新たな疾病評価指標として、少なくとも1つの前記病態を課題疾病として抽出することを特徴とする分析システム。
    The analysis system according to claim 4,
    The onset probability prediction unit predicts the onset probability of the disease for each subject from the shaping information,
    The problem disease extraction unit calculates the disease evaluation index for evaluating a disease state for each subject and for each disease state based on information received by the input unit and the onset probability predicted for each subject. An analysis system characterized in that at least one of the disease states is extracted as a problem disease using a value obtained by counting the calculated disease evaluation indicators for each subject for each disease state as a new disease evaluation index.
  6.  請求項2に記載の分析システムであって、
     前記要因抽出部が、前記抽出された要因が作成される際に基づいた整形情報の所定期間と異なる所定期間に纏められ、前記抽出された要因の項目と同一の項目のノードを新たな要因として抽出することを特徴とする分析システム。
    The analysis system according to claim 2,
    The factor extraction unit is put together in a predetermined period different from the predetermined period of the shaping information based on when the extracted factor is created, and a node of the same item as the extracted factor item is used as a new factor An analysis system characterized by extraction.
  7.  請求項6に記載の分析システムであって、
     前記要因抽出部が、前記抽出された課題疾病を表す確率変数と前記病態または前記因子との確率的依存性に基づき、前記抽出された課題疾病と関連付けられる病態または因子を、新たな要因として抽出する処理を少なくとも1回以上繰り返すことを特徴とする分析システム
    The analysis system according to claim 6,
    The factor extracting unit extracts, as a new factor, a disease state or factor associated with the extracted problem disease based on a stochastic dependency between the random variable representing the extracted disease problem and the disease state or the factor. Analysis system characterized by repeating the processing to be performed at least once
  8.  請求項1に記載の分析システムであって、
     前記プロセッサが、前記作成されたグラフィカルモデルを、前記抽出した課題疾病に関する情報を付加して可視化する可視化部、 
    を備えることを特徴とする分析システム。
    The analysis system according to claim 1,
    A visualization unit for visualizing the created graphical model by adding information on the extracted problem disease;
    An analysis system comprising:
  9.  請求項2に記載の分析システムであって、
     前記分析システムは、保険指導の実施に関する情報を含むヘルスケア情報を格納するデータベースにアクセス可能であって、
     前記プロセッサが、前記抽出された要因のうち、保健指導の実施に関する確率変数を表す要因を、前記課題疾病に有効な保健指導として選定する保健指導内容選定部を備えることを特徴とする分析システム。
    The analysis system according to claim 2,
    The analysis system is accessible to a database storing health care information including information relating to the implementation of insurance guidance,
    The analysis system comprising a health guidance content selection unit that selects, as the health guidance effective for the subject disease, a factor representing a random variable related to the implementation of health guidance among the extracted factors.
  10.  プログラムを実行するプロセッサと、前記プログラムを格納するメモリと、情報の入力を受け付ける入力部とを有し、前記プログラムを実行することによってヘルスケアデータを分析する分析システムであって、
     前記分析システムは、対象者の診療記録と検査値を含むヘルスケア情報と、前記ヘルスケア情報を前記対象者毎かつ所定期間毎に纏めた整形情報とを格納するデータベースにアクセス可能であって、
     前記分析システムは、前記整形情報に基づいて作成された、病態を表す確率変数に対応する第1のノード群と、病態の変化に影響を与える因子を表す確率変数に対応する第2のノード群と、前記第1のノード群と前記第2のノード群から成る集合に含まれる任意の2つのノード間の確率的依存性を表す有向又は無向のエッジと、により定義されるグラフィカルモデルを格納するデータベースにアクセス可能であって、
     前記プロセッサが、前記整形情報に含まれる情報の一部を変更した仮想整形情報を少なくとも1つ以上作成し、前記仮想整形情報を前記データベースに格納する仮想整形データ作成部を備え、
     前記発症確率予測部は、前記グラフィカルモデルに基づいて、前記整形情報と少なくとも1つ以上の前記仮想整形情報から、それぞれの整形情報に対応する発症確率を予測し、前記予測された発症確率を前記データベースに格納し、
     前記可視化部は、前記予測された発症確率に応じて表示形態を変更した前記グラフィカルモデルを、前記予測された発症確率の組の数だけ表示することを特徴することを特徴とする分析システム。
    A processor that executes a program; a memory that stores the program; and an input unit that receives input of information; and an analysis system that analyzes healthcare data by executing the program,
    The analysis system is accessible to a database that stores health care information including medical records and test values of the subject, and shaping information that summarizes the health care information for each subject and every predetermined period,
    The analysis system includes a first node group corresponding to a random variable representing a pathological condition and a second node group corresponding to a random variable representing a factor that affects a change in the pathological condition created based on the shaping information. And a directed or undirected edge representing a stochastic dependency between any two nodes included in the set of the first node group and the second node group, and a graphical model defined by Access to the database to store,
    The processor includes at least one virtual shaping information in which a part of information included in the shaping information is changed, and includes a virtual shaping data creation unit that stores the virtual shaping information in the database,
    The onset probability predicting unit predicts an onset probability corresponding to each shaping information from the shaping information and at least one or more virtual shaping information based on the graphical model, and the predicted onset probability is Stored in the database,
    The visualization unit displays the graphical model whose display form is changed in accordance with the predicted onset probability as many as the number of sets of the predicted onset probability.
  11.  請求項10に記載の分析システムであって、
     前記可視化部は、ノード毎に予測された発症確率に応じて、前記ノードの表示形態を変更し、ノード間の確率的依存性に応じて、前記ノード間に存在するエッジの表示形態を変更することを特徴とする分析システム。
    The analysis system according to claim 10, wherein
    The visualization unit changes a display form of the node according to an onset probability predicted for each node, and changes a display form of an edge existing between the nodes according to a stochastic dependency between nodes. An analysis system characterized by that.
PCT/JP2013/080616 2013-11-13 2013-11-13 Analysis system WO2015071968A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2013/080616 WO2015071968A1 (en) 2013-11-13 2013-11-13 Analysis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2013/080616 WO2015071968A1 (en) 2013-11-13 2013-11-13 Analysis system

Publications (1)

Publication Number Publication Date
WO2015071968A1 true WO2015071968A1 (en) 2015-05-21

Family

ID=53056938

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/080616 WO2015071968A1 (en) 2013-11-13 2013-11-13 Analysis system

Country Status (1)

Country Link
WO (1) WO2015071968A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018521408A (en) * 2016-06-30 2018-08-02 平安科技(深▲せん▼)有限公司 Data display method, terminal and storage medium
WO2019030840A1 (en) * 2017-08-09 2019-02-14 日本電気株式会社 Disease development risk prediction system, disease development risk prediction method, and disease development risk prediction program
CN109785148A (en) * 2018-11-30 2019-05-21 平安科技(深圳)有限公司 Processing method, device, equipment and the readable storage medium storing program for executing of hepatopathy reimbursement process
JP2019086839A (en) * 2017-11-01 2019-06-06 Phc株式会社 High-risk patient group extraction apparatus and high-risk patient group extraction method
US11023817B2 (en) 2017-04-20 2021-06-01 International Business Machines Corporation Probabilistic estimation of node values
CN112905442A (en) * 2019-12-04 2021-06-04 阿里巴巴集团控股有限公司 Generation method, device and equipment of random model
JP2022164059A (en) * 2021-04-15 2022-10-27 株式会社日立製作所 System and method for visualizing result of diagnosis of cause of event having occurred or likely to occur regarding facility

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011203996A (en) * 2010-03-25 2011-10-13 Oki Electric Industry Co Ltd Inference device for executing inference by bayesian network, and program for attaining the inference device
JP2012030038A (en) * 2010-07-05 2012-02-16 Sony Corp Biological information processing method and device, recording medium, and program
JP2013121440A (en) * 2011-12-12 2013-06-20 Jvc Kenwood Corp Health management system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011203996A (en) * 2010-03-25 2011-10-13 Oki Electric Industry Co Ltd Inference device for executing inference by bayesian network, and program for attaining the inference device
JP2012030038A (en) * 2010-07-05 2012-02-16 Sony Corp Biological information processing method and device, recording medium, and program
JP2013121440A (en) * 2011-12-12 2013-06-20 Jvc Kenwood Corp Health management system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HITACHI: "Eikoku de IT Katsuyo shita Iryo Service Jissho", 18 September 2013 (2013-09-18) *
YOSHIKAZU IGETA ET AL.: "Medical examination solution for healthcare", December 2007 (2007-12-01) *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018521408A (en) * 2016-06-30 2018-08-02 平安科技(深▲せん▼)有限公司 Data display method, terminal and storage medium
US11093847B2 (en) 2017-04-20 2021-08-17 International Business Machines Corporation Probabilistic estimation of node values
US11023817B2 (en) 2017-04-20 2021-06-01 International Business Machines Corporation Probabilistic estimation of node values
WO2019030840A1 (en) * 2017-08-09 2019-02-14 日本電気株式会社 Disease development risk prediction system, disease development risk prediction method, and disease development risk prediction program
US11437146B2 (en) 2017-08-09 2022-09-06 Nec Corporation Disease development risk prediction system, disease development risk prediction method, and disease development risk prediction program
JPWO2019030840A1 (en) * 2017-08-09 2020-07-30 日本電気株式会社 Disease onset risk prediction system, disease onset risk prediction method, and disease onset risk prediction program
JP2019086839A (en) * 2017-11-01 2019-06-06 Phc株式会社 High-risk patient group extraction apparatus and high-risk patient group extraction method
JP7085333B2 (en) 2017-11-01 2022-06-16 Phc株式会社 High-risk patient group extraction device, high-risk patient group extraction method, and program
CN109785148A (en) * 2018-11-30 2019-05-21 平安科技(深圳)有限公司 Processing method, device, equipment and the readable storage medium storing program for executing of hepatopathy reimbursement process
CN109785148B (en) * 2018-11-30 2023-10-24 平安科技(深圳)有限公司 Method, device, equipment and readable storage medium for processing liver disease reimbursement flow
CN112905442A (en) * 2019-12-04 2021-06-04 阿里巴巴集团控股有限公司 Generation method, device and equipment of random model
JP2022164059A (en) * 2021-04-15 2022-10-27 株式会社日立製作所 System and method for visualizing result of diagnosis of cause of event having occurred or likely to occur regarding facility
JP7262506B2 (en) 2021-04-15 2023-04-21 株式会社日立製作所 A system and method for visualizing the results of causal diagnosis of events that have occurred or may occur in equipment

Similar Documents

Publication Publication Date Title
JP6066826B2 (en) Analysis system and health business support method
JP6182431B2 (en) Medical data analysis system and method for analyzing medical data
WO2015071968A1 (en) Analysis system
Johnson et al. Machine learning and decision support in critical care
US11923056B1 (en) Discovering context-specific complexity and utilization sequences
JP5564708B2 (en) Health business support system, insurance business support device, and insurance business support program
Zhao et al. Predicting outcomes of chronic kidney disease from EMR data based on Random Forest Regression
EP2628113A1 (en) Healthcare information technology system for predicting development of cardiovascular conditions
JP7430295B2 (en) Individual chronic disease progression risk visualization evaluation method and system
WO2015132903A1 (en) Medical data analysis system, medical data analysis method, and storage medium
WO2019030840A1 (en) Disease development risk prediction system, disease development risk prediction method, and disease development risk prediction program
JP6282783B2 (en) Analysis system and analysis method
WO2021148967A1 (en) A computer-implemented system and method for outputting a prediction of a probability of a hospitalization of patients with chronic obstructive pulmonary disorder
Huang et al. A novel tool for visualizing chronic kidney disease associated polymorbidity: a 13-year cohort study in Taiwan
Coley et al. Clinical risk prediction models and informative cluster size: Assessing the performance of a suicide risk prediction algorithm
WO2015173917A1 (en) Analysis system
JP2020042629A (en) Analysis system and method for analysis
Hamburger et al. Utility of the Diamond-Forrester classification in stratifying acute chest pain in an academic chest pain center
WO2013097905A1 (en) System and method for extracting and monitoring multidimensional attributes regarding personal health status and evolution
JP6537121B1 (en) Device and method for estimating medical expenses by injury and diseases and program
JP2021056568A (en) Analysis system and analysis method
Musy et al. Big data in healthcare: new methods of analysis
Moradi et al. Detecting factors associated with polypharmacy in general practitioners' prescriptions: A data mining approach
US11894117B1 (en) Discovering context-specific complexity and utilization sequences
Badolato et al. The limits of predicting individual-level longevity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13897314

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13897314

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP