CN109087711A

CN109087711A - Medical big data method for digging and system

Info

Publication number: CN109087711A
Application number: CN201810684758.XA
Authority: CN
Inventors: 赵杰; 李金博; 李砺锋; 张腾飞; 薛文华; 翟运开; 宋晓琴; 孙东旭; 范智蕊; 沈志博; 朱子家
Original assignee: First Affiliated Hospital of Zhengzhou University
Current assignee: First Affiliated Hospital of Zhengzhou University
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2018-12-25

Abstract

The invention belongs to the technical fields of medical big data, and in particular to medical big data method for digging and system；Wherein the technical issues of decision are as follows: a kind of subjectivity influence, the medical big data method for digging of reduction misdetection rate/error rate and system that interest measure is effectively reduced and selects is provided；The technical solution of use are as follows: medical big data method for digging, comprising: acquire the medical data of patient, in which: the medical data of the patient includes: behavioral data, clinical data, cost data and insurance data；Structural data is converted by the medical data of every patient；Establish the relevant database centered on patient；Database is cleaned, by Missing Data Filling or is filtered out；The data that will be cleaned are calculated based on interestingness measure standard not of the same race, obtain different entertaining rules；Different interest measures is clustered using Fuzzy C-Means Cluster Algorithm, the degree of membership of every kind of interest measure after being optimized.

Description

Medical big data method for digging and system

Technical field

The invention belongs to the technical fields of medical big data, and in particular to medical big data method for digging and system.

Background technique

Nowadays it is the epoch of a big data, big data is applied into the hot spot that medical domain has become scientific research；Doctor Big data is treated with greatly value, excavates the value information in medical big data for medical diagnosis on disease, therapeutic scheme determination, stream Row disease forecasting, medical research and drug side-effect analysis etc. have great importance；In a sense, the big number of medical treatment According to system for improving human habitat, improving the quality of living, obtain higher happiness and refer to there is important role.

Want that big data is preferably applied to medical domain, the accurate application of medical big data association mining method seems It is particularly important, a unsuitable association mining method, may obtain between disease and disease, between disease and symptom, symptom Erroneous association between index and between other relationships, so that final research achievement be made deviation occur.

However, only limit uses a kind of interestingness measure to medical big data association rule mining method existing at present mostly, The attribute of different metric forms and the research of behavior are focused in most of researchs, but different interestingness measures are in different applied fields Under scape, performance is different, limits the ability in medical big data association rule mining using limitation；At the same time, In order to be worth the medical big data obtained more, the relevant data source of medical treatment in integrated multi-party face as much as possible is needed, is passed The single interestingness measure of system cannot meet its association rule mining demand well.

Summary of the invention

The present invention overcomes the shortcomings of the prior art, technical problem to be solved are as follows: provide one kind be effectively reduced it is emerging The subjectivity of interesting metric sebection influences, reduces the medical big data method for digging of misdetection rate/error rate and system.

In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention are as follows:

Medical big data method for digging, including the following steps: acquire the medical data of patient, in which: the doctor of the patient Treating data includes: behavioral data, clinical data, cost data and insurance data；Knot is converted by the medical data of every patient Structure data；Establish the relevant database centered on patient；Database is cleaned, by Missing Data Filling or filter It removes；The data that will be cleaned are calculated based on interestingness measure standard not of the same race, obtain different entertaining rules；It utilizes Fuzzy C-Means Cluster Algorithm clusters different interest measures, the degree of membership of every kind of interest measure after being optimized.

Preferably, described to convert structural data for the medical data of every patient, it specifically includes: by every patient Medical data is divided into structural data and unstructured data；Structural data is converted by unstructured data.

Preferably, database is cleaned, by Missing Data Filling or filters out, specifically includes: calculated using linear difference Method or according to data distribution characteristics, is filled with one in mode, median, average value, maximum value, minimum value；Data It lacks serious, directly filters out.

Correspondingly, medical big data digging system, comprising: acquisition module, for acquiring the medical data of patient, in which: The medical data of the patient includes: behavioral data, clinical data, cost data and insurance data；Data conversion module is used for Structural data is converted by the medical data of every patient；Module is established, for establishing the relationship type number centered on patient According to library；Cleaning module by Missing Data Filling or is filtered out for cleaning to database；Extraction module, for that will clean Data, calculated based on interestingness measure standard not of the same race, obtain different entertaining rules；Module is integrated, for benefit Different interest measures is clustered with Fuzzy C-Means Cluster Algorithm, the degree of membership of every kind of interest measure after being optimized.

Preferably, the data conversion module includes: that the medical data of every patient is divided into structural data and non- Structural data；Structural data is converted by unstructured data.

Preferably, the cleaning module specifically includes: filling module, for using linear difference algorithm or according to data Distribution characteristics is filled with one in mode, median, average value, maximum value, minimum value；Module is filtered out, for data It lacks serious, directly filters out.

Compared with the prior art, the invention has the following beneficial effects:

The present invention is based on patient medical datas, by interest measure criterion calculation not of the same race, and utilize fuzzy C-means clustering Algorithm, the rate score of every kind of interest measure after being optimized calculate the contribution ranking behavior of each interest measure；Entire method Most suitable interestingness measure is synthesized or selects for specific medical data mining task based on data-driven, it can be effective The subjectivity for reducing interest measure selection influences, reduces misdetection rate/error rate.

Detailed description of the invention

The present invention will be further described in detail with reference to the accompanying drawing；

Fig. 1 is the flow diagram for the medical big data method for digging that the embodiment of the present invention one provides；

Fig. 2 is the structural schematic diagram for the medical big data digging system that the embodiment of the present invention one provides；

Fig. 3 is the structural schematic diagram of medical big data digging system provided by Embodiment 2 of the present invention；

Fig. 4 is the storage mode schematic diagram provided by Embodiment 2 of the present invention for establishing module；

In figure: 10 be acquisition module, and 20 be data conversion module, and 30 is establish module, and 40 be cleaning module, and 50 be extraction Module, 60 is integrate module, and 401 be filling module, and 402 be to filter out module.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiments of the present invention, instead of all the embodiments；Based on the embodiments of the present invention, ordinary skill people Member's every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

Fig. 1 is the flow diagram for the medical big data method for digging that the embodiment of the present invention one provides, as shown in Figure 1, doctor Treat big data method for digging, including the following steps: acquire the medical data of patient, in which: the medical data of the patient includes: Behavioral data, clinical data, cost data and insurance data；Structural data is converted by the medical data of every patient；It builds The vertical relevant database centered on patient；Database is cleaned, by Missing Data Filling or is filtered out；By what is cleaned Data are calculated based on interestingness measure standard not of the same race, obtain different entertaining rules；Utilize fuzzy C-means clustering Algorithm clusters different interest measures, the degree of membership of every kind of interest measure after being optimized.

In the present embodiment one, the medical data of the acquisition patient by using different types of medical supply or can be System, such as: B ultrasound, CT, magnetic resonance, electrocardio, brain electricity, portable wearable device, hospital information system, acquisition magnanimity patient are related Information；Such as: by registering, questionnaire, obtaining the essential information of patient；By read in the modes such as case history obtain the medical of patient and Medication information；By connecting hospital information management system, cost and the insurance information of patient are obtained；Finally composition is with patient Each different block of informations at center.

Specifically, described to convert structural data for the medical data of every patient, it specifically includes: by every patient Medical data is divided into structural data and unstructured data；Structural data is converted by unstructured data；This implementation In example one, the patient medical data of acquisition include: structural data (such as: the essential information of patient, all kinds of clinical examination indexs Deng) and unstructured data (such as case history archive (e.g.xml), clinical medicine picture, the audit report of various text versions Deng)；It needs to select specific information extraction mode to convert unstructured data for different unstructured data types For structural data；Such as: picture, the video in patient-related data are carried out structuring processing by the algorithm based on deep learning, The data of character express type are subjected to processing conversion using natural language processing technique.

Further, database is cleaned, by Missing Data Filling or filters out, specifically includes: utilizing linear difference Algorithm or according to data distribution characteristics, is filled with one in mode, median, average value, maximum value, minimum value；Number It is serious according to lacking, directly filter out.

In the present invention, by establishing the relevant database centered on patient, pass through the disease from hospital management system People's code of going to a doctor (can be used as the major key of one patient of identification, Primary key can each to record in unique identification database table Patient).

It influenced for the subjectivity that interest measure selects is effectively reduced, reduce misdetection rate/error rate as under different medical scene The medical related data of generation only cannot excavate dependency rule with single interest measure.

It indicates the medical related data occurred in set A with equation A → B below there is a strong possibility to appear in set B In；Here the part interest measure used in us is as shown in the following chart:

The interest-degree scale in part of the present invention of table 1

After the pairwise distance between two interest measures obtains, the opposite behavior of variety classes interest measure is this hair The emphasis of bright concern, the present invention in, select Fuzzy C-Means Cluster Algorithm different interest measures is clustered because its The degree of membership of generation can not only measure difference of the different interest measures between different clusters and can also measure even if same Difference in a cluster, between different interest measures.

The objective function of Fuzzy C-Means Cluster Algorithm is as follows:

Here N, c and m are the type of interest measure, the number and fuzzy factor of cluster respectively.x_iAnd v_jShow respectively The cluster centre of i kind interest measure and j-th of interest measure；Fuzzy C-Means Cluster Algorithm is substantially above-mentioned in order to minimize Objective function Q；It can be as follows by the method for continuous iteration:

Thus the degree of membership of every kind of interest measure after being optimized.

Degree of membership in the present invention reflect opposite behavior of different interest measures during rule association and they Between difference；Value based on degree of membership, comprehensive analysis difference interest measure is during medical big data association rule mining Role, to be directed to different medical care problems, selection or the more suitable interest measure mode of integration be can reduce The subjectivity of interest measure selection influences, and reduces misdetection rate/error rate.It is comprehensive to improve data, accuracy and processing data Efficiency.

Fig. 2 is the structural schematic diagram for the medical big data digging system that the embodiment of the present invention one provides, as shown in Fig. 2, doctor Treat big data digging system, comprising:

Acquisition module (10), for acquiring the medical data of patient, in which: the medical data of the patient includes: behavior Data, clinical data, cost data and insurance data；

Data conversion module (20), for converting structural data for the medical data of every patient；

Module (30) are established, for establishing the relevant database centered on patient；

Cleaning module (40) by Missing Data Filling or is filtered out for cleaning to database；

Extraction module (50), the data for that will clean are calculated based on interestingness measure standard not of the same race, are obtained Obtain different entertaining rules；

Module (60) are integrated, for being clustered using Fuzzy C-Means Cluster Algorithm to different interest measures, are obtained excellent The degree of membership of every kind of interest measure after change.

Specifically, the data conversion module (20) includes:

The medical data of every patient is divided into structural data and unstructured data；

Structural data is converted by unstructured data.

Fig. 3 is the structural schematic diagram of medical big data digging system provided by Embodiment 2 of the present invention, as shown in figure 3, On the basis of embodiment one, the cleaning module (40) is specifically included: filling module (401), for utilizing linear difference algorithm Or it according to data distribution characteristics, is filled with one in mode, median, average value, maximum value, minimum value；Filter out mould Block (402), for shortage of data it is serious, directly filter out.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. medical big data method for digging, it is characterised in that: include the following steps:

Acquire the medical data of patient, in which: the medical data of the patient includes: behavioral data, clinical data, cost data And insurance data；

Structural data is converted by the medical data of every patient；

Establish the relevant database centered on patient；

Database is cleaned, by Missing Data Filling or is filtered out；

The data that will be cleaned are calculated based on interestingness measure standard not of the same race, obtain different entertaining rules；

Different interest measures is clustered using Fuzzy C-Means Cluster Algorithm, every kind of interest measure after being optimized Degree of membership.

2. medical treatment big data method for digging according to claim 1, it is characterised in that: the medical number by every patient According to structural data is converted into, specifically include:

Structural data is converted by unstructured data.

3. medical treatment big data method for digging according to claim 1, it is characterised in that: clean, will lack to database Mistake value is filled or is filtered out, and is specifically included:

Using linear difference algorithm or according to data distribution characteristics, in mode, median, average value, maximum value, minimum value One fill；

Shortage of data is serious, directly filters out.

4. medical big data digging system, it is characterised in that: include:

Acquisition module (10), for acquiring the medical data of patient, in which: the medical data of the patient include: behavioral data, Clinical data, cost data and insurance data；

Extraction module (50), the data for that will clean are calculated based on interestingness measure standard not of the same race, are obtained not Same entertaining rule；

Module (60) are integrated, for clustering using Fuzzy C-Means Cluster Algorithm to different interest measures, after obtaining optimization Every kind of interest measure degree of membership.

5. medical treatment big data digging system according to claim 4, it is characterised in that: data conversion module (20) packet It includes:

Structural data is converted by unstructured data.

6. medical treatment big data digging system according to claim 4, it is characterised in that: the cleaning module (40) is specifically wrapped It includes:

It fills module (401), for mode, median, being averaged using linear difference algorithm or according to data distribution characteristics Value, maximum value, one in minimum value fill；

Filter out module (402), for shortage of data it is serious, directly filter out.