CN104794340A - Intelligent processing system for traditional Chinese medicine information - Google Patents

Intelligent processing system for traditional Chinese medicine information Download PDF

Info

Publication number
CN104794340A
CN104794340A CN201510186317.3A CN201510186317A CN104794340A CN 104794340 A CN104794340 A CN 104794340A CN 201510186317 A CN201510186317 A CN 201510186317A CN 104794340 A CN104794340 A CN 104794340A
Authority
CN
China
Prior art keywords
data
medicine
mining
frequent
collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510186317.3A
Other languages
Chinese (zh)
Inventor
吴骏
谢隽
彭岳
汤兆亮
李宁
王崇骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201510186317.3A priority Critical patent/CN104794340A/en
Publication of CN104794340A publication Critical patent/CN104794340A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses an intelligent processing system for traditional Chinese medicine information. The system is a comprehensive Chinese medicine formulae data mining platform and comprises three modules which include the data preprocessing interface module, the frequently-used data mining function interface module and the disease-medicine relation mining interface module. The data preprocessing interface module is used for converting an excel data source frequently used in the field of traditional Chinese medicine into weka-excel middleware to be used in the follow-up process. The frequently-used data mining function interface module includes the four data analysis means frequently used in the field of traditional Chinese medicine based on secondary development of weka source codes, wherein the four data analysis means include frequent item sets, association rules, clustering and hierarchical clustering, and it is allowed to set corresponding mining parameters. The disease-medicine relation mining interface module is based on an improved algorithm of Apriori frequent item mining.

Description

A kind of intelligent traditional Chinese medicine information handling system
Technical field:
The present invention is a kind of traditional Chinese medicine intelligent information disposal system, comprises frequent item set specifically, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique.
Background technology:
Traditional Chinese medicine is as one of the Chinese nation's traditional rarity for thousands of years, and its curative effect has obtained the affirmative of more and more domestic personage, international personage.But because traditional Chinese medicine has the inside information of traditional Chinese culture, it is superorganic, fuzzy, holism, be imbued with all multiple features of analyzing mentally, and traditional Chinese medicine can not well be docked with the typical thought system in current west.Which results in traditional Chinese medicine is " not science " in some human eye, also allows traditional Chinese medicine globalization encounter bottleneck.Traditional Chinese medicine self also has the domestic demand continuing development in this day and age simultaneously.
One, the traditional Chinese medical science objectifies, standardized theory emerges in an endless stream, particularly west pathology, the research mode of pharmacology already by each TCM investigation person is accepted, have every day thousands of experimental data to be generated.
Its two, traditional Chinese medicine succession from ancient times to the present, is just continued against side's (agent) book and herbal document after all to a great extent.The case of numerous and complicated or let a hundred schools contend, the ancient books and records of the symposium on medical topics that differs from one another or even four large classical one-levels finally all must get back on out and out medicine and can cure the disease.But until modern age from the Eastern Han Dynasty, with the prescription far more than ten thousand crossed? there is the characteristic of a prescription treatment various disease conditions in tcm field still more.
In the face of so huge data volume, pure has been impossible task by manually carrying out studying.Be aided with the intension that simple statistical tool or data analysis software are not enough to again embody traditional Chinese medicine complexity.The invention provides and comprise frequent episode, the common technology that the research of the tcm field such as correlation rule uses, simultaneously also for tcm field in the urgent need to symptom and the relations problems of medicine propose a Promethean treatment technology.
Summary of the invention:
The present invention seeks to, proposing a kind of intelligent traditional Chinese medicine information handling system, is comprise frequent item set, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique.For university of TCM and other Chinese medicine study mechanisms and individual in the urgent need to the wish at Chinese medicine study field application data digging technology, on the basis ensureing accuracy, for it provides a method (JAVA application program system) that is simple, friendly, adjustable parameters.
The present invention solves the problem taked technical scheme, intelligent traditional Chinese medicine information handling system, and described system is a comprehensive Chinese medicinal formulae data mining platform, comprises three modules; First module is data prediction interface module, and the excel data source commonly used by tcm field changes the middleware of weka-excel into, for follow-up; Its two module is frequently-used data data mining duty interface, and module, based on the secondary development of weka source code, contains frequent item set, correlation rule, cluster, the means of numerical analysis that the large tcm field of hierarchical clustering four is conventional, and allows the setting carrying out corresponding excavation parameter; 3rd module is disease medicine relation excavation interface, and this module is based on the innovatory algorithm of Apriori frequent-item;
The data prediction interface that first module provides, it comprises an importing button, allows user to select the intrasystem arbitrary excel file of windows as input source; By increasing income, bag jxl.jar reads this fileinfo, and judges whether to meet first side's of being classified as name, and second is classified as the set form forming medicine; Convert data to this weka-excel middleware of DMsource.txt again, the object introducing this middleware is that it has more readability than arff file, can for checking;
The data mining interface that the first described module provides, it comprises frequent item set, correlation rule, cluster, this several large method of hierarchical clustering;
The parameter that frequent episode energy collecting is selected comprises mining mode:
0) Apriori is to medicine collection,
1) Aprioir is to sympotomatic set,
2) corresponding sympotomatic set is drawn from medicine collection,
3) corresponding medicine collection is drawn from sympotomatic set,
4) FP-growth is to medicine collection;
Parameter comprises: minimum support is arranged; The longest frequent episode length is arranged;
Correlation rule adopts FP-growth to generate, and parameter comprises: the setting of min confidence and minimum support;
What cluster adopted is clustering algorithm based on FP itemset, and parameter comprises: minimum support is arranged; The longest frequent episode length is arranged;
What hierarchical clustering adopted is the algorithm successively frequent item set of last layer being carried out cluster;
The disease medicine relation excavation interface that the 3rd described module provides, the algorithm that this interface comprises is the modified version of Apriori algorithm, than the general feature based on simply comparing the support drawn and be not enough to reflect tcm field data, take this new module of item collection importance degree, make Result more can embody objective law.
From medicine collection, the mining mode 2. of the frequent item set that the 3rd described module provides show that corresponding sympotomatic set can regard a kind of method obtaining disease medicine relation as: first according to support and the longest frequent episode length of setting, obtain the frequent item set of only drug containing according to mining mode 0.Apriori; Then for any frequent episode I wherein, by total data collection C, every medicine forms the symptom information comprising the data of I and adds in new data set Cz; According to the parameter of setting, obtain the frequent item set of symptom Z under the prerequisite of medicine to I according to mining mode 1.Apriori; This condition entry collection is exactly the disease medicine relation pair produced.
For example for frequent drug regimen " the ginseng tuber of pinellia ", prescriptions all in the C of total data storehouse are formed the sympotomatic set comprising the corresponding row of " the ginseng tuber of pinellia " and is added to Cz as a record.Then carry out 1.Apriori excavation to Cz, the result obtained is exactly frequent symptom combination corresponding to " the ginseng tuber of pinellia ";
The mining mode 3. of described frequent item set draws corresponding medicine collection from sympotomatic set, and this also can regard a kind of method obtaining disease medicine relation as; First according to support and the longest frequent episode length of setting, obtain only containing the frequent item set of symptom according to mining mode 1.Apriori; Then for any frequent episode I ' wherein, by total data collection C, the drug information that every symptom comprises the data of I adds in new data set Cy; According to the parameter of setting, obtain the frequent item set of medicine Y under the prerequisite of symptom I ' according to mining mode 0.Apriori; This condition entry collection is exactly the disease medicine relation pair produced.
For example for frequent symptom combination " cough and asthma heating ", symptoms all in the C of total data storehouse are formed the medicine collection comprising the corresponding row of " cough and asthma heating " and is added to Cz as a record.Then carry out 0.Apriori excavation to Cz, the result obtained is exactly frequent symptom combination corresponding to " cough and asthma heating ".
Beneficial effect of the present invention, this is a kind of traditional Chinese medicine intelligent information disposal system, comprises frequent item set specifically, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique, compared with prior art, its remarkable advantage is:
(1) realize simple: the implementation procedure that the present invention relates to is very simple, and clear in structure understands, particularly easy left-hand seat simple to operate is applicable to vast work of Chinese medicine person and uses and carry out data mining.
(2) versatility is good: the present invention is write by JAVA, goes for any environment having installed JAVA virtual machine (JRE), compares to general WIN32 application usable range wider.
(3) good stability: have a certain amount of error-detecting and debugging functions in program, for a certain amount of manual entry mistake, program can stably true(-)running uninterruptedly.
(4) favorable expandability: be all the feature for specific function Application and Development usually for tcm field, the present invention is the comprehensive platform of a traditional Chinese medicine application, and any other function can conveniently add.
(5) accuracy is high: compare general Data Mining Tools, and the present invention has carried out many special amendments for tcm field characteristic on data mining algorithm, and make Result more meet the objective law of traditional Chinese medicine, confidence level is higher.
Accompanying drawing illustrates:
Fig. 1 is the general frame figure of present system;
Fig. 2 is data flow of the present invention.
Embodiment:
The present invention be directed to university of TCM and other Chinese medicine study mechanisms and individual in the urgent need to the wish at Chinese medicine study field application data digging technology, and the traditional Chinese medicine informationization application of present stage is the small tool existence form of simple function and a kind of traditional Chinese medicine intelligent information disposal system of proposing, comprise frequent item set specifically, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique.During specific implementation analytical algorithm, be the Open-Source Tools bag relying on Weka to provide, carry out the secondary development meeting traditional Chinese medical theory thereon, greatly improve the accuracy of data mining algorithm when tcm field is applied.
The implementation process of whole system is introduced in detail: its major interfaces comprises three pieces of contents below in conjunction with accompanying drawing.First is data prediction interface, and its function is the middleware that the excel data source commonly used by tcm field changes weka-excel into, for follow-up.It two is frequently-used data data mining duty interfaces, and these functions are the secondary development based on weka source code, contain frequent item set, correlation rule, cluster, the means of numerical analysis that the large tcm field of hierarchical clustering four is conventional, and allow the setting carrying out corresponding excavation parameter.It three is disease medicine relation excavation interfaces, and the technology of this module is independent research, based on the innovatory algorithm of Apriori frequent-item, tolerance more meets traditional Chinese medicine cognitive law.
See Fig. 1, general frame of the present invention is:
(1) data preprocessing module:
1. the data source that data preprocessing module mainly comprises Excel form traditional Chinese medicine department generally adopted is converted to DMsource middleware, then is translated into data analysis module and disease medicine relation finds the arff form that module can identify.Wherein middleware compares and meets natural language expressing, can be used for people to check.
2. this module only has a button on interface, and that is exactly that pre-service is carried out in select File submission.
(2) data-mining module:
1. data-mining module mainly comprises the classical data mining algorithms such as frequent item set, correlation rule, cluster, and these algorithms are the modified versions of the applicable tcm field that the Open-Source Tools bag provided based on weka carries out.With regard to Result, its accuracy rate has larger lifting than common mining algorithm.
2. this module is divided into two parts on interface, the selection of the numerous data mining duty of the first; It two is that some excavates the setting of parameter.This configuration provides excavation mode abundant flexibly, but be not succinct characteristic.
(3) disease medicine relation finds module:
1. disease medicine relationship module is the realization being applicable to the data mining algorithm finding traditional Chinese medicine disease medicine relation inner link of oneself's research and development completely.No matter its result is that accuracy or coverage all have greatly improved than general correlation rule or frequent episode method.
2. this module only has a button on interface, is exactly the middleware result DMsource.txt selecting pretreatment module to derive, excavates, and return results it.
Data stream of the present invention please refer to Fig. 2.

Claims (1)

1. intelligent traditional Chinese medicine information handling system, is characterized in that described system is a comprehensive Chinese medicinal formulae data mining platform, comprises three modules; First module is data prediction interface module, and the excel data source commonly used by tcm field changes the middleware of weka-excel into, for follow-up; Its two module is frequently-used data data mining duty interface, and module, based on the secondary development of weka source code, contains frequent item set, correlation rule, cluster, the means of numerical analysis that the large tcm field of hierarchical clustering four is conventional, and allows the setting carrying out corresponding excavation parameter; 3rd module is disease medicine relation excavation interface, and this module is based on the innovatory algorithm of Apriori frequent-item;
The data prediction interface that first module provides, it comprises an importing button, allows user to select the intrasystem arbitrary excel file of windows as input source; By increasing income, bag jxl.jar reads this fileinfo, and judges whether to meet first side's of being classified as name, and second is classified as the set form forming medicine; Convert data to this weka-excel middleware of DMsource.txt again, the object introducing this middleware is that it has more readability than arff file, can for checking;
The data mining interface that the first described module provides, it comprises frequent item set, correlation rule, cluster, this several large method of hierarchical clustering;
The parameter that frequent episode energy collecting is selected comprises mining mode:
0) Apriori is to medicine collection,
1) Aprioir is to sympotomatic set,
2) corresponding sympotomatic set is drawn from medicine collection,
3) corresponding medicine collection is drawn from sympotomatic set,
4) FP-growth is to medicine collection;
Parameter comprises: minimum support is arranged; The longest frequent episode length is arranged;
Correlation rule adopts FP-growth to generate, and parameter comprises: the setting of min confidence and minimum support;
What cluster adopted is clustering algorithm based on FP itemset, and parameter comprises: minimum support is arranged; The longest frequent episode length is arranged;
What hierarchical clustering adopted is the algorithm successively frequent item set of last layer being carried out cluster;
The disease medicine relation excavation interface that the 3rd described module provides, the algorithm that this interface comprises is the modified version of Apriori algorithm, than the general feature based on simply comparing the support drawn and be not enough to reflect tcm field data, take this new module of item collection importance degree, make Result more can embody objective law.
The mining mode 2 of the frequent item set that the 3rd described module provides) show that corresponding sympotomatic set can regard a kind of method obtaining disease medicine relation as from medicine collection: first according to support and the longest frequent episode length of setting, according to mining mode 0) Apriori obtains the frequent item set of only drug containing; Then for any frequent episode I wherein, by total data collection C, every medicine forms the symptom information comprising the data of I and adds in new data set Cz; According to the parameter of setting, according to mining mode 1) Apriori obtains the frequent item set of symptom Z under the prerequisite of medicine to I; This condition entry collection is exactly the disease medicine relation pair produced;
The mining mode 3 of described frequent item set) draw corresponding medicine collection from sympotomatic set, this also can regard a kind of method obtaining disease medicine relation as; First according to support and the longest frequent episode length of setting, according to mining mode 1) Apriori obtains and only contains the frequent item set of symptom; Then for any frequent episode I ' wherein, by total data collection C, the drug information that every symptom comprises the data of I adds in new data set Cy; According to the parameter of setting, according to mining mode 0) Apriori obtains the frequent item set of medicine Y under the prerequisite of symptom I '; This condition entry collection is exactly the disease medicine relation pair produced.
CN201510186317.3A 2015-04-17 2015-04-17 Intelligent processing system for traditional Chinese medicine information Pending CN104794340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510186317.3A CN104794340A (en) 2015-04-17 2015-04-17 Intelligent processing system for traditional Chinese medicine information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510186317.3A CN104794340A (en) 2015-04-17 2015-04-17 Intelligent processing system for traditional Chinese medicine information

Publications (1)

Publication Number Publication Date
CN104794340A true CN104794340A (en) 2015-07-22

Family

ID=53559131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510186317.3A Pending CN104794340A (en) 2015-04-17 2015-04-17 Intelligent processing system for traditional Chinese medicine information

Country Status (1)

Country Link
CN (1) CN104794340A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407650A (en) * 2016-08-29 2017-02-15 首都医科大学附属北京中医医院 Traditional Chinese medicine data processing device and method
CN106650225A (en) * 2016-10-25 2017-05-10 康美药业股份有限公司 FP growth algorithm model-based traditional Chinese medicine formula data mining method and system
CN108550381A (en) * 2018-03-20 2018-09-18 昆明理工大学 A kind of drug recommendation method based on FP-growth
CN109887604A (en) * 2019-02-26 2019-06-14 上海中医药大学 A kind of quantization decision-making system of names of disease of tcm similarity

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202887195U (en) * 2012-07-30 2013-04-17 成都中医药大学 Traditional Chinese medicine decoction digitization system
CN104318082A (en) * 2014-10-10 2015-01-28 北京嘉和美康信息技术有限公司 Follow-up data processing device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202887195U (en) * 2012-07-30 2013-04-17 成都中医药大学 Traditional Chinese medicine decoction digitization system
CN104318082A (en) * 2014-10-10 2015-01-28 北京嘉和美康信息技术有限公司 Follow-up data processing device

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DEFU ZHANG等: "《International Symposium on Fuzzy Systems,Knowledge Discovery and Natural Computation(FSKD 2014)》", 2 September 2014 *
周婕: "数据挖掘若干方法研究及中医药数据库中的应用", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *
晏峻峰: "基于WEKA的中医证候病理数据挖掘实验系统构建研究", 《2012年朱文锋学术思想研讨会暨中医诊断师资班30周年纪念大会》 *
随风: "善用weka数据预处理功能:使用weka将excel数据转化为weka用arff格式数据或UCI数据", 《豆瓣网网页公开(HTTPS://WWW.DOUBAN.COM/NOTE/270231778/)》 *
马丽伟: "关联规则算法研究及其在中医药数据挖掘中的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407650A (en) * 2016-08-29 2017-02-15 首都医科大学附属北京中医医院 Traditional Chinese medicine data processing device and method
CN106407650B (en) * 2016-08-29 2018-10-19 首都医科大学附属北京中医医院 A kind of Chinese medicine data processing equipment and method
CN106650225A (en) * 2016-10-25 2017-05-10 康美药业股份有限公司 FP growth algorithm model-based traditional Chinese medicine formula data mining method and system
CN108550381A (en) * 2018-03-20 2018-09-18 昆明理工大学 A kind of drug recommendation method based on FP-growth
CN109887604A (en) * 2019-02-26 2019-06-14 上海中医药大学 A kind of quantization decision-making system of names of disease of tcm similarity

Similar Documents

Publication Publication Date Title
Gotz et al. A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data
Gambashidze et al. Evaluation of psychometric properties of the German Hospital Survey on Patient Safety Culture and its potential for cross-cultural comparisons: a cross-sectional study
CN104699766A (en) Implicit attribute mining method integrating word correlation and context deduction
CN104794340A (en) Intelligent processing system for traditional Chinese medicine information
Borgatti et al. Techniques: Dichotomizing a network
Arshad et al. A comprehensive knowledge management process framework for healthcare information systems in healthcare industry of Pakistan
KR102006214B1 (en) System and method for building integration knowledge base based a plurality of data sources
CN110321556A (en) A kind of method and its system of doctor's diagnosis and treatment medical insurance control expense intelligent recommendation scheme
Shofi et al. Android application for diagnosing general symptoms of disease using forward chaining method
Ortega-Calvo et al. Aimdp: An artificial intelligence modern data platform. use case for Spanish national health service data silo
Syed et al. An advance tree adaptive data classification for the diabetes disease prediction
Fatima et al. Biomedical (cardiac) data mining: Extraction of significant patterns for predicting heart condition
Saib et al. Hierarchical deep learning ensemble to automate the classification of breast cancer pathology reports by icd-o topography
Hackl et al. Transforming Clinical Information Systems: Empowering Healthcare through Telemedicine, Data Science, and Artificial Intelligence Applications
CN110827990A (en) Typhoid fever syndrome differentiation reasoning system based on knowledge graph
EP4191599A1 (en) Information determination method and apparatus
Hoffmann et al. The antibiotic prescription and redemption gap and opportunistic CRP point-of-care testing. A cross-sectional study in primary health care from Eastern Austria
Semenov et al. Implementation of a Clinical Decision Support System for Interpretation of Laboratory Tests for Patients.
Sappelli et al. Using file system content to organize e-mail
Jadhav et al. TBCA formulated six sigma to investigate anti-diabetic food products
Vega-Barbas et al. A different approach for digital pathology: Lexicon-semantic analysis of histopathological reports for the assessment of their quality
CN108399205A (en) A kind of data high-speed processing conversion communication means and device
Semenov et al. Implementation of a decision support system for interpretation of laboratory tests for patients
Guo et al. Application of ontology technology in health statistic data analysis
Tuoto et al. From theory to practice: the software RELAIS as a solution for record linkage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150722