CN104794340A - Intelligent processing system for traditional Chinese medicine information - Google Patents
Intelligent processing system for traditional Chinese medicine information Download PDFInfo
- Publication number
- CN104794340A CN104794340A CN201510186317.3A CN201510186317A CN104794340A CN 104794340 A CN104794340 A CN 104794340A CN 201510186317 A CN201510186317 A CN 201510186317A CN 104794340 A CN104794340 A CN 104794340A
- Authority
- CN
- China
- Prior art keywords
- data
- medicine
- mining
- frequent
- collection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000003814 drug Substances 0.000 title claims abstract description 82
- 201000010099 disease Diseases 0.000 claims abstract description 25
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 25
- 238000007418 data mining Methods 0.000 claims abstract description 19
- 238000005065 mining Methods 0.000 claims abstract description 19
- 238000000034 method Methods 0.000 claims abstract description 15
- 238000011161 development Methods 0.000 claims abstract description 7
- 241000288113 Gallirallus australis Species 0.000 claims abstract description 6
- 208000024891 symptom Diseases 0.000 claims description 15
- 238000009412 basement excavation Methods 0.000 claims description 11
- 229940079593 drug Drugs 0.000 claims description 5
- 238000013480 data collection Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 abstract description 4
- 238000007405 data analysis Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 6
- 241000208340 Araliaceae Species 0.000 description 3
- 206010011224 Cough Diseases 0.000 description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 description 3
- 241001522129 Pinellia Species 0.000 description 3
- 208000006673 asthma Diseases 0.000 description 3
- 235000008434 ginseng Nutrition 0.000 description 3
- 238000010438 heat treatment Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
Abstract
The invention discloses an intelligent processing system for traditional Chinese medicine information. The system is a comprehensive Chinese medicine formulae data mining platform and comprises three modules which include the data preprocessing interface module, the frequently-used data mining function interface module and the disease-medicine relation mining interface module. The data preprocessing interface module is used for converting an excel data source frequently used in the field of traditional Chinese medicine into weka-excel middleware to be used in the follow-up process. The frequently-used data mining function interface module includes the four data analysis means frequently used in the field of traditional Chinese medicine based on secondary development of weka source codes, wherein the four data analysis means include frequent item sets, association rules, clustering and hierarchical clustering, and it is allowed to set corresponding mining parameters. The disease-medicine relation mining interface module is based on an improved algorithm of Apriori frequent item mining.
Description
Technical field:
The present invention is a kind of traditional Chinese medicine intelligent information disposal system, comprises frequent item set specifically, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique.
Background technology:
Traditional Chinese medicine is as one of the Chinese nation's traditional rarity for thousands of years, and its curative effect has obtained the affirmative of more and more domestic personage, international personage.But because traditional Chinese medicine has the inside information of traditional Chinese culture, it is superorganic, fuzzy, holism, be imbued with all multiple features of analyzing mentally, and traditional Chinese medicine can not well be docked with the typical thought system in current west.Which results in traditional Chinese medicine is " not science " in some human eye, also allows traditional Chinese medicine globalization encounter bottleneck.Traditional Chinese medicine self also has the domestic demand continuing development in this day and age simultaneously.
One, the traditional Chinese medical science objectifies, standardized theory emerges in an endless stream, particularly west pathology, the research mode of pharmacology already by each TCM investigation person is accepted, have every day thousands of experimental data to be generated.
Its two, traditional Chinese medicine succession from ancient times to the present, is just continued against side's (agent) book and herbal document after all to a great extent.The case of numerous and complicated or let a hundred schools contend, the ancient books and records of the symposium on medical topics that differs from one another or even four large classical one-levels finally all must get back on out and out medicine and can cure the disease.But until modern age from the Eastern Han Dynasty, with the prescription far more than ten thousand crossed? there is the characteristic of a prescription treatment various disease conditions in tcm field still more.
In the face of so huge data volume, pure has been impossible task by manually carrying out studying.Be aided with the intension that simple statistical tool or data analysis software are not enough to again embody traditional Chinese medicine complexity.The invention provides and comprise frequent episode, the common technology that the research of the tcm field such as correlation rule uses, simultaneously also for tcm field in the urgent need to symptom and the relations problems of medicine propose a Promethean treatment technology.
Summary of the invention:
The present invention seeks to, proposing a kind of intelligent traditional Chinese medicine information handling system, is comprise frequent item set, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique.For university of TCM and other Chinese medicine study mechanisms and individual in the urgent need to the wish at Chinese medicine study field application data digging technology, on the basis ensureing accuracy, for it provides a method (JAVA application program system) that is simple, friendly, adjustable parameters.
The present invention solves the problem taked technical scheme, intelligent traditional Chinese medicine information handling system, and described system is a comprehensive Chinese medicinal formulae data mining platform, comprises three modules; First module is data prediction interface module, and the excel data source commonly used by tcm field changes the middleware of weka-excel into, for follow-up; Its two module is frequently-used data data mining duty interface, and module, based on the secondary development of weka source code, contains frequent item set, correlation rule, cluster, the means of numerical analysis that the large tcm field of hierarchical clustering four is conventional, and allows the setting carrying out corresponding excavation parameter; 3rd module is disease medicine relation excavation interface, and this module is based on the innovatory algorithm of Apriori frequent-item;
The data prediction interface that first module provides, it comprises an importing button, allows user to select the intrasystem arbitrary excel file of windows as input source; By increasing income, bag jxl.jar reads this fileinfo, and judges whether to meet first side's of being classified as name, and second is classified as the set form forming medicine; Convert data to this weka-excel middleware of DMsource.txt again, the object introducing this middleware is that it has more readability than arff file, can for checking;
The data mining interface that the first described module provides, it comprises frequent item set, correlation rule, cluster, this several large method of hierarchical clustering;
The parameter that frequent episode energy collecting is selected comprises mining mode:
0) Apriori is to medicine collection,
1) Aprioir is to sympotomatic set,
2) corresponding sympotomatic set is drawn from medicine collection,
3) corresponding medicine collection is drawn from sympotomatic set,
4) FP-growth is to medicine collection;
Parameter comprises: minimum support is arranged; The longest frequent episode length is arranged;
Correlation rule adopts FP-growth to generate, and parameter comprises: the setting of min confidence and minimum support;
What cluster adopted is clustering algorithm based on FP itemset, and parameter comprises: minimum support is arranged; The longest frequent episode length is arranged;
What hierarchical clustering adopted is the algorithm successively frequent item set of last layer being carried out cluster;
The disease medicine relation excavation interface that the 3rd described module provides, the algorithm that this interface comprises is the modified version of Apriori algorithm, than the general feature based on simply comparing the support drawn and be not enough to reflect tcm field data, take this new module of item collection importance degree, make Result more can embody objective law.
From medicine collection, the mining mode 2. of the frequent item set that the 3rd described module provides show that corresponding sympotomatic set can regard a kind of method obtaining disease medicine relation as: first according to support and the longest frequent episode length of setting, obtain the frequent item set of only drug containing according to mining mode 0.Apriori; Then for any frequent episode I wherein, by total data collection C, every medicine forms the symptom information comprising the data of I and adds in new data set Cz; According to the parameter of setting, obtain the frequent item set of symptom Z under the prerequisite of medicine to I according to mining mode 1.Apriori; This condition entry collection is exactly the disease medicine relation pair produced.
For example for frequent drug regimen " the ginseng tuber of pinellia ", prescriptions all in the C of total data storehouse are formed the sympotomatic set comprising the corresponding row of " the ginseng tuber of pinellia " and is added to Cz as a record.Then carry out 1.Apriori excavation to Cz, the result obtained is exactly frequent symptom combination corresponding to " the ginseng tuber of pinellia ";
The mining mode 3. of described frequent item set draws corresponding medicine collection from sympotomatic set, and this also can regard a kind of method obtaining disease medicine relation as; First according to support and the longest frequent episode length of setting, obtain only containing the frequent item set of symptom according to mining mode 1.Apriori; Then for any frequent episode I ' wherein, by total data collection C, the drug information that every symptom comprises the data of I adds in new data set Cy; According to the parameter of setting, obtain the frequent item set of medicine Y under the prerequisite of symptom I ' according to mining mode 0.Apriori; This condition entry collection is exactly the disease medicine relation pair produced.
For example for frequent symptom combination " cough and asthma heating ", symptoms all in the C of total data storehouse are formed the medicine collection comprising the corresponding row of " cough and asthma heating " and is added to Cz as a record.Then carry out 0.Apriori excavation to Cz, the result obtained is exactly frequent symptom combination corresponding to " cough and asthma heating ".
Beneficial effect of the present invention, this is a kind of traditional Chinese medicine intelligent information disposal system, comprises frequent item set specifically, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique, compared with prior art, its remarkable advantage is:
(1) realize simple: the implementation procedure that the present invention relates to is very simple, and clear in structure understands, particularly easy left-hand seat simple to operate is applicable to vast work of Chinese medicine person and uses and carry out data mining.
(2) versatility is good: the present invention is write by JAVA, goes for any environment having installed JAVA virtual machine (JRE), compares to general WIN32 application usable range wider.
(3) good stability: have a certain amount of error-detecting and debugging functions in program, for a certain amount of manual entry mistake, program can stably true(-)running uninterruptedly.
(4) favorable expandability: be all the feature for specific function Application and Development usually for tcm field, the present invention is the comprehensive platform of a traditional Chinese medicine application, and any other function can conveniently add.
(5) accuracy is high: compare general Data Mining Tools, and the present invention has carried out many special amendments for tcm field characteristic on data mining algorithm, and make Result more meet the objective law of traditional Chinese medicine, confidence level is higher.
Accompanying drawing illustrates:
Fig. 1 is the general frame figure of present system;
Fig. 2 is data flow of the present invention.
Embodiment:
The present invention be directed to university of TCM and other Chinese medicine study mechanisms and individual in the urgent need to the wish at Chinese medicine study field application data digging technology, and the traditional Chinese medicine informationization application of present stage is the small tool existence form of simple function and a kind of traditional Chinese medicine intelligent information disposal system of proposing, comprise frequent item set specifically, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique.During specific implementation analytical algorithm, be the Open-Source Tools bag relying on Weka to provide, carry out the secondary development meeting traditional Chinese medical theory thereon, greatly improve the accuracy of data mining algorithm when tcm field is applied.
The implementation process of whole system is introduced in detail: its major interfaces comprises three pieces of contents below in conjunction with accompanying drawing.First is data prediction interface, and its function is the middleware that the excel data source commonly used by tcm field changes weka-excel into, for follow-up.It two is frequently-used data data mining duty interfaces, and these functions are the secondary development based on weka source code, contain frequent item set, correlation rule, cluster, the means of numerical analysis that the large tcm field of hierarchical clustering four is conventional, and allow the setting carrying out corresponding excavation parameter.It three is disease medicine relation excavation interfaces, and the technology of this module is independent research, based on the innovatory algorithm of Apriori frequent-item, tolerance more meets traditional Chinese medicine cognitive law.
See Fig. 1, general frame of the present invention is:
(1) data preprocessing module:
1. the data source that data preprocessing module mainly comprises Excel form traditional Chinese medicine department generally adopted is converted to DMsource middleware, then is translated into data analysis module and disease medicine relation finds the arff form that module can identify.Wherein middleware compares and meets natural language expressing, can be used for people to check.
2. this module only has a button on interface, and that is exactly that pre-service is carried out in select File submission.
(2) data-mining module:
1. data-mining module mainly comprises the classical data mining algorithms such as frequent item set, correlation rule, cluster, and these algorithms are the modified versions of the applicable tcm field that the Open-Source Tools bag provided based on weka carries out.With regard to Result, its accuracy rate has larger lifting than common mining algorithm.
2. this module is divided into two parts on interface, the selection of the numerous data mining duty of the first; It two is that some excavates the setting of parameter.This configuration provides excavation mode abundant flexibly, but be not succinct characteristic.
(3) disease medicine relation finds module:
1. disease medicine relationship module is the realization being applicable to the data mining algorithm finding traditional Chinese medicine disease medicine relation inner link of oneself's research and development completely.No matter its result is that accuracy or coverage all have greatly improved than general correlation rule or frequent episode method.
2. this module only has a button on interface, is exactly the middleware result DMsource.txt selecting pretreatment module to derive, excavates, and return results it.
Data stream of the present invention please refer to Fig. 2.
Claims (1)
1. intelligent traditional Chinese medicine information handling system, is characterized in that described system is a comprehensive Chinese medicinal formulae data mining platform, comprises three modules; First module is data prediction interface module, and the excel data source commonly used by tcm field changes the middleware of weka-excel into, for follow-up; Its two module is frequently-used data data mining duty interface, and module, based on the secondary development of weka source code, contains frequent item set, correlation rule, cluster, the means of numerical analysis that the large tcm field of hierarchical clustering four is conventional, and allows the setting carrying out corresponding excavation parameter; 3rd module is disease medicine relation excavation interface, and this module is based on the innovatory algorithm of Apriori frequent-item;
The data prediction interface that first module provides, it comprises an importing button, allows user to select the intrasystem arbitrary excel file of windows as input source; By increasing income, bag jxl.jar reads this fileinfo, and judges whether to meet first side's of being classified as name, and second is classified as the set form forming medicine; Convert data to this weka-excel middleware of DMsource.txt again, the object introducing this middleware is that it has more readability than arff file, can for checking;
The data mining interface that the first described module provides, it comprises frequent item set, correlation rule, cluster, this several large method of hierarchical clustering;
The parameter that frequent episode energy collecting is selected comprises mining mode:
0) Apriori is to medicine collection,
1) Aprioir is to sympotomatic set,
2) corresponding sympotomatic set is drawn from medicine collection,
3) corresponding medicine collection is drawn from sympotomatic set,
4) FP-growth is to medicine collection;
Parameter comprises: minimum support is arranged; The longest frequent episode length is arranged;
Correlation rule adopts FP-growth to generate, and parameter comprises: the setting of min confidence and minimum support;
What cluster adopted is clustering algorithm based on FP itemset, and parameter comprises: minimum support is arranged; The longest frequent episode length is arranged;
What hierarchical clustering adopted is the algorithm successively frequent item set of last layer being carried out cluster;
The disease medicine relation excavation interface that the 3rd described module provides, the algorithm that this interface comprises is the modified version of Apriori algorithm, than the general feature based on simply comparing the support drawn and be not enough to reflect tcm field data, take this new module of item collection importance degree, make Result more can embody objective law.
The mining mode 2 of the frequent item set that the 3rd described module provides) show that corresponding sympotomatic set can regard a kind of method obtaining disease medicine relation as from medicine collection: first according to support and the longest frequent episode length of setting, according to mining mode 0) Apriori obtains the frequent item set of only drug containing; Then for any frequent episode I wherein, by total data collection C, every medicine forms the symptom information comprising the data of I and adds in new data set Cz; According to the parameter of setting, according to mining mode 1) Apriori obtains the frequent item set of symptom Z under the prerequisite of medicine to I; This condition entry collection is exactly the disease medicine relation pair produced;
The mining mode 3 of described frequent item set) draw corresponding medicine collection from sympotomatic set, this also can regard a kind of method obtaining disease medicine relation as; First according to support and the longest frequent episode length of setting, according to mining mode 1) Apriori obtains and only contains the frequent item set of symptom; Then for any frequent episode I ' wherein, by total data collection C, the drug information that every symptom comprises the data of I adds in new data set Cy; According to the parameter of setting, according to mining mode 0) Apriori obtains the frequent item set of medicine Y under the prerequisite of symptom I '; This condition entry collection is exactly the disease medicine relation pair produced.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510186317.3A CN104794340A (en) | 2015-04-17 | 2015-04-17 | Intelligent processing system for traditional Chinese medicine information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510186317.3A CN104794340A (en) | 2015-04-17 | 2015-04-17 | Intelligent processing system for traditional Chinese medicine information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104794340A true CN104794340A (en) | 2015-07-22 |
Family
ID=53559131
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510186317.3A Pending CN104794340A (en) | 2015-04-17 | 2015-04-17 | Intelligent processing system for traditional Chinese medicine information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104794340A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106407650A (en) * | 2016-08-29 | 2017-02-15 | 首都医科大学附属北京中医医院 | Traditional Chinese medicine data processing device and method |
CN106650225A (en) * | 2016-10-25 | 2017-05-10 | 康美药业股份有限公司 | FP growth algorithm model-based traditional Chinese medicine formula data mining method and system |
CN108550381A (en) * | 2018-03-20 | 2018-09-18 | 昆明理工大学 | A kind of drug recommendation method based on FP-growth |
CN109887604A (en) * | 2019-02-26 | 2019-06-14 | 上海中医药大学 | A kind of quantization decision-making system of names of disease of tcm similarity |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN202887195U (en) * | 2012-07-30 | 2013-04-17 | 成都中医药大学 | Traditional Chinese medicine decoction digitization system |
CN104318082A (en) * | 2014-10-10 | 2015-01-28 | 北京嘉和美康信息技术有限公司 | Follow-up data processing device |
-
2015
- 2015-04-17 CN CN201510186317.3A patent/CN104794340A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN202887195U (en) * | 2012-07-30 | 2013-04-17 | 成都中医药大学 | Traditional Chinese medicine decoction digitization system |
CN104318082A (en) * | 2014-10-10 | 2015-01-28 | 北京嘉和美康信息技术有限公司 | Follow-up data processing device |
Non-Patent Citations (5)
Title |
---|
DEFU ZHANG等: "《International Symposium on Fuzzy Systems,Knowledge Discovery and Natural Computation(FSKD 2014)》", 2 September 2014 * |
周婕: "数据挖掘若干方法研究及中医药数据库中的应用", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
晏峻峰: "基于WEKA的中医证候病理数据挖掘实验系统构建研究", 《2012年朱文锋学术思想研讨会暨中医诊断师资班30周年纪念大会》 * |
随风: "善用weka数据预处理功能:使用weka将excel数据转化为weka用arff格式数据或UCI数据", 《豆瓣网网页公开(HTTPS://WWW.DOUBAN.COM/NOTE/270231778/)》 * |
马丽伟: "关联规则算法研究及其在中医药数据挖掘中的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106407650A (en) * | 2016-08-29 | 2017-02-15 | 首都医科大学附属北京中医医院 | Traditional Chinese medicine data processing device and method |
CN106407650B (en) * | 2016-08-29 | 2018-10-19 | 首都医科大学附属北京中医医院 | A kind of Chinese medicine data processing equipment and method |
CN106650225A (en) * | 2016-10-25 | 2017-05-10 | 康美药业股份有限公司 | FP growth algorithm model-based traditional Chinese medicine formula data mining method and system |
CN108550381A (en) * | 2018-03-20 | 2018-09-18 | 昆明理工大学 | A kind of drug recommendation method based on FP-growth |
CN109887604A (en) * | 2019-02-26 | 2019-06-14 | 上海中医药大学 | A kind of quantization decision-making system of names of disease of tcm similarity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gotz et al. | A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data | |
Gambashidze et al. | Evaluation of psychometric properties of the German Hospital Survey on Patient Safety Culture and its potential for cross-cultural comparisons: a cross-sectional study | |
CN104699766A (en) | Implicit attribute mining method integrating word correlation and context deduction | |
CN104794340A (en) | Intelligent processing system for traditional Chinese medicine information | |
Borgatti et al. | Techniques: Dichotomizing a network | |
Arshad et al. | A comprehensive knowledge management process framework for healthcare information systems in healthcare industry of Pakistan | |
KR102006214B1 (en) | System and method for building integration knowledge base based a plurality of data sources | |
CN110321556A (en) | A kind of method and its system of doctor's diagnosis and treatment medical insurance control expense intelligent recommendation scheme | |
Shofi et al. | Android application for diagnosing general symptoms of disease using forward chaining method | |
Ortega-Calvo et al. | Aimdp: An artificial intelligence modern data platform. use case for Spanish national health service data silo | |
Syed et al. | An advance tree adaptive data classification for the diabetes disease prediction | |
Fatima et al. | Biomedical (cardiac) data mining: Extraction of significant patterns for predicting heart condition | |
Saib et al. | Hierarchical deep learning ensemble to automate the classification of breast cancer pathology reports by icd-o topography | |
Hackl et al. | Transforming Clinical Information Systems: Empowering Healthcare through Telemedicine, Data Science, and Artificial Intelligence Applications | |
CN110827990A (en) | Typhoid fever syndrome differentiation reasoning system based on knowledge graph | |
EP4191599A1 (en) | Information determination method and apparatus | |
Hoffmann et al. | The antibiotic prescription and redemption gap and opportunistic CRP point-of-care testing. A cross-sectional study in primary health care from Eastern Austria | |
Semenov et al. | Implementation of a Clinical Decision Support System for Interpretation of Laboratory Tests for Patients. | |
Sappelli et al. | Using file system content to organize e-mail | |
Jadhav et al. | TBCA formulated six sigma to investigate anti-diabetic food products | |
Vega-Barbas et al. | A different approach for digital pathology: Lexicon-semantic analysis of histopathological reports for the assessment of their quality | |
CN108399205A (en) | A kind of data high-speed processing conversion communication means and device | |
Semenov et al. | Implementation of a decision support system for interpretation of laboratory tests for patients | |
Guo et al. | Application of ontology technology in health statistic data analysis | |
Tuoto et al. | From theory to practice: the software RELAIS as a solution for record linkage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150722 |