CN104794340A

CN104794340A - Intelligent processing system for traditional Chinese medicine information

Info

Publication number: CN104794340A
Application number: CN201510186317.3A
Authority: CN
Inventors: 吴骏; 谢隽; 彭岳; 汤兆亮; 李宁; 王崇骏
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2015-04-17
Filing date: 2015-04-17
Publication date: 2015-07-22

Abstract

The invention discloses an intelligent processing system for traditional Chinese medicine information. The system is a comprehensive Chinese medicine formulae data mining platform and comprises three modules which include the data preprocessing interface module, the frequently-used data mining function interface module and the disease-medicine relation mining interface module. The data preprocessing interface module is used for converting an excel data source frequently used in the field of traditional Chinese medicine into weka-excel middleware to be used in the follow-up process. The frequently-used data mining function interface module includes the four data analysis means frequently used in the field of traditional Chinese medicine based on secondary development of weka source codes, wherein the four data analysis means include frequent item sets, association rules, clustering and hierarchical clustering, and it is allowed to set corresponding mining parameters. The disease-medicine relation mining interface module is based on an improved algorithm of Apriori frequent item mining.

Description

A kind of intelligent traditional Chinese medicine information handling system

Technical field:

The present invention is a kind of traditional Chinese medicine intelligent information disposal system, comprises frequent item set specifically, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique.

Background technology:

Traditional Chinese medicine is as one of the Chinese nation's traditional rarity for thousands of years, and its curative effect has obtained the affirmative of more and more domestic personage, international personage.But because traditional Chinese medicine has the inside information of traditional Chinese culture, it is superorganic, fuzzy, holism, be imbued with all multiple features of analyzing mentally, and traditional Chinese medicine can not well be docked with the typical thought system in current west.Which results in traditional Chinese medicine is " not science " in some human eye, also allows traditional Chinese medicine globalization encounter bottleneck.Traditional Chinese medicine self also has the domestic demand continuing development in this day and age simultaneously.

One, the traditional Chinese medical science objectifies, standardized theory emerges in an endless stream, particularly west pathology, the research mode of pharmacology already by each TCM investigation person is accepted, have every day thousands of experimental data to be generated.

Its two, traditional Chinese medicine succession from ancient times to the present, is just continued against side's (agent) book and herbal document after all to a great extent.The case of numerous and complicated or let a hundred schools contend, the ancient books and records of the symposium on medical topics that differs from one another or even four large classical one-levels finally all must get back on out and out medicine and can cure the disease.But until modern age from the Eastern Han Dynasty, with the prescription far more than ten thousand crossed? there is the characteristic of a prescription treatment various disease conditions in tcm field still more.

In the face of so huge data volume, pure has been impossible task by manually carrying out studying.Be aided with the intension that simple statistical tool or data analysis software are not enough to again embody traditional Chinese medicine complexity.The invention provides and comprise frequent episode, the common technology that the research of the tcm field such as correlation rule uses, simultaneously also for tcm field in the urgent need to symptom and the relations problems of medicine propose a Promethean treatment technology.

Summary of the invention:

The present invention seeks to, proposing a kind of intelligent traditional Chinese medicine information handling system, is comprise frequent item set, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique.For university of TCM and other Chinese medicine study mechanisms and individual in the urgent need to the wish at Chinese medicine study field application data digging technology, on the basis ensureing accuracy, for it provides a method (JAVA application program system) that is simple, friendly, adjustable parameters.

The present invention solves the problem taked technical scheme, intelligent traditional Chinese medicine information handling system, and described system is a comprehensive Chinese medicinal formulae data mining platform, comprises three modules; First module is data prediction interface module, and the excel data source commonly used by tcm field changes the middleware of weka-excel into, for follow-up; Its two module is frequently-used data data mining duty interface, and module, based on the secondary development of weka source code, contains frequent item set, correlation rule, cluster, the means of numerical analysis that the large tcm field of hierarchical clustering four is conventional, and allows the setting carrying out corresponding excavation parameter; 3rd module is disease medicine relation excavation interface, and this module is based on the innovatory algorithm of Apriori frequent-item;

The data prediction interface that first module provides, it comprises an importing button, allows user to select the intrasystem arbitrary excel file of windows as input source; By increasing income, bag jxl.jar reads this fileinfo, and judges whether to meet first side's of being classified as name, and second is classified as the set form forming medicine; Convert data to this weka-excel middleware of DMsource.txt again, the object introducing this middleware is that it has more readability than arff file, can for checking;

The data mining interface that the first described module provides, it comprises frequent item set, correlation rule, cluster, this several large method of hierarchical clustering;

The parameter that frequent episode energy collecting is selected comprises mining mode:

0) Apriori is to medicine collection,

1) Aprioir is to sympotomatic set,

2) corresponding sympotomatic set is drawn from medicine collection,

3) corresponding medicine collection is drawn from sympotomatic set,

4) FP-growth is to medicine collection;

Parameter comprises: minimum support is arranged; The longest frequent episode length is arranged;

Correlation rule adopts FP-growth to generate, and parameter comprises: the setting of min confidence and minimum support;

What cluster adopted is clustering algorithm based on FP itemset, and parameter comprises: minimum support is arranged; The longest frequent episode length is arranged;

What hierarchical clustering adopted is the algorithm successively frequent item set of last layer being carried out cluster;

The disease medicine relation excavation interface that the 3rd described module provides, the algorithm that this interface comprises is the modified version of Apriori algorithm, than the general feature based on simply comparing the support drawn and be not enough to reflect tcm field data, take this new module of item collection importance degree, make Result more can embody objective law.

From medicine collection, the mining mode 2. of the frequent item set that the 3rd described module provides show that corresponding sympotomatic set can regard a kind of method obtaining disease medicine relation as: first according to support and the longest frequent episode length of setting, obtain the frequent item set of only drug containing according to mining mode 0.Apriori; Then for any frequent episode I wherein, by total data collection C, every medicine forms the symptom information comprising the data of I and adds in new data set Cz; According to the parameter of setting, obtain the frequent item set of symptom Z under the prerequisite of medicine to I according to mining mode 1.Apriori; This condition entry collection is exactly the disease medicine relation pair produced.

For example for frequent drug regimen " the ginseng tuber of pinellia ", prescriptions all in the C of total data storehouse are formed the sympotomatic set comprising the corresponding row of " the ginseng tuber of pinellia " and is added to Cz as a record.Then carry out 1.Apriori excavation to Cz, the result obtained is exactly frequent symptom combination corresponding to " the ginseng tuber of pinellia ";

The mining mode 3. of described frequent item set draws corresponding medicine collection from sympotomatic set, and this also can regard a kind of method obtaining disease medicine relation as; First according to support and the longest frequent episode length of setting, obtain only containing the frequent item set of symptom according to mining mode 1.Apriori; Then for any frequent episode I ' wherein, by total data collection C, the drug information that every symptom comprises the data of I adds in new data set Cy; According to the parameter of setting, obtain the frequent item set of medicine Y under the prerequisite of symptom I ' according to mining mode 0.Apriori; This condition entry collection is exactly the disease medicine relation pair produced.

For example for frequent symptom combination " cough and asthma heating ", symptoms all in the C of total data storehouse are formed the medicine collection comprising the corresponding row of " cough and asthma heating " and is added to Cz as a record.Then carry out 0.Apriori excavation to Cz, the result obtained is exactly frequent symptom combination corresponding to " cough and asthma heating ".

Beneficial effect of the present invention, this is a kind of traditional Chinese medicine intelligent information disposal system, comprises frequent item set specifically, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique, compared with prior art, its remarkable advantage is:

(1) realize simple: the implementation procedure that the present invention relates to is very simple, and clear in structure understands, particularly easy left-hand seat simple to operate is applicable to vast work of Chinese medicine person and uses and carry out data mining.

(2) versatility is good: the present invention is write by JAVA, goes for any environment having installed JAVA virtual machine (JRE), compares to general WIN32 application usable range wider.

(3) good stability: have a certain amount of error-detecting and debugging functions in program, for a certain amount of manual entry mistake, program can stably true(-)running uninterruptedly.

(4) favorable expandability: be all the feature for specific function Application and Development usually for tcm field, the present invention is the comprehensive platform of a traditional Chinese medicine application, and any other function can conveniently add.

(5) accuracy is high: compare general Data Mining Tools, and the present invention has carried out many special amendments for tcm field characteristic on data mining algorithm, and make Result more meet the objective law of traditional Chinese medicine, confidence level is higher.

Accompanying drawing illustrates:

Fig. 1 is the general frame figure of present system;

Fig. 2 is data flow of the present invention.

Embodiment:

The present invention be directed to university of TCM and other Chinese medicine study mechanisms and individual in the urgent need to the wish at Chinese medicine study field application data digging technology, and the traditional Chinese medicine informationization application of present stage is the small tool existence form of simple function and a kind of traditional Chinese medicine intelligent information disposal system of proposing, comprise frequent item set specifically, cluster, hierarchical clustering, the platform of the frequently-used data digging technologies such as correlation rule and a disease medicine relation discovery technique.During specific implementation analytical algorithm, be the Open-Source Tools bag relying on Weka to provide, carry out the secondary development meeting traditional Chinese medical theory thereon, greatly improve the accuracy of data mining algorithm when tcm field is applied.

The implementation process of whole system is introduced in detail: its major interfaces comprises three pieces of contents below in conjunction with accompanying drawing.First is data prediction interface, and its function is the middleware that the excel data source commonly used by tcm field changes weka-excel into, for follow-up.It two is frequently-used data data mining duty interfaces, and these functions are the secondary development based on weka source code, contain frequent item set, correlation rule, cluster, the means of numerical analysis that the large tcm field of hierarchical clustering four is conventional, and allow the setting carrying out corresponding excavation parameter.It three is disease medicine relation excavation interfaces, and the technology of this module is independent research, based on the innovatory algorithm of Apriori frequent-item, tolerance more meets traditional Chinese medicine cognitive law.

See Fig. 1, general frame of the present invention is:

(1) data preprocessing module:

1. the data source that data preprocessing module mainly comprises Excel form traditional Chinese medicine department generally adopted is converted to DMsource middleware, then is translated into data analysis module and disease medicine relation finds the arff form that module can identify.Wherein middleware compares and meets natural language expressing, can be used for people to check.

2. this module only has a button on interface, and that is exactly that pre-service is carried out in select File submission.

(2) data-mining module:

1. data-mining module mainly comprises the classical data mining algorithms such as frequent item set, correlation rule, cluster, and these algorithms are the modified versions of the applicable tcm field that the Open-Source Tools bag provided based on weka carries out.With regard to Result, its accuracy rate has larger lifting than common mining algorithm.

2. this module is divided into two parts on interface, the selection of the numerous data mining duty of the first; It two is that some excavates the setting of parameter.This configuration provides excavation mode abundant flexibly, but be not succinct characteristic.

(3) disease medicine relation finds module:

1. disease medicine relationship module is the realization being applicable to the data mining algorithm finding traditional Chinese medicine disease medicine relation inner link of oneself's research and development completely.No matter its result is that accuracy or coverage all have greatly improved than general correlation rule or frequent episode method.

2. this module only has a button on interface, is exactly the middleware result DMsource.txt selecting pretreatment module to derive, excavates, and return results it.

Data stream of the present invention please refer to Fig. 2.

Claims

1. intelligent traditional Chinese medicine information handling system, is characterized in that described system is a comprehensive Chinese medicinal formulae data mining platform, comprises three modules; First module is data prediction interface module, and the excel data source commonly used by tcm field changes the middleware of weka-excel into, for follow-up; Its two module is frequently-used data data mining duty interface, and module, based on the secondary development of weka source code, contains frequent item set, correlation rule, cluster, the means of numerical analysis that the large tcm field of hierarchical clustering four is conventional, and allows the setting carrying out corresponding excavation parameter; 3rd module is disease medicine relation excavation interface, and this module is based on the innovatory algorithm of Apriori frequent-item;

0) Apriori is to medicine collection,

1) Aprioir is to sympotomatic set,

2) corresponding sympotomatic set is drawn from medicine collection,

3) corresponding medicine collection is drawn from sympotomatic set,

4) FP-growth is to medicine collection;

The mining mode 2 of the frequent item set that the 3rd described module provides) show that corresponding sympotomatic set can regard a kind of method obtaining disease medicine relation as from medicine collection: first according to support and the longest frequent episode length of setting, according to mining mode 0) Apriori obtains the frequent item set of only drug containing; Then for any frequent episode I wherein, by total data collection C, every medicine forms the symptom information comprising the data of I and adds in new data set Cz; According to the parameter of setting, according to mining mode 1) Apriori obtains the frequent item set of symptom Z under the prerequisite of medicine to I; This condition entry collection is exactly the disease medicine relation pair produced;

The mining mode 3 of described frequent item set) draw corresponding medicine collection from sympotomatic set, this also can regard a kind of method obtaining disease medicine relation as; First according to support and the longest frequent episode length of setting, according to mining mode 1) Apriori obtains and only contains the frequent item set of symptom; Then for any frequent episode I ' wherein, by total data collection C, the drug information that every symptom comprises the data of I adds in new data set Cy; According to the parameter of setting, according to mining mode 0) Apriori obtains the frequent item set of medicine Y under the prerequisite of symptom I '; This condition entry collection is exactly the disease medicine relation pair produced.