CN109801687A

CN109801687A - A kind of construction method and system of the causality knowledge base towards medicine

Info

Publication number: CN109801687A
Application number: CN201910034537.2A
Authority: CN
Inventors: 杨矫云; 江思源; 吉品; 安宁
Original assignee: Hefei University of Technology
Current assignee: Hangzhou Zhilan Health Co ltd
Priority date: 2019-01-15
Filing date: 2019-01-15
Publication date: 2019-05-24
Anticipated expiration: 2039-01-15
Also published as: CN112151130A; CN109801687B; CN112233736B; CN112151130B; CN112233736A

Abstract

The construction method for the causality knowledge base towards medicine that the present invention relates to a kind of, its step include: literary unit can obtain numerous pertinent literatures containing various diseases and it is carried out classification form several literary unit bodies to construct original library, so that data cell can be obtained main characteristic parameters based on literary unit body and construct data set based on main characteristic parameters, cause and effect unit is based on main characteristic parameters and data set constructs Bayesian network, to analyze the average cause-effect between illness by data pattern, to which blocks of knowledge can construct knowledge base in a manner of forming the corresponding relationship of the average cause-effect between illness and between illness based on pertinent literature.

Description

A kind of construction method and system of the causality knowledge base towards medicine

Technical field

The present invention relates to medical information technical field more particularly to a kind of structures of the causality knowledge base towards medicine Construction method and system.

Background technique

During scientific research, document is a kind of carrier for recording scientific achievement and most having convincingness and conviction power. Beyond all doubt says, the most content of scientific research activity is had recorded in document.A large amount of document describes the pass between illness Connection, but it is very difficult that doctor, which needs to consult the causality that a large amount of document is gone between research and illness,.Such as, in medicine Boundary, complication and complication be after a kind of disease has occurred, follow this disease have occurred another or it is a variety of Disease.Wherein, complication and complication are a complicated clinical medicine concepts.Complication refers to a kind of disease in development process Cause the generation of another disease or symptom；And complication refers to during diagnosis and nursing, patient is by suffering from a kind of disease merging Another or several diseases related with this disease have occurred.In medical field, medical research it is emphasised that causality without It is incidence relation.And between complication and complication disease, have causality between complication and primary disease, and complication with Noncausal relationship between principal disease.Therefore, whether it is causality that doctor can be found from numerous documents between illness It is the technical issues that need to address.

For example, a kind of case history statistics of complication and complication disclosed in the Chinese patent of Publication No. CN107145712A Analysis system.The system includes the Chi-square Test list of diagnostic code maintenance unit, case history counting unit, 2 × 2 cross division data Member and report generation unit safeguard the mapping relations between each ID and diagnosis, and establish diagnosis ID diagnosis statistical form；Its In, the file of diagnosis ID statistical form corresponds to other diagnosis in case history homepage discharge diagnosis；Hospital is imported by data-interface again K parts of history case history, according to diagnosis ID statistical form, the Main Diagnosis in case history discharge diagnosis is converted into ID with other diagnosis and is examined It is disconnected, it is then calculated by the Chi-square Test of 2 × 2 cross division data and Main Diagnosis chi-square value corresponding with various diagnosis, And be ranked up, so, it is possible to analyze rapidly using computer high-speed computation may have causality with selected Main Diagnosis Complication and complication, reach the better beneficial effect of speed.

For example, a kind of complication and complication impact factor disclosed in the Chinese patent of Publication No. CN107799182A Evaluation method and electronic equipment.This method seeks impact factor vector by excellent calculation algorithm so that Diagnosis vector and influence because The closest resource consumption level parameter of the inner product of subvector, so as to get each under a diagnosis associated packets Complication and complication occur whether or severity for medical resource consumption level weighing factor.The weighing factor can With accurately estimate each complication and complication whether appearance or severity for medical resource consumption level shadow It rings.

For example, a kind of assessment chronic disease Cost Modeling method disclosed in the Chinese patent of Publication No. CN106407686A. This method passes through screening sample and feature selecting first, then uses regression model, obtains each impact factor to chronic disease expense Influence.The invention can directly quantify influence degree of the complication complication to its data expense of chronic disease out, be chronic And medical treatment control take offer foundation.

For example, a kind of inpatient's medical control quality evaluation disclosed in the Chinese patent of Publication No. CN105046406A Method, this method comprises: historical data screening and modeling；Data identify and cleaning；Medical diagnosis on disease associated packets DRG and model Classification；ICD closes complication and its classification set of its dependent variable when being admitted to hospital；Be admitted to hospital and close the statistical check of complication variable with Screening；The foundation of mathematical modeling and quality verification；Current data screening is calculated with pre-value；Calculate the risk profile of patient admission Value, realize to each inpatient the death rate, length of stay and medical treatment cost risk profile of being admitted to hospital.The invention is by counting greatly Effective conversion of the medical data from data to solution is realized according to the methods of analysis, mathematical statistics and machine learning, it is real Data value is showed.It solves incomparable problem between medical data, can not only realize that the quality of medical care between disease is commented Estimate, can also be achieved between doctor, commented between hospital department, between hospital in the performance reasonability of inpatient's disease treatment management Valence.

For example, a kind of introducing radiation responsive parameter normal tissue disclosed in Publication No. CN102542153B is simultaneously The method for sending out disease impact probability.This method is based on simple normal tissue organ model, and dosage and the life of organ model is calculated Object effective dose BED distribution returns case processing BED distribution that total existence score SF is calculated, organ model is calculated using SF 50% complication of the NTCPLKB model effective uniform dose EUD of corresponding broad sense is utilized EUD model by effective uniform dose EUD In include a variety of radiosusceptibility parameters, introduce radiation sensitivity parameters on normal tissue complication probability influence.

For example, one kind disclosed in the Chinese patent of Publication No. CN106295187A is supported towards intelligent clinical aid decision The construction of knowledge base method and system of system, this method include obtain input information, to the input information progress word segmentation processing, Part-of-speech tagging and syntactic analysis obtain relational dependence tree, extract concept in the relational dependence tree, entity, entity and modify Language；And according to the concept, the entity, the entity modifier, by relationship semantical definition rule, obtain the relationship according to Relationship between each entity of Lai Shuzhong；Setting extension triple, by the extension triple by the relational dependence tree In relationship between each entity stored, to complete building knowledge base.The invention can be used for Clinic Case mostly and feature More situations can realize flexible expansion for case history representation manners.But this application only passes through the short of recognition expression semantic relation Language extracts the relationship between entity, and relationship lays particular emphasis on correlativity, and correlativity not necessarily causality, therefore Causality acquired in the patent is unreliable.

For example, a kind of congenital cataract art time complication disclosed in the Chinese patent of Publication No. CN106667443A is pre- Survey method and system.This method obtains predictive factor by clinical information；By predictive factor by NB Algorithm, obtain Prediction result；Prediction result is presented；Corresponding follow-up information is obtained according to prediction result, so as to Accurate Prediction complication Occur.

By the above prior art investigation it can be found that for whether constituting showing for complication and complication between illness Having that technology is much insufficient, only Publication No. CN107145712A is only capable of being determined by simple mathematical statistics, and And determined with independence assumption, it is merely capable of illustrating the relevance between illness, can not illustrate the cause and effect between illness Property, this is not enough to quantify the causality between illness.

Summary of the invention

For the deficiencies of the prior art, the building side for the causality knowledge base towards medicine that the present invention provides a kind of Method is related to whether constituting the building side of the knowledge base of complication or complication between a kind of various diseases based on literature search Method, step include: literary unit building original library；Data cell constructs data set；Between cause and effect unit illness because Fruit relationship；Blocks of knowledge stores the original library, the data set and/or the average cause-effect and can be read with constructing The knowledge base for taking and/or showing is supplied to medical worker reference, study and/or decision in the form of data quantization；Institute State literary unit can obtain numerous pertinent literatures containing various diseases and it is carried out classification form several literary unit bodies To construct original library, so that the data cell can obtain main characteristic parameters and base based on the literary unit body Data set is constructed in the main characteristic parameters, the cause and effect unit is based on the main characteristic parameters and the data set constructs Bayesian network, to analyze the average cause-effect between illness by data pattern, so that the blocks of knowledge can be based on The pertinent literature is to form the side of the corresponding relationship of the average cause-effect between the illness and between the illness Formula constructs the knowledge base.The present invention is to be not related to disease by the way that the initial data recorded in document is formed a kind of knowledge base The diagnostic and therapeutic method of disease.

According to a kind of preferred embodiment, the pertinent literature classification is carried out as follows: the literary unit The frequency for counting words/phrases in each document obtains the word/phrase joint according to independence assumption and occurs Probability；The literary unit calculates the word/phrase relevance intensity, and corrects institute based on the relevance intensity Joint probability of occurrence is stated to obtain the association reduced coordinate of the document；The association reduction that the literary unit constructs the document is sat Mark, and the association reduced coordinate based on whole pertinent literatures and the classification function of relevance intensity building will The pertinent literature is classified according to the form of iterative algorithm to form several literary unit bodies；Wherein, the classification Function can carry out deep learning based on the sample size of the pertinent literature, to enhance the precision of the literary unit.

According to a kind of preferred embodiment, in the case where the data cell obtains the literary unit body, institute It states data cell and obtains the data set in such a way that illness is to pairing；The data cell is each pertinent literature The relationship between wherein illness pair is extracted in a manner of the syntactic analysis of natural language processing, to establish the relationship of the illness pair Knowledge base, the relationship between the illness pair includes positive relationship, inverse relationship and vertical relation；Also, the data cell The document for including the illness pair is retrieved in a manner of fusion in the literary unit body based on relationship knowledge table Obtain relationship certainty value library of the relationship certainty value of the illness pair to establish the illness pair, the pass between the illness pair System includes positive relationship certainty value, inverse relationship certainty value and vertical relation certainty value；To which the data cell is based on to institute There are the relational knowledge base established in the way of matching two-by-two between illness and relationship certainty value library to construct the data set.

According to a kind of preferred embodiment, the cause and effect unit 3 constructs Bayesian network as follows, S31: base Bayesian network evaluation function is constructed in the relational knowledge base:

logP(G,D,K_L)=logP (G)+logP (D | G)+logP (K_L|G)

S32: non-directed graph structural constraint is constructed based on the relational knowledge base；For the data set D, to the data set Any illness in D is to L_mAnd L_n, attribute is obtained to L to the relational knowledge base by retrieval illness_mAnd L_nIllness to volume Number, according to the illness to illness in number searching document to L_mAnd L_nL in relationship certainty value table_m→L_nRelationship certainty value And L_n→L_mRelationship certainty value,

S33: Bayesian network is constructed based on the Bayesian network evaluation function and the non-directed graph structural constraint.

According to a kind of preferred embodiment, the cause and effect unit is based on the Bayesian network and Pearl principle calculates Average cause-effect between each illness pair, when the average cause-effect is more than the cause-effect threshold value of setting, illness it Between constitute complication；When the average cause-effect is no more than the cause-effect threshold value of setting, complication is constituted between illness.

According to a kind of preferred embodiment, for illness L_m, by way of traversal about based on the undirected graph structure Beam obtains and illness L_mThe node being connected constitutes its node collection；And gradually calculate each node and illness L_mBetween correlation, And therefrom choose correlation maximum node carry out independence assumption, delete at data-oriented collection D with L_mIndependent node；

Illness L_nWith illness L_mBetween independence pass through mutual information measure:

When the mutual information has exceeded the threshold value of mutual information, then illness L_nWith illness L_mIt is less independent with correlation； In the threshold value of the mutual information without departing from mutual information, then illness L_nWith illness L_mIt is independent without correlation.

According to a kind of preferred embodiment, the structure for the causality knowledge base towards medicine that the invention also discloses a kind of System is built, which includes literary unit: for constructing original library；Data cell: for constructing data set；Cause and effect list Member: for the average cause-effect between illness；And blocks of knowledge: for store the original library, the data set and/ Or the average cause-effect is supplied in the form of data quantization with constructing the knowledge base that can be read and/or show Medical worker reference, study and/or decision；The literary unit can be numerous containing more based on user-defined request It plants the pertinent literature of illness and carries out classification to it and form several literary unit bodies to construct original library, so that the number Main characteristic parameters can be obtained based on the literary unit body according to unit and construct data set based on the main characteristic parameters, To reduce numerous characteristic parameters of numerous pertinent literatures formation for interference causal between illness and improve original The utility value of document databse；The cause and effect unit is based on the main characteristic parameters and the data set constructs Bayesian network, To excavate the average cause-effect between illness by data pattern, so as to according between the average cause-effect illness Whether complication or complication are constituted.

According to a kind of preferred embodiment, the literary unit counts the frequency of words/phrases in each document, presses The word/phrase joint probability of occurrence is obtained according to independence assumption；The literary unit calculating word/described The relevance intensity of phrase, and the joint probability of occurrence is corrected to obtain the association of the document about based on the relevance intensity Change coordinate；The literary unit constructs the association reduced coordinate of the document, and based on described in whole pertinent literatures Be associated with reduced coordinate and the relevance intensity building classification function by the pertinent literature according to iterative algorithm form into Row classification is to form several literary unit bodies；Wherein, the classification function can be based on the sample size of the pertinent literature Deep learning is carried out, to enhance the precision of the literary unit.

According to a kind of preferred embodiment, in the case where the data cell obtains the literary unit body, institute State data cell and obtain the data set in such a way that illness is to pairing: the data cell is each pertinent literature The relationship between wherein illness pair is extracted in a manner of the syntactic analysis of natural language processing, to establish the relationship of the illness pair Knowledge base, the relationship between the illness pair includes positive relationship, inverse relationship and vertical relation；Also, the data cell The document for including the illness pair is retrieved in a manner of fusion in the literary unit body based on relationship knowledge table Obtain relationship certainty value library of the relationship certainty value of the illness pair to establish the illness pair, the pass between the illness pair System includes positive relationship certainty value, inverse relationship certainty value and vertical relation certainty value；To which the data cell is based on to institute There are the relational knowledge base established in the way of matching two-by-two between illness and relationship certainty value library to construct the data set.

According to a kind of preferred embodiment, the cause and effect unit constructs Bayesian network as follows, S31: base Bayesian network evaluation function is constructed in the relational knowledge base:

logP(G,D,K_L)=logP (G)+log (D | G)+logP (K_L|G)

S32: non-directed graph structural constraint is constructed based on the relational knowledge base；To the data set D, to the data set D In any illness to L_mAnd L_n, attribute is obtained to L to the relational knowledge base by retrieval illness_mAnd L_nIllness to number, According to the illness to illness in number searching document to L_mAnd L_nL in relationship certainty value table_m→L_nRelationship certainty value and L_n →L_mRelationship certainty value,

The present invention provides a kind of building system of causality knowledge base towards medicine, and the present invention is based on existing concurrent Disease/complication field documents and materials, construct original library, and design Bayesian network evaluation function, and construct pattra leaves Then the undirected constraint diagram of this network constructs Bayesian network, the causality analysis reasoning between illness is realized, to analyze illness Between causality.

Detailed description of the invention

Fig. 1 is a kind of preferred flow diagram of construction method provided by the invention；With

Fig. 2 is a kind of preferred module diagram of building system provided by the invention.

Reference signs list

1: literary unit 2: data cell

3: cause and effect unit 4: blocks of knowledge

Specific embodiment

It 1 and 2 is described in detail with reference to the accompanying drawing.

It in description of the invention, term " first ", " second ", " third " and is used for description purposes only, and should not be understood as Indication or suggestion relative importance or the quantity for implicitly indicating indicated technical characteristic." first ", " are defined as a result, Two ", " third " and feature can explicitly or implicitly include one or more of the features.In description of the invention In, the meaning of " plurality " is two or more, unless otherwise specifically defined.

Embodiment 1

During scientific research, document is a kind of carrier for recording scientific achievement and most having convincingness and conviction power. Beyond all doubt says, the most content of scientific research activity is had recorded in document.A large amount of document describes the pass between illness Connection, but it is very difficult that doctor, which needs to consult the causality that a large amount of document is gone between research and illness,.For example, curing Educational circles, there is no causalities between the illness of complication, and have causality between the illness of complication.Based on this this hair It is bright by be collected, separate containing the document between various diseases, the processes such as parameter extraction and judgement, to pass through quantization Mode provides a kind of construction method of the knowledge base of complication or complication, can pass through for doctor in the decision for the treatment of means The mode of " data are spoken " provides strong reference.In the present invention, it is only in the document for polydispersion of comforming, Research Literature The causality between illness referred to, with a kind of this knowledge base of being formed based on retrieval.The building system is not a kind of disease Disease diagnoses and/or treats method.

The construction method for the causality knowledge base towards medicine that present embodiment discloses a kind of, do not cause conflict or In contradictory situation, the entirety and/or partial content of the preferred embodiment of other embodiments can be used as the benefit of the present embodiment It fills.Preferably, this method can be realized by method of the invention and/or other alternative modules.

Whether the building side of the knowledge base of complication or complication is constituted between a kind of various diseases based on literature search Method, as shown in Figure 1, its step includes:

S1: literary unit 1 constructs original library；

S2: data cell 2 constructs data set；

S3: the causality between 3 illness of cause and effect unit；

S4: blocks of knowledge 4 stores the original library, the data set and/or the average cause-effect to construct The knowledge base that can be read and/or show.To which the information that knowledge base 4 provides can be supplied in the form of data quantization Medical worker reference, study and/or decision.

It is done in order to reduce numerous characteristic parameters that numerous pertinent literatures are formed for causal between illness pair Disturb and improve the utility value in original library.Preferably, literary unit can obtain numerous correlation texts containing various diseases It offers and carries out classification to it and form several literary unit bodies to construct original library, so that data cell can be based on document Cell cube obtains main characteristic parameters and constructs data set based on main characteristic parameters.

Preferably, cause and effect unit is based on main characteristic parameters and data set constructs Bayesian network, to pass through data pattern Analyze illness between average cause-effect, thus blocks of knowledge can based on pertinent literature be formed between illness and illness it Between the mode of corresponding relationship of average cause-effect construct knowledge base.For example, the average cause-effect between illness can be anti- Whether mirror between illness is to constitute complication and complication.

Preferably, numerous pertinent literatures containing various diseases of the literary unit 1 based on acquisition.1 pair of correlation of literary unit Document carries out classification and forms several literary unit bodies to construct original library.The pertinent literature includes medical case history, research report Announcement, meeting paper, periodical literature, books, academic paper and patent.In the case where such a large amount of document, need according to Certain method is classified.Document classification is carried out to be to be able to the association effectively observed between illness and reduce system Load.Such as it can classify according to disease of digestive tract, cardiovascular disease and neuropathy etc..It can also be according to science Field is classified, such as rehabilitation and psychology etc. are classified.But, quasi- under the serious stern form of lot of documents Really efficient classification will have a direct impact on the difference of complication and complication.Preferably, document point come can using Bayesian Method, SVM method and k-NN method.

Preferably, pertinent literature classification is carried out as follows: S11: literary unit 1 counts single in each document Word/phrase frequency obtains the joint probability of occurrence of words/phrases according to independence assumption.For example, for a specific text It offers, joint probability of occurrence distribution can be calculated according to Nae Bayesianmethod.

S12: the relevance intensity of the calculating words/phrases of literary unit 1.By the calculating of relevance intensity, it is able to reflect The relevance of words/phrases, the classification for document are suitable.Preferably, in classification, the collection that N is document sample is defined It closes, V is the set of document type, V_iIt is the subset of i-th of document type.W is words/phrases set, W_iIt is i-th of word/word The subset of group.In V_iIn contain S_jA sample, wherein the association reduced coordinate T of p-th of sample_pIt is a n dimension group:

Wherein, k_i(i=1,2,3 ... n) in i-th of word occur number,Normalization coefficient.

In V_iInterconnection vector be all V_iMiddle sample association reduced coordinate is averaged, which reflects word/word in document Group relevance intensity i.e.:

S13: literary unit 1 obtains the association reduced coordinate of document, and is sat based on the association reduction of whole pertinent literatures The classification function of mark building classifies pertinent literature according to the form of iterative algorithm to form several literary unit bodies.It is preferred that Ground, for any document, it is associated with reduced coordinate are as follows:

In formula, q_iIt is the number that i-th of word occurs in document.It is document to be sorted and every a kind of text carrying out classification Offer V_iSupporting point (b₁,b₂,…,b_n) distance be denoted as:

According to relevance intensity, document classification function is constructed:

In formula, γ_iIt is related to relevance intensity.

Preferably, iterative algorithm can be using minimum iterative algorithm, minimum Optimized Iterative algorithm and expectation greatest iteration Algorithm.Preferably, classification function can carry out deep learning based on the sample size of pertinent literature, to enhance literary unit 1 Precision.

Preferably, data cell 2 can be obtained main characteristic parameters based on literary unit body and be based on main characteristic parameters Construct data set.To reduce numerous characteristic parameters of numerous pertinent literatures formation for interference causal between illness and mention The utility value of high original document databse.Preferably, in the case where data cell 2 obtains literary unit body, data cell 2 is pressed Data set is obtained according to mode of the illness to pairing.Data cell 2 is to each pertinent literature with the syntax of natural language processing point Analysis mode extracts the relationship between wherein illness pair, the relationship packet to establish the relational knowledge base of illness pair, between illness pair Include positive relationship, inverse relationship and vertical relation.Also, data cell 2 is based on relationship knowledge table in literary unit body to packet Document containing illness pair is retrieved pass of the relationship certainty value that illness pair is obtained in a manner of fusion to establish illness pair It is certainty value library, the relationship between illness pair includes positive relationship certainty value, inverse relationship certainty value and vertical relation certainty value. To which data cell 2 is based on to the relational knowledge base and relationship reliability established in the way of matching two-by-two between all illnesss It is worth library and constructs data set.For example, obtaining illness L in pertinent literature₁With illness L₂.Illness L₁With illness L₂The relationship of appearance It may be positive relationship, i.e. illness L₁Influence illness L₂, it is denoted as L₁→L₂.Illness L₁With illness L₂The relationship that can occur is possible For inverse relationship, i.e. illness L₂Influence illness L₁, it is denoted as L₂→L₁.Illness L₁With illness L₂The relationship of appearance may close to be vertical System, i.e. illness L₂With illness L₁Be independent of each other L₁⊥L₂.It can also include illness L since complication or complication are a variety of₃ With illness L₄Etc. several illnesss.According to the relationship of the above building illness, illness L can be constructed₁With illness L₃Relationship knowledge Library, illness L₂With illness L₃Relational knowledge base, and so on.Then in unit document body, according to the content in different documents Based on above-mentioned relation construction of knowledge base relationship certainty value library.Preferably, positive relationship certainty value, inverse relationship certainty value and vertical The sum of relationship certainty value three is according to normalized.I.e. in unit document body, traversal queries are carried out to all documents, it is right Positive relationship certainty value, inverse relationship certainty value and vertical relation certainty value assign weight according to frequency.Data cell 2 will be above-mentioned In relational knowledge base and relationship certainty value library building data set input cause and effect unit 3, carry out in next step.

Preferably for periodical literature, L₁→L₂Positive relationship certainty value can also be according to such as giving a definition:

Wherein, C (Xi) is the confidence level of document Xi, formula are as follows: C (Xi)=(IFi+1) × (CIi+1), Xi indicate i-th Document, IFi are the impact factor after the standardization of periodical where document Xi, and CIi is the reference amount after standardization.If without document There are L₁And L₂Relationship, then KL (L₁→L₂)=0, KL (L₂→L₁)=0, KL (L₁⊥L₂)=1.Other kinds of document can To define in the same way, such as case history can be defined according to the authority of doctor.For meeting article, Ke Yigen Be defined according to the authority of meeting etc..

Preferably, cause and effect unit 3 is based on main characteristic parameters and data set constructs Bayesian network.Preferably, main special Levying parameter includes positive relationship certainty value, inverse relationship certainty value and vertical relation certainty value.The cause and effect unit 3 is according to such as Under type constructs Bayesian network:

S31: preferably, data set D=(D is defined₁, D₂……D_i) it is several groups illness, L=(L₁, L₂……L_n) a certain The specific set of disorders of group illness.Bayesian network evaluation function is constructed based on the relational knowledge base:

logP(G,D,K_L)=logP (G)+logP (D | G)+logP (K_L|G)

In formula, G is Bayes's grid, and value includes with L=(L₁, L₂……L_n) a certain group of illness specific illness Collection is combined into the directed acyclic graph of node.Wherein, P (G) is prior distribution.According to existing knowledge it is found that logP (G)+logP (D | G) Maximum value be equivalent to logP (G | D).LogP (G | D) it can be scored according to Bayesian information criterion BIC.In formula,

Wherein, as a line any in structure G is expressed as L_m→L_n, then KL (L_m→L_n) it is relationship certainty value.Summation in formula is the document to the corresponding positive relationship of directed edge all in structure G Knowledge confidence level is summed.

S32: non-directed graph structural constraint is constructed based on the relational knowledge base；For data-oriented collection D, to any in D Illness is to L_mAnd L_n, attribute is obtained to L to the relational knowledge base by retrieval illness_mAnd L_nIllness to number, according to described Illness is to illness in number searching document to L_mAnd L_nL in relationship certainty value table_m→L_nRelationship certainty value and L_n→L_mPass It is certainty value.If L₁Influence L₂, then its connection relationship is L₁Line L₂And it is directed toward L₂, construct L₁With L₂Directed edge, and assign just To relationship certainty value.If L₂Influence L₁, then its connection relationship is L₂Line L₁And it is directed toward L₂, construct L₂With L₁Directed edge, and assign Give negative sense relationship certainty value.If L₂Be independent of each other L₁, then both not line, and assign vertical relation certainty value.

S33: Bayesian network is constructed based on the Bayesian network evaluation function and the non-directed graph structural constraint.It determines After the non-directed graph structural constraint of Bayesian network, it is optimal to seek score function such as K2 algorithm for executable heuristic search algorithm Network structure.General step are as follows: searched for since initial model, in each step of search, first with searching operators to current mould Type carries out partial modification, obtains a series of candidate families, then calculates the scoring of each candidate family, and by best candidate model Compared with "current" model.If the scoring of best candidate model is big, it is continued searching as next "current" model；Otherwise stop It only searches for, returns to "current" model.According to bayesian principle, the maximum candidate family that scores is Bayesian network.Preferably, according to Bayesian network evaluation function is constructed according to according to the Bayesian network and Bayes rule of foundation.Bayesian network evaluation function It can be constructed according to classical heuristic structure learning algorithm, such as K2 algorithm, Max-Min Parents and Children are calculated Method and Markov chain monte carlo search etc.

Cause and effect unit 3 is put down based on to excavate the average cause-effect between illness by data pattern so as to basis Complication or complication whether are constituted between equal cause-effect illness.In average cause-effect, cause and effect unit 3 is based on Pearl Principle and bayesian network structure calculate cause-effect average between illness.Whether Pearl is event exploring event X When the reason of Y, needs to carry out X event by intervening X, calculate E (Y | do (X)), is i.e. event Y changes in the case where intervening X Average case be greater than significance, then it is assumed that the reason of X is Y.Specifically, it in data-oriented collection D or Di, screens first The illness for needing to study out, these illnesss include target conditions and the other illnesss for influencing the target conditions.For example, it is desired to grind Study carefully illness L1 whether be illness L2 complication, the side of all illnesss for being directed toward L1 is truncated, observes illness L1 and illness L2 at this time Average cause-effect, if it is this variation be greater than setting cause-effect threshold value, then it is assumed that illness L1 is constituted with illness L2 Complication, conversely, constituting complication.

When cause and effect unit 3 is based on to excavate the average cause-effect between illness by data pattern, due to Document Quantity It is huge, to cause the huge of Bayes's grid, therefore, average cause-effect is calculated using back door criterion.Back door criterion is Refer to, Bayes's grid G is a directed acyclic graph, (L_m, L_n) be G a pair of of node, node Z set be (L_m, L_n) back door, Wherein, node all in Z is not the offspring of Z and Z has blocked all direction L_mConnection L_mTo L_nPath.Therefore, may be used With by back door principle come reasoning illness to L_mAnd L_nCausality.

Embodiment 2

Supplement of the present embodiment as embodiment 1.In order to the case where not influencing the causality between illness pair Under, cause and effect unit 3 simplifies undirected constraint diagram by independence test.For example, independence test can be examined using card side's independence It tests.

In the present invention, independence test can also be in the following way:

For illness L_m, the non-directed graph by way of compiling based on building obtains and L_mThe node being connected constitutes its section Point set.And gradually calculate each node and illness L_mBetween correlation, and the node for therefrom choosing correlation maximum carry out it is only Vertical property it is assumed that delete under given subset D with L_mIndependent node.In the present invention, stochastic variable is measured to L using entropy_m's It is uncertain.In given stochastic variable L_mIn the case where, stochastic variable L_nUncertainty can use conditional entropy for example under type measure:

Stochastic variable L_nWith L_mBetween degree of correlation can be measured by mutual information:

If mutual information has exceeded the threshold value of mutual information, then it is assumed that L_nWith L_mWith correlation.If mutual information without departing from The threshold value of mutual information, then it is assumed that L_nWith L_mDo not have correlation.

Embodiment 3

The building system for the causality knowledge base towards medicine that present embodiment discloses a kind of, do not cause conflict or In contradictory situation, the entirety and/or partial content of the preferred embodiment of other embodiments can be used as the benefit of the present embodiment It fills.Preferably, this method can be realized by method of the invention and/or other alternative modules.

As shown in Fig. 2, the system mainly includes literary unit 1, data cell 2, cause and effect unit 3 and blocks of knowledge 4.Document Unit 1 is configured in for constructing original library.Data cell is configured in 2 for constructing data set.Cause and effect unit 3 is matched It is placed in for the average cause-effect between illness.Blocks of knowledge 4 is configured in for storing the original library, the number According to collection and/or the average cause-effect to construct the knowledge base that can be read and/or show, in the form of data quantization It is supplied to medical worker reference, study and/or decision.Preferably, literary unit 1 can be based on user-defined request Numerous pertinent literatures containing various diseases simultaneously carry out classification to it and form several literary unit bodies to construct original library, with Data cell 2 to be based on literary unit body to obtain main characteristic parameters and construct data set based on main characteristic parameters, To reduce numerous characteristic parameters of numerous pertinent literatures formation for interference causal between illness and improve original The utility value in library.Cause and effect unit 3 is based on main characteristic parameters and data set constructs Bayesian network, to be dug by data pattern The average cause-effect between illness is dug, so as to whether constitute complication or merging according between average cause-effect illness Disease.

Preferably, literary unit 1 counts the frequency of words/phrases in each document, obtains according to independence assumption single Word/phrase joint probability of occurrence.Literary unit 1 calculates the relevance intensity of words/phrases, and is corrected based on relevance intensity Joint probability of occurrence is to obtain the association reduced coordinate of the document.Association reduction of the literary unit 1 based on whole pertinent literatures The classification function of coordinate building classifies pertinent literature according to the form of iterative algorithm to form several literary unit bodies.Its In, classification function can carry out deep learning based on the sample size of pertinent literature, to enhance the precision of literary unit 1.

Preferably, in the case where data cell 2 obtains literary unit body, data cell 2 is according to illness to pairing Mode obtains data set.Data cell 2 extracts wherein each pertinent literature in a manner of the syntactic analysis of natural language processing Relationship between illness pair, to establish the relational knowledge base of illness pair, the relationship between illness pair includes positive relationship, reversed Relationship and vertical relation.Also, data cell 2 is based on relationship knowledge table to the document for including illness pair in literary unit body Retrieved relationship certainty value library of the relationship certainty value that illness pair is obtained in a manner of fusion to establish illness pair, illness pair Between relationship include positive relationship certainty value, inverse relationship certainty value and vertical relation certainty value.To 2 base of data cell In to the relational knowledge base established in the way of matching two-by-two between all illnesss and relationship certainty value library building data set.

Cause and effect unit constructs Bayesian network as follows, S31: being commented based on relational knowledge base building Bayesian network Valence function:

logP(G,D,K_L)=logP (G)+log (D | G)+logP (K_L|G)

S32: non-directed graph structural constraint is constructed based on relational knowledge base.To data set D, to any illness in data set D To L_mAnd L_n, attribute is obtained to L to relational knowledge base by retrieval illness_mAnd L_nIllness to number, number is examined according to illness Illness is to L in Suo Wenxian_mAnd L_nL in relationship certainty value table_m→L_nRelationship certainty value and L_n→L_mRelationship certainty value.

S33: Bayesian network is constructed based on Bayesian network evaluation function and non-directed graph structural constraint.

Preferably, in the present invention, literary unit 1, data cell 2, cause and effect unit 3 and blocks of knowledge 4 are a kind of tools There is the microprocessor of calculation function.For example, the literary unit 1 used in the present invention is that with search engine and have operation The server of function.Data cell 2 is the data server with calculation function.Cause and effect unit 3 is the number with calculation function According to server.Blocks of knowledge 4 is the memory with access facility, as RAM ROM disk at least one of cloud disk.Document Unit 1, data cell 2, cause and effect unit 3 and blocks of knowledge 4 pass through optical fiber, data line, bluetooth, wifi and/or 4G between each other It is connected etc. wired, wireless communication mode.

It should be noted that above-mentioned specific embodiment is exemplary, those skilled in the art can disclose in the present invention Various solutions are found out under the inspiration of content, and these solutions also belong to disclosure of the invention range and fall into this hair Within bright protection scope.It will be understood by those skilled in the art that description of the invention and its attached drawing are illustrative and are not Constitute limitations on claims.Protection scope of the present invention is defined by the claims and their equivalents.

Claims

1. a kind of building system of the causality knowledge base towards medicine, comprising:

Literary unit (1): for constructing original library；

Data cell (2): for constructing data set；

Cause and effect unit (3): for calculating the average cause-effect between illness；

Blocks of knowledge (4): for storing the original library, the data set and/or the average cause-effect to construct The knowledge base that can be read and/or show,

It is characterized in that,

The literary unit (1), which can obtain numerous pertinent literatures containing various diseases and carry out classification to it, forms several texts Cell cube is offered to construct original library, so that the data cell (2) can be obtained mainly based on the literary unit body Characteristic parameter simultaneously constructs data set based on the main characteristic parameters,

The cause and effect unit (3) is based on the main characteristic parameters and the data set constructs Bayesian network, to pass through data Average cause-effect between pattern analysis illness, so that the blocks of knowledge (4) can be based on the pertinent literature to be formed The mode of the corresponding relationship of the average cause-effect between the illness constructs the knowledge base.

2. building system as described in claim 1, which is characterized in that the literary unit (1) is configured in for counting every The frequency of words/phrases in one document obtains the word/phrase joint probability of occurrence according to independence assumption；

The literary unit (1) calculates the relevance intensity of the word/phrase, and is corrected based on the relevance intensity It is described to combine probability of occurrence to obtain the association reduced coordinate of the document；

The literary unit (1) literary unit (1) constructs the association reduced coordinate of the document, and based on described in whole The classification function that the association reduced coordinate of pertinent literature and the relevance intensity construct is by the pertinent literature according to repeatedly Classify for the form of algorithm to form several literary unit bodies；

Wherein, the classification function can carry out deep learning based on the sample size of the pertinent literature, to enhance the text Offer the precision of unit (1).

3. building system as claimed in claim 1 or 2, which is characterized in that obtain the document in the data cell (2) In the case where cell cube, the data cell (2) is configured in obtains the data set in such a way that illness is to pairing:

The data cell (2) extracts each described pertinent literature wherein sick in a manner of the syntactic analysis of natural language processing Relationship between disease pair, to establish the relational knowledge base of the illness pair, the relationship between the illness pair includes positive closes System, inverse relationship and vertical relation；

Also, the data cell (2) is based on relationship knowledge table to including the illness pair in the literary unit body Document is retrieved the relationship certainty value for being obtained the illness pair in a manner of fusion and is believed to establish the relationship of the illness pair Angle value library, the relationship between the illness pair includes positive relationship certainty value, inverse relationship certainty value and vertical relation certainty value；

To which, the data cell (2) is based on knowing the relationship established in the way of matching two-by-two between all illnesss Know library and relationship certainty value library constructs the data set.

4. the building system as described in one of preceding claims, which is characterized in that the cause and effect unit (3) be configured according to If under type constructs Bayesian network,

S31: Bayesian network evaluation function is constructed based on the relational knowledge base:

logP(G,D,K_L)=logP (G)+log (D | G)+logP (K_L|G)

S32: non-directed graph structural constraint is constructed based on the relational knowledge base；To the data set D, in the data set D Any illness is to L_mAnd L_n, attribute is obtained to L to the relational knowledge base by retrieval illness_mAnd L_nIllness to number, according to The illness is to illness in number searching document to L_mAnd L_nL in relationship certainty value table_m→L_nRelationship certainty value and L_n→L_m Relationship certainty value,

5. the building system as described in one of preceding claims, which is characterized in that the cause and effect unit (3), which is configured in, to be based on The Bayesian network and Pearl principle calculate the average cause-effect between each illness pair,

When the average cause-effect is more than the cause-effect threshold value of setting, complication is constituted between illness；Described average When cause-effect is no more than the cause-effect threshold value of setting, complication is constituted between illness.

6. the building system as described in one of preceding claims, which is characterized in that for illness L_m, the base by way of traversal In the non-directed graph structural constraint acquisition and illness L_mThe node being connected constitutes its node collection；And gradually calculate each node with Illness L_mBetween correlation, and therefrom choose correlation maximum node carry out independence assumption, delete in data-oriented collection D Lower and L_mIndependent node；

When the mutual information has exceeded the threshold value of mutual information, then illness L_nWith illness L_mIt is less independent with correlation；Institute The threshold value of the mutual information without departing from mutual information is stated, then illness L_nWith illness L_mIt is independent without correlation.

7. a kind of construction method of the causality knowledge base towards medicine, is related to a kind of a variety of diseases of the excavation based on literature search Whether the construction method of the knowledge base of complication or complication is constituted between disease, and step includes:

Literary unit (1) constructs original library；

Data cell (2) constructs data set；

Cause and effect unit (3) is used to calculate the average cause-effect between illness；

Blocks of knowledge (4) stores the original library, the data set and/or the average cause-effect and can be read with constructing The knowledge base for taking and/or showing；

It is characterized in that,

The cause and effect unit (3) is based on the main characteristic parameters and the data set constructs Bayesian network, to pass through data Average cause-effect between pattern analysis illness, so that the blocks of knowledge (4) can be based on the pertinent literature to be formed The mode of the corresponding relationship of the average cause-effect between the illness and between the illness constructs the knowledge base.

8. construction method as claimed in claim 7, which is characterized in that the pertinent literature classification is carried out as follows:

S11: the literary unit (1) counts the frequency of words/phrases in each document, obtain according to independence assumption described in Word/phrase joint probability of occurrence；

S12: the literary unit (1) calculates the relevance intensity of the word/phrase；

S13: the literary unit (1) constructs the association reduced coordinate of the document, and based on whole pertinent literatures The classification function that the association reduced coordinate and the relevance intensity construct is by the pertinent literature according to the shape of iterative algorithm Formula is classified to form several literary unit bodies.

9. construction method as claimed in claim 7 or 8, which is characterized in that obtain the document in the data cell (2) In the case where cell cube, the data cell (2) obtains the data in such a way that illness forms the illness pair to pairing Collection；

The data cell (2) extracts the disease each described pertinent literature in a manner of the syntactic analysis of natural language processing Relationship between disease pair, to establish the relational knowledge base of the illness pair, the relationship between the illness pair includes positive closes System, inverse relationship and vertical relation；

To which, the data cell (2) is based on knowing the relationship established in the way of matching two-by-two between all illnesss Know library and relationship certainty value library constructs the data set D.

10. the construction method as described in one of claim 7 to 9, which is characterized in that the cause and effect unit (3) is according to such as lower section Formula constructs Bayesian network,

logP(G,D,K_L)=logP (G)+logP (D | G)+logP (K_L|G)

S32: non-directed graph structural constraint is constructed based on the relational knowledge base；For the data set D, in the data set D Any illness to L_mAnd L_n, attribute is obtained to L to the relational knowledge base by retrieval illness_mAnd L_nIllness to number, root According to the illness to illness in number searching document to L_mAnd L_nL in relationship certainty value table_m→L_nRelationship certainty value and L_n→ L_mRelationship certainty value,