CN106874643A - Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector - Google Patents

Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector Download PDF

Info

Publication number
CN106874643A
CN106874643A CN201611222893.XA CN201611222893A CN106874643A CN 106874643 A CN106874643 A CN 106874643A CN 201611222893 A CN201611222893 A CN 201611222893A CN 106874643 A CN106874643 A CN 106874643A
Authority
CN
China
Prior art keywords
disease
correlation factor
correlation
dictionary
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611222893.XA
Other languages
Chinese (zh)
Other versions
CN106874643B (en
Inventor
张文生
牛景昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201611222893.XA priority Critical patent/CN106874643B/en
Publication of CN106874643A publication Critical patent/CN106874643A/en
Application granted granted Critical
Publication of CN106874643B publication Critical patent/CN106874643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/043Distributed expert systems; Blackboards
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The method and system that knowledge base realizes assisting in diagnosis and treatment is built automatically based on term vector the present invention relates to a kind of.Wherein, the method can include:Obtain patient's description;Using the disorder correlation factor dictionary of the expansion set up based on term vector, Keywords matching is carried out to patient's description, extracted during patient describes with the related word of medical science and expression;Whether Detection and Extraction word out and expression are in standard disorder correlation factor dictionary;Based on testing result, the correlation for corresponding to disease with reference to the disease correlation factor obtained according to the disorder correlation factor dictionary for expanding is given a mark, and calculates the fraction of disease;Fraction to disease is ranked up;Determine disease according to ranking results.Thus, the present invention solve how the technical problem made prediction to the description of the spoken state of an illness of patient.

Description

Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector
Technical field
The present embodiments relate to technical field of data processing, knowledge is built based on term vector automatically more particularly, to one kind Realize the method and system of assisting in diagnosis and treatment in storehouse.
Background technology
Along with the online question and answer website of many doctors and patients of internet medical field and the fast development of mobile phone application service, sea The colloquial style description of the conditions of patients of amount and all kinds of integrated informations, and corresponding diagnosis result constitutes question and answer pair, Form the interrogation knowledge base of preciousness.Because these records are often unstructured data, and there are a large amount of colloquial style descriptions , directly can there is lot of challenges using these data in caused non-standard medical terminology.At the same time, the patient of online interrogation There are a large amount of repeated works in case, this is a kind of waste for valuable doctor's human resources.If can be calculated using artificial intelligence Method makes preliminary diagnostic result instead of doctor, will greatly promote interrogation efficiency.This task can be summarized as:It is new to one defeated Description of the patient for entering on integrated informations such as itself sex, age, symptom, history of disease, using Sentence analysis and related algorithm, With reference to the advance domain knowledge collection of illustrative plates for building, the medical diagnosis on disease prediction of result of patient is returned.
Existing technical scheme mainly has following two methods:1st, by searching in question and answer storehouse and patient's description similarity most Problem high, returns to corresponding diagnosis result.The subject matter of this kind of method is during inreal analysis patient describes The disease information of appearance, the similarity of text can not completely reflect the similarity of conditions of patients, and matching accuracy is not good enough.2nd, lead to Cross patient and click the information such as symptom and the disease sites related to the state of an illness, the information labels correspondence disease that superposition expert marks in advance Marking, finally returns that a probability sorting that may be ill.The problem of this kind of method is that artificial marking exists greatly unstable Property and subjectivity, and need mark disease quantity it is big when to expend substantial amounts of manpower and time cost, in addition, right Information outside optional symptom, diagnostic system cannot analysis and utilization.
In view of this, it is special to propose the present invention.
The content of the invention
In order to solve above mentioned problem of the prior art, it has been and has solved how to make pre- to the description of the spoken state of an illness of patient The technical problem of survey, embodiment of the present invention offer is a kind of to build the method that knowledge base realizes assisting in diagnosis and treatment based on term vector automatically. Additionally, the embodiment of the present invention is also provided and a kind of is built the system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector.
To achieve these goals, according to an aspect of the present invention, there is provided following technical scheme:
A kind of to build the method that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector, the method includes:
Obtain patient's description;
Using the disease-disease correlation factor dictionary of the expansion set up based on term vector, keyword is carried out to patient's description Matching, extracts during patient describes with the related word of medical science and expression;
Whether Detection and Extraction word out and expression are in standard disease-disease correlation factor dictionary;
Based on testing result, with reference to the disease correlation factor pair obtained according to the disease-disease correlation factor dictionary for expanding Should be given a mark in the correlation of disease, calculate the fraction of disease;
Fraction to disease is ranked up;
Determine disease according to ranking results.
Further, the disease of expansion-disease correlation factor dictionary can be set up in the following manner:
Model is represented using term vector embedding distribution formula of the medical information training on disease-disease correlation factor;
Model is represented based on term vector embedding distribution formula, using distance metric method to standard disease-disease correlation factor Dictionary is expanded, and sets up disease, the disease correlation factor dictionary for expanding.
Further, represented using term vector embedding distribution formula of the medical information training on disease-disease correlation factor Model, can specifically include:
Obtain medical information training corpus;
Medical information training corpus is cleaned;
Count the high frequency expression way occurred in question and answer storehouse records, power of the increase high frequency expression way in participle model Weight, and Chinese word segmentation is carried out, obtain training text;
Training text is trained, generation term vector embedding distribution formula represents model.
Further, disease correlation factor can determine in the following manner corresponding to the correlation marking of disease:
Model is represented based on term vector embedding distribution formula, using distance metric method to standard disease-disease correlation factor Dictionary is expanded, and is set up and is replaced vocabulary;
Use the disease-disease correlation factor dictionary and replacement vocabulary that expand, the disease-disease phase in matching medical information The factor is closed, the correlation marking that disease correlation factor corresponds to disease is calculated.
Further, the disease-disease correlation factor dictionary and replacement vocabulary for expanding, the disease in matching medical information are used Disease-disease correlation factor, calculates the correlation marking that disease correlation factor corresponds to disease, can specifically include:
Using the disease-disease correlation factor dictionary for expanding, the matching of keyword is carried out to doctors and patients' Question Log, extract doctor Suffer from Question Log with the related word of medical science and expression;
With the related word of medical science and expression whether in standard disease-disease phase in doctors and patients' Question Log that Detection and Extraction go out In the factor dictionary of pass;
If not existing, according to vocabulary is replaced, with the related word of medical science and expression in the doctors and patients' Question Log that will be extracted Normalize to during corresponding standard scale reaches;
Reached based on standard scale, the frequency of statistics disease and its correlation factor co-occurrence obtains disease correlation factor and disease Co-occurrence frequency records matrix;
Co-occurrence frequency record matrix based on disease correlation factor and disease, using non-linear transformation method, obtains disease The correlation that correlation factor corresponds to disease is given a mark.
Further, the method can also include:
Model is represented based on term vector embedding distribution formula, using distance metric method to standard disease-disease correlation factor Dictionary is expanded, and is set up and is replaced vocabulary;
Whether Detection and Extraction word out and expression specifically include in standard disease-disease correlation factor dictionary:
If being not detected by, according to vocabulary is replaced, the word and expression that will be extracted normalize to corresponding standard scale In reaching, obtain standardizing disease correlation factor;
Based on testing result, with reference to the disease correlation factor pair obtained according to the disease-disease correlation factor dictionary for expanding Should be given a mark in the correlation of disease, calculate the fraction of disease, specifically included:
Based on standardization disease correlation factor, with reference to the disease obtained according to the disease-disease correlation factor dictionary for expanding The correlation that correlation factor corresponds to disease is given a mark, and calculates the fraction of disease.
Further, disease correlation factor can be determined corresponding to the correlation marking of disease by following formula:
Wherein, Score (i, j) represents that disease correlation factor corresponds to the correlation marking of disease;P(Di|Fj) represent suffer from The conditional probability of disease;DiRepresent disease;FjRepresent disease correlation factor;NiRepresent disease frequency, Ni=∑jNij, NijRepresent note Record frequency.
Further, the fraction of disease can be obtained by following formula:
Wherein, DS (Di) represent disease fraction;DiRepresent disease;W(Fj) represent disease category mapping weights;Score (i, j) represents that disease correlation factor corresponds to the correlation marking of disease.
To achieve these goals, according to another aspect of the present invention, following technical scheme is additionally provided:
A kind of to build the system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector, the system can include:
Acquisition module, for obtaining patient's description;
Extraction module, for the disease-disease correlation factor dictionary using the expansion set up based on term vector, is retouched to patient Stating carries out Keywords matching, extracts during patient describes with the related word of medical science and expression;
Detection module, for Detection and Extraction word out and expression whether in standard disease-disease correlation factor dictionary In;
Computing module, for based on testing result, with reference to the disease obtained according to the disease-disease correlation factor dictionary for expanding The correlation that sick correlation factor corresponds to disease is given a mark, and calculates the fraction of disease;
Order module, is ranked up for the fraction to disease;
Determining module, for determining disease according to ranking results.
Further, extraction module can also specifically include:
Term vector model sets up unit, for training the term vector on disease-disease correlation factor using medical information Embedding distribution formula represents model;
Extended lexicon sets up unit, for representing model based on term vector embedding distribution formula, uses distance metric method pair Standard disease-disease correlation factor dictionary is expanded, and sets up disease, the disease correlation factor dictionary for expanding.
Further, term vector model is set up unit and can specifically be included:
Acquiring unit, for obtaining medical information training corpus;
Cleaning unit, for being cleaned to medical information training corpus;
First statistic unit, for counting the high frequency expression way occurred in question and answer storehouse records, increase high frequency expression side Weight of the formula in participle model, and Chinese word segmentation is carried out, obtain training text;
Generation unit, for being trained to training text, generation term vector embedding distribution formula represents model.
Further, computing module can also specifically include:
First replacement vocabulary sets up unit, for representing model based on term vector embedding distribution formula, using distance metric side Method expands standard disease-disease correlation factor dictionary, sets up and replaces vocabulary;
Correlation marking computing unit, for using the disease-disease correlation factor dictionary and replacement vocabulary for expanding, matching Disease-disease correlation factor in medical information, calculates the correlation marking that disease correlation factor corresponds to disease.
Further, correlation marking computing unit can specifically include:
Extraction unit, for using the disease-disease correlation factor dictionary for expanding, keyword being carried out to doctors and patients' Question Log Matching, extract in doctors and patients' Question Log with the related word of medical science and expression;
Detection unit, with the related word of medical science and expression whether in mark in the doctors and patients' Question Log gone out for Detection and Extraction In quasi- disease-disease correlation factor dictionary;
First normalization unit, for word and expression not in standard disease-disease correlation factor dictionary when, according to Vocabulary is replaced, corresponding standard scale is normalized to and is reached with the related word of medical science and expression in the doctors and patients' Question Log that will be extracted In;
Second statistic unit, for being reached based on standard scale, the frequency of statistics disease and its correlation factor co-occurrence obtains disease The co-occurrence frequency record matrix of correlation factor and disease;
Non-linear conversion unit, matrix is recorded for the co-occurrence frequency based on disease correlation factor and disease, uses non-thread Property transform method, obtain disease correlation factor corresponding to disease correlation give a mark.
Further, the system includes:
Second replacement vocabulary sets up unit, for representing model based on term vector embedding distribution formula, using distance metric side Method expands standard disease-disease correlation factor dictionary, sets up and replaces vocabulary;
Above-mentioned detection module can specifically include:
Second normalization unit, in the word that extracts and expression not in standard disease-disease correlation factor word When in allusion quotation, according to replacing vocabulary, the word and expression that will be extracted are normalized to during corresponding standard scale reaches, and are standardized Disease correlation factor;
Above-mentioned computing module can specifically include:
Disease Score computing unit, for based on standardization disease correlation factor, with reference to according to the disease-disease phase for expanding The correlation that the disease correlation factor that pass factor dictionary is obtained corresponds to disease is given a mark, and calculates the fraction of disease.
Embodiment of the present invention offer is a kind of to build the method and system that knowledge base realizes assisting in diagnosis and treatment based on term vector automatically. Wherein, the method can include:Obtain patient's description;Using the disease-disease correlation factor of the expansion set up based on term vector Dictionary, Keywords matching is carried out to patient's description, is extracted during patient describes with the related word of medical science and expression;Detection and Extraction go out Whether the word for coming and expression are in standard disease-disease correlation factor dictionary;Based on testing result, with reference to according to the disease for expanding The correlation that the disease correlation factor that disease-disease correlation factor dictionary is obtained corresponds to disease is given a mark, and calculates the fraction of disease;It is right The fraction of disease is ranked up;Determine disease according to ranking results.Wherein, the embodiment of the present invention is utilized for medical domain training Term vector it is distributed represent, set up the disease-disease correlation factor keyword dictionary for expanding, it is possible to use including general medical science , in interior multi-source medical information, study builds disease knowledge collection of illustrative plates, analysis for data and colloquial internet doctors and patients Question Log Treatment nonstandardized technique, the description of colloquial conditions of patients, thus, the present invention is solved and how the description of the spoken state of an illness of patient done Go out the technical problem of prediction.
Brief description of the drawings
In order to illustrate more clearly of present example or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for technology description does simply to be introduced, it should be apparent that, drawings in the following description are only this hair Some bright embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can be with root Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is according to embodiments of the present invention to build the stream that knowledge base realizes the method for assisting in diagnosis and treatment automatically based on term vector Journey schematic diagram;
Fig. 2 is according to embodiments of the present invention to build the knot that knowledge base realizes the system of assisting in diagnosis and treatment automatically based on term vector Structure schematic diagram.
Specific embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little implementation methods are used only for explaining know-why of the invention, it is not intended that limit the scope of the invention.
The basic thought of the embodiment of the present invention is to use term vector embedded technology, and generation is to general medical information and online doctor Suffer from the distributed expression of disease-disease correlation factor in colloquial style Question Log, patient cases' database, build automatically disease- The knowledge mapping of disease correlation factor, and then realize the auxiliary diagnosis to the description of patient's colloquial style state of an illness.
Need the term of explanation or be defined as follows:
Disease correlation factor:May cause, help judge or the various factors containing certain disease information, such as:Disease disease Shape, history of disease, age, morbidity sign, sex etc..
Term vector is embedded in:The method for utilizing " Distributed Representation ", by a word (or phrase) With the continuous real number vector representation of low dimensional (for example, less than 1000 dimensions), such that it is able to being distinguished with these vectors or being represented The task of the natural language processings such as these words, treatment text classification, relation extraction.
Co-occurrence frequency:In a paragraph or document, certain several word or concept are while go out now referred to as once common It is existing, count the appearance number of these words in representational whole documents, i.e. co-occurrence frequency.
Embodiment of the present invention offer is a kind of to build the method that knowledge base realizes assisting in diagnosis and treatment based on term vector automatically.Such as Fig. 1 Shown, the method can include:
S100:Obtain patient's description.
S110:Using the disease-disease correlation factor dictionary of the expansion set up based on term vector, patient's description is closed Keyword is matched, and is extracted during patient describes with the related word of medical science and expression.
Wherein, the disease of expansion-disease correlation factor dictionary is set up by step S112 to step S114.
S112:Model is represented using term vector embedding distribution formula of the medical information training on disease-disease correlation factor.
Wherein, medical information includes but is not limited to general medical information, doctors and patients' Question Log, patient cases and medical science disease Relevant text data of disease, disease correlative factor etc..General medical information includes but is not limited to medical literature (for example:Medical science opinion Text, medical science patent document), textbook (especially medical text books), medical thesis.
Preferably, doctors and patients' Question Log is the online doctors and patients' colloquial style Question Log of network.
Specifically, step S112 can include:
S1121:Obtain medical information training corpus.
Wherein, medical information training corpus can include but is not limited to question and answer storehouse, medical text books, case storehouse etc..
S1122:Medical information training corpus is cleaned.
The purpose of this step is to remove meaningless character.
S1123:The high frequency expression way occurred in question and answer storehouse records is counted, increase high frequency expression way is in participle model In weight, and carry out Chinese word segmentation, obtain training text.
S1124:Training text is trained, generation term vector embedding distribution formula represents model.
In the training process, it is possible to use training corpus include but is not limited to online doctors and patients' Question Log, patient history, Textbook.The embodiment of the present invention uses but is not limited to the word2vec Open-Source Tools (https that Mikolov Tomasti are proposed:// Github.com/danielfrg/word2vec) the term vector insertion of training generation medical domain represents model, and is preserved In knowledge base.In training process, it is possible to use neutral net or other training algorithms.The method of relevant term vector training can be with Referring to Application No.:201610179115.0th, the document of 201510096570.X, the document is hereby incorporated by reference herein This.Correlation table shows that the experiment of learning areas paper shows that bigger training corpus can obtain more preferably term vector.
For example, in actual applications, it is possible to use participle simultaneously cleans the dimension of text data training 300, hundreds of thousands kind table The term vector stated, which part high frequency words are for example expressed as:
<Tumour:0.176907,0.470268, -0.008468 ... 300 ties up totally>
<Blood sugar:0.149234,0.278761, -0.474681 ... 300 ties up totally>
<Fever:0.184283,0.046142, -0.107758 ... 300 ties up totally>
<Have a high fever:0.204092,0.089622,0.0057266 ... totally 300 tie up>
<Hyperpyrexia:0.366153,0.314256,0.073571 ... totally 300 tie up>
Can also include in some optional implementations, the step of be trained to training text:It is trained text The low-dimensional real number vector representation of this medium-high frequency word.
Wherein, low-dimensional can be set according to actual conditions, for example, can set less than 1000 dimensions.
S114:Model is represented based on term vector embedding distribution formula, it is related to standard disease-disease using distance metric method Factor dictionary is expanded, and sets up disease, the disease correlation factor dictionary for expanding.
It will be apparent to one skilled in the art that can be with during the disease, the disease correlation factor dictionary that expand is set up Set up and replace vocabulary, i.e., model is represented based on term vector embedding distribution formula, using distance metric method to standard disease-disease phase Close factor dictionary to be expanded, set up and replace vocabulary.
Wherein, medical expert combines specific prediction task, and it is ginseng to construct the standard disease-disease correlation factor dictionary safeguarded The textbook and required standard of authority are examined, the disease formulated by medical expert and corrected and safeguard and disease factor include collection Close, it is the term set of standard disease-disease correlation factor, its need the disease specific that combination to be predicted and disease symptomses, The relevant informations such as history of disease, age, morbidity sign, sex are arranged and safeguarded.For example, heart disease, depression can be as Element in two standard diseases dictionary (set), and have a sleepless night, diabetic history can be as two standard disease correlation factor words Element in allusion quotation (set).
Distance metric method includes but is not limited to cosine (cosine) distances, Euclidean distance or other distance metric sides Method.
For each element in standard disease-disease correlation factor dictionary, calculated using distance metric method, and Closest k word or phrase expression way in term vector vocabulary is found, standard disease-disease correlation factor is recorded as The replacement of element in dictionary.Thus isomery expression way to the replacement vocabulary of standard expression way is set up, while set up one knowing Know the disease-disease correlation factor dictionary of the expansion in storehouse.I.e.:Each interchangeable element is added to primary standard disease-disease In correlation factor dictionary, the disease-disease correlation factor dictionary for expanding is formed.Wherein, represent can for specific tasks and data for k With the parameter for adjusting.
Described in detail by having " heating " in classical symptom correlation factor as a example by this below by preferred embodiment The process of the disease for being expanded-disease correlation factor dictionary and replacement vocabulary, it is specifically included:Step A1 to step A3.
Step A1:Calculated and " heating " closest word or phrase expression way using cosine distances, obtained " having a high fever " and " hyperpyrexia ".Wherein, distance parameter k is 2.
Step A2:In the disease-disease correlation factor dictionary for expanding, add " having a high fever " and " hyperpyrexia ", while record " having a high fever " and " hyperpyrexia " is included in the replacement vocabulary of " heating " this standard disease correlation factor.
Step A3:Model is represented using the medical field term vector embedding distribution formula for training, to standard disease-disease phase Each element closed in factor dictionary performs same operation, so as to the disease-disease correlation factor dictionary for being expanded and replaces Change vocabulary.
S120:Whether Detection and Extraction word out and expression are in standard disease-disease correlation factor dictionary.
In this step, if detecting the word for extracting and expressing in standard disease-disease correlation factor dictionary, Do not processed;If being not detected by, according to vocabulary is replaced, the word and expression that will be extracted normalize to corresponding standard In expression, obtain standardizing disease correlation factor.Wherein, replace vocabulary and represent model by based on term vector embedding distribution formula, Standard disease-disease correlation factor dictionary is expanded using distance metric method and is set up and obtained.
The above-mentioned process step that do not carry out represents related using the standardization disease in standard disease-disease correlation factor dictionary The factor carries out subsequent treatment.
S130:Based on testing result, with reference to the disease obtained according to the disease-disease correlation factor dictionary for expanding it is related because Son is given a mark corresponding to the correlation of disease, calculates the fraction of disease.
In the present embodiment, when the word and expression for being not detected by extracting are in standard disease-disease correlation factor dictionary When middle, according to replacing vocabulary, the word and expression that will be extracted are normalized to during corresponding standard scale reaches, and obtain standardizing disease Sick correlation factor;Based on standardization disease correlation factor, with reference to the disease obtained according to the disease-disease correlation factor dictionary for expanding The correlation that sick correlation factor corresponds to disease is given a mark, and calculates the fraction of disease.When detecting the word and expression that extract When in standard disease-disease correlation factor dictionary, the standardization disease phase in standard disease-disease correlation factor dictionary is used The factor is closed, the correlation of disease is corresponded to reference to the disease correlation factor obtained according to the disease-disease correlation factor dictionary for expanding Property marking, calculate disease fraction.
Wherein, disease correlation factor is corresponded to the correlation of disease and gives a mark and determined to step S134 by step S132.
S132:Model is represented based on term vector embedding distribution formula, it is related to standard disease-disease using distance metric method Factor dictionary is expanded, and is set up and is replaced vocabulary.
S134:Using expand disease-disease correlation factor dictionary and replace vocabulary, matching medical information in disease- Disease correlation factor, calculates the correlation marking that disease correlation factor corresponds to disease.
Specifically, step S134 can include:
S1341:Using the disease-disease correlation factor dictionary for expanding, the matching of keyword is carried out to doctors and patients' Question Log, With the related word of medical science and expression in extraction doctors and patients' Question Log.
In one preferred embodiment, this step can utilize the disease-disease correlation factor dictionary for expanding, to doctor Suffering from the description of the state of an illness in question and answer storehouse and diagnostic result carries out the matching of keyword, extracts related with medical science in doctors and patients' Question Log Word and expression.
S1342:In doctors and patients' Question Log that Detection and Extraction go out with the related word of medical science and expression whether standard disease- In disease correlation factor dictionary.If performing step S1343;Otherwise, step S1344 is performed.
This step one by one Detection and Extraction correlation word out and expression whether standard disease-disease correlation factor word In allusion quotation, if not processed especially if;If it was not then being normalized to during corresponding standard scale reaches according to vocabulary is replaced.
S1343:Do not processed.
This step represents to be reached using the standard scale in standard disease-disease correlation factor dictionary carries out subsequent treatment.
S1344:According to vocabulary is replaced, with the related word of medical science and expression normalizing in the doctors and patients' Question Log that will be extracted Change to corresponding standard scale in reaching.
Above-mentioned steps S1344 can also include:When word multiple standard disease corresponding with expression or disease correlation factor, Carry out the standardization of medical science correlation word and expression.
Specifically, when a certain expression correspondence multiple standard disease or disease correlation factor, it is determined that with expression distance most Near standard correlation factor replaces the expression, obtains corresponding to the standardization disease correlation factor of patient description.
As an example, when certain word and expression correspond to more than one standard disease or disease correlation factor, using But it is not limited to cosine distances or Euclidean distance to calculate and find standard concept closest therewith, for replacing current table Up to mode, that is, carry out the standardization of medical science correlation word and expression.
For example, when certain expression correspond to more than one standard disease or disease correlation factor when, using but do not limit Calculate and find standard correlation factor closest therewith in cosine distances or Euclidean distance, for replacing current expression Mode.The input content that operation has obtained for this patient, comprising Q standardized disease correlation factor:{F1, F2,...Fj...FQ}。
S1345:Reached based on standard scale, statistics disease and its correlation factor co-occurrence frequency, obtain disease correlation factor and The co-occurrence frequency record matrix of disease.
Two kinds of elements are included in standard disease-disease correlation factor dictionary:Disease and disease correlation factor.For example, For m kind diseases, { D is defined as1...D2...Di...Dm, for n kind disease correlation factors, it is defined as {F1...F...Fj...Fn};By NijIt is initialized as zero.{ R is recorded in P bar question and answer storehouse1...R2...RS...RPIn, if RsIn Occur in that D simultaneouslyiAnd Fj, by NijThe frequency for increasing by 1, i.e. certain disease and certain disease correlation factor co-occurrence is recorded once.P bars are remembered Record is counted, and can obtain the disease correlation factor of m × n and the co-occurrence frequency record matrix of disease.
Wherein, P represents question and answer storehouse record strip number;R1,R2...Rs...RPRepresent question and answer storehouse record;NijRepresent record frequency.
S1346:Co-occurrence frequency record matrix based on disease correlation factor and disease, using non-linear transformation method, obtains The correlation for corresponding to disease to disease correlation factor is given a mark.
This step is in specific implementation process, it is contemplated that:In being recorded at certain, it is known that disease correlation factor FjOccur, that Suffer from disease DiConditional probability beAlthough conditional probability can be anti-to a certain extent Disease correlation factor to the possibility of disease is reflected, but is easily influenceed by the cumulative effect of high frequency common disease, caused in record Occurrence number amount common disease higher obtains high conditional probability.So, in final scoring functions should also include one with Ni=∑jNijRelevant control parameter.It is similarly to the inverse document frequency thought used in document classification field.
Preferably, disease correlation factor can be determined corresponding to the correlation marking of disease by following formula:
Wherein, Score (i, j) represents that disease correlation factor corresponds to the correlation marking of disease;P(Di|Fj) represent suffer from The conditional probability of disease;DiRepresent disease;FjRepresent disease correlation factor;NiRepresent disease frequency, Ni=∑jNij, NijRepresent note Record frequency.
Above formula contains conditional probability and a nonlinear transformation reciprocal to disease frequency.Final each disease it is related because Son at least one relevant disease of correspondence, corresponding fraction is represented with Score (i, j).
Above-mentioned steps are by using the disease-disease correlation factor dictionary for expanding, the disease-disease in matching medical information Correlation factor, calculates and disease correlation factor is stored in knowledge mapping and given a mark to the correlation of disease, can automatically learn structure Build the knowledge mapping for predictive disease.
In a preferred embodiment, can also include after step S1346:Periodically surveyed by A/B method of testings Examination scoring functions, and update correlation marking of the disease correlation factor corresponding to disease.
This step is considered that the quality of data, the quantity in original question and answer storehouse can all give a mark to the correlation of disease correlation factor and is produced The certain influence of life, at the same time, online medical interrogation platform can produce a large amount of new records daily.So, will be relevant In the scoring functions of the disease associated factor are stored in offline knowledge base, by online A/B tests Selection effect more periodically Good scoring functions version connection is reached the standard grade.
The training learning data of each version will be individually formed a disease correlation factor and be beaten to the correlation of disease Point versions of data, due to the marking of this correlation, to be not fully equal to certain disease factor priori related to disease general Rate, thus medical expert for give a mark evaluation only have referential, and its whether can be lifted disease determination accuracy and Friendliness using as final evaluation index, and whether the foundation that other versions are changed.
In construction of knowledge base process, with reference to existing knowledge base, it is possible to achieve the state of an illness being input into for patient and basic letter Breath description is analyzed, and provides the function of the disease that may be suffered from.
Described in detail with a preferred embodiment below and obtained what disease correlation factor was given a mark to the correlation of disease Process.Wherein, " throat swells and ache ", " flu " and " nasal obstruction runny nose " is in normal dictionary.The process for obtaining correlation marking can be with Including step B1 to step B5.
Step B1:Obtain in original question and answer storehouse one " I swells and ache at throat, has a high fever always in the past few days, have a stuffy nose runny nose, please Ask doctor I what disease obtained " and " may suffer from catch a cold " question and answer pair.
Step B2:To the question and answer to processing, " throat swells and ache ", " having a high fever ", " nasal obstruction runny nose " and " sense are matched Emit ".
Step B3:According to step S121 and step S122, replaced with " fever " using replacing vocabulary and " will have a high fever ".
Step B4:3 disease correlation factors and 1 disease are matched one by one, the frequency of statistics disease and correlation factor co-occurrence Number, obtains the co-occurrence frequency record matrix of disease correlation factor and disease.
Step B5:Determine that disease correlation factor corresponds to the correlation marking of disease according to following formula:
Wherein, Score (i, j) represents that disease correlation factor corresponds to the correlation marking of disease;P(Di|Fj) represent suffer from The conditional probability of disease;DiRepresent disease;FjRepresent disease correlation factor;NiRepresent disease frequency, Ni=∑jNij, NijRepresent note Record frequency.
In a preferred embodiment, the fraction of disease can be obtained by following formula:
Wherein, DS (Di) represent disease fraction;DiRepresent disease;W(Fj) represent disease category mapping weights;Score (i, j) represents that disease correlation factor corresponds to the correlation marking of disease.
For example, in { F1,F2,...Fj...FQIn, relevant factor F is described with patient for eachj, with reference to mark The species of standardization disease correlation factor, using following formula in each associated disease DiUpper superposition marking:
Wherein, DS (Di) represent disease fraction;DiRepresent disease;W(Fj) represent disease category mapping weights;Score (i, j) represents that disease correlation factor corresponds to the correlation marking of disease.
In above formula, because the different factors are different for the judgement confidence level of disease forecasting, the species according to the factor is not Together, different disease category mapping weights are assigned.Wherein, mapping relations can be formulated by expert according to category attribute.For example: " smoking habit " belongs to the disease correlation factor of habits and customs classification;" fever " belongs to the disease correlation factor of disease symptomses class, When being calculated, the weights of classification are determined, weights are mapped using different disease categories.
S140:Fraction to disease is ranked up.
S150:Determine disease according to ranking results.
Describe the marking sequence for obtaining doubting using the embodiment of the present invention and suffering from the disease in detail with a preferred embodiment below.Its In, the state of an illness of patient is described as " continuously occur fervescence in the past few days, there is smoking habit, be what disease ".Given a mark The process of sequence can include step C1 to step C7.
Step C1:Using the disease-disease correlation factor dictionary for expanding, to " continuously occurring fervescence in the past few days, there is suction Cigarette is accustomed to, and is what disease " matching of keyword is carried out, extract " hyperpyrexia " and " smoking habit ".
Step C2:Detect " hyperpyrexia " to be present in the disease of expansion-disease correlation factor dictionary, without in standard disease In disease-disease correlation factor dictionary.
Step C3:According to vocabulary is replaced, " hyperpyrexia " is replaced with " fever ".
Step C4:Species according to belonging to " fever " and " smoking habit ", determines mapping weights respectively.
Step C5:Fraction of the patient with various disease is determined according to following formula:
Step C6:Fraction to various disease is ranked up.
Step C7:Output comes the disease of front three:<Acpuei pharyngitis:0.143531>、<Acute tonsil enlargement: 0.129281>、<Tracheal disease:0.062088>.
Although each step is described according to the mode of above-mentioned precedence in above-described embodiment, this area Technical staff is appreciated that to realize the effect of the present embodiment, not necessarily in the execution of such order between different steps, It (parallel) execution simultaneously or can be performed with the order for overturning, these simple changes all protection scope of the present invention it It is interior.
Based on above method embodiment identical technology design, the embodiment of the present invention provides a kind of automatic based on term vector Build the system that knowledge base realizes assisting in diagnosis and treatment.This is based on term vector and builds knowledge base automatically realizing that the system of assisting in diagnosis and treatment can be with Execution is above-mentioned to build the embodiment of the method that knowledge base realizes assisting in diagnosis and treatment based on term vector automatically.As shown in Fig. 2 the system 20 can To include:Acquisition module 21, extraction module 22, detection module 23, computing module 24, order module 25 and determining module 26.Its In, acquisition module 21 is used to obtain patient's description.Extraction module 22 is used for the disease-disease using the expansion set up based on term vector Sick correlation factor dictionary, Keywords matching is carried out to patient's description, is extracted during patient describes with the related word of medical science and expression. Whether detection module 23 is used for Detection and Extraction word out and expression in standard disease-disease correlation factor dictionary.Calculate Module 24 is used to be based on testing result, with reference to the disease correlation factor obtained according to the disease-disease correlation factor dictionary for expanding Corresponding to the correlation marking of disease, the fraction of disease is calculated.Order module 25 is used to be ranked up the fraction of disease.It is determined that Module 26 is used to determine disease according to ranking results.
In a preferred embodiment, extraction module can also specifically include:Term vector model sets up unit and expansion Dictionary sets up unit.Wherein, term vector model set up unit for using medical information train on disease-disease correlation factor Term vector embedding distribution formula represent model.Extended lexicon sets up unit for representing model based on term vector embedding distribution formula, Standard disease-disease correlation factor dictionary is expanded using distance metric method, set up the disease that expands, disease it is related because Sub- dictionary.
In a preferred embodiment, term vector model is set up unit and can specifically be included:Acquiring unit, cleaning are single Unit, the first statistic unit and generation unit.Wherein, acquiring unit is used to obtain medical information training corpus.Cleaning unit is used for Medical information training corpus is cleaned.First statistic unit is used to count the high frequency expression side occurred in question and answer storehouse records Formula, weight of the increase high frequency expression way in participle model, and Chinese word segmentation is carried out, obtain training text.Generation unit is used It is trained in training text, generation term vector embedding distribution formula represents model.
In a preferred embodiment, computing module can also specifically include:First replacement vocabulary sets up unit and phase Closing property marking computing unit.Wherein, the first replacement vocabulary sets up unit for representing model based on term vector embedding distribution formula, makes Standard disease-disease correlation factor dictionary is expanded with distance metric method, is set up and is replaced vocabulary.Correlation marking is calculated Unit is used for using the disease-disease correlation factor dictionary for expanding and replaces vocabulary, the disease-disease phase in matching medical information The factor is closed, the correlation marking that disease correlation factor corresponds to disease is calculated.
In a preferred embodiment, correlation marking computing unit can specifically include:Extraction unit, detection are single Unit, the first normalization unit, the second statistic unit and non-linear conversion unit.Wherein, extraction unit is used for using the disease for expanding Disease-disease correlation factor dictionary, the matching of keyword is carried out to doctors and patients' Question Log, with medical science phase in extraction doctors and patients' Question Log The word of pass and expression.It is with the related word of medical science and expression in doctors and patients' Question Log that detection unit goes out for Detection and Extraction It is no in standard disease-disease correlation factor dictionary.First normalization unit is used in word and expression not in standard disease-disease When in sick correlation factor dictionary, according to vocabulary is replaced, with the word and table of medical science correlation in the doctors and patients' Question Log that will be extracted Up to normalizing to during corresponding standard scale reaches.Second statistic unit is used to be reached based on standard scale, counts disease and its correlation factor The frequency of co-occurrence, obtains the co-occurrence frequency record matrix of disease correlation factor and disease.Non-linear conversion unit is used to be based on disease The co-occurrence frequency record matrix of sick correlation factor and disease, using non-linear transformation method, obtains disease correlation factor and corresponds to The correlation marking of disease.
In a preferred embodiment, the system can also include:Second replacement vocabulary sets up unit;Second replacement Vocabulary sets up unit for representing model based on term vector embedding distribution formula, using distance metric method to standard disease-disease Correlation factor dictionary is expanded, and is set up and is replaced vocabulary.Detection module can also specifically include the second normalization unit;This second Normalization unit is used for when the word that extracts and expression be not in standard disease-disease correlation factor dictionary, according to replacing Vocabulary is changed, the word and expression that will be extracted are normalized to during corresponding standard scale reaches, and obtain standardizing disease correlation factor. Computing module can also specifically include Disease Score computing unit;The Disease Score computing unit is used for based on standardization disease phase The factor is closed, the correlation of disease is corresponded to reference to the disease correlation factor obtained according to the disease-disease correlation factor dictionary for expanding Property marking, calculate disease fraction.
The specific work process of the system of foregoing description and relevant explanation, may be referred to the correspondence in preceding method embodiment Process, will not be repeated here.
It will be understood by those skilled in the art that above-mentioned build the system that knowledge base realizes assisting in diagnosis and treatment based on term vector automatically Can also include some other known features, such as processor, controller, memory and bus etc., wherein, memory includes But it is not limited to random access memory, flash memory, read-only storage, programmable read only memory, volatile memory, non-volatile memories Device, serial storage, parallel storage or register etc., processor include but is not limited to single core processor, polycaryon processor, base Processor, CPLD/FPGA, DSP, arm processor, MIPS processors in X86-based etc., bus can include data/address bus, Address bus and controlling bus.In order to unnecessarily obscure embodiment of the disclosure, these known structures are not shown in fig. 2 Go out.It may also be noted that the quantity of the modules in Fig. 2 is only schematical.According to actual needs, each module can be with With arbitrary quantity.
It should be noted that the division of above-mentioned modules is only for example, in actual applications, there can be other division Mode.In addition, modules can also again be decomposed into other modules, will not be repeated here.Modules can both use hardware Mode realize, it would however also be possible to employ the mode of software is realized realizing also or by the way of software and hardware is combined.In reality In the application of border, above-mentioned modules can be gone or field-programmable by such as central processing unit, microprocessor, Digital Signal Processing Gate array etc. is realized.Exemplary hardware platform for implementing modules may include such as with compatible operating system Platform based on Intel x86, Mac platforms, MAC OS, iOS, Android OS etc..
It should be noted that the statement such as " first " used herein, " second " should not be construed as coming right in a variety of manners The limitation that the scope of the present invention is formed.
Above-described specific embodiment and experimental example are to technical scheme, implementation detail and algorithm validity All it has been described in detail.It is to be mentioned that, specific embodiment of the invention is the foregoing is only, it is not limited to The present invention, all within spirit of the invention and principle, any modification, equivalent substitution and improvements done etc. should be included in this hair Within bright protection domain.

Claims (14)

1. it is a kind of to build the method that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector, it is characterised in that methods described includes:
Obtain patient's description;
Using the disease-disease correlation factor dictionary of the expansion set up based on the term vector, patient description is closed Keyword is matched, and is extracted during the patient describes with the related word of medical science and expression;
Whether the Detection and Extraction word out and the expression are in standard disease-disease correlation factor dictionary;
Based on testing result, with reference to the disease correlation factor pair that the disease according to the expansion-disease correlation factor dictionary is obtained Should be given a mark in the correlation of disease, calculate the fraction of disease;
Fraction to the disease is ranked up;
Determine disease according to ranking results.
2. method according to claim 1, it is characterised in that the disease of the expansion-disease correlation factor dictionary passes through In the following manner is set up:
Model is represented using term vector embedding distribution formula of the medical information training on disease-disease correlation factor;
Model is represented based on the term vector embedding distribution formula, it is related to the standard disease-disease using distance metric method Factor dictionary is expanded, and sets up disease, the disease correlation factor dictionary of the expansion.
3. method according to claim 2, it is characterised in that the utilization medical information training is on disease-disease phase The term vector embedding distribution formula for closing the factor represents model, specifically includes:
Obtain medical information training corpus;
The medical information training corpus is cleaned;
The high frequency expression way occurred in question and answer storehouse records is counted, increases weight of the high frequency expression way in participle model, And Chinese word segmentation is carried out, obtain training text;
The training text is trained, generation term vector embedding distribution formula represents model.
4. method according to claim 2, it is characterised in that the correlation that the disease correlation factor corresponds to disease is beaten Divide and determine in the following manner:
Model is represented based on the term vector embedding distribution formula, it is related to the standard disease-disease using distance metric method Factor dictionary is expanded, and is set up and is replaced vocabulary;
Using the disease-disease correlation factor dictionary and the replacement vocabulary of the expansion, the disease in the medical information is matched Disease-disease correlation factor, calculates the correlation marking that the disease correlation factor corresponds to disease.
5. method according to claim 4, it is characterised in that the disease-disease correlation factor using the expansion Dictionary and the replacement vocabulary, match the disease-disease correlation factor in the medical information, calculate the disease correlation factor Corresponding to the correlation marking of disease, specifically include:
Using the disease-disease correlation factor dictionary of the expansion, the matching of keyword is carried out to doctors and patients' Question Log, extract institute State in doctors and patients' Question Log with the related word of medical science and expression;
With the related word of medical science and the expression whether in the mark in doctors and patients' Question Log that Detection and Extraction go out In quasi- disease-disease correlation factor dictionary;
If not existing, according to the replacement vocabulary, with institute's predicate of medical science correlation in the doctors and patients' Question Log that will be extracted Language and the expression are normalized to during corresponding standard scale reaches;
Reached based on the standard scale, the frequency of statistics disease and its correlation factor co-occurrence obtains disease correlation factor and disease Co-occurrence frequency records matrix;
Co-occurrence frequency record matrix based on the disease correlation factor and disease, using non-linear transformation method, obtains described The correlation that disease correlation factor corresponds to disease is given a mark.
6. method according to claim 2, it is characterised in that methods described includes:
Model is represented based on the term vector embedding distribution formula, it is related to the standard disease-disease using distance metric method Factor dictionary is expanded, and is set up and is replaced vocabulary;
The Detection and Extraction word out and the expression whether in standard disease-disease correlation factor dictionary, tool Body includes:
If being not detected by, according to the replacement vocabulary, the word and the expression that will be extracted normalize to correspondence Standard scale reach, obtain standardizing disease correlation factor;
The disease for based on testing result, obtaining with reference to the disease according to the expansion-disease correlation factor dictionary it is related because Son is given a mark corresponding to the correlation of disease, calculates the fraction of disease, is specifically included:
Based on the standardization disease correlation factor, obtained with reference to the disease according to the expansion-disease correlation factor dictionary The correlation that disease correlation factor corresponds to disease is given a mark, and calculates the fraction of disease.
7. method according to claim 5, it is characterised in that the correlation that the disease correlation factor corresponds to disease is beaten Divide and determined by following formula:
S c o r e ( i , j ) = P ( D i | F j ) &times; l o g ( 1 + ( 1 N i ) ) - 1 ;
Wherein, the Score (i, j) represents that the disease correlation factor corresponds to the correlation marking of disease;P (the Di|Fj) Represent the conditional probability with disease;The DiRepresent the disease;The FjRepresent the disease correlation factor;The NiTable Show disease frequency, the Ni=∑jNij, the NijRepresent record frequency.
8. method according to claim 6, it is characterised in that the fraction of the disease is obtained by following formula:
D S ( D i ) = &Sigma; j = 1 Q W ( F j ) &times; S c o r e ( i , j ) ;
Wherein, the DS (Di) represent the fraction of the disease;The DiRepresent the disease;W (the Fj) represent disease category Mapping weights;The Score (i, j) represents that the disease correlation factor corresponds to the correlation marking of disease.
9. it is a kind of to build the system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector, it is characterised in that the system includes:
Acquisition module, for obtaining patient's description;
Extraction module, for the disease-disease correlation factor dictionary using the expansion set up based on the term vector, to the trouble Person's description carries out Keywords matching, extracts during the patient describes with the related word of medical science and expression;
Detection module, for the Detection and Extraction word out and the expression whether in standard disease-disease correlation factor In dictionary;
Computing module, for based on testing result, with reference to the disease that the disease according to the expansion-disease correlation factor dictionary is obtained The correlation that sick correlation factor corresponds to disease is given a mark, and calculates the fraction of disease;
Order module, is ranked up for the fraction to the disease;
Determining module, for determining disease according to ranking results.
10. system according to claim 9, it is characterised in that the extraction module is specifically included:
Term vector model sets up unit, for the term vector insertion on disease-disease correlation factor using medical information training Distribution represents model;
Extended lexicon sets up unit, for representing model based on the term vector embedding distribution formula, uses distance metric method pair The standard disease-disease correlation factor dictionary is expanded, and sets up disease, the disease correlation factor dictionary of the expansion.
11. methods according to claim 10, it is characterised in that the term vector model is set up unit and specifically included:
Acquiring unit, for obtaining medical information training corpus;
Cleaning unit, for being cleaned to the medical information training corpus;
First statistic unit, for counting the high frequency expression way occurred in question and answer storehouse records, increase high frequency expression way exists Weight in participle model, and Chinese word segmentation is carried out, obtain training text;
Generation unit, for being trained to the training text, generation term vector embedding distribution formula represents model.
12. methods according to claim 10, it is characterised in that the computing module is specifically included:
First replacement vocabulary sets up unit, for representing model based on the term vector embedding distribution formula, using distance metric side Method expands the standard disease-disease correlation factor dictionary, sets up and replaces vocabulary;
Correlation marking computing unit, for disease-disease correlation factor dictionary and the replacement vocabulary using the expansion, Disease-disease the correlation factor in the medical information is matched, the correlation that the disease correlation factor corresponds to disease is calculated Marking.
13. systems according to claim 12, it is characterised in that the correlation marking computing unit is specifically included:
Extraction unit, for the disease-disease correlation factor dictionary using the expansion, keyword is carried out to doctors and patients' Question Log Matching, extract in doctors and patients' Question Log with the related word of medical science and expression;
Detection unit, with the related word of medical science and the expression in the doctors and patients' Question Log gone out for Detection and Extraction Whether in the standard disease-disease correlation factor dictionary;
First normalization unit, in the word and the expression not in the standard disease-disease correlation factor dictionary When middle, according to the replacement vocabulary, with the related word of medical science and described in the doctors and patients' Question Log that will be extracted Expression is normalized to during corresponding standard scale reaches;
Second statistic unit, for being reached based on the standard scale, the frequency of statistics disease and its correlation factor co-occurrence obtains disease The co-occurrence frequency record matrix of correlation factor and disease;
Non-linear conversion unit, matrix is recorded for the co-occurrence frequency based on the disease correlation factor and disease, uses non-thread Property transform method, obtain the disease correlation factor corresponding to disease correlation give a mark.
14. systems according to claim 10, it is characterised in that the system includes:
Second replacement vocabulary sets up unit, for representing model based on the term vector embedding distribution formula, using distance metric side Method expands the standard disease-disease correlation factor dictionary, sets up and replaces vocabulary;
The detection module is specifically included:
Second normalization unit, for the word for extracting and the expression not standard disease-disease it is related because When in sub- dictionary, according to the replacement vocabulary, the word and the expression that will be extracted normalize to corresponding standard In expression, obtain standardizing disease correlation factor;
The computing module is specifically included:
Disease Score computing unit, for based on the standardization disease correlation factor, with reference to the disease-disease according to the expansion The correlation that the disease correlation factor that sick correlation factor dictionary is obtained corresponds to disease is given a mark, and calculates the fraction of disease.
CN201611222893.XA 2016-12-27 2016-12-27 Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors Active CN106874643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611222893.XA CN106874643B (en) 2016-12-27 2016-12-27 Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611222893.XA CN106874643B (en) 2016-12-27 2016-12-27 Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors

Publications (2)

Publication Number Publication Date
CN106874643A true CN106874643A (en) 2017-06-20
CN106874643B CN106874643B (en) 2020-02-28

Family

ID=59165041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611222893.XA Active CN106874643B (en) 2016-12-27 2016-12-27 Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors

Country Status (1)

Country Link
CN (1) CN106874643B (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358315A (en) * 2017-06-26 2017-11-17 深圳市金立通信设备有限公司 A kind of information forecasting method and terminal
CN107610779A (en) * 2017-10-25 2018-01-19 医渡云(北京)技术有限公司 Disease Assessment Scale and risk appraisal procedure and device
CN107633882A (en) * 2017-09-11 2018-01-26 合肥工业大学 Mix the minimally invasive medical service system and its aid decision-making method under cloud framework
CN107833629A (en) * 2017-10-25 2018-03-23 厦门大学 Aided diagnosis method and system based on deep learning
CN107863147A (en) * 2017-10-24 2018-03-30 清华大学 The method of medical diagnosis based on depth convolutional neural networks
CN108182973A (en) * 2017-12-29 2018-06-19 湖南大学 A kind of Intelligent Diagnosis Technology of knowledge based collection of illustrative plates reasoning
CN108182972A (en) * 2017-12-15 2018-06-19 上海长江科技发展有限公司 The intelligent coding method and system of Chinese medical diagnosis on disease based on participle network
CN108614885A (en) * 2018-05-03 2018-10-02 杭州认识科技有限公司 Knowledge mapping analysis method based on medical information and device
CN109240258A (en) * 2018-07-09 2019-01-18 上海万行信息科技有限公司 Vehicle failure intelligent auxiliary diagnosis method and system based on term vector
CN109243599A (en) * 2018-03-16 2019-01-18 申朴信息技术(上海)股份有限公司 A kind of disease based on various dimensions information retrieval is to code method
CN109473169A (en) * 2018-10-18 2019-03-15 安吉康尔(深圳)科技有限公司 A kind of methods for the diagnosis of diseases, device and terminal device
CN109684445A (en) * 2018-11-13 2019-04-26 中国科学院自动化研究所 Colloquial style medical treatment answering method and system
CN109817330A (en) * 2019-01-25 2019-05-28 华院数据技术(上海)有限公司 A kind of disease forecasting device
TWI665684B (en) * 2017-12-27 2019-07-11 瑞友資訊股份有限公司 Care system capable of drawing up intelligent care plan and using method thereof
CN110019826A (en) * 2017-07-27 2019-07-16 北大医疗信息技术有限公司 Construction method, construction device, equipment and the storage medium of medical knowledge map
CN110164544A (en) * 2018-02-11 2019-08-23 深圳欧德蒙科技有限公司 A kind of method, apparatus and terminal device of illness information processing
CN110276749A (en) * 2019-06-14 2019-09-24 辽宁万象联合医疗科技有限公司 Children penetrate the quality control artificial intelligence system and its quality control method of piece and diagnosis
CN110867228A (en) * 2019-11-15 2020-03-06 北京大学人民医院(北京大学第二临床医学院) Intelligent information grabbing and evaluating method and system for wound severity of wound inpatient
CN111599489A (en) * 2020-05-19 2020-08-28 万达信息股份有限公司 Disease information acquisition method, terminal equipment and storage medium
CN111798941A (en) * 2019-04-04 2020-10-20 Iqvia 有限公司 Predictive system for generating clinical queries
CN111968740A (en) * 2020-09-03 2020-11-20 卫宁健康科技集团股份有限公司 Diagnostic label recommendation method and device, storage medium and electronic equipment
CN111985246A (en) * 2020-08-27 2020-11-24 武汉东湖大数据交易中心股份有限公司 Disease cognitive system based on main symptoms and accompanying symptom words
CN112017773A (en) * 2020-08-31 2020-12-01 吾征智能技术(北京)有限公司 Disease cognition model construction method based on nightmare and disease cognition system
CN112331355A (en) * 2020-11-26 2021-02-05 微医云(杭州)控股有限公司 Generation method and device of disease category evaluation table, electronic equipment and storage medium
CN112364055A (en) * 2020-10-29 2021-02-12 上海德衡数据科技有限公司 Service management software system and method
CN112988953A (en) * 2021-04-26 2021-06-18 成都索贝数码科技股份有限公司 Adaptive broadcast television news keyword standardization method
CN113505236A (en) * 2021-06-29 2021-10-15 医智泉(杭州)医疗科技有限公司 Construction method, device and equipment of medical knowledge graph and computer readable medium
CN113793668A (en) * 2021-09-17 2021-12-14 平安科技(深圳)有限公司 Symptom standardization method and device based on artificial intelligence, electronic equipment and medium
CN114628012A (en) * 2022-03-21 2022-06-14 中国人民解放军西部战区总医院 Emergency department's preliminary examination go-no-go system
CN110459287B (en) * 2018-05-08 2024-03-22 西门子医疗有限公司 Structured report data from medical text reports

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070288304A1 (en) * 2006-06-08 2007-12-13 Adknowledge, Inc. System and method for behaviorally targeted electronic communications
CN101158969A (en) * 2007-11-23 2008-04-09 腾讯科技(深圳)有限公司 Whole sentence generating method and device
JP2011180746A (en) * 2010-02-26 2011-09-15 National Institute Of Information & Communication Technology Relational information expansion device, relational information expansion method and program
CN104375989A (en) * 2014-12-01 2015-02-25 国家电网公司 Natural language text keyword association network construction system
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts
CN104572624A (en) * 2015-01-20 2015-04-29 浙江大学 Method for discovering treatment relation between single medicine and disease based on term vector
CN104965992A (en) * 2015-07-13 2015-10-07 南开大学 Text mining method based on online medical question and answer information
CN105069123A (en) * 2015-08-13 2015-11-18 易保互联医疗信息科技(北京)有限公司 Automatic coding method and system for Chinese surgical operation information
CN105138829A (en) * 2015-08-13 2015-12-09 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese diagnosis and treatment information
CN105426358A (en) * 2015-11-09 2016-03-23 中国农业大学 Automatic disease noun identification method
CN105740612A (en) * 2016-01-27 2016-07-06 北京国医精诚科技有限公司 Traditional Chinese medicine clinical medical record based disease diagnose and treatment method and system
CN105786782A (en) * 2016-03-25 2016-07-20 北京搜狗科技发展有限公司 Word vector training method and device
CN106096273A (en) * 2016-06-08 2016-11-09 江苏华康信息技术有限公司 A kind of disease symptoms derivation method based on TF IDF innovatory algorithm
CN106156272A (en) * 2016-06-21 2016-11-23 北京工业大学 A kind of information retrieval method based on multi-source semantic analysis

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070288304A1 (en) * 2006-06-08 2007-12-13 Adknowledge, Inc. System and method for behaviorally targeted electronic communications
CN101158969A (en) * 2007-11-23 2008-04-09 腾讯科技(深圳)有限公司 Whole sentence generating method and device
JP2011180746A (en) * 2010-02-26 2011-09-15 National Institute Of Information & Communication Technology Relational information expansion device, relational information expansion method and program
CN104375989A (en) * 2014-12-01 2015-02-25 国家电网公司 Natural language text keyword association network construction system
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts
CN104572624A (en) * 2015-01-20 2015-04-29 浙江大学 Method for discovering treatment relation between single medicine and disease based on term vector
CN104965992A (en) * 2015-07-13 2015-10-07 南开大学 Text mining method based on online medical question and answer information
CN105069123A (en) * 2015-08-13 2015-11-18 易保互联医疗信息科技(北京)有限公司 Automatic coding method and system for Chinese surgical operation information
CN105138829A (en) * 2015-08-13 2015-12-09 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese diagnosis and treatment information
CN105426358A (en) * 2015-11-09 2016-03-23 中国农业大学 Automatic disease noun identification method
CN105740612A (en) * 2016-01-27 2016-07-06 北京国医精诚科技有限公司 Traditional Chinese medicine clinical medical record based disease diagnose and treatment method and system
CN105786782A (en) * 2016-03-25 2016-07-20 北京搜狗科技发展有限公司 Word vector training method and device
CN106096273A (en) * 2016-06-08 2016-11-09 江苏华康信息技术有限公司 A kind of disease symptoms derivation method based on TF IDF innovatory algorithm
CN106156272A (en) * 2016-06-21 2016-11-23 北京工业大学 A kind of information retrieval method based on multi-source semantic analysis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DAVID AFONSO等: "An Ultrasonographic Risk Score For Detecting Symptomatic Carotid Atherosclerotic Plaques", 《IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS》 *
常鹏 等: "高效的短文本主题词抽取方法", 《计算机工程与应用》 *
梁耀波: "智能医疗诊断系统的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358315A (en) * 2017-06-26 2017-11-17 深圳市金立通信设备有限公司 A kind of information forecasting method and terminal
CN110019826B (en) * 2017-07-27 2023-02-28 北大医疗信息技术有限公司 Construction method, construction device, equipment and storage medium of medical knowledge map
CN110019826A (en) * 2017-07-27 2019-07-16 北大医疗信息技术有限公司 Construction method, construction device, equipment and the storage medium of medical knowledge map
CN107633882B (en) * 2017-09-11 2019-05-14 合肥工业大学 Mix the minimally invasive medical service system and its aid decision-making method under cloud framework
CN107633882A (en) * 2017-09-11 2018-01-26 合肥工业大学 Mix the minimally invasive medical service system and its aid decision-making method under cloud framework
CN107863147A (en) * 2017-10-24 2018-03-30 清华大学 The method of medical diagnosis based on depth convolutional neural networks
CN107863147B (en) * 2017-10-24 2021-03-16 清华大学 Medical diagnosis method based on deep convolutional neural network
CN107610779A (en) * 2017-10-25 2018-01-19 医渡云(北京)技术有限公司 Disease Assessment Scale and risk appraisal procedure and device
CN107833629A (en) * 2017-10-25 2018-03-23 厦门大学 Aided diagnosis method and system based on deep learning
CN108182972A (en) * 2017-12-15 2018-06-19 上海长江科技发展有限公司 The intelligent coding method and system of Chinese medical diagnosis on disease based on participle network
TWI665684B (en) * 2017-12-27 2019-07-11 瑞友資訊股份有限公司 Care system capable of drawing up intelligent care plan and using method thereof
CN108182973A (en) * 2017-12-29 2018-06-19 湖南大学 A kind of Intelligent Diagnosis Technology of knowledge based collection of illustrative plates reasoning
CN110164544A (en) * 2018-02-11 2019-08-23 深圳欧德蒙科技有限公司 A kind of method, apparatus and terminal device of illness information processing
CN109243599A (en) * 2018-03-16 2019-01-18 申朴信息技术(上海)股份有限公司 A kind of disease based on various dimensions information retrieval is to code method
CN108614885A (en) * 2018-05-03 2018-10-02 杭州认识科技有限公司 Knowledge mapping analysis method based on medical information and device
CN110459287B (en) * 2018-05-08 2024-03-22 西门子医疗有限公司 Structured report data from medical text reports
CN109240258A (en) * 2018-07-09 2019-01-18 上海万行信息科技有限公司 Vehicle failure intelligent auxiliary diagnosis method and system based on term vector
CN109473169A (en) * 2018-10-18 2019-03-15 安吉康尔(深圳)科技有限公司 A kind of methods for the diagnosis of diseases, device and terminal device
CN109684445A (en) * 2018-11-13 2019-04-26 中国科学院自动化研究所 Colloquial style medical treatment answering method and system
CN109684445B (en) * 2018-11-13 2021-05-28 中国科学院自动化研究所 Spoken medical question-answering method and spoken medical question-answering system
CN109817330A (en) * 2019-01-25 2019-05-28 华院数据技术(上海)有限公司 A kind of disease forecasting device
CN111798941A (en) * 2019-04-04 2020-10-20 Iqvia 有限公司 Predictive system for generating clinical queries
CN111798941B (en) * 2019-04-04 2023-10-13 Iqvia 有限公司 Predictive system for generating clinical queries
US11615148B2 (en) 2019-04-04 2023-03-28 Iqvia Inc. Predictive system for generating clinical queries
CN110276749A (en) * 2019-06-14 2019-09-24 辽宁万象联合医疗科技有限公司 Children penetrate the quality control artificial intelligence system and its quality control method of piece and diagnosis
CN110276749B (en) * 2019-06-14 2022-04-01 辽宁万象联合医疗科技有限公司 Quality control artificial intelligence system and quality control method for children radiation shooting and diagnosis
CN110867228A (en) * 2019-11-15 2020-03-06 北京大学人民医院(北京大学第二临床医学院) Intelligent information grabbing and evaluating method and system for wound severity of wound inpatient
CN111599489A (en) * 2020-05-19 2020-08-28 万达信息股份有限公司 Disease information acquisition method, terminal equipment and storage medium
CN111985246A (en) * 2020-08-27 2020-11-24 武汉东湖大数据交易中心股份有限公司 Disease cognitive system based on main symptoms and accompanying symptom words
CN111985246B (en) * 2020-08-27 2023-08-15 武汉东湖大数据交易中心股份有限公司 Disease cognitive system based on main symptoms and accompanying symptom words
CN112017773A (en) * 2020-08-31 2020-12-01 吾征智能技术(北京)有限公司 Disease cognition model construction method based on nightmare and disease cognition system
CN112017773B (en) * 2020-08-31 2024-03-26 吾征智能技术(北京)有限公司 Disease cognitive model construction method and disease cognitive system based on nightmare
CN111968740A (en) * 2020-09-03 2020-11-20 卫宁健康科技集团股份有限公司 Diagnostic label recommendation method and device, storage medium and electronic equipment
CN112364055A (en) * 2020-10-29 2021-02-12 上海德衡数据科技有限公司 Service management software system and method
CN112364055B (en) * 2020-10-29 2023-11-03 上海德衡数据科技有限公司 Service management software system and method
CN112331355A (en) * 2020-11-26 2021-02-05 微医云(杭州)控股有限公司 Generation method and device of disease category evaluation table, electronic equipment and storage medium
CN112331355B (en) * 2020-11-26 2024-03-19 微医云(杭州)控股有限公司 Disease type evaluation table generation method and device, electronic equipment and storage medium
CN112988953A (en) * 2021-04-26 2021-06-18 成都索贝数码科技股份有限公司 Adaptive broadcast television news keyword standardization method
CN113505236A (en) * 2021-06-29 2021-10-15 医智泉(杭州)医疗科技有限公司 Construction method, device and equipment of medical knowledge graph and computer readable medium
CN113505236B (en) * 2021-06-29 2023-08-04 朱一帆 Medical knowledge graph construction method, device, equipment and computer readable medium
CN113793668A (en) * 2021-09-17 2021-12-14 平安科技(深圳)有限公司 Symptom standardization method and device based on artificial intelligence, electronic equipment and medium
CN114628012A (en) * 2022-03-21 2022-06-14 中国人民解放军西部战区总医院 Emergency department's preliminary examination go-no-go system
CN114628012B (en) * 2022-03-21 2023-09-05 中国人民解放军西部战区总医院 Emergency department's preliminary examination sorting system

Also Published As

Publication number Publication date
CN106874643B (en) 2020-02-28

Similar Documents

Publication Publication Date Title
CN106874643A (en) Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector
CN109460473B (en) Electronic medical record multi-label classification method based on symptom extraction and feature representation
CN111274806B (en) Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record
CN110459282B (en) Sequence labeling model training method, electronic medical record processing method and related device
CN109599185B (en) Disease data processing method and device, electronic equipment and computer readable medium
CN110472229B (en) Sequence labeling model training method, electronic medical record processing method and related device
CN109670179B (en) Medical record text named entity identification method based on iterative expansion convolutional neural network
CN110069779B (en) Symptom entity identification method of medical text and related device
CN110838368B (en) Active inquiry robot based on traditional Chinese medicine clinical knowledge map
CN110705293A (en) Electronic medical record text named entity recognition method based on pre-training language model
US20190057773A1 (en) Method and system for performing triage
Matci et al. Address standardization using the natural language process for improving geocoding results
CN112002411A (en) Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
CN109166608A (en) Electronic health record information extracting method, device and equipment
CN110931128B (en) Method, system and device for automatically identifying unsupervised symptoms of unstructured medical texts
CN110337645A (en) The processing component that can be adapted to
CN111680089A (en) Text structuring method, device and system and non-volatile storage medium
CN109378066A (en) A kind of control method and control device for realizing disease forecasting based on feature vector
CN112541066B (en) Text-structured-based medical and technical report detection method and related equipment
CN111477320B (en) Treatment effect prediction model construction system, treatment effect prediction system and terminal
CN108231146A (en) A kind of medical records model building method, system and device based on deep learning
CN104063579A (en) Health dynamic prediction method and equipment based on multivariate medical consumption data
Whitney Bootstrapping via graph propagation
CN109299467A (en) Medicine text recognition method and device, sentence identification model training method and device
CN112349367B (en) Method, device, electronic equipment and storage medium for generating simulated medical record

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant