CN107680689A - Potential disease estimating method, system and the readable storage medium storing program for executing of medical text - Google Patents

Potential disease estimating method, system and the readable storage medium storing program for executing of medical text Download PDF

Info

Publication number
CN107680689A
CN107680689A CN201710313520.1A CN201710313520A CN107680689A CN 107680689 A CN107680689 A CN 107680689A CN 201710313520 A CN201710313520 A CN 201710313520A CN 107680689 A CN107680689 A CN 107680689A
Authority
CN
China
Prior art keywords
medical
disease
text
vocabulary
medical text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710313520.1A
Other languages
Chinese (zh)
Inventor
赵清源
韦邕
吕梓燊
徐亮
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201710313520.1A priority Critical patent/CN107680689A/en
Publication of CN107680689A publication Critical patent/CN107680689A/en
Priority to PCT/CN2018/076149 priority patent/WO2018201772A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16ZINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
    • G16Z99/00Subject matter not provided for in other main groups of this subclass
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a kind of potential disease estimating method, system and the readable storage medium storing program for executing of medical text, this method includes:The medical text received is segmented, and each participle corresponding to the medical text is matched with the special lexicon of predetermined medical field, extracts the medical vocabulary in each participle corresponding to the medical text;Based on the medical professionalism database built in advance, disease corresponding to the medical vocabulary in the medical text is determined;Wherein, the mapping relations comprising different type disease with medical vocabulary in the medical professionalism database;Exported the disease of determination as the potential disease for the medical text being inferred to.The present invention is accurately and efficiently inferred to the potential disease of medical text.

Description

Potential disease estimating method, system and the readable storage medium storing program for executing of medical text
Technical field
The present invention relates to field of computer technology, more particularly to a kind of potential disease estimating method of medical text, system And readable storage medium storing program for executing.
Background technology
Generally, the first step for handling medical text is all to infer potential disease, and then could be carried out next Diagnostic recommendations.Infer in the prior art for the potential disease of medical text, can only manually be pushed away according to the personal experience of doctor Potential disease in the disconnected medical text, it is less efficient, potential disease can not be carried out using existing medical data resource Effectively infer.
The content of the invention
It is a primary object of the present invention to provide a kind of potential disease estimating method, system and the readable storage of medical text Medium, it is intended to be accurately and efficiently inferred to the potential disease of medical text.
To achieve the above object, the potential disease estimating method of a kind of medical text provided by the invention, methods described bag Include following steps:
A, the medical text received is segmented, and segmented each corresponding to the medical text and predetermined doctor Treat domain-specific lexicon to be matched, extract the medical vocabulary in each participle corresponding to the medical text;
B, based on the medical professionalism database built in advance, disease corresponding to the medical vocabulary in the medical text is determined; Wherein, the mapping relations comprising different type disease with medical vocabulary in the medical professionalism database;
C, exported the disease of determination as the potential disease for the medical text being inferred to.
Preferably, also include before the step A:
Medical data is obtained from predetermined data source, is found out from the medical data one corresponding to each disease Individual or multiple medical vocabulary, and establish medical professionalism database according to the mapping relations of different type disease and medical vocabulary.
Preferably, the medical vocabulary includes:
In profile information, symptom information, complications information, treatment medicine information or treatment section office information corresponding to disease Medical vocabulary.
Preferably, the weight of each medical vocabulary corresponding to disease, the step are also included in the medical professionalism database Rapid B includes:
Based on the medical professionalism database built in advance, disease corresponding to each medical vocabulary in the medical text is found out, And the weight for calculating medical vocabulary corresponding to each disease add and, the weight of medical vocabulary corresponding to selection adds and highest disease As disease corresponding to the medical text determined.
Preferably, the step of described pair of medical text received carries out word segmentation processing includes:
The medical text is matched with the special lexicon of predetermined medical field according to Forward Maximum Method method, The first matching result is obtained, the first phrase of the first quantity and the individual character of the 3rd quantity are included in first matching result;
The medical text is matched with the special lexicon of predetermined medical field according to reverse maximum matching method, The second matching result is obtained, the second phrase of the second quantity and the individual character of the 4th quantity are included in second matching result;
If first quantity is equal with second quantity, and the 3rd quantity is less than or equal to the described 4th number Measure, then the word segmentation result using first matching result as the medical text;
If first quantity is equal with second quantity, and the 3rd quantity is more than the 4th quantity, then will Word segmentation result of second matching result as the medical text;
If first quantity and second quantity are unequal, and first quantity is more than second quantity, then Word segmentation result using second matching result as the medical text;
If first quantity and second quantity are unequal, and first quantity is less than second quantity, then Word segmentation result using first matching result as the medical text.
In addition, to achieve the above object, the present invention also provides a kind of potential disease inference system of medical text, the doctor Treating the potential disease inference system of text includes:
Extraction module is segmented, for being segmented to the medical text received, and by each point corresponding to the medical text Word is matched with the special lexicon of predetermined medical field, extracts the doctor in each participle corresponding to the medical text Treat vocabulary;
Determining module, for based on the medical professionalism database built in advance, determining the medical vocabulary in the medical text Corresponding disease;Wherein, the mapping relations comprising different type disease with medical vocabulary in the medical professionalism database;
Output module, for being exported the disease of determination as the potential disease for the medical text being inferred to.
Preferably, in addition to:
Module is established, for obtaining medical data from predetermined data source, is found out from the medical data each One or more medical vocabulary corresponding to kind disease, and medical treatment is established according to different type disease and the mapping relations of medical vocabulary Specialized database.
Preferably, the medical vocabulary includes:
In profile information, symptom information, complications information, treatment medicine information or treatment section office information corresponding to disease Medical vocabulary.
Preferably, the weight of each medical vocabulary corresponding to disease is also included in the medical professionalism database, it is described true Cover half block is additionally operable to:
Based on the medical professionalism database built in advance, disease corresponding to each medical vocabulary in the medical text is found out, And the weight for calculating medical vocabulary corresponding to each disease add and, the weight of medical vocabulary corresponding to selection adds and highest disease As disease corresponding to the medical text determined.
Further, to achieve the above object, the present invention also provides a kind of computer-readable recording medium, the computer Readable storage medium storing program for executing is stored with the potential disease inference system of medical text, and the potential disease inference system of the medical text can By at least one computing device, so that the potential disease of at least one computing device medical text described above is inferred The step of method.
Potential disease estimating method, system and the readable storage medium storing program for executing of medical text proposed by the present invention, by receiving Medical text segmented, extract the medical vocabulary in each participle corresponding to the medical text;And based on structure in advance The mapping relations comprising various disease and medical vocabulary medical professionalism database, determine the medical vocabulary in the medical text Corresponding disease, using the potential disease as the medical text being inferred to.Due to can be built according to various medical data resources Various disease and the mapping relations of medical vocabulary, and the medical vocabulary in medical text finds the disease mapped therewith, phase Than manually being inferred according to doctor personal experience, more efficient and accuracy rate is higher.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the potential disease estimating method first embodiment of the medical text of the present invention;
Fig. 2 is the schematic flow sheet of the potential disease estimating method second embodiment of the medical text of the present invention;
Fig. 3 is the running environment schematic diagram of the preferred embodiment of potential disease inference system 10 of the medical text of the present invention;
Fig. 4 is the high-level schematic functional block diagram of the potential disease inference system first embodiment of the medical text of the present invention;
Fig. 5 is the high-level schematic functional block diagram of the potential disease inference system second embodiment of the medical text of the present invention.
The realization, functional characteristics and advantage of the object of the invention will be described further referring to the drawings in conjunction with the embodiments.
Embodiment
In order that technical problems, technical solutions and advantages to be solved are clearer, clear, tie below Drawings and examples are closed, the present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only To explain the present invention, it is not intended to limit the present invention.
The present invention provides a kind of potential disease estimating method of medical text.
Reference picture 1, Fig. 1 are the schematic flow sheet of the embodiment of potential disease estimating method one of the medical text of the present invention.
In one embodiment, the potential disease estimating method of the medical text includes:
Step S10, the medical text received is segmented, and by corresponding to the medical text it is each participle with advance really The special lexicon of fixed medical field is matched, and extracts the medical vocabulary in each participle corresponding to the medical text.
Receive medical text to be diagnosed, can such as receive user by browser, client end AP P transmissions it is to be diagnosed Medical text.In the present embodiment, after medical text is received, word segmentation processing is carried out to the medical text received first.For example, can According to punctuation mark by medical text dividing into the complete sentence of a rule, then word segmentation processing is carried out to the sentence of each cutting, Word segmentation processing such as is carried out to the sentence of each cutting using the segmenting method of string matching, such as Forward Maximum Method method, Character string in the sentence of one cutting segments from left to right;Or reverse maximum matching method, in the sentence a cutting Character string segment from right to left;Or shortest path participle method, require to cut inside the character string in the sentence of a cutting The word number gone out is minimum;Or two-way maximum matching method, it is forward and reverse while carry out participle matching.Also segmented using the meaning of a word Method carries out word segmentation processing to the sentence of each cutting, and meaning of a word participle method is the segmenting method that a kind of machine talk judges, utilizes sentence Method information and semantic information handle Ambiguity to segment.Also the sentence of each cutting is divided using statistical morphology Word processing, from the historical search record of the historical search record of active user or public users, according to the statistics of phrase, it can unite The frequency occurred in respect of a little two adjacent words is more, then can be segmented using the two adjacent words as phrase.
After completing word segmentation processing to medical text, each participle corresponding to the medical text is led with predetermined medical treatment The special lexicon in domain is matched, and the medicine in general medical dictionary is may include in the predetermined special lexicon of medical field It is dictionary, corresponding according to obtained various various diseases are extracted in a large amount of medical science texts (such as medical data of increasing income on internet) Profile information, symptom information, complications information, treatment medicine information or treat medical vocabulary in section office information, etc..Should The special lexicon of medical field can be changeless or regular according to medical data of increasing income newest on internet Update the medical vocabulary in the special lexicon of medical field.Extract in each participle corresponding to the medical text with predefining The medical vocabulary that matches of the special lexicon of medical field, you can get in the medical text with its potential disease correlation Larger information is the medical vocabulary extracted.
Step S20, based on the medical professionalism database built in advance, determine corresponding to the medical vocabulary in the medical text Disease;Wherein, the mapping relations comprising different type disease with medical vocabulary in the medical professionalism database.
After extracting medical vocabulary larger with its potential disease correlation in each participle corresponding to the medical text, base In the medical professionalism database built in advance, disease corresponding to the medical vocabulary in the medical text is determined.The medical professionalism In database comprising different type disease and medical vocabulary (such as according to extract to obtain in a large amount of medical science texts symptom, medicine, inspection Look into, the information vocabulary such as section office) mapping relations, such as medical professionalism database, bag can be built according to online increase income data and text Containing the specialized information such as disease and its corresponding brief introduction, symptom, complication, treatment medicine, common inspection.Different diseases based on structure The sick and mapping relations of medical vocabulary, can find the disease mapped therewith according to the medical vocabulary in the medical text extracted Disease.
Step S30, exported the disease of determination as the potential disease for the medical text being inferred to.
Medical vocabulary in the medical text extracted determine corresponding to after disease, you can by the disease of determination Potential disease as the medical text being inferred to is exported, with the potential disease based on the medical text being inferred to come Carry out follow-up diagnostic recommendations.By the medical text potential disease inferential statistics in practical application, by the present embodiment The disease label accuracy rate (manual review does not have apparent error) that potential disease estimating method obtains can reach 85% or so, energy Effectively improve the accuracy rate inferred to medical text potential disease.
The present embodiment is extracted in each participle corresponding to the medical text by being segmented to the medical text received Medical vocabulary;And based on building the medical professionalism database comprising various disease with the mapping relations of medical vocabulary in advance, Disease corresponding to the medical vocabulary in the medical text is determined, using the potential disease as the medical text being inferred to.Due to Various disease and the mapping relations of medical vocabulary, and the medical treatment in medical text can be built according to various medical data resources Vocabulary finds the disease mapped therewith, is manually inferred compared to according to doctor personal experience, and more efficient and accuracy rate is higher.
As shown in Fig. 2 second embodiment of the invention proposes a kind of potential disease estimating method of medical text, in above-mentioned reality On the basis of applying example, also include before above-mentioned steps S10:
Step S40, medical data is obtained from predetermined data source, each disease is found out from the medical data The corresponding medical vocabulary of one or more, and establish medical professionalism number according to the mapping relations of different type disease and medical vocabulary According to storehouse.
In the present embodiment, before the potential disease for carrying out medical text is inferred, first obtained from predetermined data source Medical data, medical professionalism number is established with the mapping relations of the different type disease in the medical data and medical vocabulary According to storehouse.The medical data can be the authentic interpretation of the various diseases obtained from existing medical data base, including it is corresponding Brief introduction, symptom, complication, treatment medicine, common the medical treatment letter corresponding to specialized information or various medicines such as check Breath, the disease type information that such as medicine cures mainly, the medical data can also be in real time or fixed by instruments such as web crawlers When from the medical data source of increasing income on internet (for example, on the question and answer of various disease, discussion etc. on each World Jam, or various Newest medical cases, medical question and answer text etc.) the certain types of information that obtains is (for example, treatment side corresponding to various disease Case, medicine, affiliated section office, clinical manifestation etc.).Found out from the medical data of acquisition corresponding to each disease one or Multiple medical vocabulary, you can medical professionalism data are established according to various disease and the mapping relations of one or more medical vocabulary Storehouse, so that medical professionalism database subsequently based on foundation carries out the deduction of potential disease.
Further, in other embodiments, each medical treatment corresponding to disease is also included in the medical professionalism database The weight of vocabulary, above-mentioned steps S20 can include:
Based on the medical professionalism database built in advance, disease corresponding to each medical vocabulary in the medical text is found out, And the weight for calculating medical vocabulary corresponding to each disease add and, the weight of medical vocabulary corresponding to selection adds and highest disease As disease corresponding to the medical text determined.
In the present embodiment, it is contemplated that medical vocabulary corresponding to a kind of disease may be one or more, a medical vocabulary Corresponding disease may also have one or more, for example, same symptom may map to obtain multiple diseases, same medicine Also a variety of diseases can be treated.Therefore, in the medical professionalism database of structure, different medical vocabulary is also assigned to different power Weight, when each medical vocabulary has multiple in the medical text found out in the medical professionalism database based on structure, to calculate The weight of medical vocabulary corresponding to each disease add and, the weight of medical vocabulary corresponding to selection adds with highest disease as true Disease corresponding to the medical text made.For example, the weight that can map some disease to obtain adds and as the deduction disease Degree of confidence, select degree of confidence highest disease to be used as final result, so as to further improve to the potential disease of medical text The accuracy rate that disease is inferred.
Further, in other embodiments, the step of word segmentation processing is carried out in above-mentioned steps S10 to the medical text received Suddenly include:
It is according to Forward Maximum Method method that character string pending in medical text and predetermined medical field is special Lexicon is (for example, the special lexicon of the medical field can be the learning-oriented of common therapy specialized dictionary or extendible capacity Medical dictionary) matched, obtain the first matching result;
It is according to reverse maximum matching method that character string pending in medical text and predetermined medical field is special Lexicon is (for example, the special lexicon of the medical field can be the learning-oriented of common therapy specialized dictionary or extendible capacity Medical dictionary) matched, obtain the second matching result.Wherein, the of the first quantity is included in first matching result One phrase, the second phrase of the second quantity is included in second matching result;Include in first matching result The individual character of three quantity, the individual character of the 4th quantity is included in second matching result.
If first quantity is equal with second quantity, and the 3rd quantity is less than or equal to the described 4th number Amount, then export first matching result (including phrase and individual character) corresponding to the medical text;
If first quantity is equal with second quantity, and the 3rd quantity is more than the 4th quantity, then defeated Go out second matching result (including phrase and individual character) corresponding to the medical text;
If first quantity and second quantity are unequal, and first quantity is more than second quantity, then Export second matching result (including phrase and individual character) corresponding to the medical text;
If first quantity and second quantity are unequal, and first quantity is less than second quantity, then Export first matching result (including phrase and individual character) corresponding to the medical text.
Word segmentation processing is carried out to medical text using bi-directional matching method in the present embodiment, by forward and reverse while divided The viscosity of front and rear combined arrangement in the pending character string of medical text is analyzed in word matching, due to phrase energy generation under normal circumstances The probability of table core views information is bigger, i.e., phrase is more likely the medical vocabulary in the medical text.Therefore, by positive and negative Find out that individual character quantity is less to carrying out segmenting matching simultaneously, the more participle matching result of phrase, to be used as medical text Word segmentation result, so as to improve the accuracy of participle, more accurately to extract the medical vocabulary in the medical text.
The present invention further provides a kind of potential disease inference system of medical text.Referring to Fig. 3, it is medical treatment of the invention The running environment schematic diagram of the preferred embodiment of potential disease inference system 10 of text.
In the present embodiment, the potential disease inference system 10 of described medical text is installed and runs on electronic installation 1 In.The electronic installation 1 may include, but be not limited only to, memory 11, processor 12 and display 13.Fig. 3 illustrate only with group Part 11-13 electronic installation 1, it should be understood that being not required for implementing all components shown, the implementation that can be substituted is more More or less components.
The memory 11 can be the internal storage unit of the electronic installation 1 in certain embodiments, such as the electricity The hard disk or internal memory of sub-device 1.The memory 11 can also be that the outside of the electronic installation 1 is deposited in further embodiments The plug-in type hard disk being equipped with storage equipment, such as the electronic installation 1, intelligent memory card (Smart Media Card, SMC), Secure digital (Secure Digital, SD) blocks, flash card (Flash Card) etc..Further, the memory 11 may be used also With both internal storage units including the electronic installation 1 or including External memory equipment.The memory 11, which is used to store, pacifies Application software and Various types of data loaded on the electronic installation 1, such as the potential disease inference system 10 of the medical text Program code etc..The memory 11 can be also used for temporarily storing the data that has exported or will export.
The processor 12 can be in certain embodiments a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chips, for running the program code stored in the memory 11 or processing number According to, such as perform the potential disease inference system 10 of the medical text etc..
The display 13 can be light-emitting diode display, liquid crystal display, touch-control liquid crystal display in certain embodiments And OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) touches device etc..The display 13 is used In being shown in the information that is handled in the electronic installation 1 and for showing that visual user interface, such as display extract Medical text in medical vocabulary, the potential disease of the medical text that is inferred to etc..The part 11- of the electronic installation 1 13 are in communication with each other by system bus.
Referring to Fig. 4, it is the functional block diagram of the first embodiment of potential disease inference system 10 of the medical text of the present invention. In the present embodiment, the potential disease inference system 10 of described medical text can be divided into one or more modules, institute State one or more module to be stored in the memory 11, and (the present embodiment is described by one or more processors Processor 12) it is performed, to complete the present invention.For example, in Fig. 4, the potential disease inference system 10 of described medical text Participle extraction module 01, determining module 02 and output module 03 can be divided into.Module alleged by the present invention is to have referred to Into the series of computation machine programmed instruction section of specific function, than program more suitable for describing the speech recognition system 10 described Implementation procedure in electronic installation 1.Describe specifically to introduce the participle extraction module 01, determining module 02 and output mould below The function of block 03.
Extraction module 01 is segmented, for being segmented to the medical text received, and will be each corresponding to the medical text Participle is matched with the special lexicon of predetermined medical field, is extracted in each participle corresponding to the medical text Medical vocabulary;
Receive medical text to be diagnosed, can such as receive user by browser, client end AP P transmissions it is to be diagnosed Medical text.In the present embodiment, after medical text is received, word segmentation processing is carried out to the medical text received first.For example, can According to punctuation mark by medical text dividing into the complete sentence of a rule, then word segmentation processing is carried out to the sentence of each cutting, Word segmentation processing such as is carried out to the sentence of each cutting using the segmenting method of string matching, such as Forward Maximum Method method, Character string in the sentence of one cutting segments from left to right;Or reverse maximum matching method, in the sentence a cutting Character string segment from right to left;Or shortest path participle method, require to cut inside the character string in the sentence of a cutting The word number gone out is minimum;Or two-way maximum matching method, it is forward and reverse while carry out participle matching.Also segmented using the meaning of a word Method carries out word segmentation processing to the sentence of each cutting, and meaning of a word participle method is the segmenting method that a kind of machine talk judges, utilizes sentence Method information and semantic information handle Ambiguity to segment.Also the sentence of each cutting is divided using statistical morphology Word processing, from the historical search record of the historical search record of active user or public users, according to the statistics of phrase, it can unite The frequency occurred in respect of a little two adjacent words is more, then can be segmented using the two adjacent words as phrase.
After completing word segmentation processing to medical text, each participle corresponding to the medical text is led with predetermined medical treatment The special lexicon in domain is matched, and the medicine in general medical dictionary is may include in the predetermined special lexicon of medical field It is dictionary, corresponding according to obtained various various diseases are extracted in a large amount of medical science texts (such as medical data of increasing income on internet) Profile information, symptom information, complications information, treatment medicine information or treat medical vocabulary in section office information, etc..Should The special lexicon of medical field can be changeless or regular according to medical data of increasing income newest on internet Update the medical vocabulary in the special lexicon of medical field.Extract in each participle corresponding to the medical text with predefining The medical vocabulary that matches of the special lexicon of medical field, you can get in the medical text with its potential disease correlation Larger information is the medical vocabulary extracted.
Determining module 02, for based on the medical professionalism database built in advance, determining the medical word in the medical text Disease corresponding to remittance;Wherein, the mapping relations comprising different type disease with medical vocabulary in the medical professionalism database;
After extracting medical vocabulary larger with its potential disease correlation in each participle corresponding to the medical text, base In the medical professionalism database built in advance, disease corresponding to the medical vocabulary in the medical text is determined.The medical professionalism In database comprising different type disease and medical vocabulary (such as according to extract to obtain in a large amount of medical science texts symptom, medicine, inspection Look into, the information vocabulary such as section office) mapping relations, such as medical professionalism database, bag can be built according to online increase income data and text Containing the specialized information such as disease and its corresponding brief introduction, symptom, complication, treatment medicine, common inspection.Different diseases based on structure The sick and mapping relations of medical vocabulary, can find the disease mapped therewith according to the medical vocabulary in the medical text extracted Disease.
Output module 03, for being exported the disease of determination as the potential disease for the medical text being inferred to.
Medical vocabulary in the medical text extracted determine corresponding to after disease, you can by the disease of determination Potential disease as the medical text being inferred to is exported, with the potential disease based on the medical text being inferred to come Carry out follow-up diagnostic recommendations.By the medical text potential disease inferential statistics in practical application, by the present embodiment The disease label accuracy rate (manual review does not have apparent error) that potential disease estimating method obtains can reach 85% or so, energy Effectively improve the accuracy rate inferred to medical text potential disease.
The present embodiment is extracted in each participle corresponding to the medical text by being segmented to the medical text received Medical vocabulary;And based on building the medical professionalism database comprising various disease with the mapping relations of medical vocabulary in advance, Disease corresponding to the medical vocabulary in the medical text is determined, using the potential disease as the medical text being inferred to.Due to Various disease and the mapping relations of medical vocabulary, and the medical treatment in medical text can be built according to various medical data resources Vocabulary finds the disease mapped therewith, is manually inferred compared to according to doctor personal experience, and more efficient and accuracy rate is higher.
As shown in figure 5, second embodiment of the invention proposes a kind of potential disease inference system of medical text, in above-mentioned reality On the basis of applying example, in addition to:
Module 04 is established, for obtaining medical data from predetermined data source, is found out from the medical data every One or more medical vocabulary corresponding to a kind of disease, and established and cured according to different type disease and the mapping relations of medical vocabulary Treat specialized database.
In the present embodiment, before the potential disease for carrying out medical text is inferred, first obtained from predetermined data source Medical data, medical professionalism number is established with the mapping relations of the different type disease in the medical data and medical vocabulary According to storehouse.The medical data can be the authentic interpretation of the various diseases obtained from existing medical data base, including it is corresponding Brief introduction, symptom, complication, treatment medicine, common the medical treatment letter corresponding to specialized information or various medicines such as check Breath, the disease type information that such as medicine cures mainly, the medical data can also be in real time or fixed by instruments such as web crawlers When from the medical data source of increasing income on internet (for example, on the question and answer of various disease, discussion etc. on each World Jam, or various Newest medical cases, medical question and answer text etc.) the certain types of information that obtains is (for example, treatment side corresponding to various disease Case, medicine, affiliated section office, clinical manifestation etc.).Found out from the medical data of acquisition corresponding to each disease one or Multiple medical vocabulary, you can medical professionalism data are established according to various disease and the mapping relations of one or more medical vocabulary Storehouse, so that medical professionalism database subsequently based on foundation carries out the deduction of potential disease.
Further, in other embodiments, each medical treatment corresponding to disease is also included in the medical professionalism database The weight of vocabulary, above-mentioned determining module 02 can be also used for:
Based on the medical professionalism database built in advance, disease corresponding to each medical vocabulary in the medical text is found out, And the weight for calculating medical vocabulary corresponding to each disease add and, the weight of medical vocabulary corresponding to selection adds and highest disease As disease corresponding to the medical text determined.
In the present embodiment, it is contemplated that medical vocabulary corresponding to a kind of disease may be one or more, a medical vocabulary Corresponding disease may also have one or more, for example, same symptom may map to obtain multiple diseases, same medicine Also a variety of diseases can be treated.Therefore, in the medical professionalism database of structure, different medical vocabulary is also assigned to different power Weight, when each medical vocabulary has multiple in the medical text found out in the medical professionalism database based on structure, to calculate The weight of medical vocabulary corresponding to each disease add and, the weight of medical vocabulary corresponding to selection adds with highest disease as true Disease corresponding to the medical text made.For example, the weight that can map some disease to obtain adds and as the deduction disease Degree of confidence, select degree of confidence highest disease to be used as final result, so as to further improve to the potential disease of medical text The accuracy rate that disease is inferred.
Further, in other embodiments, above-mentioned participle extraction module 01 is additionally operable to:
It is according to Forward Maximum Method method that character string pending in medical text and predetermined medical field is special Lexicon is (for example, the special lexicon of the medical field can be the learning-oriented of common therapy specialized dictionary or extendible capacity Medical dictionary) matched, obtain the first matching result;
It is according to reverse maximum matching method that character string pending in medical text and predetermined medical field is special Lexicon is (for example, the special lexicon of the medical field can be the learning-oriented of common therapy specialized dictionary or extendible capacity Medical dictionary) matched, obtain the second matching result.Wherein, the of the first quantity is included in first matching result One phrase, the second phrase of the second quantity is included in second matching result;Include in first matching result The individual character of three quantity, the individual character of the 4th quantity is included in second matching result.
If first quantity is equal with second quantity, and the 3rd quantity is less than or equal to the described 4th number Amount, then export first matching result (including phrase and individual character) corresponding to the medical text;
If first quantity is equal with second quantity, and the 3rd quantity is more than the 4th quantity, then defeated Go out second matching result (including phrase and individual character) corresponding to the medical text;
If first quantity and second quantity are unequal, and first quantity is more than second quantity, then Export second matching result (including phrase and individual character) corresponding to the medical text;
If first quantity and second quantity are unequal, and first quantity is less than second quantity, then Export first matching result (including phrase and individual character) corresponding to the medical text.
Word segmentation processing is carried out to medical text using bi-directional matching method in the present embodiment, by forward and reverse while divided The viscosity of front and rear combined arrangement in the pending character string of medical text is analyzed in word matching, due to phrase energy generation under normal circumstances The probability of table core views information is bigger, i.e., phrase is more likely the medical vocabulary in the medical text.Therefore, by positive and negative Find out that individual character quantity is less to carrying out segmenting matching simultaneously, the more participle matching result of phrase, to be used as medical text Word segmentation result, so as to improve the accuracy of participle, more accurately to extract the medical vocabulary in the medical text.
In addition, the present invention also provides a kind of computer-readable recording medium, the computer-readable recording medium storage has The potential disease inference system of medical text, the potential disease inference system of the medical text can be held by least one processor OK, so that the step of the potential disease estimating method of medical text at least one computing device such as above-mentioned embodiment Suddenly, the specific implementation process such as step S10, S20, S30 of potential disease estimating method of the medical text as described above, herein Repeat no more.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row His property includes, so that process, method, article or device including a series of elements not only include those key elements, and And also include the other element being not expressly set out, or also include for this process, method, article or device institute inherently Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this Other identical element also be present in the process of key element, method, article or device.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to realized by hardware, but a lot In the case of the former be more preferably embodiment.Based on such understanding, technical scheme is substantially in other words to existing The part that technology contributes can be embodied in the form of software product, and the computer software product is stored in a storage In medium (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal equipment (can be mobile phone, calculate Machine, server, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
Above by reference to the preferred embodiments of the present invention have been illustrated, not thereby limit to the interest field of the present invention.On State that sequence number of the embodiment of the present invention is for illustration only, do not represent the quality of embodiment.Patrolled in addition, though showing in flow charts Order is collected, but in some cases, can be with the step shown or described by being performed different from order herein.
Those skilled in the art do not depart from the scope of the present invention and essence, can have a variety of flexible programs to realize the present invention, It can be used for another embodiment for example as the feature of one embodiment and obtain another embodiment.All technologies with the present invention The all any modification, equivalent and improvement made within design, all should be within the interest field of the present invention.

Claims (10)

1. a kind of potential disease estimating method of medical text, it is characterised in that the described method comprises the following steps:
A, the medical text received is segmented, and each participle corresponding to the medical text is led with predetermined medical treatment The special lexicon in domain is matched, and extracts the medical vocabulary in each participle corresponding to the medical text;
B, based on the medical professionalism database built in advance, disease corresponding to the medical vocabulary in the medical text is determined;Wherein, Mapping relations comprising different type disease with medical vocabulary in the medical professionalism database;
C, exported the disease of determination as the potential disease for the medical text being inferred to.
2. the potential disease estimating method of medical text as claimed in claim 1, it is characterised in that before the step A also Including:
Obtain medical data from predetermined data source, found out from the medical data corresponding to each disease one or Multiple medical vocabulary, and establish medical professionalism database according to the mapping relations of different type disease and medical vocabulary.
3. the potential disease estimating method of medical text as claimed in claim 1, it is characterised in that the medical vocabulary bag Include:
Medical treatment in profile information, symptom information, complications information, treatment medicine information or treatment section office information corresponding to disease Vocabulary.
4. the potential disease estimating method of the medical text as any one of claim 1-3, it is characterised in that the doctor The weight that each medical vocabulary corresponding to disease is also included in specialized database is treated, the step B includes:
Based on the medical professionalism database built in advance, disease corresponding to each medical vocabulary in the medical text is found out, and count Calculate medical vocabulary corresponding to each disease weight add and, the weight of medical vocabulary corresponding to selection adds and highest disease conduct Disease corresponding to the medical text determined.
5. the potential disease estimating method of the medical text as any one of claim 1-3, it is characterised in that described right The step of medical text progress word segmentation processing received, includes:
The medical text is matched with the special lexicon of predetermined medical field according to Forward Maximum Method method, obtained First matching result, include the first phrase of the first quantity and the individual character of the 3rd quantity in first matching result;
The medical text is matched with the special lexicon of predetermined medical field according to reverse maximum matching method, obtained Second matching result, include the second phrase of the second quantity and the individual character of the 4th quantity in second matching result;
If first quantity is equal with second quantity, and the 3rd quantity is less than or equal to the 4th quantity, The then word segmentation result using first matching result as the medical text;
If first quantity is equal with second quantity, and the 3rd quantity is more than the 4th quantity, then by described in Word segmentation result of second matching result as the medical text;
If first quantity and second quantity are unequal, and first quantity is more than second quantity, then by institute State word segmentation result of second matching result as the medical text;
If first quantity and second quantity are unequal, and first quantity is less than second quantity, then by institute State word segmentation result of first matching result as the medical text.
6. a kind of electronic installation, it is characterised in that the electronic installation includes memory, processor and is stored in the memory The potential disease inference system for the medical text gone up and can run on the processor, the potential disease of the medical text push away Following steps are realized when disconnected system is by the computing device:
A, the medical text received is segmented, and each participle corresponding to the medical text is led with predetermined medical treatment The special lexicon in domain is matched, and extracts the medical vocabulary in each participle corresponding to the medical text;
B, based on the medical professionalism database built in advance, disease corresponding to the medical vocabulary in the medical text is determined;Wherein, Mapping relations comprising different type disease with medical vocabulary in the medical professionalism database;
C, exported the disease of determination as the potential disease for the medical text being inferred to.
7. electronic installation as claimed in claim 6, it is characterised in that before the step A, the processor is additionally operable to hold The potential disease inference system of the row medical text, to realize following steps:
Obtain medical data from predetermined data source, found out from the medical data corresponding to each disease one or Multiple medical vocabulary, and establish medical professionalism database according to the mapping relations of different type disease and medical vocabulary.
8. electronic installation as claimed in claim 6, it is characterised in that the medical vocabulary includes:
Medical treatment in profile information, symptom information, complications information, treatment medicine information or treatment section office information corresponding to disease Vocabulary.
9. such as the electronic installation any one of claim 6-8, it is characterised in that also wrapped in the medical professionalism database Weight containing each medical vocabulary corresponding to disease, the step B include:
Based on the medical professionalism database built in advance, disease corresponding to each medical vocabulary in the medical text is found out, and count Calculate medical vocabulary corresponding to each disease weight add and, the weight of medical vocabulary corresponding to selection adds and highest disease conduct Disease corresponding to the medical text determined.
10. a kind of computer-readable recording medium, the computer-readable recording medium storage has the potential disease of medical text Inference system, the potential disease inference system of the medical text can be by least one computing device, so that described at least one The step of potential disease estimating method of medical text of the individual computing device as any one of claim 1-5.
CN201710313520.1A 2017-05-05 2017-05-05 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text Pending CN107680689A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710313520.1A CN107680689A (en) 2017-05-05 2017-05-05 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text
PCT/CN2018/076149 WO2018201772A1 (en) 2017-05-05 2018-02-10 Method and system for inferring potential disease from medical text, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710313520.1A CN107680689A (en) 2017-05-05 2017-05-05 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text

Publications (1)

Publication Number Publication Date
CN107680689A true CN107680689A (en) 2018-02-09

Family

ID=61134116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710313520.1A Pending CN107680689A (en) 2017-05-05 2017-05-05 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text

Country Status (2)

Country Link
CN (1) CN107680689A (en)
WO (1) WO2018201772A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018201772A1 (en) * 2017-05-05 2018-11-08 平安科技(深圳)有限公司 Method and system for inferring potential disease from medical text, and readable storage medium
CN109036506A (en) * 2018-07-25 2018-12-18 平安科技(深圳)有限公司 Monitoring and managing method, electronic device and the readable storage medium storing program for executing of internet medical treatment interrogation
CN109192321A (en) * 2018-09-26 2019-01-11 北京理工大学 The construction method and calculating storage device of drug knowledge mapping
CN109192300A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Intelligent way of inquisition, system, computer equipment and storage medium
CN109215754A (en) * 2018-09-10 2019-01-15 平安科技(深圳)有限公司 Medical record data processing method, device, computer equipment and storage medium
CN109616165A (en) * 2018-11-07 2019-04-12 平安科技(深圳)有限公司 Medical information methods of exhibiting and device
CN109698018A (en) * 2018-12-24 2019-04-30 广州天鹏计算机科技有限公司 Medical text handling method, device, computer equipment and storage medium
WO2020034810A1 (en) * 2018-08-14 2020-02-20 平安医疗健康管理股份有限公司 Search method and apparatus, computer device and storage medium
WO2020103469A1 (en) * 2018-05-29 2020-05-28 平安医疗健康管理股份有限公司 Method and device for establishing medical mapping database, computer apparatus, and storage medium
WO2020177230A1 (en) * 2019-03-07 2020-09-10 平安科技(深圳)有限公司 Medical data classification method and apparatus based on machine learning, and computer device and storage medium
CN112002416A (en) * 2020-08-23 2020-11-27 吾征智能技术(北京)有限公司 Disease symptom prediction system based on urine character self-learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915299A (en) * 2012-10-23 2013-02-06 海信集团有限公司 Word segmentation method and device
CN104102816A (en) * 2014-06-20 2014-10-15 周晋 Symptom match and machine learning-based automatic diagnosis system and method
CN104484845A (en) * 2014-12-30 2015-04-01 天津迈沃医药技术有限公司 Disease self-analysis method based on medical ontology database
CN104915413A (en) * 2015-06-05 2015-09-16 广东顺德中山大学卡内基梅隆大学国际联合研究院 Health monitoring method and health monitoring system
CN105139237A (en) * 2015-09-25 2015-12-09 百度在线网络技术(北京)有限公司 Information push method and apparatus
CN106372439A (en) * 2016-09-21 2017-02-01 北京大学 Method for acquiring and processing disease symptoms and weight knowledge thereof based on case library
CN106557653A (en) * 2016-11-15 2017-04-05 合肥工业大学 A kind of portable medical intelligent medical guide system and method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8145664B2 (en) * 2008-08-15 2012-03-27 Siemens Aktiengesellschaft Disease oriented user interfaces
CN105138829B (en) * 2015-08-13 2018-01-12 易保互联医疗信息科技(北京)有限公司 A kind of natural language processing method and system of Chinese medical information
CN105095665B (en) * 2015-08-13 2018-07-06 易保互联医疗信息科技(北京)有限公司 A kind of natural language processing method and system of Chinese medical diagnosis on disease information
CN106095913A (en) * 2016-06-08 2016-11-09 广州同构医疗科技有限公司 A kind of electronic health record text structure method
CN107680689A (en) * 2017-05-05 2018-02-09 平安科技(深圳)有限公司 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915299A (en) * 2012-10-23 2013-02-06 海信集团有限公司 Word segmentation method and device
CN104765724A (en) * 2012-10-23 2015-07-08 海信集团有限公司 Word segmenting method and device
CN104102816A (en) * 2014-06-20 2014-10-15 周晋 Symptom match and machine learning-based automatic diagnosis system and method
CN104484845A (en) * 2014-12-30 2015-04-01 天津迈沃医药技术有限公司 Disease self-analysis method based on medical ontology database
CN104915413A (en) * 2015-06-05 2015-09-16 广东顺德中山大学卡内基梅隆大学国际联合研究院 Health monitoring method and health monitoring system
CN105139237A (en) * 2015-09-25 2015-12-09 百度在线网络技术(北京)有限公司 Information push method and apparatus
CN106372439A (en) * 2016-09-21 2017-02-01 北京大学 Method for acquiring and processing disease symptoms and weight knowledge thereof based on case library
CN106557653A (en) * 2016-11-15 2017-04-05 合肥工业大学 A kind of portable medical intelligent medical guide system and method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018201772A1 (en) * 2017-05-05 2018-11-08 平安科技(深圳)有限公司 Method and system for inferring potential disease from medical text, and readable storage medium
WO2020103469A1 (en) * 2018-05-29 2020-05-28 平安医疗健康管理股份有限公司 Method and device for establishing medical mapping database, computer apparatus, and storage medium
CN109036506A (en) * 2018-07-25 2018-12-18 平安科技(深圳)有限公司 Monitoring and managing method, electronic device and the readable storage medium storing program for executing of internet medical treatment interrogation
WO2020034810A1 (en) * 2018-08-14 2020-02-20 平安医疗健康管理股份有限公司 Search method and apparatus, computer device and storage medium
CN109192300A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Intelligent way of inquisition, system, computer equipment and storage medium
CN109215754A (en) * 2018-09-10 2019-01-15 平安科技(深圳)有限公司 Medical record data processing method, device, computer equipment and storage medium
CN109192321A (en) * 2018-09-26 2019-01-11 北京理工大学 The construction method and calculating storage device of drug knowledge mapping
CN109616165A (en) * 2018-11-07 2019-04-12 平安科技(深圳)有限公司 Medical information methods of exhibiting and device
CN109698018A (en) * 2018-12-24 2019-04-30 广州天鹏计算机科技有限公司 Medical text handling method, device, computer equipment and storage medium
WO2020177230A1 (en) * 2019-03-07 2020-09-10 平安科技(深圳)有限公司 Medical data classification method and apparatus based on machine learning, and computer device and storage medium
CN112002416A (en) * 2020-08-23 2020-11-27 吾征智能技术(北京)有限公司 Disease symptom prediction system based on urine character self-learning

Also Published As

Publication number Publication date
WO2018201772A1 (en) 2018-11-08

Similar Documents

Publication Publication Date Title
CN107680689A (en) Potential disease estimating method, system and the readable storage medium storing program for executing of medical text
CN108629043B (en) Webpage target information extraction method, device and storage medium
CN113051356B (en) Open relation extraction method and device, electronic equipment and storage medium
CN113821622B (en) Answer retrieval method and device based on artificial intelligence, electronic equipment and medium
CN111695354A (en) Text question-answering method and device based on named entity and readable storage medium
CN112988963B (en) User intention prediction method, device, equipment and medium based on multi-flow nodes
CN113378970B (en) Sentence similarity detection method and device, electronic equipment and storage medium
CN113360654B (en) Text classification method, apparatus, electronic device and readable storage medium
CN111723870A (en) Data set acquisition method, device, equipment and medium based on artificial intelligence
CN115238670B (en) Information text extraction method, device, equipment and storage medium
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN113626704A (en) Method, device and equipment for recommending information based on word2vec model
CN116450829A (en) Medical text classification method, device, equipment and medium
CN113344125B (en) Long text matching recognition method and device, electronic equipment and storage medium
CN114840684A (en) Map construction method, device and equipment based on medical entity and storage medium
CN116821373A (en) Map-based prompt recommendation method, device, equipment and medium
CN113918704A (en) Question-answering method and device based on machine learning, electronic equipment and medium
CN112632264A (en) Intelligent question and answer method and device, electronic equipment and storage medium
CN116341646A (en) Pretraining method and device of Bert model, electronic equipment and storage medium
CN116739001A (en) Text relation extraction method, device, equipment and medium based on contrast learning
CN116468025A (en) Electronic medical record structuring method and device, electronic equipment and storage medium
CN116628162A (en) Semantic question-answering method, device, equipment and storage medium
CN115346095A (en) Visual question answering method, device, equipment and storage medium
CN114595321A (en) Question marking method and device, electronic equipment and storage medium
CN113204962A (en) Word sense disambiguation method, device, equipment and medium based on graph expansion structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180209