WO2018201772A1 - Method and system for inferring potential disease from medical text, and readable storage medium - Google Patents

Method and system for inferring potential disease from medical text, and readable storage medium Download PDF

Info

Publication number
WO2018201772A1
WO2018201772A1 PCT/CN2018/076149 CN2018076149W WO2018201772A1 WO 2018201772 A1 WO2018201772 A1 WO 2018201772A1 CN 2018076149 W CN2018076149 W CN 2018076149W WO 2018201772 A1 WO2018201772 A1 WO 2018201772A1
Authority
WO
WIPO (PCT)
Prior art keywords
medical
disease
text
vocabulary
medical text
Prior art date
Application number
PCT/CN2018/076149
Other languages
French (fr)
Chinese (zh)
Inventor
赵清源
韦邕
吕梓燊
徐亮
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2018201772A1 publication Critical patent/WO2018201772A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16ZINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
    • G16Z99/00Subject matter not provided for in other main groups of this subclass
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the present application relates to the field of computer technology, and in particular, to a potential disease inference method, system, and readable storage medium for medical text.
  • the first step in dealing with medical texts is to infer potential diseases in order to make the next diagnostic recommendations.
  • the underlying disease inference for the medical text can only artificially infer the underlying disease in the medical text according to the doctor's personal experience, and the efficiency is low, and the existing medical data resources cannot be utilized to effectively infer the underlying disease.
  • the main purpose of the present application is to provide a potential disease inference method, system and readable storage medium for medical texts, which aim to accurately and efficiently infer potential diseases of medical texts.
  • a first aspect of the present application provides a method for inferring a potential disease of a medical text, the method comprising the following steps:
  • the second aspect of the present application further provides a potential disease inference system for a medical text, where the potential disease inference system of the medical text includes:
  • a word segmentation module configured to segment the received medical text, and match each word segment corresponding to the medical text with a predetermined medical field-specific vocabulary to extract a medical vocabulary in each word segment corresponding to the medical text;
  • a determining module configured to determine a disease corresponding to the medical vocabulary in the medical text based on a pre-built medical professional database; wherein the medical professional database includes a mapping relationship between different types of diseases and medical vocabulary;
  • An output module for outputting the determined disease as an inferred potential disease of the medical text.
  • a third aspect of the present application further provides a computer readable storage medium storing a potential disease inference system of medical text
  • the potential disease inference system of the medical text may be Executing by at least one processor to cause the at least one processor to perform the steps of the potential disease inference method of the medical text as described above.
  • the potential disease inferring method, system and readable storage medium of the medical text proposed by the present application extracts the medical vocabulary in each participle corresponding to the medical text by segmenting the received medical text; and based on the pre-built inclusion
  • a medical professional database that maps the relationship between disease and medical vocabulary to determine the disease corresponding to the medical vocabulary in the medical text as a delineated underlying disease of the medical text. Because it can construct the mapping relationship between different diseases and medical vocabulary according to various medical data resources, and find the disease mapped according to the medical vocabulary in the medical text, it is more efficient and accurate than manual estimation based on the doctor's personal experience. high.
  • FIG. 1 is a schematic flow chart of a first embodiment of a method for inferring a potential disease of a medical text according to the present application
  • FIG. 2 is a schematic flow chart of a second embodiment of a method for inferring a potential disease of a medical text according to the present application;
  • FIG. 3 is a schematic diagram of an operating environment of a preferred embodiment of the underlying disease inference system 10 of the medical text of the present application;
  • FIG. 4 is a schematic diagram of functional modules of a first embodiment of a potential disease inference system for medical text of the present application
  • FIG. 5 is a schematic diagram of functional modules of a second embodiment of a potential disease inference system for medical text of the present application.
  • the present application provides a method for inferring a potential disease of a medical text.
  • FIG. 1 is a schematic flow chart of an embodiment of a method for estimating a potential disease of a medical text according to the present application.
  • the potential disease inference method of the medical text includes:
  • Step S10 segmenting the received medical text, and matching each word segment corresponding to the medical text with a predetermined medical field-specific vocabulary, and extracting the medical vocabulary in each participle corresponding to the medical text.
  • Receiving medical text to be diagnosed such as receiving medical text to be diagnosed sent by the user through a browser, a client APP, or the like.
  • the received medical text is first subjected to word segmentation processing.
  • the medical text can be divided into a complete statement according to the punctuation marks, and then the word segmentation processing is performed on each segmented sentence, for example, the word segmentation method can be used to perform segmentation processing on each segmented sentence, such as positive direction.
  • the maximum matching method which divides the string in a segmented statement from left to right; or, the inverse maximum matching method, divides the string in a segmented statement from right to left; or, the shortest path
  • Word segmentation the string in a segmented statement requires the number of words to be cut out to be the least; or, the two-way maximum matching method, and the word segmentation is performed in both forward and reverse directions.
  • Word segmentation can also be used to classify each segmented sentence.
  • Word segmentation is a segmentation method for machine speech judgment. It uses syntactic information and semantic information to deal with ambiguity phenomena to segment words. Statistical segmentation can also be used to process word segmentation of each segmented sentence.
  • the respective word segments corresponding to the medical text are matched with the predetermined medical field-specific vocabulary, and the predetermined medical field-specific vocabulary may include the medical lexicon in the general medical dictionary, according to a large number.
  • the medical field-specific vocabulary can be fixed, or it can be based on the latest open source medical data on the Internet to regularly update the medical vocabulary in the medical field-specific vocabulary.
  • the medical vocabulary matching the predetermined medical field-specific vocabulary among the respective word segments corresponding to the medical text is extracted, and the medical vocabulary that is related to the potential disease in the medical text, that is, the extracted medical vocabulary can be obtained.
  • Step S20 Determine a disease corresponding to the medical vocabulary in the medical text based on the pre-built medical professional database; wherein the medical professional database includes a mapping relationship between different types of diseases and medical vocabulary.
  • the medical vocabulary corresponding to the medical vocabulary in the medical text is determined based on the pre-built medical professional database.
  • the medical professional database contains mapping relationships between different types of diseases and medical vocabulary (such as symptoms, drugs, examinations, departments and other information vocabulary extracted from a large number of medical texts), such as building medical professional materials based on open source data and texts.
  • the database contains professional information such as diseases and their corresponding profiles, symptoms, complications, treatments, and common tests. Based on the constructed mapping relationship between different diseases and medical vocabulary, the disease mapped with the medical vocabulary in the medical text can be found.
  • step S30 the determined disease is output as the inferred potential disease of the medical text.
  • the determined disease After determining the corresponding disease according to the extracted medical vocabulary in the medical text, the determined disease can be output as the inferred potential disease of the medical text, based on the inferred potential disease of the medical text. Subsequent diagnostic recommendations.
  • the disease label accuracy rate obtained by the potential disease inference method in this embodiment can reach about 85%, which can effectively improve the potential disease inference for the medical text. The accuracy rate.
  • the medical vocabulary in each participle corresponding to the medical text is extracted by segmenting the received medical text; and the medical text is determined based on a pre-built medical professional database containing mapping relationships between different diseases and medical vocabulary.
  • the medical vocabulary corresponds to the disease as a potential disease inferred from the medical text. Because it can construct the mapping relationship between different diseases and medical vocabulary according to various medical data resources, and find the disease mapped according to the medical vocabulary in the medical text, it is more efficient and accurate than manual estimation based on the doctor's personal experience. high.
  • the second embodiment of the present application provides a method for inferring a potential disease of a medical text.
  • the method before the step S10, the method further includes:
  • Step S40 Obtain medical data from a predetermined data source, find one or more medical vocabularies corresponding to each disease from the medical data, and establish a medical professional database according to a mapping relationship between different types of diseases and medical vocabulary.
  • the medical data is first acquired from the predetermined data source to establish a medical professional database according to the mapping relationship between the different types of diseases and the medical vocabulary in the medical data.
  • the medical data may be an authoritative interpretation of various diseases obtained from an existing medical database, including corresponding information such as profiles, symptoms, complications, therapeutic drugs, common examinations, etc., or medical treatments corresponding to various drugs.
  • Information such as the type of disease in which the drug is administered, the medical data can also be an open source medical data source on the Internet in real time or regularly through tools such as web crawlers (for example, questions and answers about different diseases in various forums, etc., or Specific types of information obtained by various latest medical cases, medical question and answer texts, etc.
  • the medical professional database further includes the weight of each medical vocabulary corresponding to the disease
  • the step S20 may include:
  • the medical vocabulary corresponding to one disease may be one or more, and one medical vocabulary may have one or more diseases.
  • the same symptom may map multiple diseases, the same type.
  • Medicines can also treat a variety of diseases. Therefore, in the medical professional database constructed, different medical vocabularies are given different weights, so that when there are multiple medical vocabularies in the medical texts found based on the constructed medical professional database, the medical vocabulary corresponding to each disease can be calculated. The sum of the weights is selected, and the weight of the corresponding medical vocabulary is added to add the highest disease as the disease corresponding to the medical text determined. For example, the weight of a disease map can be summed as the degree of self-confidence of the disease, and the disease with the highest degree of confidence is selected as the final result, thereby further improving the accuracy of the underestimation of the medical text.
  • the step of performing word segmentation processing on the received medical text in the above step S10 includes:
  • the string to be processed in the medical text and the predetermined medical field-specific vocabulary (for example, the medical field-specific vocabulary may be a general medical professional vocabulary, or may be a scalable learning medical
  • the lexicon is matched to obtain the first matching result
  • the character string to be processed in the medical text and the predetermined medical field-specific vocabulary (for example, the medical field-specific vocabulary can be a general medical professional vocabulary, or can be a scalable learning medical word.
  • the library is matched to obtain the second matching result.
  • the first matching result includes a first number of first phrases
  • the second matching result includes a second number of second phrases
  • the first matching result includes a third number of words
  • the second matching result includes a fourth number of words.
  • the first quantity is equal to the second quantity, and the third quantity is less than or equal to the fourth quantity, outputting the first matching result (including a phrase and a single word) corresponding to the medical text;
  • the first quantity is equal to the second quantity, and the third quantity is greater than the fourth quantity, outputting the second matching result (including a phrase and a single word) corresponding to the medical text;
  • the first quantity is not equal to the second quantity, and the first quantity is greater than the second quantity, outputting the second matching result (including a phrase and a single word) corresponding to the medical text;
  • the first quantity is not equal to the second quantity, and the first quantity is less than the second quantity, outputting the first matching result (including a phrase and a single word) corresponding to the medical text.
  • the two-way matching method is used to perform word segmentation processing on medical texts, and the word segmentation matching is performed by forward and reverse simultaneous segmentation to analyze the stickiness of the combined content in the character string to be processed of the medical text, since the phrase can represent the core viewpoint information under normal circumstances.
  • the probability is greater, that is, the phrase is more likely to be the medical vocabulary in the medical text. Therefore, through the simultaneous and reverse word segmentation matching, the word segment matching result with fewer words and more phrases is found to be used as the word segmentation result of the medical text, thereby improving the accuracy of the word segmentation, so as to extract the medical text more accurately.
  • Medical vocabulary is used to perform word segmentation processing on medical texts, and the word segmentation matching is performed by forward and reverse simultaneous segmentation to analyze the stickiness of the combined content in the character string to be processed of the medical text, since the phrase can represent the core viewpoint information under normal circumstances.
  • the probability is greater, that is, the phrase is more likely to be the medical vocabulary in the medical text. Therefore, through the simultaneous and reverse word segmentation matching
  • the application further provides a potential disease inference system for medical text.
  • FIG. 3 is a schematic diagram of an operating environment of a preferred embodiment of the underlying disease inference system 10 of the medical text of the present application.
  • the medical text potential disease inference system 10 is installed and operated in the electronic device 1.
  • the electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a display 13.
  • Figure 3 shows only the electronic device 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a hard disk or memory of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC), and a secure digital device. (Secure Digital, SD) card, flash card, etc.
  • SMC smart memory card
  • secure digital device Secure Digital, SD
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 is used to store application software installed on the electronic device 1 and various types of data, such as program codes of the underlying disease inference system 10 of the medical text.
  • the memory 11 can also be used to temporarily store data that has been output or is about to be output.
  • the processor 12 may be a central processing unit (CPU), a microprocessor or other data processing chip for running program code or processing data stored in the memory 11, for example A potential disease inference system 10 or the like that executes the medical text.
  • CPU central processing unit
  • microprocessor or other data processing chip for running program code or processing data stored in the memory 11, for example A potential disease inference system 10 or the like that executes the medical text.
  • the display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor or the like in some embodiments.
  • the display 13 is used to display information processed in the electronic device 1 and a user interface for displaying visualizations, such as displaying medical vocabulary in the extracted medical text, inferred potential disease of the medical text, and the like.
  • the components 11-13 of the electronic device 1 communicate with one another via a system bus.
  • FIG. 4 is a functional block diagram of a first embodiment of the underlying disease inference system 10 of the medical text of the present application.
  • the potential disease inference system 10 of the medical text may be segmented into one or more modules, the one or more modules being stored in the memory 11 and processed by one or more
  • the present invention (this embodiment is the processor 12) is executed to complete the application.
  • the potential disease inference system 10 of the medical text may be divided into a component word extraction module 01, a determination module 02, and an output module 03.
  • a module referred to in this application refers to a series of computer program instructions that are capable of performing a particular function, and are more suitable than the program to describe the execution of the speech recognition system 10 in the electronic device 1.
  • the following description will specifically describe the functions of the word segmentation module 01, the determination module 02, and the output module 03.
  • the word segmentation module 01 is configured to segment the received medical text, and match each word segment corresponding to the medical text with a predetermined medical field-specific vocabulary to extract the medical vocabulary in each participle corresponding to the medical text. ;
  • Receiving medical text to be diagnosed such as receiving medical text to be diagnosed sent by the user through a browser, a client APP, or the like.
  • the received medical text is first subjected to word segmentation processing.
  • the medical text can be divided into a complete statement according to the punctuation marks, and then the word segmentation processing is performed on each segmented sentence, for example, the word segmentation method can be used to perform segmentation processing on each segmented sentence, such as positive direction.
  • the maximum matching method which divides the string in a segmented statement from left to right; or, the inverse maximum matching method, divides the string in a segmented statement from right to left; or, the shortest path
  • Word segmentation the string in a segmented statement requires the number of words to be cut out to be the least; or, the two-way maximum matching method, and the word segmentation is performed in both forward and reverse directions.
  • Word segmentation can also be used to classify each segmented sentence.
  • Word segmentation is a segmentation method for machine speech judgment. It uses syntactic information and semantic information to deal with ambiguity phenomena to segment words. Statistical segmentation can also be used to process word segmentation of each segmented sentence.
  • the respective word segments corresponding to the medical text are matched with the predetermined medical field-specific vocabulary, and the predetermined medical field-specific vocabulary may include the medical lexicon in the general medical dictionary, according to a large number.
  • the medical field-specific vocabulary can be fixed, or it can be based on the latest open source medical data on the Internet to regularly update medical vocabulary in the medical field-specific vocabulary.
  • the medical vocabulary matching the predetermined medical field-specific vocabulary among the respective word segments corresponding to the medical text is extracted, and the medical vocabulary that is related to the potential disease in the medical text, that is, the extracted medical vocabulary can be obtained.
  • a determining module 02 configured to determine, according to a pre-built medical professional database, a disease corresponding to the medical vocabulary in the medical text; wherein the medical professional database includes a mapping relationship between different types of diseases and medical vocabulary;
  • the medical vocabulary corresponding to the medical vocabulary in the medical text is determined based on the pre-built medical professional database.
  • the medical professional database contains mapping relationships between different types of diseases and medical vocabulary (such as symptoms, drugs, examinations, departments and other information vocabulary extracted from a large number of medical texts), such as building medical professional materials based on open source data and texts.
  • the database contains professional information such as diseases and their corresponding profiles, symptoms, complications, treatments, and common tests. Based on the constructed mapping relationship between different diseases and medical vocabulary, the disease mapped with the medical vocabulary in the medical text can be found.
  • the output module 03 is configured to output the determined disease as the inferred potential disease of the medical text.
  • the determined disease After determining the corresponding disease according to the extracted medical vocabulary in the medical text, the determined disease can be output as the inferred potential disease of the medical text, based on the inferred potential disease of the medical text. Subsequent diagnostic recommendations.
  • the disease label accuracy rate obtained by the potential disease inference method in this embodiment can reach about 85%, which can effectively improve the potential disease inference for the medical text. The accuracy rate.
  • the medical vocabulary in each participle corresponding to the medical text is extracted by segmenting the received medical text; and the medical text is determined based on a pre-built medical professional database containing mapping relationships between different diseases and medical vocabulary.
  • the medical vocabulary corresponds to the disease as a potential disease inferred from the medical text. Because it can construct the mapping relationship between different diseases and medical vocabulary according to various medical data resources, and find the disease mapped according to the medical vocabulary in the medical text, it is more efficient and accurate than manual estimation based on the doctor's personal experience. high.
  • the second embodiment of the present application provides a potential disease inference system for a medical text. Based on the foregoing embodiments, the method further includes:
  • the establishing module 04 is configured to obtain medical data from a predetermined data source, find one or more medical vocabularies corresponding to each disease from the medical data, and establish a medical relationship according to a mapping relationship between different types of diseases and medical vocabulary Professional database.
  • the medical data is first acquired from a predetermined data source to establish a medical professional database according to the mapping relationship between different types of diseases and medical vocabulary in the medical data.
  • the medical data may be an authoritative interpretation of various diseases obtained from an existing medical database, including corresponding information such as profiles, symptoms, complications, therapeutic drugs, common examinations, etc., or medical treatments corresponding to various drugs.
  • Information such as the type of disease in which the drug is administered, the medical data can also be an open source medical data source on the Internet in real time or regularly through tools such as web crawlers (for example, questions and answers about different diseases in various forums, etc., or Specific types of information obtained by various latest medical cases, medical question and answer texts, etc.
  • the medical professional database further includes the weight of each medical vocabulary corresponding to the disease
  • the determining module 02 may further be used to:
  • the medical vocabulary corresponding to one disease may be one or more, and one medical vocabulary may have one or more diseases.
  • the same symptom may map multiple diseases, the same type.
  • Medicines can also treat a variety of diseases. Therefore, in the medical professional database constructed, different medical vocabularies are given different weights, so that when there are multiple medical vocabularies in the medical texts found based on the constructed medical professional database, the medical vocabulary corresponding to each disease can be calculated. The sum of the weights is selected, and the weight of the corresponding medical vocabulary is added to add the highest disease as the disease corresponding to the medical text determined. For example, the weight of a disease map can be summed as the degree of self-confidence of the disease, and the disease with the highest degree of confidence is selected as the final result, thereby further improving the accuracy of the underestimation of the medical text.
  • the word segmentation module 01 is further configured to:
  • the string to be processed in the medical text and the predetermined medical field-specific vocabulary (for example, the medical field-specific vocabulary may be a general medical professional vocabulary, or may be a scalable learning medical
  • the lexicon is matched to obtain the first matching result
  • the character string to be processed in the medical text and the predetermined medical field-specific vocabulary (for example, the medical field-specific vocabulary can be a general medical professional vocabulary, or can be a scalable learning medical word.
  • the library is matched to obtain the second matching result.
  • the first matching result includes a first number of first phrases
  • the second matching result includes a second number of second phrases
  • the first matching result includes a third number of words
  • the second matching result includes a fourth number of words.
  • the first quantity is equal to the second quantity, and the third quantity is less than or equal to the fourth quantity, outputting the first matching result (including a phrase and a single word) corresponding to the medical text;
  • the first quantity is equal to the second quantity, and the third quantity is greater than the fourth quantity, outputting the second matching result (including a phrase and a single word) corresponding to the medical text;
  • the first quantity is not equal to the second quantity, and the first quantity is greater than the second quantity, outputting the second matching result (including a phrase and a single word) corresponding to the medical text;
  • the first quantity is not equal to the second quantity, and the first quantity is less than the second quantity, outputting the first matching result (including a phrase and a single word) corresponding to the medical text.
  • the two-way matching method is used to perform word segmentation processing on medical texts, and the word segmentation matching is performed by forward and reverse simultaneous segmentation to analyze the stickiness of the combined content in the character string to be processed of the medical text, since the phrase can represent the core viewpoint information under normal circumstances.
  • the probability is greater, that is, the phrase is more likely to be the medical vocabulary in the medical text. Therefore, through the simultaneous and reverse word segmentation matching, the word segment matching result with fewer words and more phrases is found to be used as the word segmentation result of the medical text, thereby improving the accuracy of the word segmentation, so as to extract the medical text more accurately.
  • Medical vocabulary is used to perform word segmentation processing on medical texts, and the word segmentation matching is performed by forward and reverse simultaneous segmentation to analyze the stickiness of the combined content in the character string to be processed of the medical text, since the phrase can represent the core viewpoint information under normal circumstances.
  • the probability is greater, that is, the phrase is more likely to be the medical vocabulary in the medical text. Therefore, through the simultaneous and reverse word segmentation matching
  • the present application also provides a computer readable storage medium storing a potential disease inference system of medical text, the potential disease inference system of the medical text being executable by at least one processor such that The at least one processor performs the steps of the potential disease inference method of the medical text in the above embodiment, and the specific implementation processes of the steps S10, S20, S30, etc. of the potential disease inference method of the medical text are as described above, and are not Let me repeat.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and can also be implemented by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present application which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

Provided are a method and system for inferring a potential disease from a medical text, and a readable storage medium. The method comprises: performing segmentation on a received medical text, performing matching between respective words corresponding to the medical text and a pre-determined medical-specific terminology base, and extracting a medical terminology from the respective words corresponding to the medical text (S10); determining, on the basis of a pre-established medical specialty database, a disease corresponding to the medical terminology in the medical text (S20); and outputting the determined disease as a potential disease inferred from the medical text (S30). The method enables accurate and highly efficient inference of a potential disease from a medical text.

Description

医疗文本的潜在疾病推断方法、系统及可读存储介质Potential disease inference method, system and readable storage medium for medical text
优先权申明Priority claim
本申请基于巴黎公约申明享有2017年5月5日递交的申请号为CN2017103135201、名称为“医疗文本的潜在疾病推断方法、系统及可读存储介质”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。The present application is based on the priority of the Chinese Patent Application for the application of the Chinese Patent Application No. CN2017103135201, entitled "Potential Disease Inferring Methods, Systems and Readable Storage Media for Medical Texts", filed on May 5, 2017. The entire content is incorporated herein by reference.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种医疗文本的潜在疾病推断方法、系统及可读存储介质。The present application relates to the field of computer technology, and in particular, to a potential disease inference method, system, and readable storage medium for medical text.
背景技术Background technique
一般情况下,处理医疗文本的第一步都是推断潜在的疾病,进而才能进行接下来的诊断建议。现有技术中针对医疗文本的潜在疾病推断,只能根据医生的个人经验人工推断该医疗文本中的潜在疾病,效率较低,无法利用现有的医疗数据资源来进行潜在疾病的有效推断。In general, the first step in dealing with medical texts is to infer potential diseases in order to make the next diagnostic recommendations. In the prior art, the underlying disease inference for the medical text can only artificially infer the underlying disease in the medical text according to the doctor's personal experience, and the efficiency is low, and the existing medical data resources cannot be utilized to effectively infer the underlying disease.
发明内容Summary of the invention
本申请的主要目的在于提供一种医疗文本的潜在疾病推断方法、系统及可读存储介质,旨在准确、高效地推断出医疗文本的潜在疾病。The main purpose of the present application is to provide a potential disease inference method, system and readable storage medium for medical texts, which aim to accurately and efficiently infer potential diseases of medical texts.
为实现上述目的,本申请第一方面提供的一种医疗文本的潜在疾病推断方法,所述方法包括以下步骤:In order to achieve the above object, a first aspect of the present application provides a method for inferring a potential disease of a medical text, the method comprising the following steps:
A、对收到的医疗文本进行分词,并将该医疗文本对应的各个分词与预先确定的医疗领域专用词汇库进行匹配,提取出该医疗文本对应的各个分词中的医疗词汇;A. segmentation of the received medical text, and matching each word segment corresponding to the medical text with a predetermined medical field-specific vocabulary to extract medical vocabulary in each participle corresponding to the medical text;
B、基于预先构建的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病;其中,所述医疗专业数据库中包含不同类型疾病与医疗词汇的映射关系;B. determining, according to a pre-built medical professional database, a disease corresponding to the medical vocabulary in the medical text; wherein the medical professional database includes a mapping relationship between different types of diseases and medical vocabulary;
C、将确定的疾病作为推断出的该医疗文本的潜在疾病进行输出。C. Outputting the determined disease as a presumed potential disease of the medical text.
此外,为实现上述目的,本申请第二方面还提供一种医疗文本的潜在疾病推断系统,所述医疗文本的潜在疾病推断系统包括:In addition, in order to achieve the above object, the second aspect of the present application further provides a potential disease inference system for a medical text, where the potential disease inference system of the medical text includes:
分词提取模块,用于对收到的医疗文本进行分词,并将该医疗文本对应的各个分词与预先确定的医疗领域专用词汇库进行匹配,提取出该医疗文本对应的各个分词中的医疗词汇;a word segmentation module, configured to segment the received medical text, and match each word segment corresponding to the medical text with a predetermined medical field-specific vocabulary to extract a medical vocabulary in each word segment corresponding to the medical text;
确定模块,用于基于预先构建的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病;其中,所述医疗专业数据库中包含不同类型疾病与医疗词汇的映射关系;a determining module, configured to determine a disease corresponding to the medical vocabulary in the medical text based on a pre-built medical professional database; wherein the medical professional database includes a mapping relationship between different types of diseases and medical vocabulary;
输出模块,用于将确定的疾病作为推断出的该医疗文本的潜在疾病进行输出。An output module for outputting the determined disease as an inferred potential disease of the medical text.
进一步地,为实现上述目的,本申请第三方面还提供一种计算机可读存储介质,所述计算机可读存储介质存储有医疗文本的潜在疾病推断系统,所述医疗文本的潜在疾病推断系统可被至少一个处理器执行,以使所述至少一个处理器执行如上述的医疗文本的潜在疾病推断方法的步骤。Further, in order to achieve the above object, a third aspect of the present application further provides a computer readable storage medium storing a potential disease inference system of medical text, the potential disease inference system of the medical text may be Executing by at least one processor to cause the at least one processor to perform the steps of the potential disease inference method of the medical text as described above.
本申请提出的医疗文本的潜在疾病推断方法、系统及可读存储介质,通过对收到的医疗文本进行分词,提取出该医疗文本对应的各个分词中的医疗词汇;并基于预先构建的包含不同疾病与医疗词汇的映射关系的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病,以作为推断出的该医疗文本的潜在疾病。由于能根据各种医疗数据资源构建不同疾病与医疗词汇的映射关系,并根据医疗文本中的医疗词汇找到与之映射的疾病,相比根据医生个人经验进行人工推断,效率更高且准确率更高。The potential disease inferring method, system and readable storage medium of the medical text proposed by the present application extracts the medical vocabulary in each participle corresponding to the medical text by segmenting the received medical text; and based on the pre-built inclusion A medical professional database that maps the relationship between disease and medical vocabulary to determine the disease corresponding to the medical vocabulary in the medical text as a delineated underlying disease of the medical text. Because it can construct the mapping relationship between different diseases and medical vocabulary according to various medical data resources, and find the disease mapped according to the medical vocabulary in the medical text, it is more efficient and accurate than manual estimation based on the doctor's personal experience. high.
附图说明DRAWINGS
图1为本申请医疗文本的潜在疾病推断方法第一实施例的流程示意图;1 is a schematic flow chart of a first embodiment of a method for inferring a potential disease of a medical text according to the present application;
图2为本申请医疗文本的潜在疾病推断方法第二实施例的流程示意图;2 is a schematic flow chart of a second embodiment of a method for inferring a potential disease of a medical text according to the present application;
图3为本申请医疗文本的潜在疾病推断系统10较佳实施例的运行环境示意图;3 is a schematic diagram of an operating environment of a preferred embodiment of the underlying disease inference system 10 of the medical text of the present application;
图4为本申请医疗文本的潜在疾病推断系统第一实施例的功能模块示意图;4 is a schematic diagram of functional modules of a first embodiment of a potential disease inference system for medical text of the present application;
图5为本申请医疗文本的潜在疾病推断系统第二实施例的功能模块示意图。FIG. 5 is a schematic diagram of functional modules of a second embodiment of a potential disease inference system for medical text of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.
具体实施方式detailed description
为了使本申请所要解决的技术问题、技术方案及有益效果更加清楚、明白,以下结合附图和实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the technical problems, technical solutions and beneficial effects to be solved by the present application clearer and clearer, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
本申请提供一种医疗文本的潜在疾病推断方法。The present application provides a method for inferring a potential disease of a medical text.
参照图1,图1为本申请医疗文本的潜在疾病推断方法一实施例的流程示意图。Referring to FIG. 1, FIG. 1 is a schematic flow chart of an embodiment of a method for estimating a potential disease of a medical text according to the present application.
在一实施例中,该医疗文本的潜在疾病推断方法包括:In an embodiment, the potential disease inference method of the medical text includes:
步骤S10,对收到的医疗文本进行分词,并将该医疗文本对应的各个分词与预先确定的医疗领域专用词汇库进行匹配,提取出该医疗文本对应的各个分词中的医疗词汇。Step S10, segmenting the received medical text, and matching each word segment corresponding to the medical text with a predetermined medical field-specific vocabulary, and extracting the medical vocabulary in each participle corresponding to the medical text.
接收待诊断的医疗文本,如可接收用户通过浏览器、客户端APP等发送的待诊断的医疗文本。本实施例中,在收到医疗文本后,首先对收到的医疗文本进行分词处理。例如,可根据标点符号将医疗文本切分成一条条完整的语句,再对各个切分的语句进行分词处理,如可利用字符串匹配的分词方法对各个切分的语句进行分词处理,如正向最大匹配法,把一个切分的语句中的字符串从左至右来分词;或者,反向最大匹配法,把一个切分的语句中的字符串从右至左来分词;或者,最短路径分词法,一个切分的语句中的字符串里面要求切出的词数是最少的;或者,双向最大匹配法,正反向同时进行分词匹配。还可利用词义分词法对各个切分的语句进行分词处理,词义分词法是一种机器语音判断的分词方法,利用句法信息和语义信息来处理歧义现象来分词。还可利用统计分词法对各个切分的语句进行分词处理,从当前用户的历史搜索记录或大众用户的历史搜索记录中,根据词组的统计,会统计有些两个相邻的字出现的频率较多,则可将这两个相邻的字作为词组来进行分词。Receiving medical text to be diagnosed, such as receiving medical text to be diagnosed sent by the user through a browser, a client APP, or the like. In this embodiment, after receiving the medical text, the received medical text is first subjected to word segmentation processing. For example, the medical text can be divided into a complete statement according to the punctuation marks, and then the word segmentation processing is performed on each segmented sentence, for example, the word segmentation method can be used to perform segmentation processing on each segmented sentence, such as positive direction. The maximum matching method, which divides the string in a segmented statement from left to right; or, the inverse maximum matching method, divides the string in a segmented statement from right to left; or, the shortest path Word segmentation, the string in a segmented statement requires the number of words to be cut out to be the least; or, the two-way maximum matching method, and the word segmentation is performed in both forward and reverse directions. Word segmentation can also be used to classify each segmented sentence. Word segmentation is a segmentation method for machine speech judgment. It uses syntactic information and semantic information to deal with ambiguity phenomena to segment words. Statistical segmentation can also be used to process word segmentation of each segmented sentence. From the historical search record of the current user or the historical search record of the public user, according to the statistics of the phrase, the frequency of occurrence of some two adjacent words will be compared. If you have more, you can use these two adjacent words as a phrase to perform word segmentation.
对医疗文本完成分词处理后,将该医疗文本对应的各个分词与预先确定的医疗领域专用词汇库进行匹配,预先确定的医疗领域专用词汇库中可包括通用医药词典中的医药词库、根据大量医学文本(例如互联网上的开源医疗数据)中抽取得到的各种不同疾病对应的简介信息、症状信息、并发症信息、治疗药品信息或治疗科室信息中的医疗词汇,等等。该医疗领域专用词汇库可以是固定不变的,也可以是根据互联网上最新的开源医疗数据定期更新医疗领域专用词汇库中的医疗词汇。提取出该医疗文本对应的各个分词中与预先确定的医疗领域专用词汇库相匹配的医疗词汇,即可获取到该医疗文本中与其潜在疾病相关性较大的信息即提取出的医疗词汇。After the medical text completes the word segmentation process, the respective word segments corresponding to the medical text are matched with the predetermined medical field-specific vocabulary, and the predetermined medical field-specific vocabulary may include the medical lexicon in the general medical dictionary, according to a large number. Brief information, symptom information, complication information, therapeutic drug information, or medical vocabulary in treatment department information extracted from various medical diseases extracted from medical texts (such as open source medical data on the Internet). The medical field-specific vocabulary can be fixed, or it can be based on the latest open source medical data on the Internet to regularly update the medical vocabulary in the medical field-specific vocabulary. The medical vocabulary matching the predetermined medical field-specific vocabulary among the respective word segments corresponding to the medical text is extracted, and the medical vocabulary that is related to the potential disease in the medical text, that is, the extracted medical vocabulary can be obtained.
步骤S20,基于预先构建的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病;其中,所述医疗专业数据库中包含不同类型疾病与医疗词汇的映射关系。Step S20: Determine a disease corresponding to the medical vocabulary in the medical text based on the pre-built medical professional database; wherein the medical professional database includes a mapping relationship between different types of diseases and medical vocabulary.
提取出该医疗文本对应的各个分词中与其潜在疾病相关性较大的医疗词汇后,基于预先构建的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病。所述医疗专业数据库中包含不同类型疾病与医疗词汇(如根据大量医学文本中抽取得到的症状、药品、检查、科 室等信息词汇)的映射关系,如可根据网上开源数据和文本,构建医疗专业数据库,包含疾病及其对应的简介、症状、并发症、治疗药品、常见检查等专业信息。基于构建的不同疾病与医疗词汇的映射关系,可根据提取出的该医疗文本中的医疗词汇找到与之映射的疾病。After extracting the medical vocabulary related to the underlying disease in each participle corresponding to the medical text, the medical vocabulary corresponding to the medical vocabulary in the medical text is determined based on the pre-built medical professional database. The medical professional database contains mapping relationships between different types of diseases and medical vocabulary (such as symptoms, drugs, examinations, departments and other information vocabulary extracted from a large number of medical texts), such as building medical professional materials based on open source data and texts. The database contains professional information such as diseases and their corresponding profiles, symptoms, complications, treatments, and common tests. Based on the constructed mapping relationship between different diseases and medical vocabulary, the disease mapped with the medical vocabulary in the medical text can be found.
步骤S30,将确定的疾病作为推断出的该医疗文本的潜在疾病进行输出。In step S30, the determined disease is output as the inferred potential disease of the medical text.
根据提取出的该医疗文本中的医疗词汇确定出对应的疾病后,即可将确定的疾病作为推断出的该医疗文本的潜在疾病进行输出,以基于推断出的该医疗文本的潜在疾病来进行后续的诊断建议。经过实际应用中的医疗文本潜在疾病推断统计,通过本实施例中的潜在疾病推断方法得到的疾病标签准确率(人工审查没有明显错误)可以达到85%左右,能有效提高对医疗文本潜在疾病推断的准确率。After determining the corresponding disease according to the extracted medical vocabulary in the medical text, the determined disease can be output as the inferred potential disease of the medical text, based on the inferred potential disease of the medical text. Subsequent diagnostic recommendations. After the medical text potential disease inference statistics in practical application, the disease label accuracy rate obtained by the potential disease inference method in this embodiment (the human examination has no obvious error) can reach about 85%, which can effectively improve the potential disease inference for the medical text. The accuracy rate.
本实施例通过对收到的医疗文本进行分词,提取出该医疗文本对应的各个分词中的医疗词汇;并基于预先构建的包含不同疾病与医疗词汇的映射关系的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病,以作为推断出的该医疗文本的潜在疾病。由于能根据各种医疗数据资源构建不同疾病与医疗词汇的映射关系,并根据医疗文本中的医疗词汇找到与之映射的疾病,相比根据医生个人经验进行人工推断,效率更高且准确率更高。In this embodiment, the medical vocabulary in each participle corresponding to the medical text is extracted by segmenting the received medical text; and the medical text is determined based on a pre-built medical professional database containing mapping relationships between different diseases and medical vocabulary. The medical vocabulary corresponds to the disease as a potential disease inferred from the medical text. Because it can construct the mapping relationship between different diseases and medical vocabulary according to various medical data resources, and find the disease mapped according to the medical vocabulary in the medical text, it is more efficient and accurate than manual estimation based on the doctor's personal experience. high.
如图2所示,本申请第二实施例提出一种医疗文本的潜在疾病推断方法,在上述实施例的基础上,在上述步骤S10之前还包括:As shown in FIG. 2, the second embodiment of the present application provides a method for inferring a potential disease of a medical text. On the basis of the foregoing embodiment, before the step S10, the method further includes:
步骤S40,从预先确定的数据源获取医疗数据,从所述医疗数据中找出每一种疾病对应的一个或多个医疗词汇,并根据不同类型疾病与医疗词汇的映射关系建立医疗专业数据库。Step S40: Obtain medical data from a predetermined data source, find one or more medical vocabularies corresponding to each disease from the medical data, and establish a medical professional database according to a mapping relationship between different types of diseases and medical vocabulary.
本实施例中,在进行医疗文本的潜在疾病推断之前,先从预先确定的数据源获取医疗数据,以根据所述医疗数据中的不同类型疾病与医疗词汇的映射关系建立医疗专业数据库。该医疗数据可以是从现有的医疗数据库中获取的各种疾病的权威解释,包括其对应的简介、症状、并发症、治疗药品、常见检查等专业信息,也可以是各种药品对应的医疗信息,如药品主治的疾病类型等信息,该医疗数据也可以是通过网络爬虫等工具实时或者定时从互联网上的开源医疗数据源(例如,各大论坛上关于不同疾病的问答、讨论等,或各种最新的医疗案例、医疗问答文本等)获取的特定类型的信息(例如,不同疾病对应的治疗方案、治疗药物、所属科室、临床表现等)。从获取的医疗数据中找出每一种疾病对应的一个或多个医疗词汇,即可根据不同疾病与一个或多个医疗词汇的映射关系建立医疗专业数据库,以供后续基 于建立的医疗专业数据库来进行潜在疾病的推断。In this embodiment, before performing the underlying disease estimation of the medical text, the medical data is first acquired from the predetermined data source to establish a medical professional database according to the mapping relationship between the different types of diseases and the medical vocabulary in the medical data. The medical data may be an authoritative interpretation of various diseases obtained from an existing medical database, including corresponding information such as profiles, symptoms, complications, therapeutic drugs, common examinations, etc., or medical treatments corresponding to various drugs. Information, such as the type of disease in which the drug is administered, the medical data can also be an open source medical data source on the Internet in real time or regularly through tools such as web crawlers (for example, questions and answers about different diseases in various forums, etc., or Specific types of information obtained by various latest medical cases, medical question and answer texts, etc. (for example, treatment plans for different diseases, therapeutic drugs, departments, clinical manifestations, etc.). Finding one or more medical vocabularies corresponding to each disease from the obtained medical data, and establishing a medical professional database according to the mapping relationship between different diseases and one or more medical vocabularies for subsequent establishment of a medical professional database To infer the underlying disease.
进一步地,在其他实施例中,所述医疗专业数据库中还包含疾病对应的各个医疗词汇的权重,上述步骤S20可以包括:Further, in other embodiments, the medical professional database further includes the weight of each medical vocabulary corresponding to the disease, and the step S20 may include:
基于预先构建的医疗专业数据库,找出该医疗文本中各个医疗词汇对应的疾病,并计算各个疾病对应的医疗词汇的权重加和,选择对应的医疗词汇的权重加和最高的疾病作为确定出的该医疗文本对应的疾病。Based on the pre-built medical professional database, find out the diseases corresponding to the medical vocabulary in the medical text, calculate the weight of the medical vocabulary corresponding to each disease, and select the corresponding medical vocabulary to add the highest disease as the determined disease. The medical text corresponds to the disease.
本实施例中,考虑到一种疾病对应的医疗词汇可能为一个或多个,一个医疗词汇对应的疾病也可能有一种或多种,例如,同一个症状可能会映射得到多个疾病、同一种药品也会治疗多种疾病。因此,在构建的医疗专业数据库中,还将不同医疗词汇赋予不同的权重,以便在基于构建的医疗专业数据库找出的医疗文本中各个医疗词汇有多个时,可计算各个疾病对应的医疗词汇的权重加和,选择对应的医疗词汇的权重加和最高的疾病作为确定出的该医疗文本对应的疾病。例如,可将某个疾病映射得到的权重加和作为推断该疾病的自信程度,选择自信程度最高的疾病作为最终结果,从而进一步提高对医疗文本潜在疾病推断的准确率。In this embodiment, the medical vocabulary corresponding to one disease may be one or more, and one medical vocabulary may have one or more diseases. For example, the same symptom may map multiple diseases, the same type. Medicines can also treat a variety of diseases. Therefore, in the medical professional database constructed, different medical vocabularies are given different weights, so that when there are multiple medical vocabularies in the medical texts found based on the constructed medical professional database, the medical vocabulary corresponding to each disease can be calculated. The sum of the weights is selected, and the weight of the corresponding medical vocabulary is added to add the highest disease as the disease corresponding to the medical text determined. For example, the weight of a disease map can be summed as the degree of self-confidence of the disease, and the disease with the highest degree of confidence is selected as the final result, thereby further improving the accuracy of the underestimation of the medical text.
进一步地,在其他实施例中,上述步骤S10中对收到的医疗文本进行分词处理的步骤包括:Further, in other embodiments, the step of performing word segmentation processing on the received medical text in the above step S10 includes:
根据正向最大匹配法将医疗文本中待处理的字符串与预先确定的医疗领域专用词汇库(例如,该医疗领域专用词汇库可以是通用医疗专业词库,也可以是可扩容的学习型医疗词库)进行匹配,得到第一匹配结果;According to the forward maximum matching method, the string to be processed in the medical text and the predetermined medical field-specific vocabulary (for example, the medical field-specific vocabulary may be a general medical professional vocabulary, or may be a scalable learning medical The lexicon is matched to obtain the first matching result;
根据逆向最大匹配法将医疗文本中待处理的字符串与预先确定的医疗领域专用词汇库(例如,该医疗领域专用词汇库可以是通用医疗专业词库,也可以是可扩容的学习型医疗词库)进行匹配,得到第二匹配结果。其中,所述第一匹配结果中包含有第一数量的第一词组,所述第二匹配结果中包含有第二数量的第二词组;所述第一匹配结果中包含有第三数量的单字,所述第二匹配结果中包含有第四数量的单字。According to the inverse maximum matching method, the character string to be processed in the medical text and the predetermined medical field-specific vocabulary (for example, the medical field-specific vocabulary can be a general medical professional vocabulary, or can be a scalable learning medical word. The library is matched to obtain the second matching result. The first matching result includes a first number of first phrases, and the second matching result includes a second number of second phrases; the first matching result includes a third number of words The second matching result includes a fourth number of words.
若所述第一数量与所述第二数量相等,且所述第三数量小于或者等于所述第四数量,则输出该医疗文本对应的所述第一匹配结果(包括词组和单字);If the first quantity is equal to the second quantity, and the third quantity is less than or equal to the fourth quantity, outputting the first matching result (including a phrase and a single word) corresponding to the medical text;
若所述第一数量与所述第二数量相等,且所述第三数量大于所述第四数量,则输出该医疗文本对应的所述第二匹配结果(包括词组和单字);If the first quantity is equal to the second quantity, and the third quantity is greater than the fourth quantity, outputting the second matching result (including a phrase and a single word) corresponding to the medical text;
若所述第一数量与所述第二数量不相等,且所述第一数量大于所述第二数量,则输出该医疗文本对应的所述第二匹配结果(包括词组和单字);If the first quantity is not equal to the second quantity, and the first quantity is greater than the second quantity, outputting the second matching result (including a phrase and a single word) corresponding to the medical text;
若所述第一数量与所述第二数量不相等,且所述第一数量小于所述第二数量,则输出该医疗文本对应的所述第一匹配结果(包括词组和单字)。If the first quantity is not equal to the second quantity, and the first quantity is less than the second quantity, outputting the first matching result (including a phrase and a single word) corresponding to the medical text.
本实施例中采用双向匹配法来对医疗文本进行分词处理,通过正反向同时进行分词匹配来分析医疗文本待处理的字符串中前后组合内容的粘性,由于通常情况下词组能代表核心观点信息的概率更大,即词组更有可能是该医疗文本中的医疗词汇。因此,通过正反向同时进行分词匹配找出单字数量更少,词组数量更多的分词匹配结果,以作为医疗文本的分词结果,从而提高分词的准确性,以更加准确地提取出该医疗文本中的医疗词汇。In this embodiment, the two-way matching method is used to perform word segmentation processing on medical texts, and the word segmentation matching is performed by forward and reverse simultaneous segmentation to analyze the stickiness of the combined content in the character string to be processed of the medical text, since the phrase can represent the core viewpoint information under normal circumstances. The probability is greater, that is, the phrase is more likely to be the medical vocabulary in the medical text. Therefore, through the simultaneous and reverse word segmentation matching, the word segment matching result with fewer words and more phrases is found to be used as the word segmentation result of the medical text, thereby improving the accuracy of the word segmentation, so as to extract the medical text more accurately. Medical vocabulary.
本申请进一步提供一种医疗文本的潜在疾病推断系统。请参阅图3,是本申请医疗文本的潜在疾病推断系统10较佳实施例的运行环境示意图。The application further provides a potential disease inference system for medical text. Please refer to FIG. 3, which is a schematic diagram of an operating environment of a preferred embodiment of the underlying disease inference system 10 of the medical text of the present application.
在本实施例中,所述的医疗文本的潜在疾病推断系统10安装并运行于电子装置1中。该电子装置1可包括,但不仅限于,存储器11、处理器12及显示器13。图3仅示出了具有组件11-13的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。In the present embodiment, the medical text potential disease inference system 10 is installed and operated in the electronic device 1. The electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a display 13. Figure 3 shows only the electronic device 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
所述存储器11在一些实施例中可以是所述电子装置1的内部存储单元,例如该电子装置1的硬盘或内存。所述存储器11在另一些实施例中也可以是所述电子装置1的外部存储设备,例如所述电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器11还可以既包括所述电子装置1的内部存储单元也包括外部存储设备。所述存储器11用于存储安装于所述电子装置1的应用软件及各类数据,例如所述医疗文本的潜在疾病推断系统10的程序代码等。所述存储器11还可以用于暂时地存储已经输出或者将要输出的数据。The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a hard disk or memory of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC), and a secure digital device. (Secure Digital, SD) card, flash card, etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 is used to store application software installed on the electronic device 1 and various types of data, such as program codes of the underlying disease inference system 10 of the medical text. The memory 11 can also be used to temporarily store data that has been output or is about to be output.
所述处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行所述存储器11中存储的程序代码或处理数据,例如执行所述医疗文本的潜在疾病推断系统10等。The processor 12, in some embodiments, may be a central processing unit (CPU), a microprocessor or other data processing chip for running program code or processing data stored in the memory 11, for example A potential disease inference system 10 or the like that executes the medical text.
所述显示器13在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发 光二极管)触摸器等。所述显示器13用于显示在所述电子装置1中处理的信息以及用于显示可视化的用户界面,例如显示提取出的医疗文本中的医疗词汇、推断出的该医疗文本的潜在疾病等。所述电子装置1的部件11-13通过系统总线相互通信。The display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor or the like in some embodiments. The display 13 is used to display information processed in the electronic device 1 and a user interface for displaying visualizations, such as displaying medical vocabulary in the extracted medical text, inferred potential disease of the medical text, and the like. The components 11-13 of the electronic device 1 communicate with one another via a system bus.
请参阅图4,是本申请医疗文本的潜在疾病推断系统10第一实施例的功能模块图。在本实施例中,所述的医疗文本的潜在疾病推断系统10可以被分割成一个或多个模块,所述一个或者多个模块被存储于所述存储器11中,并由一个或多个处理器(本实施例为所述处理器12)所执行,以完成本申请。例如,在图4中,所述的医疗文本的潜在疾病推断系统10可以被分割成分词提取模块01、确定模块02及输出模块03。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述所述语音识别系统10在所述电子装置1中的执行过程。以下描述将具体介绍所述分词提取模块01、确定模块02及输出模块03的功能。Please refer to FIG. 4, which is a functional block diagram of a first embodiment of the underlying disease inference system 10 of the medical text of the present application. In this embodiment, the potential disease inference system 10 of the medical text may be segmented into one or more modules, the one or more modules being stored in the memory 11 and processed by one or more The present invention (this embodiment is the processor 12) is executed to complete the application. For example, in FIG. 4, the potential disease inference system 10 of the medical text may be divided into a component word extraction module 01, a determination module 02, and an output module 03. A module referred to in this application refers to a series of computer program instructions that are capable of performing a particular function, and are more suitable than the program to describe the execution of the speech recognition system 10 in the electronic device 1. The following description will specifically describe the functions of the word segmentation module 01, the determination module 02, and the output module 03.
分词提取模块01,用于对收到的医疗文本进行分词,并将该医疗文本对应的各个分词与预先确定的医疗领域专用词汇库进行匹配,提取出该医疗文本对应的各个分词中的医疗词汇;The word segmentation module 01 is configured to segment the received medical text, and match each word segment corresponding to the medical text with a predetermined medical field-specific vocabulary to extract the medical vocabulary in each participle corresponding to the medical text. ;
接收待诊断的医疗文本,如可接收用户通过浏览器、客户端APP等发送的待诊断的医疗文本。本实施例中,在收到医疗文本后,首先对收到的医疗文本进行分词处理。例如,可根据标点符号将医疗文本切分成一条条完整的语句,再对各个切分的语句进行分词处理,如可利用字符串匹配的分词方法对各个切分的语句进行分词处理,如正向最大匹配法,把一个切分的语句中的字符串从左至右来分词;或者,反向最大匹配法,把一个切分的语句中的字符串从右至左来分词;或者,最短路径分词法,一个切分的语句中的字符串里面要求切出的词数是最少的;或者,双向最大匹配法,正反向同时进行分词匹配。还可利用词义分词法对各个切分的语句进行分词处理,词义分词法是一种机器语音判断的分词方法,利用句法信息和语义信息来处理歧义现象来分词。还可利用统计分词法对各个切分的语句进行分词处理,从当前用户的历史搜索记录或大众用户的历史搜索记录中,根据词组的统计,会统计有些两个相邻的字出现的频率较多,则可将这两个相邻的字作为词组来进行分词。Receiving medical text to be diagnosed, such as receiving medical text to be diagnosed sent by the user through a browser, a client APP, or the like. In this embodiment, after receiving the medical text, the received medical text is first subjected to word segmentation processing. For example, the medical text can be divided into a complete statement according to the punctuation marks, and then the word segmentation processing is performed on each segmented sentence, for example, the word segmentation method can be used to perform segmentation processing on each segmented sentence, such as positive direction. The maximum matching method, which divides the string in a segmented statement from left to right; or, the inverse maximum matching method, divides the string in a segmented statement from right to left; or, the shortest path Word segmentation, the string in a segmented statement requires the number of words to be cut out to be the least; or, the two-way maximum matching method, and the word segmentation is performed in both forward and reverse directions. Word segmentation can also be used to classify each segmented sentence. Word segmentation is a segmentation method for machine speech judgment. It uses syntactic information and semantic information to deal with ambiguity phenomena to segment words. Statistical segmentation can also be used to process word segmentation of each segmented sentence. From the historical search record of the current user or the historical search record of the public user, according to the statistics of the phrase, the frequency of occurrence of some two adjacent words will be compared. If you have more, you can use these two adjacent words as a phrase to perform word segmentation.
对医疗文本完成分词处理后,将该医疗文本对应的各个分词与预先确定的医疗领域专用词汇库进行匹配,预先确定的医疗领域专用词汇库中可包括通用医药词典中的医药词库、根据大量医学文本(例如互联网上的开源医疗数据)中抽取得到的各种不同疾病对应的简介信息、症状信息、并发症信息、治疗药品信息或治疗科室信息中的医疗词汇,等等。该医疗领域专用词汇库可以是固定不变的,也可以是根 据互联网上最新的开源医疗数据定期更新医疗领域专用词汇库中的医疗词汇。提取出该医疗文本对应的各个分词中与预先确定的医疗领域专用词汇库相匹配的医疗词汇,即可获取到该医疗文本中与其潜在疾病相关性较大的信息即提取出的医疗词汇。After the medical text completes the word segmentation process, the respective word segments corresponding to the medical text are matched with the predetermined medical field-specific vocabulary, and the predetermined medical field-specific vocabulary may include the medical lexicon in the general medical dictionary, according to a large number. Brief information, symptom information, complication information, therapeutic drug information, or medical vocabulary in treatment department information extracted from various medical diseases extracted from medical texts (such as open source medical data on the Internet). The medical field-specific vocabulary can be fixed, or it can be based on the latest open source medical data on the Internet to regularly update medical vocabulary in the medical field-specific vocabulary. The medical vocabulary matching the predetermined medical field-specific vocabulary among the respective word segments corresponding to the medical text is extracted, and the medical vocabulary that is related to the potential disease in the medical text, that is, the extracted medical vocabulary can be obtained.
确定模块02,用于基于预先构建的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病;其中,所述医疗专业数据库中包含不同类型疾病与医疗词汇的映射关系;a determining module 02, configured to determine, according to a pre-built medical professional database, a disease corresponding to the medical vocabulary in the medical text; wherein the medical professional database includes a mapping relationship between different types of diseases and medical vocabulary;
提取出该医疗文本对应的各个分词中与其潜在疾病相关性较大的医疗词汇后,基于预先构建的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病。所述医疗专业数据库中包含不同类型疾病与医疗词汇(如根据大量医学文本中抽取得到的症状、药品、检查、科室等信息词汇)的映射关系,如可根据网上开源数据和文本,构建医疗专业数据库,包含疾病及其对应的简介、症状、并发症、治疗药品、常见检查等专业信息。基于构建的不同疾病与医疗词汇的映射关系,可根据提取出的该医疗文本中的医疗词汇找到与之映射的疾病。After extracting the medical vocabulary related to the underlying disease in each participle corresponding to the medical text, the medical vocabulary corresponding to the medical vocabulary in the medical text is determined based on the pre-built medical professional database. The medical professional database contains mapping relationships between different types of diseases and medical vocabulary (such as symptoms, drugs, examinations, departments and other information vocabulary extracted from a large number of medical texts), such as building medical professional materials based on open source data and texts. The database contains professional information such as diseases and their corresponding profiles, symptoms, complications, treatments, and common tests. Based on the constructed mapping relationship between different diseases and medical vocabulary, the disease mapped with the medical vocabulary in the medical text can be found.
输出模块03,用于将确定的疾病作为推断出的该医疗文本的潜在疾病进行输出。The output module 03 is configured to output the determined disease as the inferred potential disease of the medical text.
根据提取出的该医疗文本中的医疗词汇确定出对应的疾病后,即可将确定的疾病作为推断出的该医疗文本的潜在疾病进行输出,以基于推断出的该医疗文本的潜在疾病来进行后续的诊断建议。经过实际应用中的医疗文本潜在疾病推断统计,通过本实施例中的潜在疾病推断方法得到的疾病标签准确率(人工审查没有明显错误)可以达到85%左右,能有效提高对医疗文本潜在疾病推断的准确率。After determining the corresponding disease according to the extracted medical vocabulary in the medical text, the determined disease can be output as the inferred potential disease of the medical text, based on the inferred potential disease of the medical text. Subsequent diagnostic recommendations. After the medical text potential disease inference statistics in practical application, the disease label accuracy rate obtained by the potential disease inference method in this embodiment (the human examination has no obvious error) can reach about 85%, which can effectively improve the potential disease inference for the medical text. The accuracy rate.
本实施例通过对收到的医疗文本进行分词,提取出该医疗文本对应的各个分词中的医疗词汇;并基于预先构建的包含不同疾病与医疗词汇的映射关系的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病,以作为推断出的该医疗文本的潜在疾病。由于能根据各种医疗数据资源构建不同疾病与医疗词汇的映射关系,并根据医疗文本中的医疗词汇找到与之映射的疾病,相比根据医生个人经验进行人工推断,效率更高且准确率更高。In this embodiment, the medical vocabulary in each participle corresponding to the medical text is extracted by segmenting the received medical text; and the medical text is determined based on a pre-built medical professional database containing mapping relationships between different diseases and medical vocabulary. The medical vocabulary corresponds to the disease as a potential disease inferred from the medical text. Because it can construct the mapping relationship between different diseases and medical vocabulary according to various medical data resources, and find the disease mapped according to the medical vocabulary in the medical text, it is more efficient and accurate than manual estimation based on the doctor's personal experience. high.
如图5所示,本申请第二实施例提出一种医疗文本的潜在疾病推断系统,在上述实施例的基础上,还包括:As shown in FIG. 5, the second embodiment of the present application provides a potential disease inference system for a medical text. Based on the foregoing embodiments, the method further includes:
建立模块04,用于从预先确定的数据源获取医疗数据,从所述医疗数据中找出每一种疾病对应的一个或多个医疗词汇,并根据不同类型疾病与医疗词汇的映射关系建立医疗专业数据库。The establishing module 04 is configured to obtain medical data from a predetermined data source, find one or more medical vocabularies corresponding to each disease from the medical data, and establish a medical relationship according to a mapping relationship between different types of diseases and medical vocabulary Professional database.
本实施例中,在进行医疗文本的潜在疾病推断之前,先从预先确 定的数据源获取医疗数据,以根据所述医疗数据中的不同类型疾病与医疗词汇的映射关系建立医疗专业数据库。该医疗数据可以是从现有的医疗数据库中获取的各种疾病的权威解释,包括其对应的简介、症状、并发症、治疗药品、常见检查等专业信息,也可以是各种药品对应的医疗信息,如药品主治的疾病类型等信息,该医疗数据也可以是通过网络爬虫等工具实时或者定时从互联网上的开源医疗数据源(例如,各大论坛上关于不同疾病的问答、讨论等,或各种最新的医疗案例、医疗问答文本等)获取的特定类型的信息(例如,不同疾病对应的治疗方案、治疗药物、所属科室、临床表现等)。从获取的医疗数据中找出每一种疾病对应的一个或多个医疗词汇,即可根据不同疾病与一个或多个医疗词汇的映射关系建立医疗专业数据库,以供后续基于建立的医疗专业数据库来进行潜在疾病的推断。In this embodiment, before the underlying disease inference of the medical text is performed, the medical data is first acquired from a predetermined data source to establish a medical professional database according to the mapping relationship between different types of diseases and medical vocabulary in the medical data. The medical data may be an authoritative interpretation of various diseases obtained from an existing medical database, including corresponding information such as profiles, symptoms, complications, therapeutic drugs, common examinations, etc., or medical treatments corresponding to various drugs. Information, such as the type of disease in which the drug is administered, the medical data can also be an open source medical data source on the Internet in real time or regularly through tools such as web crawlers (for example, questions and answers about different diseases in various forums, etc., or Specific types of information obtained by various latest medical cases, medical question and answer texts, etc. (for example, treatment plans for different diseases, therapeutic drugs, departments, clinical manifestations, etc.). Finding one or more medical vocabularies corresponding to each disease from the obtained medical data, and establishing a medical professional database according to the mapping relationship between different diseases and one or more medical vocabularies for subsequent establishment of a medical professional database To infer the underlying disease.
进一步地,在其他实施例中,所述医疗专业数据库中还包含疾病对应的各个医疗词汇的权重,上述确定模块02还可以用于:Further, in other embodiments, the medical professional database further includes the weight of each medical vocabulary corresponding to the disease, and the determining module 02 may further be used to:
基于预先构建的医疗专业数据库,找出该医疗文本中各个医疗词汇对应的疾病,并计算各个疾病对应的医疗词汇的权重加和,选择对应的医疗词汇的权重加和最高的疾病作为确定出的该医疗文本对应的疾病。Based on the pre-built medical professional database, find out the diseases corresponding to the medical vocabulary in the medical text, calculate the weight of the medical vocabulary corresponding to each disease, and select the corresponding medical vocabulary to add the highest disease as the determined disease. The medical text corresponds to the disease.
本实施例中,考虑到一种疾病对应的医疗词汇可能为一个或多个,一个医疗词汇对应的疾病也可能有一种或多种,例如,同一个症状可能会映射得到多个疾病、同一种药品也会治疗多种疾病。因此,在构建的医疗专业数据库中,还将不同医疗词汇赋予不同的权重,以便在基于构建的医疗专业数据库找出的医疗文本中各个医疗词汇有多个时,可计算各个疾病对应的医疗词汇的权重加和,选择对应的医疗词汇的权重加和最高的疾病作为确定出的该医疗文本对应的疾病。例如,可将某个疾病映射得到的权重加和作为推断该疾病的自信程度,选择自信程度最高的疾病作为最终结果,从而进一步提高对医疗文本潜在疾病推断的准确率。In this embodiment, the medical vocabulary corresponding to one disease may be one or more, and one medical vocabulary may have one or more diseases. For example, the same symptom may map multiple diseases, the same type. Medicines can also treat a variety of diseases. Therefore, in the medical professional database constructed, different medical vocabularies are given different weights, so that when there are multiple medical vocabularies in the medical texts found based on the constructed medical professional database, the medical vocabulary corresponding to each disease can be calculated. The sum of the weights is selected, and the weight of the corresponding medical vocabulary is added to add the highest disease as the disease corresponding to the medical text determined. For example, the weight of a disease map can be summed as the degree of self-confidence of the disease, and the disease with the highest degree of confidence is selected as the final result, thereby further improving the accuracy of the underestimation of the medical text.
进一步地,在其他实施例中,上述分词提取模块01还用于:Further, in other embodiments, the word segmentation module 01 is further configured to:
根据正向最大匹配法将医疗文本中待处理的字符串与预先确定的医疗领域专用词汇库(例如,该医疗领域专用词汇库可以是通用医疗专业词库,也可以是可扩容的学习型医疗词库)进行匹配,得到第一匹配结果;According to the forward maximum matching method, the string to be processed in the medical text and the predetermined medical field-specific vocabulary (for example, the medical field-specific vocabulary may be a general medical professional vocabulary, or may be a scalable learning medical The lexicon is matched to obtain the first matching result;
根据逆向最大匹配法将医疗文本中待处理的字符串与预先确定的医疗领域专用词汇库(例如,该医疗领域专用词汇库可以是通用医疗专业词库,也可以是可扩容的学习型医疗词库)进行匹配,得到第 二匹配结果。其中,所述第一匹配结果中包含有第一数量的第一词组,所述第二匹配结果中包含有第二数量的第二词组;所述第一匹配结果中包含有第三数量的单字,所述第二匹配结果中包含有第四数量的单字。According to the inverse maximum matching method, the character string to be processed in the medical text and the predetermined medical field-specific vocabulary (for example, the medical field-specific vocabulary can be a general medical professional vocabulary, or can be a scalable learning medical word. The library is matched to obtain the second matching result. The first matching result includes a first number of first phrases, and the second matching result includes a second number of second phrases; the first matching result includes a third number of words The second matching result includes a fourth number of words.
若所述第一数量与所述第二数量相等,且所述第三数量小于或者等于所述第四数量,则输出该医疗文本对应的所述第一匹配结果(包括词组和单字);If the first quantity is equal to the second quantity, and the third quantity is less than or equal to the fourth quantity, outputting the first matching result (including a phrase and a single word) corresponding to the medical text;
若所述第一数量与所述第二数量相等,且所述第三数量大于所述第四数量,则输出该医疗文本对应的所述第二匹配结果(包括词组和单字);If the first quantity is equal to the second quantity, and the third quantity is greater than the fourth quantity, outputting the second matching result (including a phrase and a single word) corresponding to the medical text;
若所述第一数量与所述第二数量不相等,且所述第一数量大于所述第二数量,则输出该医疗文本对应的所述第二匹配结果(包括词组和单字);If the first quantity is not equal to the second quantity, and the first quantity is greater than the second quantity, outputting the second matching result (including a phrase and a single word) corresponding to the medical text;
若所述第一数量与所述第二数量不相等,且所述第一数量小于所述第二数量,则输出该医疗文本对应的所述第一匹配结果(包括词组和单字)。If the first quantity is not equal to the second quantity, and the first quantity is less than the second quantity, outputting the first matching result (including a phrase and a single word) corresponding to the medical text.
本实施例中采用双向匹配法来对医疗文本进行分词处理,通过正反向同时进行分词匹配来分析医疗文本待处理的字符串中前后组合内容的粘性,由于通常情况下词组能代表核心观点信息的概率更大,即词组更有可能是该医疗文本中的医疗词汇。因此,通过正反向同时进行分词匹配找出单字数量更少,词组数量更多的分词匹配结果,以作为医疗文本的分词结果,从而提高分词的准确性,以更加准确地提取出该医疗文本中的医疗词汇。In this embodiment, the two-way matching method is used to perform word segmentation processing on medical texts, and the word segmentation matching is performed by forward and reverse simultaneous segmentation to analyze the stickiness of the combined content in the character string to be processed of the medical text, since the phrase can represent the core viewpoint information under normal circumstances. The probability is greater, that is, the phrase is more likely to be the medical vocabulary in the medical text. Therefore, through the simultaneous and reverse word segmentation matching, the word segment matching result with fewer words and more phrases is found to be used as the word segmentation result of the medical text, thereby improving the accuracy of the word segmentation, so as to extract the medical text more accurately. Medical vocabulary.
此外,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有医疗文本的潜在疾病推断系统,所述医疗文本的潜在疾病推断系统可被至少一个处理器执行,以使所述至少一个处理器执行如上述实施例中的医疗文本的潜在疾病推断方法的步骤,该医疗文本的潜在疾病推断方法的步骤S10、S20、S30等具体实施过程如上文所述,在此不再赘述。Moreover, the present application also provides a computer readable storage medium storing a potential disease inference system of medical text, the potential disease inference system of the medical text being executable by at least one processor such that The at least one processor performs the steps of the potential disease inference method of the medical text in the above embodiment, and the specific implementation processes of the steps S10, S20, S30, etc. of the potential disease inference method of the medical text are as described above, and are not Let me repeat.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It is to be understood that the term "comprises", "comprising", or any other variants thereof, is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device comprising a series of elements includes those elements. It also includes other elements that are not explicitly listed, or elements that are inherent to such a process, method, article, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件来实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and can also be implemented by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.
以上参照附图说明了本申请的优选实施例,并非因此局限本申请的权利范围。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。另外,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。The preferred embodiments of the present application have been described above with reference to the drawings, and are not intended to limit the scope of the application. The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments. Additionally, although logical sequences are shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
本领域技术人员不脱离本申请的范围和实质,可以有多种变型方案实现本申请,比如作为一个实施例的特征可用于另一实施例而得到又一实施例。凡在运用本申请的技术构思之内所作的任何修改、等同替换和改进,均应在本申请的权利范围之内。A person skilled in the art can implement the present application in various variants without departing from the scope and spirit of the present application. For example, the features as one embodiment can be used in another embodiment to obtain another embodiment. Any modifications, equivalent substitutions and improvements made within the technical concept of the application should be within the scope of the application.

Claims (20)

  1. 一种医疗文本的潜在疾病推断方法,其特征在于,所述方法包括以下步骤:A potential disease inference method for medical text, characterized in that the method comprises the following steps:
    A、对收到的医疗文本进行分词,并将该医疗文本对应的各个分词与预先确定的医疗领域专用词汇库进行匹配,提取出该医疗文本对应的各个分词中的医疗词汇;A. segmentation of the received medical text, and matching each word segment corresponding to the medical text with a predetermined medical field-specific vocabulary to extract medical vocabulary in each participle corresponding to the medical text;
    B、基于预先构建的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病;其中,所述医疗专业数据库中包含不同类型疾病与医疗词汇的映射关系;B. determining, according to a pre-built medical professional database, a disease corresponding to the medical vocabulary in the medical text; wherein the medical professional database includes a mapping relationship between different types of diseases and medical vocabulary;
    C、将确定的疾病作为推断出的该医疗文本的潜在疾病进行输出。C. Outputting the determined disease as a presumed potential disease of the medical text.
  2. 如权利要求1所述的医疗文本的潜在疾病推断方法,其特征在于,所述步骤A之前还包括:The method for inferring a potential disease of a medical text according to claim 1, wherein the step A further comprises:
    从预先确定的数据源获取医疗数据,从所述医疗数据中找出每一种疾病对应的一个或多个医疗词汇,并根据不同类型疾病与医疗词汇的映射关系建立医疗专业数据库。Obtaining medical data from a predetermined data source, finding one or more medical vocabularies corresponding to each disease from the medical data, and establishing a medical professional database according to a mapping relationship between different types of diseases and medical vocabulary.
  3. 如权利要求1所述的医疗文本的潜在疾病推断方法,其特征在于,所述医疗词汇包括:The potential disease inference method of medical text according to claim 1, wherein the medical vocabulary comprises:
    疾病对应的简介信息、症状信息、并发症信息、治疗药品信息或治疗科室信息中的医疗词汇。Information about the disease, symptom information, complication information, treatment drug information, or medical vocabulary in the treatment department information.
  4. 如权利要求1所述的医疗文本的潜在疾病推断方法,其特征在于,所述医疗专业数据库中还包含疾病对应的各个医疗词汇的权重,所述步骤B包括:The potential disease inference method of the medical text according to claim 1, wherein the medical professional database further includes weights of respective medical vocabularies corresponding to the disease, and the step B includes:
    基于预先构建的医疗专业数据库,找出该医疗文本中各个医疗词汇对应的疾病,并计算各个疾病对应的医疗词汇的权重加和,选择对应的医疗词汇的权重加和最高的疾病作为确定出的该医疗文本对应的疾病。Based on the pre-built medical professional database, find out the diseases corresponding to the medical vocabulary in the medical text, calculate the weight of the medical vocabulary corresponding to each disease, and select the corresponding medical vocabulary to add the highest disease as the determined disease. The medical text corresponds to the disease.
  5. 如权利要求2所述的医疗文本的潜在疾病推断方法,其特征在于,所述医疗专业数据库中还包含疾病对应的各个医疗词汇的权重,所述步骤B包括:The potential disease inference method of the medical text according to claim 2, wherein the medical professional database further includes weights of respective medical vocabularies corresponding to the disease, and the step B includes:
    基于预先构建的医疗专业数据库,找出该医疗文本中各个医疗词汇对应的疾病,并计算各个疾病对应的医疗词汇的权重加和,选择对应的医疗词汇的权重加和最高的疾病作为确定出的该医疗文本对应的疾病。Based on the pre-built medical professional database, find out the diseases corresponding to the medical vocabulary in the medical text, calculate the weight of the medical vocabulary corresponding to each disease, and select the corresponding medical vocabulary to add the highest disease as the determined disease. The medical text corresponds to the disease.
  6. 如权利要求3所述的医疗文本的潜在疾病推断方法,其特征在于,所述医疗专业数据库中还包含疾病对应的各个医疗词汇的权重,所述步骤B包括:The potential disease inference method of the medical text according to claim 3, wherein the medical professional database further includes weights of respective medical vocabularies corresponding to the disease, and the step B includes:
    基于预先构建的医疗专业数据库,找出该医疗文本中各个医疗词汇对应的疾病,并计算各个疾病对应的医疗词汇的权重加和,选择对应的医疗词汇的权重加和最高的疾病作为确定出的该医疗文本对应的疾病。Based on the pre-built medical professional database, find out the diseases corresponding to the medical vocabulary in the medical text, calculate the weight of the medical vocabulary corresponding to each disease, and select the corresponding medical vocabulary to add the highest disease as the determined disease. The medical text corresponds to the disease.
  7. 如权利要求1所述的医疗文本的潜在疾病推断方法,其特征在于,所述对收到的医疗文本进行分词处理的步骤包括:The method for inferring a potential disease of a medical text according to claim 1, wherein the step of performing word segmentation on the received medical text comprises:
    根据正向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第一匹配结果,所述第一匹配结果中包含有第一数量的第一词组和第三数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to a forward maximum matching method to obtain a first matching result, where the first matching result includes a first quantity of the first phrase and a third quantity of words ;
    根据逆向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第二匹配结果,所述第二匹配结果中包含有第二数量的第二词组和第四数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to the inverse maximum matching method to obtain a second matching result, where the second matching result includes a second number of second phrases and a fourth number of words;
    若所述第一数量与所述第二数量相等,且所述第三数量小于或者等于所述第四数量,则将所述第一匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is less than or equal to the fourth quantity, the first matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量相等,且所述第三数量大于所述第四数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is greater than the fourth quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量大于所述第二数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is not equal to the second quantity, and the first quantity is greater than the second quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量小于所述第二数量,则将所述第一匹配结果作为该医疗文本的分词结果。If the first quantity is not equal to the second quantity, and the first quantity is less than the second quantity, the first matching result is used as a word segmentation result of the medical text.
  8. 如权利要求2所述的医疗文本的潜在疾病推断方法,其特征在于,所述对收到的医疗文本进行分词处理的步骤包括:The method for inferring a potential disease of a medical text according to claim 2, wherein the step of performing word segmentation on the received medical text comprises:
    根据正向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第一匹配结果,所述第一匹配结果中包含有第一数量的第一词组和第三数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to a forward maximum matching method to obtain a first matching result, where the first matching result includes a first quantity of the first phrase and a third quantity of words ;
    根据逆向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第二匹配结果,所述第二匹配结果中包含有第二数量的第二词组和第四数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to the inverse maximum matching method to obtain a second matching result, where the second matching result includes a second number of second phrases and a fourth number of words;
    若所述第一数量与所述第二数量相等,且所述第三数量小于或者等于所述第四数量,则将所述第一匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is less than or equal to the fourth quantity, the first matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量相等,且所述第三数量大于所述第四数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is greater than the fourth quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量大于所述第二数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is not equal to the second quantity, and the first quantity is greater than the second quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量小于所述第二数量,则将所述第一匹配结果作为该医疗文本的分词结果。If the first quantity is not equal to the second quantity, and the first quantity is less than the second quantity, the first matching result is used as a word segmentation result of the medical text.
  9. 如权利要求3所述的医疗文本的潜在疾病推断方法,其特征在于,所述对收到的医疗文本进行分词处理的步骤包括:The method for inferring a potential disease of a medical text according to claim 3, wherein the step of performing word segmentation processing on the received medical text comprises:
    根据正向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第一匹配结果,所述第一匹配结果中包含有第一数量的第一词组和第三数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to a forward maximum matching method to obtain a first matching result, where the first matching result includes a first quantity of the first phrase and a third quantity of words ;
    根据逆向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第二匹配结果,所述第二匹配结果中包含有第二数量的第二词组和第四数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to the inverse maximum matching method to obtain a second matching result, where the second matching result includes a second number of second phrases and a fourth number of words;
    若所述第一数量与所述第二数量相等,且所述第三数量小于或者等于所述第四数量,则将所述第一匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is less than or equal to the fourth quantity, the first matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量相等,且所述第三数量大于所述第四数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is greater than the fourth quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量大于所述第二数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is not equal to the second quantity, and the first quantity is greater than the second quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量小于所述第二数量,则将所述第一匹配结果作为该医疗文本的分词结果。If the first quantity is not equal to the second quantity, and the first quantity is less than the second quantity, the first matching result is used as a word segmentation result of the medical text.
  10. 一种电子装置,其特征在于,所述电子装置包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的医疗文本的潜在疾病推断系统,所述医疗文本的潜在疾病推断系统被所述处理器执行时实现如下步骤:An electronic device, comprising: a memory, a processor, and a potential disease inference system stored on the memory and operable on the processor, the potential disease of the medical text The following steps are implemented when the inference system is executed by the processor:
    步骤一、对收到的医疗文本进行分词,并将该医疗文本对应的各个分词与预先确定的医疗领域专用词汇库进行匹配,提取出该医疗文本对应的各个分词中的医疗词汇;Step 1: classifying the received medical text, and matching each word segment corresponding to the medical text with a predetermined medical field-specific vocabulary to extract medical vocabulary in each participle corresponding to the medical text;
    步骤二、基于预先构建的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病;其中,所述医疗专业数据库中包含不同类型疾病与医疗词汇的映射关系;Step 2: determining, according to a pre-built medical professional database, a disease corresponding to the medical vocabulary in the medical text; wherein the medical professional database includes a mapping relationship between different types of diseases and medical vocabulary;
    步骤三、将确定的疾病作为推断出的该医疗文本的潜在疾病进行输出。Step 3: Output the determined disease as the inferred potential disease of the medical text.
  11. 如权利要求10所述的电子装置,其特征在于,在所述步骤一之前,所述处理器还用于执行所述医疗文本的潜在疾病推断系统,以实现以下步骤:The electronic device according to claim 10, wherein prior to said step 1, said processor is further configured to execute said potential disease inference system of said medical text to implement the following steps:
    从预先确定的数据源获取医疗数据,从所述医疗数据中找出每一种疾病对应的一个或多个医疗词汇,并根据不同类型疾病与医疗词汇的映射关系建立医疗专业数据库。Obtaining medical data from a predetermined data source, finding one or more medical vocabularies corresponding to each disease from the medical data, and establishing a medical professional database according to a mapping relationship between different types of diseases and medical vocabulary.
  12. 如权利要求10所述的电子装置,其特征在于,所述医疗词汇包括:The electronic device of claim 10, wherein the medical vocabulary comprises:
    疾病对应的简介信息、症状信息、并发症信息、治疗药品信息或治疗科室信息中的医疗词汇。Information about the disease, symptom information, complication information, treatment drug information, or medical vocabulary in the treatment department information.
  13. 如权利要求10所述的电子装置,其特征在于,所述医疗专业数据库中还包含疾病对应的各个医疗词汇的权重,所述步骤二包括:The electronic device according to claim 10, wherein the medical professional database further comprises weights of respective medical vocabularies corresponding to the disease, and the second step comprises:
    基于预先构建的医疗专业数据库,找出该医疗文本中各个医疗词汇对应的疾病,并计算各个疾病对应的医疗词汇的权重加和,选择对应的医疗词汇的权重加和最高的疾病作为确定出的该医疗文本对应的疾病。Based on the pre-built medical professional database, find out the diseases corresponding to the medical vocabulary in the medical text, calculate the weight of the medical vocabulary corresponding to each disease, and select the corresponding medical vocabulary to add the highest disease as the determined disease. The medical text corresponds to the disease.
  14. 如权利要求11所述的电子装置,其特征在于,所述医疗专业数据库中还包含疾病对应的各个医疗词汇的权重,所述步骤二包括:The electronic device according to claim 11, wherein the medical professional database further comprises weights of respective medical vocabularies corresponding to diseases, and the second step comprises:
    基于预先构建的医疗专业数据库,找出该医疗文本中各个医疗词汇对应的疾病,并计算各个疾病对应的医疗词汇的权重加和,选择对应的医疗词汇的权重加和最高的疾病作为确定出的该医疗文本对应的疾病。Based on the pre-built medical professional database, find out the diseases corresponding to the medical vocabulary in the medical text, calculate the weight of the medical vocabulary corresponding to each disease, and select the corresponding medical vocabulary to add the highest disease as the determined disease. The medical text corresponds to the disease.
  15. 如权利要求12所述的电子装置,其特征在于,所述医疗专业数据库中还包含疾病对应的各个医疗词汇的权重,所述步骤二包括:The electronic device according to claim 12, wherein the medical professional database further comprises weights of respective medical vocabularies corresponding to the disease, and the second step comprises:
    基于预先构建的医疗专业数据库,找出该医疗文本中各个医疗词汇对应的疾病,并计算各个疾病对应的医疗词汇的权重加和,选择对应的医疗词汇的权重加和最高的疾病作为确定出的该医疗文本对应的疾病。Based on the pre-built medical professional database, find out the diseases corresponding to the medical vocabulary in the medical text, calculate the weight of the medical vocabulary corresponding to each disease, and select the corresponding medical vocabulary to add the highest disease as the determined disease. The medical text corresponds to the disease.
  16. 如权利要求10所述的电子装置,其特征在于,所述对收到的医疗文本进行分词处理的步骤包括:The electronic device according to claim 10, wherein the step of performing word segmentation on the received medical text comprises:
    根据正向最大匹配法将该医疗文本与预先确定的医疗领域专用 词汇库进行匹配,得到第一匹配结果,所述第一匹配结果中包含有第一数量的第一词组和第三数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to a forward maximum matching method to obtain a first matching result, where the first matching result includes a first quantity of the first phrase and a third quantity of words ;
    根据逆向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第二匹配结果,所述第二匹配结果中包含有第二数量的第二词组和第四数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to the inverse maximum matching method to obtain a second matching result, where the second matching result includes a second number of second phrases and a fourth number of words;
    若所述第一数量与所述第二数量相等,且所述第三数量小于或者等于所述第四数量,则将所述第一匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is less than or equal to the fourth quantity, the first matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量相等,且所述第三数量大于所述第四数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is greater than the fourth quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量大于所述第二数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is not equal to the second quantity, and the first quantity is greater than the second quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量小于所述第二数量,则将所述第一匹配结果作为该医疗文本的分词结果。If the first quantity is not equal to the second quantity, and the first quantity is less than the second quantity, the first matching result is used as a word segmentation result of the medical text.
  17. 如权利要求11所述的电子装置,其特征在于,所述对收到的医疗文本进行分词处理的步骤包括:The electronic device according to claim 11, wherein the step of performing word segmentation on the received medical text comprises:
    根据正向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第一匹配结果,所述第一匹配结果中包含有第一数量的第一词组和第三数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to a forward maximum matching method to obtain a first matching result, where the first matching result includes a first quantity of the first phrase and a third quantity of words ;
    根据逆向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第二匹配结果,所述第二匹配结果中包含有第二数量的第二词组和第四数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to the inverse maximum matching method to obtain a second matching result, where the second matching result includes a second number of second phrases and a fourth number of words;
    若所述第一数量与所述第二数量相等,且所述第三数量小于或者等于所述第四数量,则将所述第一匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is less than or equal to the fourth quantity, the first matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量相等,且所述第三数量大于所述第四数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is greater than the fourth quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量大于所述第二数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is not equal to the second quantity, and the first quantity is greater than the second quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量小于所述第二数量,则将所述第一匹配结果作为该医疗文本的分词结果。If the first quantity is not equal to the second quantity, and the first quantity is less than the second quantity, the first matching result is used as a word segmentation result of the medical text.
  18. 如权利要求12所述的电子装置,其特征在于,所述对收到的医疗文本进行分词处理的步骤包括:The electronic device according to claim 12, wherein the step of performing word segmentation on the received medical text comprises:
    根据正向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第一匹配结果,所述第一匹配结果中包含有第一数量的第一词组和第三数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to a forward maximum matching method to obtain a first matching result, where the first matching result includes a first quantity of the first phrase and a third quantity of words ;
    根据逆向最大匹配法将该医疗文本与预先确定的医疗领域专用词汇库进行匹配,得到第二匹配结果,所述第二匹配结果中包含有第二数量的第二词组和第四数量的单字;Matching the medical text with a predetermined medical domain-specific vocabulary according to the inverse maximum matching method to obtain a second matching result, where the second matching result includes a second number of second phrases and a fourth number of words;
    若所述第一数量与所述第二数量相等,且所述第三数量小于或者等于所述第四数量,则将所述第一匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is less than or equal to the fourth quantity, the first matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量相等,且所述第三数量大于所述第四数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is equal to the second quantity, and the third quantity is greater than the fourth quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量大于所述第二数量,则将所述第二匹配结果作为该医疗文本的分词结果;If the first quantity is not equal to the second quantity, and the first quantity is greater than the second quantity, the second matching result is used as a word segmentation result of the medical text;
    若所述第一数量与所述第二数量不相等,且所述第一数量小于所述第二数量,则将所述第一匹配结果作为该医疗文本的分词结果。If the first quantity is not equal to the second quantity, and the first quantity is less than the second quantity, the first matching result is used as a word segmentation result of the medical text.
  19. 一种计算机可读存储介质,所述计算机可读存储介质存储有医疗文本的潜在疾病推断系统,所述医疗文本的潜在疾病推断系统可被至少一个处理器执行,以使所述至少一个处理器执行如下步骤:A computer readable storage medium storing a potential disease inference system of medical text executable by at least one processor to cause the at least one processor Perform the following steps:
    对收到的医疗文本进行分词,并将该医疗文本对应的各个分词与预先确定的医疗领域专用词汇库进行匹配,提取出该医疗文本对应的各个分词中的医疗词汇;Performing word segmentation on the received medical text, matching each word segment corresponding to the medical text with a predetermined medical field-specific vocabulary, and extracting medical vocabulary in each participle corresponding to the medical text;
    基于预先构建的医疗专业数据库,确定该医疗文本中的医疗词汇对应的疾病;其中,所述医疗专业数据库中包含不同类型疾病与医疗词汇的映射关系;Determining a disease corresponding to the medical vocabulary in the medical text based on the pre-built medical professional database; wherein the medical professional database includes a mapping relationship between different types of diseases and medical vocabulary;
    将确定的疾病作为推断出的该医疗文本的潜在疾病进行输出。The determined disease is output as an inferred potential disease of the medical text.
  20. 如权利要求19所述的计算机可读存储介质,其特征在于,在所述对收到的医疗文本进行分词,并将该医疗文本对应的各个分词与预先确定的医疗领域专用词汇库进行匹配,提取出该医疗文本对应的各个分词中的医疗词汇的步骤之前,所述处理器还用于执行所述医疗文本的潜在疾病推断系统,以实现以下步骤:The computer readable storage medium according to claim 19, wherein said segmentation of the received medical text is performed, and each word segment corresponding to the medical text is matched with a predetermined medical domain-specific vocabulary. Before extracting the medical vocabulary in each participle corresponding to the medical text, the processor is further configured to execute the potential disease inference system of the medical text to implement the following steps:
    从预先确定的数据源获取医疗数据,从所述医疗数据中找出每一种疾病对应的一个或多个医疗词汇,并根据不同类型疾病与医疗词汇的映射关系建立医疗专业数据库。Obtaining medical data from a predetermined data source, finding one or more medical vocabularies corresponding to each disease from the medical data, and establishing a medical professional database according to a mapping relationship between different types of diseases and medical vocabulary.
PCT/CN2018/076149 2017-05-05 2018-02-10 Method and system for inferring potential disease from medical text, and readable storage medium WO2018201772A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710313520.1 2017-05-05
CN201710313520.1A CN107680689A (en) 2017-05-05 2017-05-05 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text

Publications (1)

Publication Number Publication Date
WO2018201772A1 true WO2018201772A1 (en) 2018-11-08

Family

ID=61134116

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/076149 WO2018201772A1 (en) 2017-05-05 2018-02-10 Method and system for inferring potential disease from medical text, and readable storage medium

Country Status (2)

Country Link
CN (1) CN107680689A (en)
WO (1) WO2018201772A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107680689A (en) * 2017-05-05 2018-02-09 平安科技(深圳)有限公司 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text
CN109215771B (en) * 2018-05-29 2024-07-12 深圳平安医疗健康科技服务有限公司 Medical mapping relation library establishment method, device, computer equipment and storage medium
CN109036506B (en) * 2018-07-25 2023-04-18 平安科技(深圳)有限公司 Internet medical inquiry supervision method, electronic device and readable storage medium
CN109215796B (en) * 2018-08-14 2023-04-25 深圳平安医疗健康科技服务有限公司 Searching method, searching device, computer equipment and storage medium
CN109192300A (en) * 2018-08-17 2019-01-11 百度在线网络技术(北京)有限公司 Intelligent way of inquisition, system, computer equipment and storage medium
CN109215754A (en) * 2018-09-10 2019-01-15 平安科技(深圳)有限公司 Medical record data processing method, device, computer equipment and storage medium
CN109192321A (en) * 2018-09-26 2019-01-11 北京理工大学 The construction method and calculating storage device of drug knowledge mapping
CN109616165A (en) * 2018-11-07 2019-04-12 平安科技(深圳)有限公司 Medical information methods of exhibiting and device
CN109698018A (en) * 2018-12-24 2019-04-30 广州天鹏计算机科技有限公司 Medical text handling method, device, computer equipment and storage medium
CN110021439B (en) * 2019-03-07 2023-01-24 平安科技(深圳)有限公司 Medical data classification method and device based on machine learning and computer equipment
CN112002416A (en) * 2020-08-23 2020-11-27 吾征智能技术(北京)有限公司 Disease symptom prediction system based on urine character self-learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100042436A1 (en) * 2008-08-15 2010-02-18 Sultan Haider Disease oriented user interfaces
CN105095665A (en) * 2015-08-13 2015-11-25 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese disease diagnosis information
CN105138829A (en) * 2015-08-13 2015-12-09 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese diagnosis and treatment information
CN106095913A (en) * 2016-06-08 2016-11-09 广州同构医疗科技有限公司 A kind of electronic health record text structure method
CN106557653A (en) * 2016-11-15 2017-04-05 合肥工业大学 A kind of portable medical intelligent medical guide system and method
CN107680689A (en) * 2017-05-05 2018-02-09 平安科技(深圳)有限公司 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915299B (en) * 2012-10-23 2015-04-08 海信集团有限公司 Word segmentation method and device
CN104102816B (en) * 2014-06-20 2017-07-25 周晋 Auto-check system and method with machine learning is matched based on symptom
CN104484845B (en) * 2014-12-30 2019-03-05 天津迈沃医药技术股份有限公司 Disease autoanalysis platform based on medical information ontology database
CN104915413B (en) * 2015-06-05 2018-09-07 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of health detecting method and system
CN105139237A (en) * 2015-09-25 2015-12-09 百度在线网络技术(北京)有限公司 Information push method and apparatus
CN106372439A (en) * 2016-09-21 2017-02-01 北京大学 Method for acquiring and processing disease symptoms and weight knowledge thereof based on case library

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100042436A1 (en) * 2008-08-15 2010-02-18 Sultan Haider Disease oriented user interfaces
CN105095665A (en) * 2015-08-13 2015-11-25 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese disease diagnosis information
CN105138829A (en) * 2015-08-13 2015-12-09 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese diagnosis and treatment information
CN106095913A (en) * 2016-06-08 2016-11-09 广州同构医疗科技有限公司 A kind of electronic health record text structure method
CN106557653A (en) * 2016-11-15 2017-04-05 合肥工业大学 A kind of portable medical intelligent medical guide system and method
CN107680689A (en) * 2017-05-05 2018-02-09 平安科技(深圳)有限公司 Potential disease estimating method, system and the readable storage medium storing program for executing of medical text

Also Published As

Publication number Publication date
CN107680689A (en) 2018-02-09

Similar Documents

Publication Publication Date Title
WO2018201772A1 (en) Method and system for inferring potential disease from medical text, and readable storage medium
WO2021000676A1 (en) Q&a method, q&a device, computer equipment and storage medium
US10162886B2 (en) Embedding-based parsing of search queries on online social networks
WO2019214149A1 (en) Text key information identification method, electronic device, and readable storage medium
US9633006B2 (en) Question answering system and method for structured knowledgebase using deep natural language question analysis
Matci et al. Address standardization using the natural language process for improving geocoding results
WO2021012878A1 (en) Medical domain knowledge graph question and answer processing method, apparatus, device, and storage medium
US11720611B2 (en) Entailment knowledge base in natural language processing systems
CN110442840B (en) Sequence labeling network updating method, electronic medical record processing method and related device
WO2023029513A1 (en) Artificial intelligence-based search intention recognition method and apparatus, device, and medium
CN105210055B (en) According to the hyphenation device across languages phrase table
WO2022160614A1 (en) Method and apparatus for constructing medical entity relationship diagram, method and apparatus for medical order quality control, device, and medium
US11080615B2 (en) Generating chains of entity mentions
CN112599213B (en) Classification code determining method, device, equipment and storage medium
CN113488157B (en) Intelligent diagnosis guiding processing method and device, electronic equipment and storage medium
CN111985241A (en) Medical information query method, device, electronic equipment and medium
CN113724830B (en) Medication risk detection method based on artificial intelligence and related equipment
CA3164921A1 (en) Unsupervised taxonomy extraction from medical clinical trials
CN112149409A (en) Medical word cloud generation method and device, computer equipment and storage medium
WO2023178978A1 (en) Prescription review method and apparatus based on artificial intelligence, and device and medium
WO2023178979A1 (en) Question labeling method and apparatus, electronic device and storage medium
WO2023116572A1 (en) Word or sentence generation method and related device
WO2022227171A1 (en) Method and apparatus for extracting key information, electronic device, and medium
CN118051598A (en) Medicine knowledge question-answering method and device, electronic equipment and storage medium
CN107688594B (en) The identifying system and method for risk case based on social information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18795172

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27/02/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18795172

Country of ref document: EP

Kind code of ref document: A1