WO2021012878A1 - 医疗领域知识图谱问答处理方法、装置、设备及存储介质 - Google Patents

医疗领域知识图谱问答处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021012878A1
WO2021012878A1 PCT/CN2020/098534 CN2020098534W WO2021012878A1 WO 2021012878 A1 WO2021012878 A1 WO 2021012878A1 CN 2020098534 W CN2020098534 W CN 2020098534W WO 2021012878 A1 WO2021012878 A1 WO 2021012878A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
sentence
processed
medical
answer
Prior art date
Application number
PCT/CN2020/098534
Other languages
English (en)
French (fr)
Inventor
朱威
梁欣
倪渊
谢国彤
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to SG11202103961QA priority Critical patent/SG11202103961QA/en
Publication of WO2021012878A1 publication Critical patent/WO2021012878A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence natural language processing, and in particular to a method, device, equipment, and storage medium for processing knowledge graph questions and answers in the medical field.
  • the knowledge map is also called the scientific knowledge map. It is called the knowledge domain visualization or the knowledge domain mapping map in the library and information industry. It is a series of different graphs showing the relationship between the development process of knowledge and the structure. Because it can provide high-quality structured data, more and more fields will use knowledge graphs and question answering systems based on knowledge graphs, such as automatic question answering, search engines, and information extraction.
  • a typical knowledge graph is usually expressed in the form of head entity, relationship, and tail entity (for example, Yao Ming, nationality, China) of the triad. The expression of this instance reflects the fact that Yao Ming’s nationality is Chinese.
  • the inventor realizes that in the medical field, constructing an automatic knowledge graph question-and-answer processing system can effectively help patients or healthy users develop autonomy and prevention, and can alleviate the problem of hospitals’ shortage of public medical resources.
  • the accuracy of the knowledge question answering system is relatively high.
  • the current knowledge graph question and answer technology is still in the stage of exploration and research and development.
  • Most of the results and progress are mainly based on academic papers.
  • the specific plan is: according to the question raised by the user, the corresponding paper or website can be obtained through keyword search in the database
  • this application proposes a method, device, equipment, and storage medium for processing knowledge graph question and answer in the medical field, which can improve the processing efficiency of the user's questions and meet the user's use requirements.
  • this application proposes a method for processing knowledge graph questions and answers in the medical field.
  • the method includes the steps:
  • the medical entity and the corresponding start position and end position compare with the entity in the preset knowledge base to determine the first entity corresponding to the medical entity, and the first entity is on the knowledge graph The corresponding node;
  • the answer corresponding to the sentence to be processed is determined, and the answer is output.
  • this application also provides an electronic device, which includes:
  • the recognition module is used to obtain the sentence to be processed and identify the medical entity in the sentence to be processed;
  • An obtaining module for obtaining the start position and end position of each medical entity in the sentence to be processed
  • the determining module is used to compare the medical entity and the corresponding start position and end position with the entity in the knowledge base set in advance to determine the first entity corresponding to the medical entity, and the first entity The corresponding node of the entity on the knowledge graph;
  • a processing module configured to perform relationship analysis on the sentence to be processed, and obtain the relationship corresponding to the sentence to be processed based on a relationship matching model
  • the output module is configured to determine the answer corresponding to the sentence to be processed according to the relationship corresponding to the sentence to be processed and the node corresponding to the first entity on the knowledge graph, and output the answer.
  • the present application also provides a device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor implements knowledge in the medical field when the computer program is executed.
  • An atlas question and answer processing method the method specifically includes the following steps:
  • the medical entity and the corresponding start position and end position compare with the entity in the preset knowledge base to determine the first entity corresponding to the medical entity, and the first entity is on the knowledge graph The corresponding node;
  • the answer corresponding to the sentence to be processed is determined, and the answer is output.
  • the present application also provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the medical field knowledge graph question-and-answer processing method is implemented.
  • the method is specifically Including the following steps:
  • the medical entity and the corresponding start position and end position compare with the entity in the preset knowledge base to determine the first entity corresponding to the medical entity, and the first entity is on the knowledge graph The corresponding node;
  • the answer corresponding to the sentence to be processed is determined, and the answer is output.
  • the medical field knowledge graph question-and-answer processing method, device, equipment and storage medium proposed in this application identify the medical entity in the sentence to be processed, and obtain the sentence to be processed according to each medical entity.
  • the start position and end position in determine the first entity corresponding to the medical entity, and the node corresponding to the first entity on the knowledge graph; then the relationship between the sentence to be processed and the first entity in the knowledge graph obtained through analysis
  • the node corresponding to the above determines the answer corresponding to the sentence to be processed, and outputs the answer. It avoids the need to manually search for data from papers corresponding to the knowledge graph in the prior art, and therefore can improve the processing efficiency of the user's questions, meet the user's use requirements, and thus improve the user experience.
  • FIG. 1 is an optional application environment diagram of the electronic device of the embodiment of the present application
  • FIG. 2 is a schematic diagram of the hardware architecture of the electronic device according to the first embodiment of the present application.
  • FIG. 3 is a schematic diagram of program modules of the electronic device according to the first embodiment of the present application.
  • FIG. 4 is a schematic diagram of the display effect of node content on a knowledge map according to an embodiment of the present application
  • FIG. 5 is a schematic flow chart of a method for question and answer processing of a knowledge graph in the medical field according to the first embodiment of the present application;
  • FIG. 6 is a schematic flow chart of a method for question and answer processing of a knowledge graph in the medical field according to a second embodiment of the present application;
  • FIG. 7 is a schematic flowchart of a method for question and answering of a knowledge graph in the medical field according to a third embodiment of the present application.
  • FIG. 8 is a schematic flow chart of a method for questioning and answering a medical domain knowledge graph according to a fourth embodiment of the present application.
  • FIG 1 and 2 are schematic diagrams of an optional application environment of the electronic device 20 of the present application.
  • the electronic device 20 can communicate with the terminal device 11 and the database 30 in a wired or wireless manner.
  • the electronic device 20 obtains the input information of the terminal device 11 through the network interface 23, retrieves the corresponding knowledge graph data information from the database 30 after processing according to the obtained input information, and passes the data information through the network interface 23 is sent on the display interface of the terminal device 11, so as to realize the transmission of the data after the question-and-answer process of the medical field knowledge graph.
  • the terminal device 11 includes a mobile phone, a tablet, a personal computer, and so on.
  • the database 30 at least includes a data server.
  • FIG. 2 is a schematic diagram of an optional hardware architecture of the electronic device 20 of the present application.
  • the electronic device 20 includes, but is not limited to, a memory 21, a processor 22, and a network interface 23 that can communicate with each other through a system bus.
  • FIG. 2 only shows the electronic device 20 with components 21-23, but it should be understood that It is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the memory 21 includes at least one type of readable storage medium.
  • the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (for example, SD or DX memory, etc.), random access memory (RAM), static memory Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 21 may be an internal storage unit of the electronic device 20, such as a hard disk or a memory of the electronic device 20.
  • the memory may also be an external storage device of the electronic device 20, such as a plug-in hard disk equipped on the electronic device 20, a smart memory card (Smart Media Card, SMC), a secure digital ( Secure Digital, SD) card, flash memory card (Flash Card) etc.
  • the memory 21 may also include both an internal storage unit of the electronic device 20 and an external storage device thereof.
  • the memory 21 is generally used to store the operating system and various application software installed in the electronic device 20, such as the program code of the medical field knowledge graph question and answer processing system 24.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit) in some embodiments. Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip.
  • the processor 22 is generally used to control the overall operation of the electronic device 20.
  • the processor 22 is used to run the program code or processing data stored in the memory 21, for example, to run the medical domain knowledge graph question-and-answer processing system 24 and so on.
  • the network interface 23 may include a wireless network interface or a wired network interface.
  • the network interface 23 is usually used to establish a communication connection between the electronic device 20 and other electronic devices.
  • this application proposes an electronic device 20.
  • FIG. 3 is a schematic diagram of the program modules of the electronic device 20 according to the first embodiment of the present application.
  • the electronic device 20 includes a series of computer program instructions stored on the memory 21.
  • the computer program instructions are executed by the processor 22, the medical field knowledge graph question-and-answer processing of each embodiment of the application can be implemented. operating.
  • the electronic device 20 may be divided into one or more modules based on specific operations implemented by the various parts of the computer program instructions. For example, in FIG. 3, the electronic device 20 may be divided into an identification module 201, an obtaining module 202, a determination module 203, a processing module 204, and an output module 205. among them:
  • the identification module 201 is adapted to receive sentences to be processed externally sent to the electronic device.
  • the recognition module 201 receives the sentence to be processed and recognizes the medical entity in the sentence to be processed.
  • the obtaining the sentence to be processed includes: receiving a sentence sent by a user and determining whether the sentence is a question sentence; if so, determining that the sentence of the received user is a sentence to be processed.
  • the identified medical entity may be one or more than one. Because it is mainly used in the medical field, the identified medical entity may be the name of a disease, the name of a drug, and so on.
  • the sentence to be processed is a problem sent by the user that needs to be solved.
  • the sentence to be processed is: How should atorvastatin be taken to prevent coronary heart disease?
  • the medical entities obtained after identification are: atorvastatin (drug), coronary heart disease (disease).
  • the implementation of the identification of medical entities in the embodiment of the present application includes the use of the recognition module 201, which is specifically configured to: obtain the sentence to be processed, and use the NER model to identify the medical entity in the sentence to be processed, wherein the medical entity At least include: diseases and/or drugs.
  • the named entity recognition model (Named Entity Recognition, NER model for short), in specific implementation, forms training data by manually labeling the collected question data set.
  • the NER model uses the popular bi-LSTM-CRF model.
  • the input is: the Chinese character-based embedding layer and radical radical-based embedding layer.
  • the features added are as follows: part-of-speech tagging in the question, and the question is loaded according to the stammering tool The word category label after the medical dictionary.
  • the NER model is a named entity recognition model, which uses training samples in the medical field for training and obtains a mature NER model, which can then be used to identify medical entities. This part is a conventional implementation method of a person skilled in the art, and the embodiments of the present application will not be repeated here.
  • the entity recognition method formed can assist in effectively identifying the entity of the sentence to be processed.
  • the obtaining module 202 is used to obtain the start position and the end position of each medical entity in the sentence to be processed.
  • the specific position of the medical entity in the sentence to be processed can be obtained for each medical entity, and the specific position includes the start position and the end position.
  • the recognized medical entity atorvastatin starts at the first character and ends at the fifth character.
  • the start position of solid coronary heart disease is the 8th character, and the end position is the 10th character.
  • the determining module 203 is configured to compare the medical entity and the corresponding start position and end position with the entities in the knowledge base set in advance to determine the first entity corresponding to the medical entity and the first entity The corresponding node of an entity on the knowledge graph.
  • the medical entities are cold medicines (drugs) and colds (disease). Since cold medicines produced by different manufacturers are taken differently, for example, the unit of granular cold medicine is a bag, and the unit of capsule cold medicine is a capsule, so knowledge is required
  • the library compares the medical entities to determine the other entities corresponding to the medical entity cold medicine in the knowledge base, that is, the first entity. It is assumed that the first entity obtained includes: adult cold medicine and children cold medicine.
  • the adult cold medicines in the first entity will have corresponding nodes on the knowledge graph, and the children cold medicines in the first entity will also have corresponding nodes on the knowledge graph.
  • the content contained in the node can be obtained through the node.
  • the node corresponding to an adult cold medicine may correspond to the name and dosage of the cold medicine in the cold medicine of traditional Chinese medicine, the cold medicine of western medicine, the name and dosage of cold medicine in the cold medicine of Chinese medicine, and the name and dosage of cold medicine in the cold medicine of Western medicine. In this way, it is more detailed and comprehensive than obtaining the node division on the knowledge graph directly through cold medicine.
  • the form of each medical entity in the sentence to be processed is determined according to the medical entity and the corresponding start position and end position; the form of each medical entity in the sentence to be processed is determined The form is compared with the entities in the knowledge base to determine the second entity corresponding to the medical entity; according to the similarity algorithm, the similarity between the medical entity and the second entity is determined; according to the determined similarity, A first entity matching the medical entity is determined from the second entity.
  • each medical entity in the sentence to be processed can be obtained, and there are many question sentences stored in the knowledge base, and each question sentence will also contain entities. Analyze the entity to obtain the expression form in the entity, and then compare the entity expression form in the sentence to be processed with the entity in the knowledge base to determine whether the medical entity in the sentence to be processed has other names to obtain more Comprehensive medical entity. The more medical entities obtained, the greater the possibility of errors. Therefore, in order to improve the accuracy of the medical entity, the medical entity and the second entity are used to make similar determinations, so as to obtain more accurate information corresponding to the medical entity.
  • the first entity is used to make similar determinations, so as to obtain more accurate information corresponding to the medical entity.
  • the similarity value between the medical entity and the second entity is calculated according to the edit distance ratio of the character string; and the similarity value is determined according to the magnitude of the similarity value.
  • the similarity of the second entity or, calculate the similarity value between each medical entity and the second entity according to the feature vector; determine the similarity between the medical entity and the second entity according to the magnitude of the similarity value.
  • the calculated similarity value can be compared with the preset similarity value.
  • the similarity value is lower than the preset similarity value, the corresponding second entity is deleted and not used; otherwise, it is determined to be greater than the preset similarity value.
  • the second entity with the similarity value be the first entity.
  • the similarity between the first entity and the medical entity By calculating the similarity between the first entity and the medical entity, irrelevant entities can be eliminated from the first entity, thereby obtaining a more accurate second entity. It is understandable that the number of entities contained in the first entity is often greater than that of the second entity. Of course, the calculation result may also be that the correlation between the first entity and the medical entity is relatively high, so the first entity and the second entity It can be the same entity collection.
  • the first medical entity corresponding to the medical entity “atorvastatin” includes: “atorvastatin calcium tablets”, “Drug”, “atorvastatin calcium dispersible tablets”, “Drug”, “ “Atorvastatin calcium capsules”, “Drug”, “Amlodipine Atorvastatin calcium tablets”, “Drug”, “Atorvastatin calcium”, “ATC”, “Rosuvastatin calcium tablets”, “ Drug”. Therefore, after the similarity value is calculated, the similarity of "Atorvastatin calcium", “ATC” and atorvastatin is lower than the preset similarity value, so it will be excluded from the first entity, and the other One entity will act as the second entity.
  • boundary adjustment and multi-dimensional similarity calculation are used to effectively identify entities and filter noise.
  • the processing module 204 is configured to perform relationship analysis on the sentence to be processed, and obtain the relationship corresponding to the sentence to be processed based on the relationship matching model.
  • the training process of the relationship matching model includes: collecting samples, and manually labeling the samples as positive sample questions and negative sample questions, wherein the positive sample questions are existing and knowledge graphs.
  • the negative sample question is an entity that does not have a corresponding relationship in the knowledge graph; the positive sample question and the negative sample question are used to train the LSTM network; according to the training output value, Determine the maturity of the relational matching model; use the LSTM network with maturity greater than the preset value as the relational matching model.
  • each question is a positive sample question if it has a corresponding relationship with it in the knowledge graph, otherwise it is a negative sample question.
  • the ratio of positive and negative sample questions is determined according to the relationship in the knowledge graph, for example, 1:50 is selected.
  • the relation matching model adopts enhanced after transformation Log-Structured Merge Tree, referred to as ESIM network. This network structure was originally mainly used for question pair matching tasks. The specific process is the user's question input, which passes through the embedding layer, and then passes through a two-way LSTM network encoding, and the embedding layer of the relationship.
  • the embedding layer of the relationship is composed of two parts: (a) random initialization vector; (b) relationship name
  • the output vector after the LSTM model After the output interacts with the attention mechanism, it passes through the lstm network, and the final output passes through a feedforward network to score the output matching degree.
  • the output module 205 is configured to determine the answer corresponding to the sentence to be processed according to the relationship corresponding to the sentence to be processed and the node corresponding to the first entity on the knowledge graph, and output the answer.
  • each entity will have a corresponding node content on the knowledge graph.
  • the output module 205 is specifically used for the node corresponding to the first entity on the knowledge graph. Find the content corresponding to the relationship of the sentence to be processed; determine the found content as the answer corresponding to the sentence to be processed, and output the answer.
  • atorvastatin calcium is the main component of Lipitor.
  • the rules of Lipitor include 10mg, 20mg and 40mg, which are suitable for coronary heart disease.
  • both Plavix and aspirin are adapted to the symptoms of coronary heart disease. Therefore, the corresponding answer can be:
  • the usual starting dose of Lipitor is 10 mg once a day.
  • the corresponding answer can also be output. Therefore, the output answer is more comprehensive, and can provide high-precision, high-recall relationship matching results.
  • the medical field knowledge graph question-and-answer processing device recognizes the medical entities in the sentence to be processed, and according to the start position and end position of each medical entity in the sentence to be processed Determine the first entity corresponding to the medical entity and the node corresponding to the first entity on the knowledge graph; then by analyzing the relationship corresponding to the sentence to be processed and the node corresponding to the first entity on the knowledge graph, determine And output the answer corresponding to the sentence to be processed. It avoids the need to manually search for data from papers corresponding to the knowledge graph in the prior art, and therefore can improve the processing efficiency of the user's questions, meet the user's use requirements, and thus improve the user experience.
  • this application also proposes a question-and-answer processing method for knowledge graphs in the medical field.
  • FIG. 5 is a schematic flow chart of the first embodiment of the question and answer processing method of the medical field knowledge graph according to the present application.
  • the question and answer processing method of the medical field knowledge graph is applied to the electronic device 20.
  • the execution order of the steps in the flowchart shown in FIG. 5 can be changed, and some steps can be omitted.
  • Step S501 Obtain a sentence to be processed, and identify a medical entity in the sentence to be processed.
  • Step S502 Obtain the start position and end position of each medical entity in the sentence to be processed.
  • Step S503 According to the medical entity and the corresponding start position and end position, compare with the entity in the knowledge base set in advance to determine the first entity corresponding to the medical entity, and whether the first entity is located The corresponding node on the knowledge graph.
  • Step S504 Perform relationship analysis on the sentence to be processed, and obtain the relationship corresponding to the sentence to be processed based on the relationship matching model.
  • Step S505 Determine the answer corresponding to the sentence to be processed according to the relationship corresponding to the sentence to be processed and the node corresponding to the first entity on the knowledge graph, and output the answer.
  • the step S501 includes: obtaining a sentence to be processed, and using a NER model to identify a medical entity in the sentence to be processed, wherein the medical entity includes at least: disease and/or drug.
  • the step of obtaining a sentence to be processed includes: receiving a sentence sent by a user and determining whether the sentence is a question sentence; if so, determining that the sentence of the received user is a sentence to be processed.
  • step S503 includes:
  • S701 Determine the form of each medical entity in the sentence to be processed according to the medical entity and the corresponding start position and end position;
  • S702 Compare the form of each medical entity in the sentence to be processed with the entity in the knowledge base to determine a second entity corresponding to the medical entity;
  • S703 Determine the similarity between the medical entity and the second entity according to a similarity algorithm
  • S704 According to the determined similarity, determine a first entity matching the medical entity from the second entities.
  • step S703 includes: calculating the similarity value between the medical entity and the second entity according to the edit distance ratio of the character string; and determining the difference between the medical entity and the second entity according to the magnitude of the similarity value. Similarity; or, calculate the similarity value between each medical entity and the second entity according to the feature vector; determine the similarity between the medical entity and the second entity according to the magnitude of the similarity value.
  • Step S505 includes: searching for content corresponding to the relationship of the sentence to be processed in the node corresponding to the knowledge graph of the first entity; determining the found content as the answer corresponding to the sentence to be processed , And output the answer.
  • the training steps of the relationship matching model include:
  • S804 Use an LSTM network with a maturity greater than a preset value as a relationship matching model.
  • This application also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including independent servers, or more Server cluster composed of two servers), etc.
  • the computer device of this embodiment includes at least but not limited to: a memory, a processor, and a computer program stored in the memory and running on the processor that can be communicatively connected to each other through a system bus, wherein the processor executes the computer During the program, a question-and-answer processing method for the medical field knowledge graph is implemented, and the method specifically includes the following steps:
  • the medical entity and the corresponding start position and end position compare with the entity in the preset knowledge base to determine the first entity corresponding to the medical entity, and the first entity is on the knowledge graph The corresponding node;
  • the answer corresponding to the sentence to be processed is determined, and the answer is output.
  • the computer-readable storage medium may be non-volatile or volatile, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX). Memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory , Magnetic disks, optical disks, servers, App application malls, etc., on which computer programs are stored, and the corresponding functions are realized when the programs are executed by the processor.
  • the computer-readable storage medium of this embodiment is used to store the electronic device 20, and when executed by a processor, realizes the medical field knowledge graph question and answer processing method of the present application, and the method specifically includes the following steps:
  • the medical entity and the corresponding start position and end position compare with the entity in the preset knowledge base to determine the first entity corresponding to the medical entity, and the first entity is on the knowledge graph The corresponding node;
  • the answer corresponding to the sentence to be processed is determined, and the answer is output.
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

一种医疗领域知识图谱问答处理方法、电子装置、计算机设备及存储介质,方法包括:获得待处理语句,并识别所述待处理语句中的医学实体(S501);获得每一个医学实体在所述获待处理语句中的开始位置和结束位置(S502);确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点(S503);对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系(S504);根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出答案(S505)。该医疗领域知识图谱问答处理方法、电子装置、计算机及存储介质,能够提高用户提出问题的处理效率,满足用户的使用要求。

Description

医疗领域知识图谱问答处理方法、装置、设备及存储介质
本申请要求于2019年7月19日提交中国专利局、申请号为201910655569.4,发明名称为“医疗领域知识图谱问答处理方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能自然语言处理领域,尤其涉及一种医疗领域知识图谱问答处理方法、装置、设备及存储介质。
背景技术
知识图谱又称为科学知识图谱,在图书情报界称为知识域可视化或知识领域映射地图,是显示知识发展进程与结构关系的一系列各种不同的图形。由于其能提供高质量的结构化数据,所以越来越多的领域会使用到知识图谱以及以知识图谱为基础的问答系统受到使用,例如自动问答、搜索引擎以及信息抽取。典型的知识图谱通常以三元组的头实体、关系、尾实体(例如姚明,国籍,中国)表达形式,通过该实例的表达反映了姚明的国籍是中国这件事实。
发明人意识到,在医疗领域,构建自动的知识图谱问答处理系统能够有效帮助患者或者健康用户展开自治和预防,能够缓解医院在公共医疗资源紧缺方面的问题。而实际应用中,由于医疗领域的特殊性,对知识问答系统的准确性要求较高。但是目前知识图谱问答技术尚处于探索与研发阶段,大部分成果与进展还是以学界的论文为主,具体方案为:根据用户提出的问句,在数据库中通过关键字检索获得对应的论文或者网站文献,用户在点击具体的论文内容中去寻找其需要的内容,这样会导致用户提出问题的处理效率较差,不能满足用户的使用要求。
因此,提供有效的进行医疗领域知识图谱问答处理是亟待解决的技术问题。
发明内容
有鉴于此,本申请提出一种医疗领域知识图谱问答处理方法、装置、设备及存储介质,能够提高用户提出问题的处理效率,满足用户的使用要求。
首先,为实现上述目的,本申请提出一种医疗领域知识图谱问答处理方法,所述方法包括步骤:
获得待处理语句,并识别所述待处理语句中的医学实体;
获得每一个医学实体在所述获待处理语句中的开始位置和结束位置;
根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点;
对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系;
根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
为实现上述目的,本申请还提供一种电子装置,所述装置包括:
识别模块,用于获得待处理语句,并识别所述待处理语句中的医学实体;
获得模块,用于获得每一个医学实体在所述获待处理语句中的开始位置和结束位置;
确定模块,用于根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点;
处理模块,用于对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系;
输出模块,用于根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
为实现上述目的,本申请还提供一种设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现医疗领域知识图谱问答处理方法,所述方法具体包括如下步骤:
获得待处理语句,并识别所述待处理语句中的医学实体;
获得每一个医学实体在所述获待处理语句中的开始位置和结束位置;
根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点;
对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系;
根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现医疗领域知识图谱问答处理方法,所述方法具体包括如下步骤:
获得待处理语句,并识别所述待处理语句中的医学实体;
获得每一个医学实体在所述获待处理语句中的开始位置和结束位置;
根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点;
对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系;
根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
相较于现有技术,本申请所提出的医疗领域知识图谱问答处理方法、装置、设备及存储介质,通过识别待处理语句中的医学实体,并根据每一个医学实体在所述获待处理语句中的开始位置和结束位置确定与医学实体所对应的第一实体,以及第一实体在知识图谱上所对应的节点;然后通过分析得到的待处理语句所对应的关系和第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。避免了现有技术中需要人为的从知识图谱对应的论文中再进行资料查找,因此可以提高对用户提出问题的处理效率,满足用户的使用要求,从而提高用户体验。
附图说明
图1是本申请实施例之电子装置一可选的应用环境图;
图2是本申请第一实施例之电子装置的硬件架构示意图;
图3是本申请第一实施例之电子装置的程序模块示意图;
图4是本申请实施例的一种知识图谱上结点内容显示效果示意图;
图5是本申请第一实施例之医疗领域知识图谱问答处理方法的流程示意图;
图6是本申请第二实施例之医疗领域知识图谱问答处理方法的流程示意图;
图7是本申请第三实施例之医疗领域知识图谱问答处理方法的流程示意图;
图8是本申请第四实施例之医疗领域知识图谱问答处理方法的流程示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
参阅图1和图2所示,是本申请电子装置20一可选的应用环境示意图。
本实施例中,所述电子装置20可通过有线或无线方式与终端设备11以及数据库30进行通信。所述电子装置20通过网络接口23获取所述终端设备11的输入信息,根据获取到的输入信息经过处理后从数据库30中调取对应的知识图谱数据信息,并将所述数据信息通过网络接口23发送于所述终端设备11的显示界面上,从而实现医疗领域知识图谱问答处理后数据的传输。所述终端设备11包括手机、平板和个人计算机等。所述数据库30至少包括数据服务器。
参阅图2所示,是本申请电子装置20一可选的硬件架构示意图。电子装置20包括,但不仅限于,可通过系统总线相互通信连接存储器21、处理器22以及网络接口23,图2仅示出了具有组件21-23的电子装置20,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
所述存储器21至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器21可以是所述电子装置20的内部存储单元,例如该电子装置20的硬盘或内存。在另一些实施例中,所述存储器也可以是所述电子装置20的外部存储设备,例如该电子装置20上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,所述存储器21还可以既包括所述电子装置20的内部存储单元也包括其外部存储设备。本实施例中,所述存储器21通常用于存储安装于所述电子装置20的操作系统和各类应用软件,例如医疗领域知识图谱问答处理系统24的程序代码等。此外,所述存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制所述电子装置20的总体操作。本实施例中,所述处理器22用于运行所述存储器21中存储的程序代码或者处理数据,例如运行所述医疗领域知识图谱问答处理系统24等。
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述电子装置20与其他电子设备之间建立通信连接。
至此,己经详细介绍了本申请相关设备的硬件结构和功能。下面,将基于上述介绍提出本申请的各个实施例。
首先,本申请提出一种电子装置20。
参阅图3所示,是本申请第一实施例之电子装置20的程序模块示意图。
本实施例中,所述电子装置20包括一系列的存储于存储器21上的计算机程序指令,当该计算机程序指令被处理器22执行时,可以实现本申请各实施例的医疗领域知识图谱问答处理操作。在一些实施例中,基于该计算机程序指令各部分所实现的特定的操作,电子装置20可以被划分为一个或多个模块。例如,在图3中,所述电子装置20可以被分割成识别模块201、获得模块202、确定模块203、处理模块204、输出模块205。其中:
识别模块201,适于接收外部发送给所述电子装置的待处理语句。
具体地,所述识别模块201通过接收待处理语句,并识别所述待处理语句中的医学实体。
为了进一步提高对待处理语句处理的效率,避免用户发送的语句不符合作为进行医疗领域知识图谱问答处理的触发条件,以提高待处理语句处理的效率。一种较佳的实施方式中,所述获得待处理语句,包括:接收用户发送的语句,并判断所述语句是否为问句;如果是,确定所接收用户的语句为待处理语句。
可以理解的是,当不是问句的形式,表明这个语句是不需要回答的,所以,通过直接判断是否是问句的形式,确实是否需要进行语句处理,提高语句处理的针对性,避免了无效语句的处理,从而从整体上提高待处理语句的处理效率。
本领域技术人员可以理解的是,所识别的医学实体可以是一个也可以是多个,因为是主要应用于医疗领域,所以识别的医学实体可以是疾病的名称、药品的名称等。
可以理解的是,待处理语句为用户发送的需要进行求解的问题,例如,待处理语句为:阿托伐他丁防治冠心病应该怎么服用?识别后得到的医学实体为:阿托伐他丁(药品)、冠心病(疾病)。
具体的,本申请实施例识别医学实体的实现方式,包括采用识别模块201,具体用于:获得待处理语句,并采用NER模型识别所述待处理语句中的医学实体,其中,所述医学实体至少包括:疾病和/或药品。
需要说明的是,命名实体识别模型(Named Entity Recognition,简称NER模型),具体实现中,通过对收集的问句数据集做手动标注,形成训练数据。NER模型采用流行的bi-LSTM-CRF模型,输入为:基于汉字的嵌入层与基于偏旁部首的嵌入层加入的特征如下:问句中的词性标注,问句中根据结巴分词工具加载我们的医学词典后的词语类别标注。NER模型是命名实体识别模型,采用医学领域的训练样本进行训练,得到成熟的NER模型,就可以将该成熟的NER模型进行医学实体的识别。该部分为本领域技术人员的常规实现手段,本申请实施例在此不做赘述。
因此,通过本申请实施例的识别网络结构加上特征识别,形成的实体识别方式能够协助有效地识别出待处理语句的实体。
获得模块202,用于获得每一个医学实体在所述获待处理语句中的开始位置和结束位置。
需要说明的是,当获得待处理语句中的医学实体以后,可以针对每一个医学实体,获得该医学实体在待处理语句中的具体位置,具体位置包括开始位置和结束位置。
示例性的,待处理语句“阿托伐他丁防治冠心病应该怎么服用?”识别后到的医学实体阿托伐他丁的开始位置是第1个字,结束位置是第5个字,医学实体冠心病的开始位置是第8个字,结束位置是第10个字。
确定模块203,用于根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点。
由于一种药品可能存在有多种称呼,或者,由多个厂家生产同一类治疗某种疾病的药品,基于这一因素,如果直接用识别模块201识别出来的医学实体会造成局限性太大,得不到全面的结果。例如待处理语句感冒药的服用剂量?其医学实体为感冒药(药品)、感冒(疾病),由于不同厂家生产的感冒药的吃法是不同的,例如颗粒感冒药的单位是袋,胶囊感冒药的单位是颗,所以需要在知识库中对医学实体进行比对,确定医学实体感冒药在知识库中所对应的其他实体,即第一实体,假设获得的第一实体包括:成人感冒药、儿童感冒药。
可以理解的是,第一实体中的成人感冒药在知识图谱上会有对应的节点,同样第一实体中的儿童感冒药在知识图谱上也会有对应的节点。具体的,在知识图谱上找到节点以后,可以通过节点获得节点中所包含的内容。例如,成人感冒药所对应的节点可以对应中药感冒药、西药感冒药,中药感冒药中的感冒药名称和服用剂量,西药感冒药中的感冒药名称和服用剂量。这样就比直接通过感冒药去在知识图谱上获得节点划分更细致、更全面。
在一较佳实施例中,根据所述医学实体和对应的开始位置、结束位置,确定每一个医学实体在所述待处理语句中的形式;将每一个医学实体在所述待处理语句中的形式与知识库中的实体进行比对,确定与所述医学实体对应的第二实体;根据相似度算法,确定所述医学实体与所述第二实体的相似度;根据所确定的相似度,从所述第二实体中确定与所述医学实体相匹配的第一实体。
需要说明的是,通过对问句进行分析,可以得到每一个医学实体在待处理语句中的表达形式,而在知识库中也存储有很多问句,每一个问句中也会包含实体,同样对实体进行解析可以获得实体中的表达形式,然后将待处理语句中的实体表达形式与知识库中的实体进行比对,以确定待处理语句中的医学实体是否存在其他的名称,以获得更全面的医学实体。而获得的医学实体越多存在错误的可能性也越大,所以为了提高医学实体的准确性,采用医学实体与所述第二实体进行相似的确定,从而获得与医学实体相对应的更加准确的第一实体。
在一较佳实施例中,本领域技术人员可以理解的是,计算相似度可以采用汉字编辑距离、拼音编辑距离、Word2vec特征、同义词林特征。因此,本申请实施例的具体实现中,根据字符串的编辑距离比率,计算所述医学实体与所述第二实体的相似度值;根据相似度值的大小,确定所述医学实体与所述第二实体的相似度;或者,根据特征向量,计算每一个医学实体与第二实体的相似度值;根据相似度值的大小,确定所述医学实体与所述第二实体的相似度。
具体的实现中,可以将计算出来的相似度值与预设相似度值进行比较,当相似度值低于预设相似度值时,对应的第二实体删除不使用,否则,则确定大于预设相似度值的第二实体为第一实体。
通过计算第一实体和医学实体的相似度,可以从第一实体中剔除不相关的实体,从而获得更准确的第二实体。可以理解的是,第一实体包含的实体个数往往是大于第二实体的,当然,计算的结果也可能是第一实体与医学实体的相关度均比较高,那么第一实体和第二实体就可以为相同的实体集合。
示例性的,医学实体“阿托伐他丁”所对应的第一医学实体包括:“阿托伐他汀钙片”,“Drug”、“阿托伐他汀钙分散片”,“Drug”、“阿托伐他汀钙胶囊”,“Drug”、“氨氯地平阿托伐他汀钙片”,“Drug”、“阿托伐他汀钙”,“ATC”、“瑞舒伐他汀钙片”,“Drug”。因此,经过相似度值计算以后“阿托伐他汀钙”,“ATC”与阿托伐他丁的相似度低于预设的相似度值,所以会从第一实体中剔除,而其他的第一实体会作为第二实体。
因此,通过实体连接部分采用边界调整与多维度相似度计算,有效识别实体,并过滤噪音。
处理模块204,用于对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系。
示例性的,待处理语句阿托伐他丁防治冠心病应该怎么服用?的关系为:<药品>防治<疾病>应该怎么服用?<药品>和<疾病>均表示医学实体。
一种实施方式中,所述关系匹配模型的训练过程包括:收集样本,并手动对样本进行标注为正样本问句和负样本问句,其中,所述正样本问句为存在有与知识图谱中对应关系的实体,所述负样本问句为不存在有与知识图谱中对应关系的实体;采用所述正样本问句和所述负样本问句对LSTM网络进行训练;根据训练输出值,确定关系匹配模型的成熟度;将成熟度大于预设数值的LSTM网络作为关系匹配模型。
可以理解的是,每个问句在知识图谱中存在有与其具有对应关系时为正样本问句,否则为负样本问句。正负样本问句比例根据知识图谱中的关系多少来决定,例如选择1:50。关系匹配模型采用改造后的enhanced Log-Structured Merge Tree,简称ESIM网络。这一网络结构原先主要用于问句对匹配任务。具体过程是用户的问句输入,经过嵌入层,再通过一个双向LSTM网络编码,以及关系的嵌入层,关系的嵌入层由两部分相加组成:(a)随机初始化向量;(b)关系名称通过嵌入层,LSTM模型后输出的向量。输出进行注意力机制的交互后,再通过lstm网络,最后的输出通过一个前馈网络,输出匹配度打分。
输出模块205,用于在根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
可以理解的是,每一个实体在知识图谱上会有一个对应的节点内容,一种具体的实现方式中,输出模块205,具体用于在所述第一实体在知识图谱上所对应的节点中查找所述待处理语句的关系所对应的内容;将所查找到的内容确定为所述待处理语句所对应的答案,并输出所述答案。
如图4所示,阿托伐他汀钙为立普妥的主要成分,立普妥的规则包含10mg、20mg和40mg,适应于冠心病。同时,在冠心病这个实体的节点上还可以看到波立维和阿司匹林都适应于冠心病的症状。因此,对应的答案可以为:立普妥常用的起始剂量为10mg每日一次。针对规格为20mg的立普妥同样可以输出对应的答案。因此,输出答案较全面,能够提供高精确度,高召回率的关系匹配结果。
因此,综上所述,本申请所提出的医疗领域知识图谱问答处理装置,通过识别待处理语句中的医学实体,并根据每一个医学实体在所述获待处理语句中的开始位置和结束位置确定与医学实体所对应的第一实体,以及第一实体在知识图谱上所对应的节点;然后通过分析得到的待处理语句所对应的关系和第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。避免了现有技术中需要人为的从知识图谱对应的论文中再进行资料查找,因此可以提高对用户提出问题的处理效率,满足用户的使用要求,从而提高用户体验。
此外,本申请还提出一种医疗领域知识图谱问答处理方法。
参阅图5所示,是本申请医疗领域知识图谱问答处理方法之第一实施例的流程示意图。所述医疗领域知识图谱问答处理方法应用于电子装置20中。在本实施例中,根据不同的需求,图5所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。
步骤S501,获得待处理语句,并识别所述待处理语句中的医学实体。
步骤S502,获得每一个医学实体在所述获待处理语句中的开始位置和结束位置。
步骤S503,根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点。
步骤S504,对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系。
步骤S505,根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
如图6所示,所述步骤S501,具体为S601,包括:获得待处理语句,并采用NER模型识别所述待处理语句中的医学实体,其中,所述医学实体至少包括:疾病和/或药品。
一种优选的实施方式中,所述获得待处理语句的步骤,包括:接收用户发送的语句,并判断所述语句是否为问句;如果是,确定所接收用户的语句为待处理语句。
具体的,如图7所示,步骤S503,包括:
S701,根据所述医学实体和对应的开始位置、结束位置,确定每一个医学实体在所述待处理语句中的形式;
S702,将每一个医学实体在所述待处理语句中的形式与知识库中的实体进行比对,确定与所述医学实体对应的第二实体;
S703,根据相似度算法,确定所述医学实体与所述第二实体的相似度;
S704,根据所确定的相似度,从所述第二实体中确定与所述医学实体相匹配的第一实体。
具体的,步骤S703包括:根据字符串的编辑距离比率,计算所述医学实体与所述第二实体的相似度值;根据相似度值的大小,确定所述医学实体与所述第二实体的相似度;或者,根据特征向量,计算每一个医学实体与第二实体的相似度值;根据相似度值的大小,确定所述医学实体与所述第二实体的相似度。
步骤S505,包括:在所述第一实体在知识图谱上所对应的节点中查找所述待处理语句的关系所对应的内容;将所查找到的内容确定为所述待处理语句所对应的答案,并输出所述答案。
如图8所示,所述关系匹配模型的训练步骤,包括:
S801,收集样本,并手动对样本进行标注为正样本问句和负样本问句,其中,所述正样本问句为存在有与知识图谱中对应关系的实体,所述负样本问句为不存在有与知识图谱中对应关系的实体;
S802,采用所述正样本问句和所述负样本问句对LSTM网络进行训练;
S803,根据训练输出值,确定关系匹配模型的成熟度;
S804,将成熟度大于预设数值的LSTM网络作为关系匹配模型。
本申请还提供一种计算机设备,如可以执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。本实施例的计算机设备至少包括但不限于:可通过系统总线相互通信连接的存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现医疗领域知识图谱问答处理方法,所述方法具体包括如下步骤:
获得待处理语句,并识别所述待处理语句中的医学实体;
获得每一个医学实体在所述获待处理语句中的开始位置和结束位置;
根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点;
对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系;
根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
本实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器执行时实现相应功能。本实施例的计算机可读存储介质用于存储电子装置20,被处理器执行时实现本申请的医疗领域知识图谱问答处理方法,所述方法具体包括如下步骤:
获得待处理语句,并识别所述待处理语句中的医学实体;
获得每一个医学实体在所述获待处理语句中的开始位置和结束位置;
根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点;
对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系;
根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种医疗领域知识图谱问答处理方法,其中,所述方法包括步骤:
    获得待处理语句,并识别所述待处理语句中的医学实体;
    获得每一个医学实体在所述获待处理语句中的开始位置和结束位置;
    根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点;
    对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系;
    根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
  2. 根据权利要求1所述的医疗领域知识图谱问答处理方法,其中,所述获得待处理语句,并识别所述待处理语句中的医学实体的步骤,包括:
    获得待处理语句,并采用NER模型识别所述待处理语句中的医学实体,其中,所述医学实体至少包括:疾病和/或药品。
  3. 根据权利要求1或2所述的医疗领域知识图谱问答处理方法,其中,所述获得待处理语句的步骤,包括:
    接收用户发送的语句,并判断所述语句是否为问句;
    如果是,确定所接收用户的语句为待处理语句。
  4. 根据权利要求3所述的医疗领域知识图谱问答处理方法,其中,所述根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点的步骤,包括:
    根据所述医学实体和对应的开始位置、结束位置,确定每一个医学实体在所述待处理语句中的形式;
    将每一个医学实体在所述待处理语句中的形式与知识库中的实体进行比对,确定与所述医学实体对应的第二实体;
    根据相似度算法,确定所述医学实体与所述第二实体的相似度;
    根据所确定的相似度,从所述第二实体中确定与所述医学实体相匹配的第一实体。
  5. 根据权利要求4所述的医疗领域知识图谱问答处理方法,其中,所述根据相似度算法,确定所述医学实体与所述第二实体的相似度的步骤,包括:
    根据字符串的编辑距离比率,计算所述医学实体与所述第二实体的相似度值;
    根据相似度值的大小,确定所述医学实体与所述第二实体的相似度;
    或者,
    根据特征向量,计算每一个医学实体与第二实体的相似度值;
    根据相似度值的大小,确定所述医学实体与所述第二实体的相似度。
  6. 根据权利要求1所述的医疗领域知识图谱问答处理方法,其中,所述根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案的步骤,包括:
    在所述第一实体在知识图谱上所对应的节点中查找所述待处理语句的关系所对应的内容;
    将所查找到的内容确定为所述待处理语句所对应的答案,并输出所述答案。
  7. 根据权利要求1所述的医疗领域知识图谱问答处理方法,其中,所述关系匹配模型的训练步骤,包括:
    收集样本,并手动对样本进行标注为正样本问句和负样本问句,其中,所述正样本问句为存在有与知识图谱中对应关系的实体,所述负样本问句为不存在有与知识图谱中对应关系的实体;
    采用所述正样本问句和所述负样本问句对LSTM网络进行训练;
    根据训练输出值,确定关系匹配模型的成熟度;
    将成熟度大于预设数值的LSTM网络作为关系匹配模型。
  8. 一种电子装置,其中,所述装置包括:
    识别模块,用于获得待处理语句,并识别所述待处理语句中的医学实体;
    获得模块,用于获得每一个医学实体在所述获待处理语句中的开始位置和结束位置;
    确定模块,用于根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点;
    处理模块,用于对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系;
    输出模块,用于根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
  9. 一种设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现医疗领域知识图谱问答处理方法,所述方法具体包括如下步骤:
    获得待处理语句,并识别所述待处理语句中的医学实体;
    获得每一个医学实体在所述获待处理语句中的开始位置和结束位置;
    根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点;
    对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系;
    根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
  10. 如权利要求9所述的设备,其中,所述获得待处理语句,并识别所述待处理语句中的医学实体的步骤,包括:
    获得待处理语句,并采用NER模型识别所述待处理语句中的医学实体,其中,所述医学实体至少包括:疾病和/或药品。
  11. 如权利要求9或10所述的设备,其中,所述获得待处理语句的步骤,包括:
    接收用户发送的语句,并判断所述语句是否为问句;
    如果是,确定所接收用户的语句为待处理语句。
  12. 如权利要求11所述的设备,其中,所述所述根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点的步骤,包括:
    根据所述医学实体和对应的开始位置、结束位置,确定每一个医学实体在所述待处理语句中的形式;
    将每一个医学实体在所述待处理语句中的形式与知识库中的实体进行比对,确定与所述医学实体对应的第二实体;
    根据相似度算法,确定所述医学实体与所述第二实体的相似度;
    根据所确定的相似度,从所述第二实体中确定与所述医学实体相匹配的第一实体。
  13. 如权利要求12所述的设备,其中,所述根据相似度算法,确定所述医学实体与所述第二实体的相似度的步骤,包括:
    根据字符串的编辑距离比率,计算所述医学实体与所述第二实体的相似度值;
    根据相似度值的大小,确定所述医学实体与所述第二实体的相似度;
    或者,
    根据特征向量,计算每一个医学实体与第二实体的相似度值;
    根据相似度值的大小,确定所述医学实体与所述第二实体的相似度。
  14. 如权利要求9所述的设备,其中,所述根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案的步骤,包括:
    在所述第一实体在知识图谱上所对应的节点中查找所述待处理语句的关系所对应的内容;
    将所查找到的内容确定为所述待处理语句所对应的答案,并输出所述答案。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现医疗领域知识图谱问答处理方法,所述方法具体包括如下步骤:
    获得待处理语句,并识别所述待处理语句中的医学实体;
    获得每一个医学实体在所述获待处理语句中的开始位置和结束位置;
    根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点;
    对所述待处理语句进行关系分析,并基于关系匹配模型获得所述待处理语句对应的关系;
    根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案。
  16. 如权利要求15所述的计算机可读存储介质,其中,所述获得待处理语句,并识别所述待处理语句中的医学实体的步骤,包括:
    获得待处理语句,并采用NER模型识别所述待处理语句中的医学实体,其中,所述医学实体至少包括:疾病和/或药品。
  17. 如权利要求15或16所述的计算机可读存储介质,其中,所述获得待处理语句的步骤,包括:
    接收用户发送的语句,并判断所述语句是否为问句;
    如果是,确定所接收用户的语句为待处理语句。
  18. 如权利要求17所述的计算机可读存储介质,其中,所述所述根据所述医学实体和对应的开始位置、结束位置,与预先设置的知识库中的实体进行比对,确定与所述医学实体所对应的第一实体,以及所述第一实体在知识图谱上所对应的节点的步骤,包括:
    根据所述医学实体和对应的开始位置、结束位置,确定每一个医学实体在所述待处理语句中的形式;
    将每一个医学实体在所述待处理语句中的形式与知识库中的实体进行比对,确定与所述医学实体对应的第二实体;
    根据相似度算法,确定所述医学实体与所述第二实体的相似度;
    根据所确定的相似度,从所述第二实体中确定与所述医学实体相匹配的第一实体。
  19. 如权利要求18所述的计算机可读存储介质,其中,所述根据相似度算法,确定所述医学实体与所述第二实体的相似度的步骤,包括:
    根据字符串的编辑距离比率,计算所述医学实体与所述第二实体的相似度值;
    根据相似度值的大小,确定所述医学实体与所述第二实体的相似度;
    或者,
    根据特征向量,计算每一个医学实体与第二实体的相似度值;
    根据相似度值的大小,确定所述医学实体与所述第二实体的相似度。
  20. 如权利要求15所述的计算机可读存储介质,其中,所述根据所述待处理语句对应的关系、所述第一实体在知识图谱上所对应的节点,确定所述待处理语句所对应的答案,并输出所述答案的步骤,包括:
    在所述第一实体在知识图谱上所对应的节点中查找所述待处理语句的关系所对应的内容;
    将所查找到的内容确定为所述待处理语句所对应的答案,并输出所述答案。
PCT/CN2020/098534 2019-07-19 2020-06-28 医疗领域知识图谱问答处理方法、装置、设备及存储介质 WO2021012878A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
SG11202103961QA SG11202103961QA (en) 2019-07-19 2020-06-28 Medical knowledge graph question answering processing method, device, apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910655569.4 2019-07-19
CN201910655569.4A CN110532360A (zh) 2019-07-19 2019-07-19 医疗领域知识图谱问答处理方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021012878A1 true WO2021012878A1 (zh) 2021-01-28

Family

ID=68660502

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/098534 WO2021012878A1 (zh) 2019-07-19 2020-06-28 医疗领域知识图谱问答处理方法、装置、设备及存储介质

Country Status (3)

Country Link
CN (1) CN110532360A (zh)
SG (1) SG11202103961QA (zh)
WO (1) WO2021012878A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112786144A (zh) * 2021-01-29 2021-05-11 北京百度网讯科技有限公司 知识图谱的方法、医嘱质控的方法、装置、设备和介质
CN113127626A (zh) * 2021-04-22 2021-07-16 广联达科技股份有限公司 基于知识图谱的推荐方法、装置、设备及可读存储介质
CN113190645A (zh) * 2021-05-31 2021-07-30 国家电网有限公司大数据中心 一种索引结构建立方法、装置、设备及存储介质
CN113641807A (zh) * 2021-07-28 2021-11-12 北京百度网讯科技有限公司 对话推荐模型的训练方法、装置、设备和存储介质
CN115510196A (zh) * 2021-06-07 2022-12-23 马上消费金融股份有限公司 知识图谱的构建方法、问答方法、装置和存储介质
CN115762813A (zh) * 2023-01-09 2023-03-07 之江实验室 一种基于患者个体知识图谱的医患交互方法及系统
CN115982335A (zh) * 2023-02-14 2023-04-18 智慧眼科技股份有限公司 一种主动式ai医疗问答系统、方法、设备及存储介质
CN117407541A (zh) * 2023-12-15 2024-01-16 中国科学技术大学 一种基于知识增强的知识图谱问答方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532360A (zh) * 2019-07-19 2019-12-03 平安科技(深圳)有限公司 医疗领域知识图谱问答处理方法、装置、设备及存储介质
CN111191035B (zh) * 2019-12-30 2022-07-19 四川大学华西医院 一种识别肺癌临床数据库文本实体的方法及装置
CN111341456B (zh) * 2020-02-21 2024-02-23 中南大学湘雅医院 糖尿病足知识图谱生成方法、装置及可读存储介质
CN113434627A (zh) * 2020-03-18 2021-09-24 中国电信股份有限公司 工单的处理方法、装置和计算机可读存储介质
CN112307215A (zh) * 2020-04-20 2021-02-02 北京沃东天骏信息技术有限公司 数据处理方法、装置及计算机可读存储介质
CN111694942A (zh) * 2020-05-29 2020-09-22 平安科技(深圳)有限公司 问答方法、装置、设备及计算机可读存储介质
CN112148884B (zh) * 2020-08-21 2023-09-22 北京阿叟阿巴科技有限公司 用于孤独症干预的系统及方法
CN112466463B (zh) * 2020-12-10 2023-08-18 求臻医学科技(浙江)有限公司 基于肿瘤精准诊疗知识图谱的智能解答系统
CN114416941B (zh) * 2021-12-28 2023-09-05 北京百度网讯科技有限公司 融合知识图谱的对话知识点确定模型的生成方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180276525A1 (en) * 2015-12-03 2018-09-27 Huawei Technologies Co., Ltd. Method and neural network system for human-computer interaction, and user equipment
CN109522393A (zh) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 智能问答方法、装置、计算机设备和存储介质
CN109543018A (zh) * 2018-11-23 2019-03-29 北京羽扇智信息科技有限公司 答案生成方法、装置、电子设备及存储介质
CN110532360A (zh) * 2019-07-19 2019-12-03 平安科技(深圳)有限公司 医疗领域知识图谱问答处理方法、装置、设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748757B (zh) * 2017-09-21 2021-05-07 北京航空航天大学 一种基于知识图谱的问答方法
CN109271504B (zh) * 2018-11-07 2021-06-25 爱因互动科技发展(北京)有限公司 基于知识图谱的推理对话的方法
CN109710738A (zh) * 2018-12-24 2019-05-03 广州天鹏计算机科技有限公司 药物问询方法、装置、系统、计算机设备和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180276525A1 (en) * 2015-12-03 2018-09-27 Huawei Technologies Co., Ltd. Method and neural network system for human-computer interaction, and user equipment
CN109522393A (zh) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 智能问答方法、装置、计算机设备和存储介质
CN109543018A (zh) * 2018-11-23 2019-03-29 北京羽扇智信息科技有限公司 答案生成方法、装置、电子设备及存储介质
CN110532360A (zh) * 2019-07-19 2019-12-03 平安科技(深圳)有限公司 医疗领域知识图谱问答处理方法、装置、设备及存储介质

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112786144B (zh) * 2021-01-29 2024-04-02 北京百度网讯科技有限公司 知识图谱的方法、医嘱质控的方法、装置、设备和介质
CN112786144A (zh) * 2021-01-29 2021-05-11 北京百度网讯科技有限公司 知识图谱的方法、医嘱质控的方法、装置、设备和介质
CN113127626A (zh) * 2021-04-22 2021-07-16 广联达科技股份有限公司 基于知识图谱的推荐方法、装置、设备及可读存储介质
CN113127626B (zh) * 2021-04-22 2024-04-30 广联达科技股份有限公司 基于知识图谱的推荐方法、装置、设备及可读存储介质
CN113190645A (zh) * 2021-05-31 2021-07-30 国家电网有限公司大数据中心 一种索引结构建立方法、装置、设备及存储介质
CN115510196A (zh) * 2021-06-07 2022-12-23 马上消费金融股份有限公司 知识图谱的构建方法、问答方法、装置和存储介质
CN113641807A (zh) * 2021-07-28 2021-11-12 北京百度网讯科技有限公司 对话推荐模型的训练方法、装置、设备和存储介质
CN113641807B (zh) * 2021-07-28 2024-05-24 北京百度网讯科技有限公司 对话推荐模型的训练方法、装置、设备和存储介质
CN115762813A (zh) * 2023-01-09 2023-03-07 之江实验室 一种基于患者个体知识图谱的医患交互方法及系统
CN115762813B (zh) * 2023-01-09 2023-04-18 之江实验室 一种基于患者个体知识图谱的医患交互方法及系统
CN115982335A (zh) * 2023-02-14 2023-04-18 智慧眼科技股份有限公司 一种主动式ai医疗问答系统、方法、设备及存储介质
CN117407541B (zh) * 2023-12-15 2024-03-29 中国科学技术大学 一种基于知识增强的知识图谱问答方法
CN117407541A (zh) * 2023-12-15 2024-01-16 中国科学技术大学 一种基于知识增强的知识图谱问答方法

Also Published As

Publication number Publication date
SG11202103961QA (en) 2021-05-28
CN110532360A (zh) 2019-12-03

Similar Documents

Publication Publication Date Title
WO2021012878A1 (zh) 医疗领域知识图谱问答处理方法、装置、设备及存储介质
CN107808124B (zh) 电子装置、医疗文本实体命名的识别方法及存储介质
CN108182262B (zh) 基于深度学习和知识图谱的智能问答系统构建方法和系统
US10818397B2 (en) Clinical content analytics engine
JP6901816B2 (ja) エンティティ関係データ生成方法、装置、機器、及び記憶媒体
WO2022105115A1 (zh) 问答对匹配方法、装置、电子设备及存储介质
WO2020253725A1 (zh) 一种药品推荐方法、电子设备和计算机可读存储介质
WO2018201772A1 (zh) 医疗文本的潜在疾病推断方法、系统及可读存储介质
WO2020172446A9 (en) Automated generation of structured patient data record
CN103631847A (zh) 基于上下文的搜索与图形节点相关的数据存储的方法和系统
WO2023029512A1 (zh) 基于知识图谱的医疗问题解答方法、装置、设备及介质
WO2021146831A1 (zh) 实体识别的方法和装置、建立词典的方法、设备、介质
WO2023029513A1 (zh) 基于人工智能的搜索意图识别方法、装置、设备及介质
CN111048167A (zh) 一种层级式病例结构化方法及系统
CN112347204B (zh) 药物研发知识库构建方法及装置
CN113793696B (zh) 一种基于相似性的新药副作用发生频率预测方法、系统、终端及可读存储介质
CN111292814A (zh) 一种医疗数据标准化的方法及装置
CN111986759A (zh) 电子病历的解析方法、系统、计算机设备与可读存储介质
CN108427702A (zh) 目标文档获取方法及应用服务器
CN108304381B (zh) 基于人工智能的实体建边方法、装置、设备及存储介质
CN111177309A (zh) 病历数据的处理方法及装置
CN115620886B (zh) 一种数据审核方法和装置
CN113052410A (zh) 一种电子病历数据的质控方法及装置
EP1710718A2 (en) Systems and methods for performing streaming checks on data format for UDTs
CN115186112B (zh) 一种基于辨证映射规则的医药数据检索方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20844528

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20844528

Country of ref document: EP

Kind code of ref document: A1