WO2021213314A1 - Data processing method and device, and computer readable storage medium - Google Patents

Data processing method and device, and computer readable storage medium Download PDF

Info

Publication number
WO2021213314A1
WO2021213314A1 PCT/CN2021/088090 CN2021088090W WO2021213314A1 WO 2021213314 A1 WO2021213314 A1 WO 2021213314A1 CN 2021088090 W CN2021088090 W CN 2021088090W WO 2021213314 A1 WO2021213314 A1 WO 2021213314A1
Authority
WO
WIPO (PCT)
Prior art keywords
medical
queried
entity
relationship
sentence
Prior art date
Application number
PCT/CN2021/088090
Other languages
French (fr)
Chinese (zh)
Inventor
冷莹
Original Assignee
北京京东拓先科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东拓先科技有限公司 filed Critical 北京京东拓先科技有限公司
Publication of WO2021213314A1 publication Critical patent/WO2021213314A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring

Definitions

  • the present disclosure relates to the field of information technology, and in particular to a data processing method, device, and computer-readable storage medium.
  • Question answering systems usually integrate different research fields such as natural language processing, information retrieval, and databases. After the user enters the question to be asked, the question and answer system analyzes and processes the question entered by the user, performs various operations such as information retrieval or database query, and returns the answer that the user needs.
  • Question answering systems include automatic question answering systems based on search engines and community-based question answering systems.
  • a technical problem solved by the present disclosure is how to automatically provide users with scientific and reasonable medical advice and medical recommendations.
  • a computer-executed data processing method including: parsing an inquiry sentence input by a user to obtain a medical entity to be queried and a medical relationship to be queried; using the medical care to be queried Entity and medical relationship to be queried query the pre-created medical knowledge graph to obtain the target medical entity.
  • the nodes of the medical knowledge graph are each medical entity
  • the edges of the medical knowledge graph are the medical relationship between each medical entity, and each medical entity Including the medical entity to be queried and the target medical entity.
  • the medical relationship between each medical entity includes the medical relationship to be queried; according to the target medical entity, a response sentence is output.
  • parsing the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried includes: identifying the medical entity to be queried included in the query sentence input by the user; and identifying the query input by the user At least one keyword contained in the query sentence; the medical relationship to be queried is determined according to the medical entity to be queried and at least one keyword of the query sentence.
  • the query sentence includes multiple keywords
  • the medical relationship to be queried includes the initial medical relationship and the association relationship between each keyword.
  • Determining the medical relationship to be queried includes: determining the initial medical relationship to be queried according to the medical entity to be queried and the first keyword of the query sentence; determining the association between each keyword according to the keywords of the query sentence relation.
  • identifying the medical entity to be queried included in the query sentence input by the user includes: using a two-way long and short-term memory network and a conditional random field to recognize the medical entity to be queried included in the query sentence input by the user.
  • the parsing of the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried includes: extracting the medical characteristic vocabulary contained in the query sentence to identify what the query sentence contains The medical entity to be queried.
  • parsing the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried includes: understanding the semantics expressed by the query sentence through a relationship recognition ruler, and matching medical knowledge The relationship type in the graph, so as to obtain the medical relationship corresponding to the medical knowledge graph.
  • using the medical entity to be queried and the medical relationship to be queried to query the pre-created medical knowledge graph includes: using the medical entity to be queried and the medical relationship to be queried to generate a graph database query statement; using the graph database to query Sentence queries the pre-created medical knowledge graph.
  • using the medical entity to be queried and the medical relationship to be queried to generate a graph database query sentence includes: combining the medical entity to be queried and the medical relationship to be queried to form a logical expression of the query sentence Formula; convert the logical expression into a graph database query statement corresponding to the query statement based on the medical knowledge graph.
  • the query statement of the graph database includes a one-degree query statement or a multiple-degree query statement.
  • the graph database query statement includes a Cypher graph database query statement.
  • outputting the response sentence according to the target medical entity includes: determining the sentence template of the response sentence according to the medical relationship to be queried; filling the sentence template of the response sentence with the medical entity to be queried and the target medical entity to generate the response Sentence; output the response sentence.
  • filling the medical entity to be queried and the target medical entity into the sentence template of the response sentence, and generating the response sentence includes: according to the query result from the medical knowledge graph, combining the sentence template of the medical knowledge and the response sentence And natural language that conforms to the user’s expression habits to generate response sentences.
  • the data processing method further includes: using each medical entity and the medical relationship between each medical entity to create a medical knowledge graph.
  • a data processing device including: a sentence parsing module configured to parse a query sentence input by a user to obtain a medical entity to be queried and a medical relationship to be queried;
  • the knowledge graph query module is configured to use the medical entity to be queried and the medical relationship to be queried to query the pre-created medical knowledge graph to obtain the target medical entity.
  • the nodes of the medical knowledge graph are each medical entity, and the edges of the medical knowledge graph
  • each medical entity includes the medical entity to be queried and the target medical entity, and the medical relationship between each medical entity includes the medical relationship to be queried
  • the sentence output module is configured to be based on the target medical entity , Output the response sentence.
  • the sentence analysis module is configured to: identify the medical entity to be queried included in the query sentence entered by the user; identify at least one keyword included in the query sentence entered by the user; and according to the medical entity to be queried and the query At least one keyword of the query sentence determines the medical relationship to be queried.
  • the query sentence includes multiple keywords, and the medical relationship to be queried includes the initial medical relationship and the association relationship between each keyword; the sentence parsing module is configured to: according to the medical entity to be queried and the query The first keyword of the sentence determines the initial medical relationship to be queried; according to each keyword of the query sentence, the association relationship between each keyword is determined.
  • the sentence parsing module is configured to use a two-way long and short-term memory network and a conditional random field to identify the medical entity to be queried included in the query sentence input by the user.
  • the knowledge graph query module is configured to: use the medical entity to be queried and the medical relationship to be queried to generate a graph database query statement; use the graph database query statement to query a pre-created medical knowledge graph.
  • the sentence output module is configured to: determine the sentence template of the response sentence according to the medical relationship to be queried; fill the sentence template of the response sentence with the medical entity to be queried and the target medical entity to generate the response sentence; output Response sentence.
  • the data processing device further includes a graph creation module configured to use each medical entity and the medical relationship between each medical entity to create a medical knowledge graph.
  • a data processing device including: a memory; and a processor coupled to the memory, the processor being configured to execute the aforementioned data processing method based on instructions stored in the memory .
  • a computer-readable storage medium wherein the computer-readable storage medium stores computer instructions, and the instructions are executed by a processor to implement the aforementioned data processing method.
  • the present disclosure uses the knowledge graph as the answer source of the automatic question and answer, realizes the automatic question and answer of medical knowledge, can more accurately understand the user's question described in natural language, analyze the user's true intention, and return to the user more accurate and professional answers, thereby Provide users with scientific and reasonable medical advice and medical recommendations.
  • Fig. 1 shows a schematic flowchart of a data processing method according to some embodiments of the present disclosure.
  • Fig. 2 shows a schematic diagram of the process of parsing the query sentence input by the user.
  • Figure 3 shows a network model composed of a two-way long and short-term memory network and a conditional random field.
  • Fig. 4 exemplarily shows an application example of the automated medical question answering system.
  • Figure 5 shows a schematic structural diagram of a data processing device according to some embodiments of the present disclosure.
  • Fig. 6 shows a schematic structural diagram of a data processing device according to other embodiments of the present disclosure.
  • the inventor has conducted in-depth analysis and research on various automatic question answering systems.
  • the community-based question answering system is mainly based on the community platform on the Internet.
  • Community platforms often ask users to ask questions, other users give answers to the questions, and other users can choose which answer they agree with by voting or liking. Therefore, the main technology adopted by the community-based question answering system is to match the new question with all existing questions in the community, and select the existing question and answer pair with the highest similarity, which contains a list of possible answers to the new question.
  • the disadvantage of community-based question answering systems is that they cannot judge and verify the accuracy of the data, because the answers provided by users are often not accurate and complete.
  • community-based question and answer systems cannot solve long-tail questions, that is, questions that no one has asked cannot match possible answers.
  • the present disclosure provides a data processing method, which automatically provides users with scientific and reasonable medical advice and medical recommendations based on the medical knowledge graph.
  • Knowledge graph is a technical method that uses graph models to describe the relationship between knowledge and modeling things. It visually displays complex domain knowledge through data mining, information processing, knowledge measurement and graph drawing , Reveals the dynamic development law of the knowledge field, and can provide practical and valuable reference for field research. In essence, the knowledge graph is to systemize and relational industry knowledge, and visually display the knowledge in the form of graphs.
  • Named entity generally refers to an entity with a specific meaning or strong referentiality in the text, which usually includes the name of a person, place name, organization name, date and time, proper nouns, etc.
  • Named entity recognition is to extract named entities from unstructured input text, and can identify more categories of named entities according to business needs, such as product names, models, prices, etc.
  • Entity alignment is also called entity matching, which refers to finding the same entity in the real world for each entity in the knowledge base of heterogeneous data sources.
  • Knowledge fusion that is, merging two knowledge graphs, fusing description information about the same entity or concept from multiple sources.
  • each medical entity and the medical relationship between each medical entity can be used to create a medical knowledge graph.
  • the nodes of the medical knowledge graph are various medical entities
  • the edges of the medical knowledge graph are the medical relationships between various medical entities.
  • Each medical entity includes the medical entity to be queried and the target medical entity.
  • the medical relationship between each medical entity includes The medical relationship to be queried.
  • the medical entity can be specific disease names such as cold, fever, emphysema, etc., can be specific drug names such as Contec, or specific examination names such as B-ultrasound and fluoroscopy, or specific Food name or symptom name.
  • the medical relationship can specifically be the corresponding relationship between the disease and the medicine, the corresponding relationship between the disease and the examination, the corresponding relationship between the disease and the food, the corresponding relationship between the disease and the symptom, and so on.
  • the medical knowledge graph can be used as an accurate knowledge base to provide business knowledge in the medical field. Compared with related technologies, the medical knowledge map can be greatly improved in three aspects. First, the results are more accurate. Due to linguistic phenomena such as polysemous words, multiple words and one meaning, and multiple data sources, the knowledge graph ensures the accuracy and comprehensiveness of its own data through entity alignment and knowledge fusion technology. Second, the data relevance in the knowledge graph is stronger. Due to the characteristics of the graph data structure, the knowledge graph can be easily related from one entity to other entities, and can more deeply understand the information users want to search. Third, the knowledge graph can provide search results that contain complete knowledge and relationships, so users can discover a wealth of unknown information through the knowledge graph.
  • Fig. 1 shows a schematic flowchart of a data processing method according to some embodiments of the present disclosure. As shown in Fig. 1, this embodiment includes steps S101 to S103.
  • step S101 the query sentence input by the user is parsed to obtain the medical entity to be queried and the medical relationship to be queried.
  • the query sentence entered by the user is "What medicine should you take for a cold”
  • the query sentence entered by the user can be parsed through entity alignment, and the medical entity to be queried "Cold” can be obtained, and the medical relationship to be queried can be obtained. Correspondence between diseases and drugs".
  • step S102 use the medical entity to be queried and the medical relationship to be queried to query the pre-created medical knowledge graph to obtain the target medical entity.
  • the medical entity to be queried and the medical relationship to be queried can be used to generate the query statement of the graph database. Then, use the graph database query sentence to query the pre-created medical knowledge graph.
  • step S103 a response sentence is output according to the target medical entity.
  • the sentence template of the response sentence may be determined first according to the medical relationship to be queried. Then fill in the sentence template of the response sentence with the medical entity to be queried and the target medical entity to generate the response sentence, and finally output the response sentence.
  • the latter way of outputting response sentences can provide users with a more complete user experience.
  • the pre-set sentence template "___ ⁇ ___” can be determined based on the "correspondence between diseases and drugs”. Then, fill in the sentence template according to the position correspondence of the medical entity in the sentence template and generate and output the response sentence "Cold cold eat Kangtaike”.
  • the knowledge graph is used as the answer source of the automatic question and answer, which realizes the automatic question and answer of medical knowledge, can more accurately understand the user's question described in natural language, analyze the user's true intention, and return to the user more accurate and professional answers. So as to provide users with scientific and reasonable medical advice and medical recommendations.
  • This embodiment helps to standardize medical question-and-answer behaviors, improve the service efficiency and service quality of medical question-and-answer, and adjust the supply relationship of the medical industry.
  • Fig. 2 shows a schematic diagram of the process of parsing the query sentence input by the user. As shown in FIG. 2, this embodiment includes steps S2011 to S2013.
  • step S2011 the medical entity to be queried included in the query sentence input by the user is identified.
  • a two-way long-term short-term memory network and a conditional random field can be used to identify the medical entity to be queried included in the query sentence input by the user.
  • Long short-term memory network (LSTM for short) is a kind of recurrent neural network, first proposed by Hochreiter and Schmidhuber in 1997. The original design was to solve the long-term dependence problem in recurrent neural network and make remembering long-term information become the default of neural network. Behavior, and the bidirectional long-term short-term memory network (Bi-LSTM for short) can not only learn the above information, but also the following information.
  • Conditional random fields CRF for short
  • CRF Conditional random fields
  • Figure 3 shows a network model composed of a two-way long and short-term memory network and a conditional random field.
  • Adding a CRF layer to Bi-LSTM is to obtain the global optimal output sequence, which is equivalent to reusing Bi-LSTM information. If only the Bi-LSTM network is used, for the labeling sequence of a sentence, the network model will select the sequence with the most occurrences of each word, that is, it will not consider whether the context is reasonable, and with a layer of CRF filtering, the network model will consider the In this context, the label with the greatest probability of a word appears, so that a higher accuracy rate can be obtained.
  • the advantage of Bi-LSTM is that it can learn the dependencies between input sequences in both directions.
  • Bi-LSTM can automatically extract the features of the input sequence according to the entity.
  • the advantage of CRF is that it can model hidden states to learn the characteristics of state sequences. Therefore, adding a layer of CRF after Bi-LSTM can obtain the advantages of both at the same time, thereby making the task of named entity recognition more accurate and comprehensive, with good robustness and field openness, and targeting multiple fields and different entities.
  • the category recognition task has good portability. Using the combination of two-way long and short-term memory network and conditional random field, it can mine the deep semantic information and hidden information that cannot be expressed on the surface of the text, and more closely represent the semantic vector of the text, which greatly improves the performance of named entity recognition.
  • the name of the disease itself may appear in the query sentence, or the common name of the disease may appear. Regardless of whether it is the name of the disease itself or the common name of the disease, the identification result of the medical entity can ultimately correspond to the same medical entity in the medical knowledge graph.
  • step S2012 at least one keyword contained in the query sentence input by the user is identified.
  • the keywords can be collective names of medical entities, such as “disease”, “food”, “check”, “medicine”, “symptom” and so on.
  • the number of keywords may be one or more. For example, in the query sentence “What medicine should you take for a cold", the keyword is "medicine” and the number of keywords is one; while in the query sentence "How much do you take for a cold?", the keyword is "medicine” , “Money”, the number of keywords is two.
  • step S2013 the medical relationship to be queried is determined according to the medical entity to be queried and at least one keyword of the query sentence.
  • the keyword “disease” belonging to the medical entity “cold” to be queried is “disease”
  • the keyword “disease” belonging to the medical entity “cold” to be queried and the keyword “medicine” in the query sentence can be determined
  • the medical relationship to be queried is "correspondence between disease and drug”.
  • the query sentence includes multiple keywords.
  • the medical relationship to be queried includes the initial medical relationship and the association relationship between each keyword. Therefore, the initial medical relationship to be queried can be determined according to the medical entity to be queried and the first keyword of the query sentence; the association relationship between the keywords can be determined according to the keywords of the query sentence.
  • the keywords are “medicine” and “money”, and the number of keywords is two. Then, according to the medical entity to be queried “cold” and the first keyword “medicine”, it can be determined that the initial medical relationship to be queried is "correspondence between disease and drug”. According to the first keyword "medicine” and the first keyword For the two keywords "money”, the correlation between the keywords can be determined as the "correspondence between drugs and amount”.
  • Fig. 4 exemplarily shows an application example of the automated medical question answering system.
  • the automated medical question and answer system is mainly composed of three modules: semantic analysis (including medical named entity recognition and medical relationship recognition), query sentence construction and conversion, and answer reply ruler.
  • the semantic analysis module includes two parts: medical named entity recognition and medical relationship recognition.
  • Medical named entity recognition aims to extract the medical characteristic vocabulary contained in the user's query sentence, and can extract more than 20 kinds of diseases, symptoms, drugs, examinations, etc.
  • Type of medical entity That is, by extracting the medical characteristic vocabulary contained in the questioning sentence, the medical entity to be queried contained in the questioning sentence is identified.
  • Medical relationship recognition uses the relationship recognition ruler to understand the semantics of query sentences, match the relationship types in the medical knowledge graph, and obtain the medical relationship corresponding to the medical knowledge graph, that is, the medical relationship to be queried.
  • medical named entity recognition and medical relationship recognition the medical entity and medical relationship information of the user's question is obtained, and the entity and relationship information is sent to the query sentence construction and transformation module.
  • the query sentence construction and transformation module based on the medical entity and medical relationship of the user query sentence obtained by medical entity recognition and medical relationship recognition, combines the medical entity and medical relationship to form the logical expression of the user question, and then express the logic
  • the formula is converted into a graph database query statement corresponding to the user question based on the medical knowledge graph, such as a Cypher graph database query statement.
  • the established medical knowledge graph is queried through the query sentence of the graph database (specifically including one-degree or multiple-degree query, etc.).
  • the medical knowledge graph gives the result of the query, that is, the corresponding medical knowledge, which is sent to the answer reply ruler module.
  • the answer reply ruler module receives the results of the medical knowledge graph query, and combines medical knowledge, answer templates and natural language that conforms to the user's expression habits to obtain the final answer and reply to the user. That is, according to the results of the query from the medical knowledge graph, combined with the medical knowledge, the sentence template of the response sentence, and the natural language that conforms to the user's expression habits, the response sentence is generated and output to the user.
  • Figure 5 shows a schematic structural diagram of a data processing device according to some embodiments of the present disclosure.
  • the data processing device 50 in this embodiment includes: a sentence parsing module 501 configured to parse the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried; a knowledge graph
  • the query module 502 is configured to use the medical entity to be queried and the medical relationship to be queried to query the pre-created medical knowledge graph to obtain the target medical entity.
  • the nodes of the medical knowledge graph are each medical entity, and the edges of the medical knowledge graph are The medical relationship between various medical entities, each medical entity includes the medical entity to be queried and the target medical entity, and the medical relationship between each medical entity includes the medical relationship to be queried; the sentence output module 503 is configured to be based on the target medical entity , Output the response sentence.
  • the sentence parsing module 501 is configured to: identify the medical entity to be queried included in the query sentence input by the user; recognize at least one keyword contained in the query sentence entered by the user; according to the medical entity to be queried and At least one keyword of the query sentence determines the medical relationship to be queried.
  • the query sentence includes multiple keywords, and the medical relationship to be queried includes the initial medical relationship and the association relationship between each keyword; the sentence parsing module 501 is configured to: according to the medical entity to be queried and the query The first keyword of the query sentence determines the initial medical relationship to be queried; according to each keyword of the query sentence, the association relationship between each keyword is determined.
  • the sentence parsing module 501 is configured to use a two-way long and short-term memory network and a conditional random field to identify the medical entity to be queried included in the query sentence input by the user.
  • the knowledge graph query module 502 is configured to: use the medical entity to be queried and the medical relationship to be queried to generate a graph database query statement; use the graph database query statement to query a pre-created medical knowledge graph.
  • the sentence output module 503 is configured to: determine the sentence template of the response sentence according to the medical relationship to be queried; fill the sentence template of the response sentence with the medical entity to be queried and the target medical entity to generate the response sentence; The response sentence is output.
  • the data processing device 50 further includes a graph creating module 500 configured to use each medical entity and the medical relationship between each medical entity to create a medical knowledge graph.
  • the knowledge graph is used as the answer source of the automatic question and answer, which realizes the automatic question and answer of medical knowledge, can more accurately understand the user's question described in natural language, analyze the user's true intention, and return to the user more accurate and professional answers. So as to provide users with scientific and reasonable medical advice and medical recommendations.
  • This embodiment helps to standardize medical question-and-answer behaviors, improve the service efficiency and service quality of medical question-and-answer, and adjust the supply relationship of the medical industry.
  • Fig. 6 shows a schematic structural diagram of a data processing device according to other embodiments of the present disclosure.
  • the data processing device 60 of this embodiment includes a memory 610 and a processor 620 coupled to the memory 610.
  • the processor 620 is configured to execute any of the foregoing implementations based on instructions stored in the memory 610.
  • the data processing method in the example is described in detail below.
  • the memory 610 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), and other programs.
  • the data processing device 60 may also include an input and output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630, 640, 650, and the memory 610 and the processor 620 may be connected via a bus 660, for example.
  • the input and output interface 630 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
  • the network interface 640 provides a connection interface for various networked devices.
  • the storage interface 650 provides a connection interface for external storage devices such as SD cards and U disks.
  • the present disclosure also includes a computer-readable storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the data processing method in any of the foregoing embodiments is implemented.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the field of information technology. Provided are a data processing method and device, and a computer readable storage medium. The data processing method comprises: performing analysis on a query sentence input by a user, and obtaining a medical entity to be searched and a medical relationship to be searched; querying a pre-established medical knowledge graph using the medical entity to be searched and the medical relationship to be searched, and obtaining a target medical entity, wherein nodes of the medical knowledge graph are respective medical entities, sides of the medical knowledge graph are medical relationships between the respective medical entities, the respective medical entities include the medical entity to be searched and the target medical entity, and the medical relationships between the respective medical entities include the medical relationship to be searched; and outputting an answer sentence according to the target medical entity. The method of the present invention enables automatic answering for medical knowledge, thereby automatically providing a user with a scientifically reasonable medical suggestion and medical recommendation.

Description

数据处理方法、装置及计算机可读存储介质Data processing method, device and computer readable storage medium
相关申请的交叉引用Cross-references to related applications
本申请是以CN申请号为202010311319.1,申请日为2020年4月20日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。This application is based on the application with the CN application number 202010311319.1 and the filing date of April 20, 2020, and claims its priority. The disclosure of the CN application is hereby incorporated into this application as a whole.
技术领域Technical field
本公开涉及信息技术领域,特别涉及一种数据处理方法、装置及计算机可读存储介质。The present disclosure relates to the field of information technology, and in particular to a data processing method, device, and computer-readable storage medium.
背景技术Background technique
问答系统通常集自然语言处理、信息检索及数据库等不同研究领域于一体。用户输入要提问的问题后,问答系统分析处理用户输入的问题,执行信息检索或数据库查询等各种操作,为用户返回所需要的答案。问答系统包括基于搜索引擎的自动问答系统、基于社区的问答系统。Question answering systems usually integrate different research fields such as natural language processing, information retrieval, and databases. After the user enters the question to be asked, the question and answer system analyzes and processes the question entered by the user, performs various operations such as information retrieval or database query, and returns the answer that the user needs. Question answering systems include automatic question answering systems based on search engines and community-based question answering systems.
发明内容Summary of the invention
本公开解决的一个技术问题是,如何自动化的为用户提供科学合理的医疗建议及医疗推荐。A technical problem solved by the present disclosure is how to automatically provide users with scientific and reasonable medical advice and medical recommendations.
根据本公开实施例的一个方面,提供了一种计算机执行的数据处理方法,包括:对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系;利用待查询的医疗实体及待查询的医疗关系查询预先创建的医疗知识图谱,获得目标医疗实体,其中,医疗知识图谱的节点为各个医疗实体,医疗知识图谱的边为各个医疗实体之间的医疗关系,各个医疗实体包括待查询的医疗实体及目标医疗实体,各个医疗实体之间的医疗关系包括待查询的医疗关系;根据目标医疗实体,输出应答语句。According to one aspect of the embodiments of the present disclosure, a computer-executed data processing method is provided, including: parsing an inquiry sentence input by a user to obtain a medical entity to be queried and a medical relationship to be queried; using the medical care to be queried Entity and medical relationship to be queried query the pre-created medical knowledge graph to obtain the target medical entity. Among them, the nodes of the medical knowledge graph are each medical entity, and the edges of the medical knowledge graph are the medical relationship between each medical entity, and each medical entity Including the medical entity to be queried and the target medical entity. The medical relationship between each medical entity includes the medical relationship to be queried; according to the target medical entity, a response sentence is output.
在一些实施例中,对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系包括:识别用户输入的问询语句包含的待查询的医疗实体;识别用户输入的问询语句包含的至少一个关键词;根据待查询的医疗实体及问询语句的至少一个关键词,确定待查询的医疗关系。In some embodiments, parsing the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried includes: identifying the medical entity to be queried included in the query sentence input by the user; and identifying the query input by the user At least one keyword contained in the query sentence; the medical relationship to be queried is determined according to the medical entity to be queried and at least one keyword of the query sentence.
在一些实施例中,问询语句包含多个关键词,待查询的医疗关系包括初始医疗 关系及各个关键词之间的关联关系,根据待查询的医疗实体及问询语句的至少一个关键词,确定待查询的医疗关系包括:根据待查询的医疗实体及问询语句的第一个关键词,确定待查询的初始医疗关系;根据问询语句的各个关键词,确定各个关键词之间的关联关系。In some embodiments, the query sentence includes multiple keywords, and the medical relationship to be queried includes the initial medical relationship and the association relationship between each keyword. According to the medical entity to be queried and at least one keyword of the query sentence, Determining the medical relationship to be queried includes: determining the initial medical relationship to be queried according to the medical entity to be queried and the first keyword of the query sentence; determining the association between each keyword according to the keywords of the query sentence relation.
在一些实施例中,识别用户输入的问询语句包含的待查询的医疗实体包括:采用双向长短期记忆网络及条件随机场,识别用户输入的问询语句包含的待查询的医疗实体。In some embodiments, identifying the medical entity to be queried included in the query sentence input by the user includes: using a two-way long and short-term memory network and a conditional random field to recognize the medical entity to be queried included in the query sentence input by the user.
在一些实施例中,所述对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系包括:通过提取问询语句中包含的医疗特征词汇,识别问询语句包含的待查询的医疗实体。In some embodiments, the parsing of the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried includes: extracting the medical characteristic vocabulary contained in the query sentence to identify what the query sentence contains The medical entity to be queried.
在一些实施例中,所述对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系包括:通过关系识别规则器来理解问询语句所表达的语义,匹配医疗知识图谱中的关系类型,从而获得与医疗知识图谱对应的医疗关系。In some embodiments, parsing the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried includes: understanding the semantics expressed by the query sentence through a relationship recognition ruler, and matching medical knowledge The relationship type in the graph, so as to obtain the medical relationship corresponding to the medical knowledge graph.
在一些实施例中,利用待查询的医疗实体及待查询的医疗关系查询预先创建的医疗知识图谱包括:利用待查询的医疗实体及待查询的医疗关系,生成图数据库查询语句;利用图数据库查询语句查询预先创建的医疗知识图谱。In some embodiments, using the medical entity to be queried and the medical relationship to be queried to query the pre-created medical knowledge graph includes: using the medical entity to be queried and the medical relationship to be queried to generate a graph database query statement; using the graph database to query Sentence queries the pre-created medical knowledge graph.
在一些实施例中,所述利用待查询的医疗实体及待查询的医疗关系,生成图数据库查询语句包括:将待查询的医疗实体及待查询的医疗关系结合起来,形成问询语句的逻辑表达式;将逻辑表达式转换为基于医疗知识图谱的与问询语句相对应的图数据库查询语句。In some embodiments, using the medical entity to be queried and the medical relationship to be queried to generate a graph database query sentence includes: combining the medical entity to be queried and the medical relationship to be queried to form a logical expression of the query sentence Formula; convert the logical expression into a graph database query statement corresponding to the query statement based on the medical knowledge graph.
在一些实施例中,所述图数据库查询语句包括一度查询语句或多度查询语句。In some embodiments, the query statement of the graph database includes a one-degree query statement or a multiple-degree query statement.
在一些实施例中,所述图数据库查询语句包括Cypher图数据库查询语句。In some embodiments, the graph database query statement includes a Cypher graph database query statement.
在一些实施例中,根据目标医疗实体,输出应答语句包括:根据待查询的医疗关系,确定应答语句的语句模板;将待查询的医疗实体、目标医疗实体填入应答语句的语句模板,生成应答语句;输出应答语句。In some embodiments, outputting the response sentence according to the target medical entity includes: determining the sentence template of the response sentence according to the medical relationship to be queried; filling the sentence template of the response sentence with the medical entity to be queried and the target medical entity to generate the response Sentence; output the response sentence.
在一些实施例中,所述将待查询的医疗实体、目标医疗实体填入应答语句的语句模板,生成应答语句包括:根据从医疗知识图谱查询到的结果,结合医疗知识、应答语句的语句模板及符合用户表达习惯的自然语言,生成应答语句。In some embodiments, filling the medical entity to be queried and the target medical entity into the sentence template of the response sentence, and generating the response sentence includes: according to the query result from the medical knowledge graph, combining the sentence template of the medical knowledge and the response sentence And natural language that conforms to the user’s expression habits to generate response sentences.
在一些实施例中,数据处理方法还包括:利用各个医疗实体及各个医疗实体之间的医疗关系,创建医疗知识图谱。In some embodiments, the data processing method further includes: using each medical entity and the medical relationship between each medical entity to create a medical knowledge graph.
根据本公开实施例的另一个方面,提供了一种数据处理装置,包括:语句解析模块,被配置为对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系;知识图谱查询模块,被配置为利用待查询的医疗实体及待查询的医疗关系查询预先创建的医疗知识图谱,获得目标医疗实体,其中,医疗知识图谱的节点为各个医疗实体,医疗知识图谱的边为各个医疗实体之间的医疗关系,各个医疗实体包括待查询的医疗实体及目标医疗实体,各个医疗实体之间的医疗关系包括待查询的医疗关系;语句输出模块,被配置为根据目标医疗实体,输出应答语句。According to another aspect of the embodiments of the present disclosure, there is provided a data processing device, including: a sentence parsing module configured to parse a query sentence input by a user to obtain a medical entity to be queried and a medical relationship to be queried; The knowledge graph query module is configured to use the medical entity to be queried and the medical relationship to be queried to query the pre-created medical knowledge graph to obtain the target medical entity. Among them, the nodes of the medical knowledge graph are each medical entity, and the edges of the medical knowledge graph For the medical relationship between various medical entities, each medical entity includes the medical entity to be queried and the target medical entity, and the medical relationship between each medical entity includes the medical relationship to be queried; the sentence output module is configured to be based on the target medical entity , Output the response sentence.
在一些实施例中,语句解析模块被配置为:识别用户输入的问询语句包含的待查询的医疗实体;识别用户输入的问询语句包含的至少一个关键词;根据待查询的医疗实体及问询语句的至少一个关键词,确定待查询的医疗关系。In some embodiments, the sentence analysis module is configured to: identify the medical entity to be queried included in the query sentence entered by the user; identify at least one keyword included in the query sentence entered by the user; and according to the medical entity to be queried and the query At least one keyword of the query sentence determines the medical relationship to be queried.
在一些实施例中,问询语句包含多个关键词,待查询的医疗关系包括初始医疗关系及各个关键词之间的关联关系;语句解析模块被配置为:根据待查询的医疗实体及问询语句的第一个关键词,确定待查询的初始医疗关系;根据问询语句的各个关键词,确定各个关键词之间的关联关系。In some embodiments, the query sentence includes multiple keywords, and the medical relationship to be queried includes the initial medical relationship and the association relationship between each keyword; the sentence parsing module is configured to: according to the medical entity to be queried and the query The first keyword of the sentence determines the initial medical relationship to be queried; according to each keyword of the query sentence, the association relationship between each keyword is determined.
在一些实施例中,语句解析模块被配置为:采用双向长短期记忆网络及条件随机场,识别用户输入的问询语句包含的待查询的医疗实体。In some embodiments, the sentence parsing module is configured to use a two-way long and short-term memory network and a conditional random field to identify the medical entity to be queried included in the query sentence input by the user.
在一些实施例中,知识图谱查询模块被配置为:利用待查询的医疗实体及待查询的医疗关系,生成图数据库查询语句;利用图数据库查询语句查询预先创建的医疗知识图谱。In some embodiments, the knowledge graph query module is configured to: use the medical entity to be queried and the medical relationship to be queried to generate a graph database query statement; use the graph database query statement to query a pre-created medical knowledge graph.
在一些实施例中,语句输出模块被配置为:根据待查询的医疗关系,确定应答语句的语句模板;将待查询的医疗实体、目标医疗实体填入应答语句的语句模板,生成应答语句;输出应答语句。In some embodiments, the sentence output module is configured to: determine the sentence template of the response sentence according to the medical relationship to be queried; fill the sentence template of the response sentence with the medical entity to be queried and the target medical entity to generate the response sentence; output Response sentence.
在一些实施例中,数据处理装置还包括图谱创建模块,被配置为:利用各个医疗实体及各个医疗实体之间的医疗关系,创建医疗知识图谱。In some embodiments, the data processing device further includes a graph creation module configured to use each medical entity and the medical relationship between each medical entity to create a medical knowledge graph.
根据本公开实施例的又一个方面,提供了一种数据处理装置,包括:存储器;以及耦接至存储器的处理器,处理器被配置为基于存储在存储器中的指令,执行前述的数据处理方法。According to another aspect of the embodiments of the present disclosure, there is provided a data processing device, including: a memory; and a processor coupled to the memory, the processor being configured to execute the aforementioned data processing method based on instructions stored in the memory .
根据本公开实施例的再一个方面,提供了一种计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现前述的数据处理方法。According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the instructions are executed by a processor to implement the aforementioned data processing method.
本公开以知识图谱作为自动问答的答案来源,实现了医疗知识的自动问答,能 够更加准确地理解以自然语言描述的用户问题,解析用户的真实意图,返回给用户更加准确、专业的答案,从而为用户提供科学合理的医疗建议及医疗推荐。The present disclosure uses the knowledge graph as the answer source of the automatic question and answer, realizes the automatic question and answer of medical knowledge, can more accurately understand the user's question described in natural language, analyze the user's true intention, and return to the user more accurate and professional answers, thereby Provide users with scientific and reasonable medical advice and medical recommendations.
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。Through the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings, other features and advantages of the present disclosure will become clear.
附图说明Description of the drawings
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.
图1示出了本公开一些实施例的数据处理方法的流程示意图。Fig. 1 shows a schematic flowchart of a data processing method according to some embodiments of the present disclosure.
图2示出了对用户输入的问询语句进行解析的流程示意图。Fig. 2 shows a schematic diagram of the process of parsing the query sentence input by the user.
图3使出了双向长短期记忆网络及条件随机场构成的网络模型。Figure 3 shows a network model composed of a two-way long and short-term memory network and a conditional random field.
图4示例性示出了自动化医疗问答系统的一个应用示例。Fig. 4 exemplarily shows an application example of the automated medical question answering system.
图5示出了本公开一些实施例的数据处理装置的结构示意图。Figure 5 shows a schematic structural diagram of a data processing device according to some embodiments of the present disclosure.
图6示出了本公开另一些实施例的数据处理装置的结构示意图。Fig. 6 shows a schematic structural diagram of a data processing device according to other embodiments of the present disclosure.
具体实施方式Detailed ways
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments. The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any limitation to the present disclosure and its application or use. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.
发明人对各类自动问答系统进行了深入的分析和研究。The inventor has conducted in-depth analysis and research on various automatic question answering systems.
在基于搜索引擎的问答系统中,用户可以输入关键词或句子,搜索引擎将返回排好序的相关文档集。现如今,这种问答系统已经很难满足用户的需求,基于搜索引擎的自动问答系统主要有两大缺点:一方面难以保证向用户反馈文档的数据质量,因为搜索引擎本身不能控制数据来源和数据本身,搜索引擎无法保证数据的准确性和全面性;另一方面,用户提出问题是希望得到一个精准且直接的答案,但搜索引擎给出的一系列文档,需要用户仔细研读才能找到需要的答案。In a question and answer system based on a search engine, users can enter keywords or sentences, and the search engine will return a sorted collection of related documents. Nowadays, this kind of question answering system has been difficult to meet the needs of users. Automatic question answering systems based on search engines have two major shortcomings: on the one hand, it is difficult to guarantee the data quality of the documents feedback to users, because the search engine itself cannot control the data source and data. By itself, search engines cannot guarantee the accuracy and comprehensiveness of the data; on the other hand, users ask questions in order to get an accurate and direct answer, but a series of documents given by search engines require users to read carefully to find the answers they need. .
基于社区的问答系统主要基于互联网上的社区平台。社区平台往往是由用户提出问题,其他用户给出问题的答案,其他用户可以通过投票或点赞等方式来选择赞成哪个答案。因此,基于社区的问答系统所采用的主要技术是把新问题和社区中已有的所有问题进行相似性匹配,选择出相似性最高的已有问答对,其中包含了新问题可能的答案列表。基于社区的问答系统的缺陷在于无法判断和验证数据的准确性,因为用户提供的答案往往是不够准确和完善的。同时,基于社区的问答系统还无法解决长尾问题,即没有人提问过的问题无法匹配到可能的答案。The community-based question answering system is mainly based on the community platform on the Internet. Community platforms often ask users to ask questions, other users give answers to the questions, and other users can choose which answer they agree with by voting or liking. Therefore, the main technology adopted by the community-based question answering system is to match the new question with all existing questions in the community, and select the existing question and answer pair with the highest similarity, which contains a list of possible answers to the new question. The disadvantage of community-based question answering systems is that they cannot judge and verify the accuracy of the data, because the answers provided by users are often not accurate and complete. At the same time, community-based question and answer systems cannot solve long-tail questions, that is, questions that no one has asked cannot match possible answers.
为实现更加智能化的医疗问答系统,本公开提供了一种数据处理方法,基于医疗知识图谱,自动化的为用户提供科学合理的医疗建议及医疗推荐。In order to realize a more intelligent medical question-and-answer system, the present disclosure provides a data processing method, which automatically provides users with scientific and reasonable medical advice and medical recommendations based on the medical knowledge graph.
首先介绍知识图谱的相关概念。First introduce the related concepts of the knowledge graph.
知识图谱:知识图谱是一种用图模型来描述知识和建模事物之间的关联关系的技术方法,它通过数据挖掘、信息处理、知识计量和图形绘制,把复杂的领域知识可视化地显示出来,揭示了知识领域的动态发展规律,能够为领域研究提供切实的、有价值的参考。在本质上,知识图谱就是将行业知识系统化、关系化,并通过图的形式把知识可视化地展示出来。Knowledge graph: Knowledge graph is a technical method that uses graph models to describe the relationship between knowledge and modeling things. It visually displays complex domain knowledge through data mining, information processing, knowledge measurement and graph drawing , Reveals the dynamic development law of the knowledge field, and can provide practical and valuable reference for field research. In essence, the knowledge graph is to systemize and relational industry knowledge, and visually display the knowledge in the form of graphs.
命名实体:命名实体一般指的是文本中具有特定意义或者指代性强的实体,通常包括人名、地名、组织机构名、日期时间、专有名词等。Named entity: A named entity generally refers to an entity with a specific meaning or strong referentiality in the text, which usually includes the name of a person, place name, organization name, date and time, proper nouns, etc.
命名实体识别:命名实体识别就是从非结构化的输入文本中抽取出命名实体,并且可以按照业务需求识别出更多类别的命名实体,比如产品名称、型号、价格等。Named entity recognition: Named entity recognition is to extract named entities from unstructured input text, and can identify more categories of named entities according to business needs, such as product names, models, prices, etc.
实体对齐:实体对齐也被称作实体匹配,是指对于异构数据源知识库中的各个实体,找出属于现实世界中的同一实体。Entity alignment: Entity alignment is also called entity matching, which refers to finding the same entity in the real world for each entity in the knowledge base of heterogeneous data sources.
知识融合:即合并两个知识图谱,将来自多个来源的关于同一个实体或概念的描述信息融合起来。Knowledge fusion: that is, merging two knowledge graphs, fusing description information about the same entity or concept from multiple sources.
在利用本公开提供的数据处理方法实现医疗知识的自动问答之前,可以利用各个医疗实体及各个医疗实体之间的医疗关系,创建医疗知识图谱。其中,医疗知识图谱的节点为各个医疗实体,医疗知识图谱的边为各个医疗实体之间的医疗关系,各个医疗实体包括待查询的医疗实体及目标医疗实体,各个医疗实体之间的医疗关系包括待查询的医疗关系。Before using the data processing method provided by the present disclosure to realize automatic question and answer of medical knowledge, each medical entity and the medical relationship between each medical entity can be used to create a medical knowledge graph. Among them, the nodes of the medical knowledge graph are various medical entities, and the edges of the medical knowledge graph are the medical relationships between various medical entities. Each medical entity includes the medical entity to be queried and the target medical entity. The medical relationship between each medical entity includes The medical relationship to be queried.
例如,医疗实体具体可以为诸如感冒、发烧、肺气肿等具体的疾病名称,可以为诸如康泰克等具体的药品名称,也可以为B超、透视等具体的检查名称,还可以为具 体的食物名称或症状名称。医疗关系具体可以为疾病与药品之间的对应关系、疾病与检查之间的对应关系、疾病与食物之间的对应关系、疾病与症状之间的对应关系,等等。For example, the medical entity can be specific disease names such as cold, fever, emphysema, etc., can be specific drug names such as Contec, or specific examination names such as B-ultrasound and fluoroscopy, or specific Food name or symptom name. The medical relationship can specifically be the corresponding relationship between the disease and the medicine, the corresponding relationship between the disease and the examination, the corresponding relationship between the disease and the food, the corresponding relationship between the disease and the symptom, and so on.
医疗知识图谱能够作为准确的知识库来提供医疗领域中的业务知识。与相关技术相比,医疗知识图谱能在三个方面有大幅度的提高。第一,结果准确性更高,由于一词多义、多词一义等语言现象以及多数据来源问题,知识图谱通过实体对齐和知识融合技术保证自身数据的准确性和全面性。第二,知识图谱中数据关联性更强,由于图数据结构的特性,知识图谱能够很轻松地从某实体关联到其他实体,能够更加深入地理解用户想要搜索的信息。第三,知识图谱能够给出包含完整的知识和关系的搜索结果,所以用户可以通过知识图谱发现丰富的未知信息。The medical knowledge graph can be used as an accurate knowledge base to provide business knowledge in the medical field. Compared with related technologies, the medical knowledge map can be greatly improved in three aspects. First, the results are more accurate. Due to linguistic phenomena such as polysemous words, multiple words and one meaning, and multiple data sources, the knowledge graph ensures the accuracy and comprehensiveness of its own data through entity alignment and knowledge fusion technology. Second, the data relevance in the knowledge graph is stronger. Due to the characteristics of the graph data structure, the knowledge graph can be easily related from one entity to other entities, and can more deeply understand the information users want to search. Third, the knowledge graph can provide search results that contain complete knowledge and relationships, so users can discover a wealth of unknown information through the knowledge graph.
下面结合图1描述本公开数据处理方法的一些实施例。The following describes some embodiments of the data processing method of the present disclosure with reference to FIG. 1.
图1示出了本公开一些实施例的数据处理方法的流程示意图。如图1所示,本实施例包括步骤S101~步骤S103。Fig. 1 shows a schematic flowchart of a data processing method according to some embodiments of the present disclosure. As shown in Fig. 1, this embodiment includes steps S101 to S103.
在步骤S101中,对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系。In step S101, the query sentence input by the user is parsed to obtain the medical entity to be queried and the medical relationship to be queried.
例如,用户输入的问询语句为“感冒吃啥药”,则可以通过实体对齐对用户输入的问询语句进行解析,可以获得待查询的医疗实体“感冒”,并获得待查询的医疗关系“疾病与药物的对应关系”。For example, if the query sentence entered by the user is "What medicine should you take for a cold", the query sentence entered by the user can be parsed through entity alignment, and the medical entity to be queried "Cold" can be obtained, and the medical relationship to be queried can be obtained. Correspondence between diseases and drugs".
在步骤S102中,利用待查询的医疗实体及待查询的医疗关系查询预先创建的医疗知识图谱,获得目标医疗实体。In step S102, use the medical entity to be queried and the medical relationship to be queried to query the pre-created medical knowledge graph to obtain the target medical entity.
查询医疗知识图谱时,可以利用待查询的医疗实体及待查询的医疗关系,生成图数据库查询语句。然后,利用图数据库查询语句查询预先创建的医疗知识图谱。When querying the medical knowledge graph, the medical entity to be queried and the medical relationship to be queried can be used to generate the query statement of the graph database. Then, use the graph database query sentence to query the pre-created medical knowledge graph.
在步骤S103中,根据目标医疗实体,输出应答语句。In step S103, a response sentence is output according to the target medical entity.
输出应答语句时,比较简单的方式是直接输出目标医疗实体,例如“康泰克”。在一些实施例中,可以先根据待查询的医疗关系,确定应答语句的语句模板。然后将待查询的医疗实体、目标医疗实体填入应答语句的语句模板生成应答语句,最终输出应答语句。后一种输出应答语句的方式能够给用户提供更加完善的用户体验。When outputting the response sentence, the simpler way is to directly output the target medical entity, such as "Contec". In some embodiments, the sentence template of the response sentence may be determined first according to the medical relationship to be queried. Then fill in the sentence template of the response sentence with the medical entity to be queried and the target medical entity to generate the response sentence, and finally output the response sentence. The latter way of outputting response sentences can provide users with a more complete user experience.
例如,可以根据“疾病与药物的对应关系”确定预先设定的语句模板“___吃___”。然后,将查询的医疗实体“感冒”和目标医疗实体“康泰克”按照医疗实体在语句模板中的位置对应关系填入语句模板,生成并输出应答语句“感冒吃康泰克”。For example, the pre-set sentence template "___吃___" can be determined based on the "correspondence between diseases and drugs". Then, fill in the sentence template according to the position correspondence of the medical entity in the sentence template and generate and output the response sentence "Cold cold eat Kangtaike".
本实施例以知识图谱作为自动问答的答案来源,实现了医疗知识的自动问答,能够更加准确地理解以自然语言描述的用户问题,解析用户的真实意图,返回给用户更加准确、专业的答案,从而为用户提供科学合理的医疗建议及医疗推荐。本实施例有助于规范医疗问答行为,提高医疗只是精准化问答的服务效率和服务质量,调整医疗行业供给关系。In this embodiment, the knowledge graph is used as the answer source of the automatic question and answer, which realizes the automatic question and answer of medical knowledge, can more accurately understand the user's question described in natural language, analyze the user's true intention, and return to the user more accurate and professional answers. So as to provide users with scientific and reasonable medical advice and medical recommendations. This embodiment helps to standardize medical question-and-answer behaviors, improve the service efficiency and service quality of medical question-and-answer, and adjust the supply relationship of the medical industry.
下面结合图2描述如何对用户输入的问询语句进行解析。The following describes how to parse the query sentence input by the user in conjunction with Figure 2.
图2示出了对用户输入的问询语句进行解析的流程示意图。如图2所示,本实施例包括步骤S2011~步骤S2013。Fig. 2 shows a schematic diagram of the process of parsing the query sentence input by the user. As shown in FIG. 2, this embodiment includes steps S2011 to S2013.
在步骤S2011中,识别用户输入的问询语句包含的待查询的医疗实体。In step S2011, the medical entity to be queried included in the query sentence input by the user is identified.
在一些实施例中,可以采用双向长短期记忆网络及条件随机场,识别用户输入的问询语句包含的待查询的医疗实体。长短期记忆网络(简称LSTM)是一种循环神经网络,最早由Hochreiter和Schmidhuber在1997年提出,设计初衷是希望能够解决循环神经网络中的长期依赖问题,让记住长期信息成为神经网络的默认行为,而双向长短期记忆网络(简称Bi-LSTM)不仅能学到上文信息,也能学到下文信息。条件随机场(简称CRF)可以用于构造在给定一组输入随机变量的条件下,另一组输出随机变量的条件概率分布模型。In some embodiments, a two-way long-term short-term memory network and a conditional random field can be used to identify the medical entity to be queried included in the query sentence input by the user. Long short-term memory network (LSTM for short) is a kind of recurrent neural network, first proposed by Hochreiter and Schmidhuber in 1997. The original design was to solve the long-term dependence problem in recurrent neural network and make remembering long-term information become the default of neural network. Behavior, and the bidirectional long-term short-term memory network (Bi-LSTM for short) can not only learn the above information, but also the following information. Conditional random fields (CRF for short) can be used to construct a conditional probability distribution model for a given set of input random variables and another set of output random variables.
图3示出了双向长短期记忆网络及条件随机场构成的网络模型。在Bi-LSTM上增加一个CRF层,目的是为获取全局最优的输出序列,相当于对Bi-LSTM信息的再利用。如果只采用Bi-LSTM网络,对于一个句子的标注序列,网络模型会选择出现每个词次数最多的序列,即不会考虑上下文是否合理,而加上一层CRF过滤,网络模型会考虑到在该上下文中,一个词最大概率出现的那个标注,这样能够得到更高的准确率。Bi-LSTM的优点是能够双向学习输入序列之间的依赖。在训练过程中,Bi-LSTM能够根据实体自动提取输入序列的特征。CRF的优点是能对隐含状态建模,从而学习状态序列的特点。所以在Bi-LSTM的后面再加一层CRF,能够同时获得两者的优点,从而使得命名实体识别任务效果更加准确和全面,具有良好的健壮性和领域开放性,针对多个领域、不同实体类别的识别任务时具有良好的可移植性。采用双向长短期记忆网络及条件随机场相结合,能够挖掘文本表层无法表示的深层语义信息和隐含信息,更贴近地对文本进行语义向量表示,大幅度提高命名实体识别的性能。Figure 3 shows a network model composed of a two-way long and short-term memory network and a conditional random field. Adding a CRF layer to Bi-LSTM is to obtain the global optimal output sequence, which is equivalent to reusing Bi-LSTM information. If only the Bi-LSTM network is used, for the labeling sequence of a sentence, the network model will select the sequence with the most occurrences of each word, that is, it will not consider whether the context is reasonable, and with a layer of CRF filtering, the network model will consider the In this context, the label with the greatest probability of a word appears, so that a higher accuracy rate can be obtained. The advantage of Bi-LSTM is that it can learn the dependencies between input sequences in both directions. In the training process, Bi-LSTM can automatically extract the features of the input sequence according to the entity. The advantage of CRF is that it can model hidden states to learn the characteristics of state sequences. Therefore, adding a layer of CRF after Bi-LSTM can obtain the advantages of both at the same time, thereby making the task of named entity recognition more accurate and comprehensive, with good robustness and field openness, and targeting multiple fields and different entities. The category recognition task has good portability. Using the combination of two-way long and short-term memory network and conditional random field, it can mine the deep semantic information and hidden information that cannot be expressed on the surface of the text, and more closely represent the semantic vector of the text, which greatly improves the performance of named entity recognition.
在医疗实体识别过程中,问询语句中可能出现疾病名称本身,也可能出现疾病俗称。而无论出现的是疾病名称本身还是疾病俗称,医疗实体的识别结果最终都能够 与医疗知识图谱中相同的医疗实体相对应。In the process of medical entity identification, the name of the disease itself may appear in the query sentence, or the common name of the disease may appear. Regardless of whether it is the name of the disease itself or the common name of the disease, the identification result of the medical entity can ultimately correspond to the same medical entity in the medical knowledge graph.
在步骤S2012中,识别用户输入的问询语句包含的至少一个关键词。In step S2012, at least one keyword contained in the query sentence input by the user is identified.
关键词可以为医疗实体的类别统称,例如“疾病”、“食物”、“检查”、“药”、“症状”等等。本领域技术人员应理解,关键词的个数可以为一个或更多个。例如,在问询语句“感冒吃啥药”中,关键词是“药”,关键词的个数为一个;而在在问询语句“感冒吃药多少钱”中,关键词是“药”、“钱”,关键词的个数为两个。The keywords can be collective names of medical entities, such as "disease", "food", "check", "medicine", "symptom" and so on. Those skilled in the art should understand that the number of keywords may be one or more. For example, in the query sentence "What medicine should you take for a cold", the keyword is "medicine" and the number of keywords is one; while in the query sentence "How much do you take for a cold?", the keyword is "medicine" , "Money", the number of keywords is two.
在步骤S2013中,根据待查询的医疗实体及问询语句的至少一个关键词,确定待查询的医疗关系。In step S2013, the medical relationship to be queried is determined according to the medical entity to be queried and at least one keyword of the query sentence.
例如,待查询的医疗实体“感冒”所属的关键词是“疾病”,而根据待查询的医疗实体“感冒”所属的关键词“疾病”与问询语句中的关键词“药”,可以确定待查询的医疗关系为“疾病与药物的对应关系”。For example, the keyword "disease" belonging to the medical entity "cold" to be queried is "disease", and the keyword "disease" belonging to the medical entity "cold" to be queried and the keyword "medicine" in the query sentence can be determined The medical relationship to be queried is "correspondence between disease and drug".
在一些实施例中,问询语句包含多个关键词。在这样的情况下,待查询的医疗关系包括初始医疗关系及各个关键词之间的关联关系。因此,可以根据待查询的医疗实体及问询语句的第一个关键词,确定待查询的初始医疗关系;根据问询语句的各个关键词,确定各个关键词之间的关联关系。In some embodiments, the query sentence includes multiple keywords. In this case, the medical relationship to be queried includes the initial medical relationship and the association relationship between each keyword. Therefore, the initial medical relationship to be queried can be determined according to the medical entity to be queried and the first keyword of the query sentence; the association relationship between the keywords can be determined according to the keywords of the query sentence.
例如,在问询语句“感冒吃药多少钱”中,关键词是“药”、“钱”,关键词的个数为两个。那么,根据待查询的医疗实体“感冒”及第一个关键词“药”,可以确定待查询的初始医疗关系为“疾病与药物的对应关系”,根据第一个关键词“药”和第二个关键词“钱”,可以确定关键词之间的关联关系为“药物和金额的对应关系”。For example, in the query sentence "how much does it cost to take medicine for a cold", the keywords are "medicine" and "money", and the number of keywords is two. Then, according to the medical entity to be queried "cold" and the first keyword "medicine", it can be determined that the initial medical relationship to be queried is "correspondence between disease and drug". According to the first keyword "medicine" and the first keyword For the two keywords "money", the correlation between the keywords can be determined as the "correspondence between drugs and amount".
下面结合图4描述利用本公开提供的数据处理方法生成的自动化医疗问答系统的一个应用示例。The following describes an application example of an automated medical question-and-answer system generated by using the data processing method provided by the present disclosure with reference to FIG. 4.
图4示例性示出了自动化医疗问答系统的一个应用示例。自动化医疗问答系统主要由语义解析(包括医疗命名实体识别和医疗关系识别)、查询语句构建与转化以及答案回复规则器三个模块组成。Fig. 4 exemplarily shows an application example of the automated medical question answering system. The automated medical question and answer system is mainly composed of three modules: semantic analysis (including medical named entity recognition and medical relationship recognition), query sentence construction and conversion, and answer reply ruler.
语义解析模块包含医疗命名实体识别和医疗关系识别两个部分,医疗命名实体识别旨在抽取出用户问询语句中包含的医疗特征词汇,能够提取出疾病、症状、药品、检查等二十多种类型的医疗实体。即,通过提取问询语句中包含的医疗特征词汇,识别问询语句包含的待查询的医疗实体。The semantic analysis module includes two parts: medical named entity recognition and medical relationship recognition. Medical named entity recognition aims to extract the medical characteristic vocabulary contained in the user's query sentence, and can extract more than 20 kinds of diseases, symptoms, drugs, examinations, etc. Type of medical entity. That is, by extracting the medical characteristic vocabulary contained in the questioning sentence, the medical entity to be queried contained in the questioning sentence is identified.
医疗关系识别通过关系识别规则器来理解问询语句所表达的语义,匹配医疗知识图谱中的关系类型,从而得到与医疗知识图谱对应的医疗关系,即待查询的医疗关 系。通过医疗命名实体识别和医疗关系识别得到了用户问句的医疗实体和医疗关系信息,实体和关系信息被送入查询语句构建与转化模块。Medical relationship recognition uses the relationship recognition ruler to understand the semantics of query sentences, match the relationship types in the medical knowledge graph, and obtain the medical relationship corresponding to the medical knowledge graph, that is, the medical relationship to be queried. Through medical named entity recognition and medical relationship recognition, the medical entity and medical relationship information of the user's question is obtained, and the entity and relationship information is sent to the query sentence construction and transformation module.
查询语句构建与转化模块,基于医疗实体识别和医疗关系识别得到的用户问询语句的医疗实体和医疗关系,将医疗实体和医疗关系结合起来,形成用户问句的逻辑表达式,然后将逻辑表达式转换为基于医疗知识图谱的与用户问句相对应的图数据库查询语句,例如Cypher图数据库查询语句。通过图数据库查询语句(具体包括一度或多度查询等)查询已建立好的医疗知识图谱,医疗知识图谱给出查询到的结果,即对应的医疗知识,送入答案回复规则器模块。The query sentence construction and transformation module, based on the medical entity and medical relationship of the user query sentence obtained by medical entity recognition and medical relationship recognition, combines the medical entity and medical relationship to form the logical expression of the user question, and then express the logic The formula is converted into a graph database query statement corresponding to the user question based on the medical knowledge graph, such as a Cypher graph database query statement. The established medical knowledge graph is queried through the query sentence of the graph database (specifically including one-degree or multiple-degree query, etc.). The medical knowledge graph gives the result of the query, that is, the corresponding medical knowledge, which is sent to the answer reply ruler module.
答案回复规则器模块接收医疗知识图谱查询到的结果,同时结合医疗知识、回答模板及符合用户表达习惯的自然语言,从而得到最终答案,并回复给用户。即,根据从医疗知识图谱查询到的结果,结合医疗知识、应答语句的语句模板及符合用户表达习惯的自然语言,生成应答语句,并输出给用户。The answer reply ruler module receives the results of the medical knowledge graph query, and combines medical knowledge, answer templates and natural language that conforms to the user's expression habits to obtain the final answer and reply to the user. That is, according to the results of the query from the medical knowledge graph, combined with the medical knowledge, the sentence template of the response sentence, and the natural language that conforms to the user's expression habits, the response sentence is generated and output to the user.
下面结合图5描述本公开数据处理装置的一些实施例。In the following, some embodiments of the data processing device of the present disclosure will be described with reference to FIG. 5.
图5示出了本公开一些实施例的数据处理装置的结构示意图。如图5所示,本实施例中的数据处理装置50包括:语句解析模块501,被配置为对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系;知识图谱查询模块502,被配置为利用待查询的医疗实体及待查询的医疗关系查询预先创建的医疗知识图谱,获得目标医疗实体,其中,医疗知识图谱的节点为各个医疗实体,医疗知识图谱的边为各个医疗实体之间的医疗关系,各个医疗实体包括待查询的医疗实体及目标医疗实体,各个医疗实体之间的医疗关系包括待查询的医疗关系;语句输出模块503,被配置为根据目标医疗实体,输出应答语句。Figure 5 shows a schematic structural diagram of a data processing device according to some embodiments of the present disclosure. As shown in FIG. 5, the data processing device 50 in this embodiment includes: a sentence parsing module 501 configured to parse the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried; a knowledge graph The query module 502 is configured to use the medical entity to be queried and the medical relationship to be queried to query the pre-created medical knowledge graph to obtain the target medical entity. The nodes of the medical knowledge graph are each medical entity, and the edges of the medical knowledge graph are The medical relationship between various medical entities, each medical entity includes the medical entity to be queried and the target medical entity, and the medical relationship between each medical entity includes the medical relationship to be queried; the sentence output module 503 is configured to be based on the target medical entity , Output the response sentence.
在一些实施例中,语句解析模块501被配置为:识别用户输入的问询语句包含的待查询的医疗实体;识别用户输入的问询语句包含的至少一个关键词;根据待查询的医疗实体及问询语句的至少一个关键词,确定待查询的医疗关系。In some embodiments, the sentence parsing module 501 is configured to: identify the medical entity to be queried included in the query sentence input by the user; recognize at least one keyword contained in the query sentence entered by the user; according to the medical entity to be queried and At least one keyword of the query sentence determines the medical relationship to be queried.
在一些实施例中,问询语句包含多个关键词,待查询的医疗关系包括初始医疗关系及各个关键词之间的关联关系;语句解析模块501被配置为:根据待查询的医疗实体及问询语句的第一个关键词,确定待查询的初始医疗关系;根据问询语句的各个关键词,确定各个关键词之间的关联关系。In some embodiments, the query sentence includes multiple keywords, and the medical relationship to be queried includes the initial medical relationship and the association relationship between each keyword; the sentence parsing module 501 is configured to: according to the medical entity to be queried and the query The first keyword of the query sentence determines the initial medical relationship to be queried; according to each keyword of the query sentence, the association relationship between each keyword is determined.
在一些实施例中,语句解析模块501被配置为:采用双向长短期记忆网络及条件随机场,识别用户输入的问询语句包含的待查询的医疗实体。In some embodiments, the sentence parsing module 501 is configured to use a two-way long and short-term memory network and a conditional random field to identify the medical entity to be queried included in the query sentence input by the user.
在一些实施例中,知识图谱查询模块502被配置为:利用待查询的医疗实体及待查询的医疗关系,生成图数据库查询语句;利用图数据库查询语句查询预先创建的医疗知识图谱。In some embodiments, the knowledge graph query module 502 is configured to: use the medical entity to be queried and the medical relationship to be queried to generate a graph database query statement; use the graph database query statement to query a pre-created medical knowledge graph.
在一些实施例中,语句输出模块503被配置为:根据待查询的医疗关系,确定应答语句的语句模板;将待查询的医疗实体、目标医疗实体填入应答语句的语句模板,生成应答语句;输出应答语句。In some embodiments, the sentence output module 503 is configured to: determine the sentence template of the response sentence according to the medical relationship to be queried; fill the sentence template of the response sentence with the medical entity to be queried and the target medical entity to generate the response sentence; The response sentence is output.
在一些实施例中,数据处理装置50还包括图谱创建模块500,被配置为:利用各个医疗实体及各个医疗实体之间的医疗关系,创建医疗知识图谱。In some embodiments, the data processing device 50 further includes a graph creating module 500 configured to use each medical entity and the medical relationship between each medical entity to create a medical knowledge graph.
本实施例以知识图谱作为自动问答的答案来源,实现了医疗知识的自动问答,能够更加准确地理解以自然语言描述的用户问题,解析用户的真实意图,返回给用户更加准确、专业的答案,从而为用户提供科学合理的医疗建议及医疗推荐。本实施例有助于规范医疗问答行为,提高医疗只是精准化问答的服务效率和服务质量,调整医疗行业供给关系。In this embodiment, the knowledge graph is used as the answer source of the automatic question and answer, which realizes the automatic question and answer of medical knowledge, can more accurately understand the user's question described in natural language, analyze the user's true intention, and return to the user more accurate and professional answers. So as to provide users with scientific and reasonable medical advice and medical recommendations. This embodiment helps to standardize medical question-and-answer behaviors, improve the service efficiency and service quality of medical question-and-answer, and adjust the supply relationship of the medical industry.
下面结合图6描述本公开数据处理装置的另一些实施例。In the following, other embodiments of the data processing device of the present disclosure will be described with reference to FIG. 6.
图6示出了本公开另一些实施例的数据处理装置的结构示意图。如图6所示,该实施例的数据处理装置60包括:存储器610以及耦接至该存储器610的处理器620,处理器620被配置为基于存储在存储器610中的指令,执行前述任意一些实施例中的数据处理方法。Fig. 6 shows a schematic structural diagram of a data processing device according to other embodiments of the present disclosure. As shown in FIG. 6, the data processing device 60 of this embodiment includes a memory 610 and a processor 620 coupled to the memory 610. The processor 620 is configured to execute any of the foregoing implementations based on instructions stored in the memory 610. The data processing method in the example.
其中,存储器610例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。Among them, the memory 610 may include, for example, a system memory, a fixed non-volatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), and other programs.
数据处理装置60还可以包括输入输出接口630、网络接口640、存储接口650等。这些接口630、640、650以及存储器610和处理器620之间例如可以通过总线660连接。其中,输入输出接口630为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口640为各种联网设备提供连接接口。存储接口650为SD卡、U盘等外置存储设备提供连接接口。The data processing device 60 may also include an input and output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630, 640, 650, and the memory 610 and the processor 620 may be connected via a bus 660, for example. Among them, the input and output interface 630 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 640 provides a connection interface for various networked devices. The storage interface 650 provides a connection interface for external storage devices such as SD cards and U disks.
本公开还包括一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现前述任意一些实施例中的数据处理方法。The present disclosure also includes a computer-readable storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the data processing method in any of the foregoing embodiments is implemented.
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提 供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present disclosure is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present disclosure. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。The above descriptions are only preferred embodiments of the present disclosure and are not intended to limit the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure shall be included in the protection of the present disclosure. Within range.

Claims (15)

  1. 一种计算机执行的数据处理方法,包括:A computer-executed data processing method includes:
    对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系;Analyze the query sentence entered by the user to obtain the medical entity to be queried and the medical relationship to be queried;
    利用待查询的医疗实体及待查询的医疗关系查询预先创建的医疗知识图谱,获得目标医疗实体,其中,医疗知识图谱的节点为各个医疗实体,医疗知识图谱的边为各个医疗实体之间的医疗关系,各个医疗实体包括待查询的医疗实体及目标医疗实体,各个医疗实体之间的医疗关系包括待查询的医疗关系;Use the medical entity to be queried and the medical relationship to be queried to query the pre-created medical knowledge graph to obtain the target medical entity. The node of the medical knowledge graph is each medical entity, and the edge of the medical knowledge graph is the medical treatment between each medical entity. Relationship, each medical entity includes the medical entity to be queried and the target medical entity, and the medical relationship between each medical entity includes the medical relationship to be queried;
    根据目标医疗实体,输出应答语句。According to the target medical entity, a response sentence is output.
  2. 根据权利要求1所述的数据处理方法,其中,所述对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系包括:The data processing method according to claim 1, wherein the parsing the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried comprises:
    识别用户输入的问询语句包含的待查询的医疗实体;Identify the medical entity to be queried contained in the query sentence entered by the user;
    识别用户输入的问询语句包含的至少一个关键词;Identify at least one keyword contained in the query sentence entered by the user;
    根据待查询的医疗实体及问询语句的至少一个关键词,确定待查询的医疗关系。According to the medical entity to be queried and at least one keyword of the query sentence, the medical relationship to be queried is determined.
  3. 根据权利要求2所述的数据处理方法,其中,问询语句包含多个关键词,待查询的医疗关系包括初始医疗关系及各个关键词之间的关联关系,所述根据待查询的医疗实体及问询语句的至少一个关键词,确定待查询的医疗关系包括:The data processing method according to claim 2, wherein the query sentence includes a plurality of keywords, the medical relationship to be queried includes an initial medical relationship and an association relationship between each keyword, and the medical entity to be queried and At least one key word of the query sentence to determine the medical relationship to be queried includes:
    根据待查询的医疗实体及问询语句的第一个关键词,确定待查询的初始医疗关系;Determine the initial medical relationship to be queried according to the medical entity to be queried and the first keyword of the query sentence;
    根据问询语句的各个关键词,确定各个关键词之间的关联关系。According to each keyword of the query sentence, determine the association relationship between each keyword.
  4. 根据权利要求2所述的数据处理方法,其中,所述识别用户输入的问询语句包含的待查询的医疗实体包括:The data processing method according to claim 2, wherein said identifying the medical entity to be queried included in the query sentence input by the user comprises:
    采用双向长短期记忆网络及条件随机场,识别用户输入的问询语句包含的待查询的医疗实体。A bidirectional long-term short-term memory network and a conditional random field are used to identify the medical entity to be queried contained in the query sentence entered by the user.
  5. 根据权利要求1所述的数据处理方法,其中,所述对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系包括:The data processing method according to claim 1, wherein the parsing the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried comprises:
    通过提取问询语句中包含的医疗特征词汇,识别问询语句包含的待查询的医疗实体。By extracting the medical characteristic vocabulary contained in the query sentence, the medical entity to be queried contained in the query sentence is identified.
  6. 根据权利要求1所述的数据处理方法,其中,所述对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系包括:The data processing method according to claim 1, wherein the parsing the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried comprises:
    通过关系识别规则器来理解问询语句所表达的语义,匹配医疗知识图谱中的关系类型,从而获得与医疗知识图谱对应的医疗关系。The relationship recognition ruler is used to understand the semantics of the query sentence, match the relationship type in the medical knowledge graph, and obtain the medical relationship corresponding to the medical knowledge graph.
  7. 根据权利要求1所述的数据处理方法,其中,所述利用待查询的医疗实体及待查询的医疗关系查询预先创建的医疗知识图谱包括:The data processing method according to claim 1, wherein the query of the medical knowledge graph created in advance by using the medical entity to be queried and the medical relationship to be queried comprises:
    利用待查询的医疗实体及待查询的医疗关系,生成图数据库查询语句;Use the medical entity to be queried and the medical relationship to be queried to generate query statements for the graph database;
    利用图数据库查询语句查询预先创建的医疗知识图谱。Use graph database query sentences to query the pre-created medical knowledge graph.
  8. 根据权利要求7所述的数据处理方法,其中,所述利用待查询的医疗实体及待查询的医疗关系,生成图数据库查询语句包括:The data processing method according to claim 7, wherein said generating a graph database query sentence using the medical entity to be queried and the medical relationship to be queried comprises:
    将待查询的医疗实体及待查询的医疗关系结合起来,形成问询语句的逻辑表达式;Combine the medical entity to be queried and the medical relationship to be queried to form a logical expression of the query sentence;
    将逻辑表达式转换为基于医疗知识图谱的与问询语句相对应的图数据库查询语句。9.根据权利要求7所述的数据处理方法,其中,所述图数据库查询语句包括一度查询语句或多度查询语句。The logical expression is converted into a graph database query statement corresponding to the query statement based on the medical knowledge graph. 9. The data processing method according to claim 7, wherein the graph database query statement comprises a one-degree query statement or a multiple-degree query statement.
  9. 根据权利要求7所述的数据处理方法,其中,所述图数据库查询语句包括Cypher图数据库查询语句。8. The data processing method according to claim 7, wherein the graph database query sentence comprises a Cypher graph database query sentence.
  10. 根据权利要求1所述的数据处理方法,其中,所述根据目标医疗实体,输出应答语句包括:The data processing method according to claim 1, wherein said outputting a response sentence according to the target medical entity comprises:
    根据待查询的医疗关系,确定应答语句的语句模板;According to the medical relationship to be queried, determine the sentence template of the response sentence;
    将待查询的医疗实体、目标医疗实体填入应答语句的语句模板,生成应答语句;Fill in the sentence template of the response sentence with the medical entity to be queried and the target medical entity to generate the response sentence;
    输出应答语句。The response sentence is output.
  11. 根据权利要求11所述的数据处理方法,其中,所述将待查询的医疗实体、目标医疗实体填入应答语句的语句模板,生成应答语句包括:The data processing method according to claim 11, wherein said filling the medical entity to be queried and the target medical entity into the sentence template of the response sentence, and generating the response sentence comprises:
    根据从医疗知识图谱查询到的结果,结合医疗知识、应答语句的语句模板及符合用户表达习惯的自然语言,生成应答语句。According to the query results from the medical knowledge graph, the response sentence is generated by combining the medical knowledge, the sentence template of the response sentence, and the natural language that conforms to the user's expression habits.
  12. 根据权利要求1所述的数据处理方法,还包括:The data processing method according to claim 1, further comprising:
    利用各个医疗实体及各个医疗实体之间的医疗关系,创建医疗知识图谱。Use each medical entity and the medical relationship between each medical entity to create a medical knowledge graph.
  13. 一种数据处理装置,包括:A data processing device includes:
    语句解析模块,被配置为对用户输入的问询语句进行解析,获得待查询的医疗实体和待查询的医疗关系;The sentence parsing module is configured to parse the query sentence input by the user to obtain the medical entity to be queried and the medical relationship to be queried;
    知识图谱查询模块,被配置为利用待查询的医疗实体及待查询的医疗关系查询 预先创建的医疗知识图谱,获得目标医疗实体,其中,医疗知识图谱的节点为各个医疗实体,医疗知识图谱的边为各个医疗实体之间的医疗关系,各个医疗实体包括待查询的医疗实体及目标医疗实体,各个医疗实体之间的医疗关系包括待查询的医疗关系;The knowledge graph query module is configured to use the medical entity to be queried and the medical relationship to be queried to query the pre-created medical knowledge graph to obtain the target medical entity. Among them, the nodes of the medical knowledge graph are each medical entity, and the edges of the medical knowledge graph For the medical relationship between various medical entities, each medical entity includes the medical entity to be queried and the target medical entity, and the medical relationship between each medical entity includes the medical relationship to be queried;
    语句输出模块,被配置为根据目标医疗实体,输出应答语句。The sentence output module is configured to output response sentences according to the target medical entity.
  14. 一种数据处理装置,包括:A data processing device includes:
    存储器;以及Memory; and
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行如权利要求1至13中任一项所述的数据处理方法。A processor coupled to the memory, and the processor is configured to execute the data processing method according to any one of claims 1 to 13 based on instructions stored in the memory.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机指令,所述指令被处理器执行时实现如权利要求1至13中任一项所述的数据处理方法。A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions that, when executed by a processor, implement the data processing method according to any one of claims 1 to 13.
PCT/CN2021/088090 2020-04-20 2021-04-19 Data processing method and device, and computer readable storage medium WO2021213314A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010311319.1 2020-04-20
CN202010311319.1A CN112307215B (en) 2020-04-20 2020-04-20 Data processing method, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2021213314A1 true WO2021213314A1 (en) 2021-10-28

Family

ID=74336428

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088090 WO2021213314A1 (en) 2020-04-20 2021-04-19 Data processing method and device, and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN112307215B (en)
WO (1) WO2021213314A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610902A (en) * 2022-03-25 2022-06-10 南京市畜牧兽医站(南京市动物疫病预防控制中心) Poultry disease diagnosis system based on knowledge graph
CN114969261A (en) * 2022-05-30 2022-08-30 平安科技(深圳)有限公司 Data query method and device based on artificial intelligence, electronic equipment and medium
CN115826956A (en) * 2023-02-14 2023-03-21 长威信息科技发展股份有限公司 Visual arrangement analysis method and visual builder for knowledge graph business rules
CN115905497A (en) * 2022-12-23 2023-04-04 北京百度网讯科技有限公司 Method, device, electronic equipment and storage medium for determining reply sentence
CN116108906A (en) * 2023-04-06 2023-05-12 北京亚信数据有限公司 Disease drug relation mapping model training and related recommendation and detection methods and devices
CN116244410A (en) * 2023-02-16 2023-06-09 北京三维天地科技股份有限公司 Index data analysis method and system based on knowledge graph and natural language
CN116303625A (en) * 2023-05-17 2023-06-23 之江实验室 Data query method and device, storage medium and electronic equipment
CN116628167A (en) * 2023-06-08 2023-08-22 四维创智(北京)科技发展有限公司 Response determination method and device, electronic equipment and storage medium
CN117056493A (en) * 2023-09-07 2023-11-14 四川大学 Large language model medical question-answering system based on medical record knowledge graph
CN117556086A (en) * 2023-10-20 2024-02-13 国网智能电网研究院有限公司 Multi-hop path query method, device, computer equipment and storage medium
CN118132681A (en) * 2024-04-30 2024-06-04 支付宝(杭州)信息技术有限公司 Method and device for ordering multiple query results in medical knowledge graph query

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307215B (en) * 2020-04-20 2024-07-19 北京京东拓先科技有限公司 Data processing method, device and computer readable storage medium
CN117171308B (en) * 2023-07-28 2024-09-17 至本医疗科技(上海)有限公司 Method, device and medium for generating scientific research data analysis response information
CN118035394A (en) * 2023-12-08 2024-05-14 重庆邮电大学 Medical question-answering method and system based on multi-source data integration

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180108443A1 (en) * 2016-04-29 2018-04-19 Boe Technology Group Co., Ltd. Apparatus and method for analyzing natural language medical text and generating a medical knowledge graph representing the natural language medical text
CN109710738A (en) * 2018-12-24 2019-05-03 广州天鹏计算机科技有限公司 Drug inquiry method, apparatus, system, computer equipment and storage medium
CN110390003A (en) * 2019-06-19 2019-10-29 北京百度网讯科技有限公司 Question and answer processing method and system, computer equipment and readable medium based on medical treatment
CN110532360A (en) * 2019-07-19 2019-12-03 平安科技(深圳)有限公司 Medical field knowledge mapping question and answer processing method, device, equipment and storage medium
CN112307215A (en) * 2020-04-20 2021-02-02 北京沃东天骏信息技术有限公司 Data processing method, device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180108443A1 (en) * 2016-04-29 2018-04-19 Boe Technology Group Co., Ltd. Apparatus and method for analyzing natural language medical text and generating a medical knowledge graph representing the natural language medical text
CN109710738A (en) * 2018-12-24 2019-05-03 广州天鹏计算机科技有限公司 Drug inquiry method, apparatus, system, computer equipment and storage medium
CN110390003A (en) * 2019-06-19 2019-10-29 北京百度网讯科技有限公司 Question and answer processing method and system, computer equipment and readable medium based on medical treatment
CN110532360A (en) * 2019-07-19 2019-12-03 平安科技(深圳)有限公司 Medical field knowledge mapping question and answer processing method, device, equipment and storage medium
CN112307215A (en) * 2020-04-20 2021-02-02 北京沃东天骏信息技术有限公司 Data processing method, device and computer readable storage medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610902A (en) * 2022-03-25 2022-06-10 南京市畜牧兽医站(南京市动物疫病预防控制中心) Poultry disease diagnosis system based on knowledge graph
CN114969261A (en) * 2022-05-30 2022-08-30 平安科技(深圳)有限公司 Data query method and device based on artificial intelligence, electronic equipment and medium
CN115905497A (en) * 2022-12-23 2023-04-04 北京百度网讯科技有限公司 Method, device, electronic equipment and storage medium for determining reply sentence
CN115905497B (en) * 2022-12-23 2024-03-19 北京百度网讯科技有限公司 Method, device, electronic equipment and storage medium for determining reply sentence
CN115826956A (en) * 2023-02-14 2023-03-21 长威信息科技发展股份有限公司 Visual arrangement analysis method and visual builder for knowledge graph business rules
CN116244410B (en) * 2023-02-16 2023-10-20 北京三维天地科技股份有限公司 Index data analysis method and system based on knowledge graph and natural language
CN116244410A (en) * 2023-02-16 2023-06-09 北京三维天地科技股份有限公司 Index data analysis method and system based on knowledge graph and natural language
CN116108906A (en) * 2023-04-06 2023-05-12 北京亚信数据有限公司 Disease drug relation mapping model training and related recommendation and detection methods and devices
CN116303625A (en) * 2023-05-17 2023-06-23 之江实验室 Data query method and device, storage medium and electronic equipment
CN116303625B (en) * 2023-05-17 2023-07-21 之江实验室 Data query method and device, storage medium and electronic equipment
CN116628167A (en) * 2023-06-08 2023-08-22 四维创智(北京)科技发展有限公司 Response determination method and device, electronic equipment and storage medium
CN116628167B (en) * 2023-06-08 2024-04-05 四维创智(北京)科技发展有限公司 Response determination method and device, electronic equipment and storage medium
CN117056493A (en) * 2023-09-07 2023-11-14 四川大学 Large language model medical question-answering system based on medical record knowledge graph
CN117556086A (en) * 2023-10-20 2024-02-13 国网智能电网研究院有限公司 Multi-hop path query method, device, computer equipment and storage medium
CN118132681A (en) * 2024-04-30 2024-06-04 支付宝(杭州)信息技术有限公司 Method and device for ordering multiple query results in medical knowledge graph query

Also Published As

Publication number Publication date
CN112307215B (en) 2024-07-19
CN112307215A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
WO2021213314A1 (en) Data processing method and device, and computer readable storage medium
CN111353310B (en) Named entity identification method and device based on artificial intelligence and electronic equipment
Halevy et al. Principles of dataspace systems
US7076493B2 (en) Defining a data dependency path through a body of related data
US9799040B2 (en) Method and apparatus for computer assisted innovation
Niu et al. Cognition-driven decision support for business intelligence
CN111475623A (en) Case information semantic retrieval method and device based on knowledge graph
CN102663129A (en) Medical field deep question and answer method and medical retrieval system
Khoo et al. Augmenting Dublin core digital library metadata with Dewey decimal classification
Weigl et al. On providing semantic alignment and unified access to music library metadata
CN115309885A (en) Knowledge graph construction, retrieval and visualization method and system for scientific and technological service
Saini et al. Towards queryable and traceable domain models
Sun A natural language interface for querying graph databases
Amer-Yahia et al. INODE: building an end-to-end data exploration system in practice [extended vision]
CN117891923A (en) Legal question-answering system based on intention recognition and knowledge graph
Song et al. Semantic query graph based SPARQL generation from natural language questions
Wang et al. AceMap: Knowledge Discovery through Academic Graph
Pietranik et al. A method for ontology alignment based on semantics of attributes
Jain et al. Schema matching technique for heterogeneous web database
Jagerman Creating, maintaining and applying quality taxonomies
El Moukhi et al. Requirements-driven modeling for decision-making systems
Hettiarachchi et al. A Scenario-based ER Diagram and Query Generation Engine
Виноградов et al. Ontologies in the problems of building a concept domain model
Kang et al. Methodology for refining subject terms and supporting subject indexing with taxonomy: a case study of the APO digital repository
Grabus Historical Subject Representation: An Analysis of Historical Vocabularies for Temporally-Aligned and Contextual Access Points

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21792929

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21792929

Country of ref document: EP

Kind code of ref document: A1