CN117854737A

CN117854737A - Medical knowledge graph construction method, device and auxiliary decision making system

Info

Publication number: CN117854737A
Application number: CN202211210633.6A
Authority: CN
Inventors: 崔灿; 郑珊珊; 吕明
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2024-04-09

Abstract

The disclosure relates to the technical field of big data processing, in particular to the technical field of knowledge maps, and specifically relates to a method and a device for constructing a medical knowledge map and an auxiliary decision making system. The specific implementation scheme is as follows: acquiring text data to be identified; extracting information from the text data to be identified, and identifying at least two entities in the text data to be identified and the relation between the at least two entities to obtain triples and conditions corresponding to the text data to be identified; and constructing a medical knowledge graph based on the triples and the conditions. The method and the device can obtain the triplet with the condition by extracting the information from the medical knowledge, construct the knowledge graph based on the triplet with the condition, can be suitable for more complex medical application scenes, can be applied to an auxiliary decision making system, and can make auxiliary decision making suggestions more in line with individuals aiming at individual characteristics of patients.

Description

Medical knowledge graph construction method, device and auxiliary decision making system

Technical Field

The disclosure relates to the technical field of big data processing, in particular to the technical field of knowledge maps, and specifically relates to a method and a device for constructing a medical knowledge map and an auxiliary decision making system.

Background

Knowledge Graph (knowledgegraph), also known as Knowledge domain visualization or Knowledge domain mapping map, is a series of various graphs showing Knowledge development process and structural relationship, and Knowledge resources and their carriers are described by using visualization technology, and Knowledge and their interrelationships are mined, analyzed, constructed, drawn and displayed. Applications of medical knowledge, in which relationships between entities are often intricate, often face relational complexities, require unstructured medical record data to be converted into relational entity triples through entity recognition and relational extraction models.

The prior art has focused on simple knowledge descriptions for the management of medical knowledge. Taking the symptom-like entity as an example, there may be only the attribute of "symptom name", but information of important attention in clinical work such as duration of symptoms for a certain disease, occurrence condition, etc. is missing. In addition, the prior art relies on domain experts to classify knowledge according to the knowledge domain covered by the requirement and the actual task oriented to construct a knowledge graph, that is, SPO (Object Subject predicate) triplet knowledge meeting the requirement of the schema is extracted from the natural language text under a given schema set. If the body layer expressed as- (symptom), (cause) -induction- (disease) is established, then knowledge is extracted by semi-supervision method based on knowledge source, and entities such as disease, symptom, cause and the like and related attributes thereof are extracted to form an instance layer. The complexity of the clinical medical service scene tends to lead to the complexity of the ontology layer and the instance layer, if the schema is not sufficiently formulated in advance, the reasoning needs of thousands of diseases needed by the clinical actual scene are difficult to be covered, and the requirements of the clinical intelligent application system on the application-level medical knowledge graph cannot be met.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, storage medium and auxiliary decision making system based on medical knowledge-graph for constructing medical knowledge-graph.

According to a first aspect of the present disclosure, there is provided a method for constructing a medical knowledge graph, including:

acquiring text data to be identified;

extracting information from the text data to be identified, and identifying at least two entities in the text data to be identified and the relation between the at least two entities to obtain triples and conditions corresponding to the text data to be identified;

and constructing a medical knowledge graph based on the triples and the conditions.

According to a second aspect of the present disclosure, there is provided a medical knowledge graph construction apparatus, including:

the data access module is configured to acquire text data to be identified;

the knowledge extraction module is configured to extract information from the text data to be identified, identify at least two entities in the text data to be identified and the relation between the at least two entities, and obtain triples and conditions corresponding to the text data to be identified;

a knowledge-graph generation module configured to construct a medical knowledge-graph based on the triples and the conditions.

According to a third aspect of the present disclosure, there is provided an auxiliary decision making system based on medical knowledge-graph, comprising:

an acquisition module configured to acquire symptom information of a patient;

the knowledge reasoning module is configured to perform knowledge reasoning on the medical knowledge graph constructed by the method according to any one of claims 1-5 to obtain a disease list and probability corresponding to the symptom information, and generate a disease candidate result based on the disease list and probability;

the feature extraction module is configured to extract individual features corresponding to the patient based on the symptom information and search the conditions corresponding to the individual features in the medical knowledge graph;

a decision generation module configured to output a corresponding auxiliary decision suggestion based on the disease candidate result and the condition corresponding to the individual feature.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above claims.

According to a fifth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the above-mentioned technical solutions.

According to a sixth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the above-mentioned technical solutions.

The present disclosure provides a method and an apparatus for constructing a medical knowledge graph, and an auxiliary decision system based on the medical knowledge graph, where information is extracted from medical knowledge to obtain a triplet with conditions, and the triplet with conditions is used to construct the knowledge graph, which can be applied to more complex medical application scenarios, and can be applied to the auxiliary decision system to make an auxiliary decision suggestion more in line with an individual according to individual characteristics of a patient.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic step diagram of a medical knowledge graph construction method in an embodiment of the present disclosure;

FIG. 2 is a schematic block diagram of a first medical knowledge graph construction apparatus in an embodiment of the disclosure;

FIG. 3 is a functional block diagram of a second medical knowledge graph construction apparatus in an embodiment of the disclosure;

FIG. 4 is a schematic block diagram of a first medical knowledge-graph-based decision-aid system in an embodiment of the disclosure;

FIG. 5 is an application interface diagram of an auxiliary decision making system in an embodiment of the present disclosure;

FIG. 6 is an example diagram of a medical knowledge-graph applied to clinical medicine in an embodiment of the present disclosure;

FIG. 7 is a schematic block diagram of a second medical knowledge-graph-based decision-aid system in an embodiment of the disclosure;

fig. 8 is a schematic block diagram of an example electronic device in an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Knowledge extraction is generally required for building a knowledge graph from accessed data. By means of docking a database or importing a document, structured SPO (entity, attribute value) triple relation data which can be understood and calculated by a computer are extracted from data of different structures and types, and a data source is converted into knowledge graph data. For example, penicillin (pharmaceutical) is an entity, hemolytic infectious disease is an entity, indications are relationships, it is indicated that hemolytic streptococcal infectious disease is an indication that hemolytic streptococcal infectious disease is penicillin, penicillin-indication-hemolytic infectious disease is an example of a triplet (entity-relationship-entity). Currently, the information extraction in the medical informatization field is generally based on a closed domain, that is, an SPO triplet meeting the schema requirement is extracted from natural language text under a given schema set. If the schema is not sufficiently formulated in advance, the reasoning needs of thousands of diseases needed by the clinical actual scene are difficult to be covered, and the requirements of the clinical intelligent application system on the application-level medical knowledge graph cannot be met.

In view of the above technical problems, the present disclosure provides a method for constructing a medical knowledge graph, as shown in fig. 1, including:

step S101, obtaining text data to be recognized. The first step in building a knowledge graph is data access, and existing data sources can be roughly divided into two types: one type is structured data, which includes relational data mediated by relational databases (Mysql, oracle, etc.), such as structured databases of disease knowledge bases, typical case bases, symptom sign bases, inspection bases, surgical operation bases, medication information bases, medication question-answering bases, medication case bases, traditional Chinese medicine prescription bases, etc., stored in tabular form. Another type is unstructured data, such as document or form data of books, documents, guides, etc. The means for obtaining the text data to be identified includes database import, manual import from disk, API (Application Program Interface ) access, etc.

Step S102, extracting information from the text data to be identified, and identifying at least two entities in the text data to be identified and the relation between the at least two entities to obtain triples and conditions corresponding to the text data to be identified. And extracting a computer-understandable and calculated structured triplet from the acquired text data to be identified, and converting the text data to be identified into knowledge graph data. In this embodiment, the triples of knowledge extraction are not regular SPOs, but SPOCs (Subject predicate, subject Object, condition conditions), and compared with regular triples, the present disclosure additionally extracts conditions in text data to be identified, and builds a more perfect medical knowledge graph for application in a complex medical decision system. For example, the text data to be recognized is "in order to avoid cerebral acidosis and hypernatremia, sodium bicarbonate solution is not suitable for ketoacidosis, and HCO is only used at blood pH of less than 7.1 ₃ When the concentration of the sodium bicarbonate is less than 12mmol/L, 1.4% sodium bicarbonate solution intravenous drip can be given according to 2mmol/kg, half of the sodium bicarbonate solution is firstly used, and the sodium bicarbonate solution is stopped when the pH of blood is more than or equal to 7.2, so that the problem that cerebral edema is aggravated too quickly due to acidosis is avoided, and a model can be utilizedSPO triplets (sodium bicarbonate solution, treatment, ketoacidosis) were extracted and conditions (blood pH, less than 7.1) and (HCO) in the text were extracted ₃ Less than 12 mmol/L). Thus, in the subsequent application of the medical knowledge graph, decision advice more suitable for the symptoms and signs of the patient can be given according to the individual characteristics of the patient, for example, the blood pH of the patient is 7, HCO ₃ The individual characteristics of the patient can be matched with corresponding conditions for 11mmol/L, so that a decision suggestion of 'giving 1.4% sodium bicarbonate solution static drop according to 2mmol/kg and using half amount first' is obtained.

And step S103, constructing a medical knowledge graph based on the triples and the conditions. I.e. medical knowledge graph comprising SPO triplets (sodium bicarbonate solution, treatment, ketoacidosis) and conditions (blood pH, less than 7.1), (HCO) ₃ Less than 12 mmol/L).

Through the technical scheme, the constructed medical knowledge graph can be suitable for more complex medical decision scenes, the medical knowledge graph is called, relevant disease examination/operation/medication recommendation is carried out, and more personalized decision suggestions are given for different patients.

As an optional implementation manner, extracting information from the text data to be identified, identifying at least two entities in the text data to be identified and a relationship between the at least two entities, and obtaining the triples and conditions corresponding to the text data to be identified includes: determining a first entity corresponding to the triplet; and extracting the relation among the second entity, the first entity and the second entity corresponding to the triples and the conditions corresponding to the triples based on the determined subject. The first entity is a subject of the SPO triplet, and the second entity is a predicate of the triplet. Embodiments of the present disclosure introduce a semi-open domain SPO extraction technique with conditional extraction that completes extraction of the pre, object, and Condition under the Condition given by the Subject only. For example, a Subject is a symptom of a disease, "lower abdominal pain", and in the extraction process, the Subject is defined as "lower abdominal pain", and entities and entity relationships related to "lower abdominal pain" are extracted. This is the extraction of the semi-open field, which only limits the Subject. If a fully open domain extraction mode is adopted, the model randomly extracts entities without any limitation, and the number of the entities possibly extracted is large, but the accuracy is also reduced; and by adopting a full-closed domain mode, the extracted entities and entity relations are limited, and complex knowledge application scenes cannot be met. The method for collecting the semi-open domain extracts the SPO triples with conditions, so that the diversity of extraction entities can be ensured, and the accuracy of the extraction entities can be ensured.

As an alternative embodiment, in identifying at least two entities in the text data to be identified and a relationship between the at least two entities, the identifying the at least two entities and the identifying the relationship between the at least two entities are performed simultaneously. In the prior art, knowledge extraction usually performs an entity identification task, and then combines the identified entities two by two to predict the relationship between the entities. Therefore, in this embodiment, a labeling strategy for identifying entity types and relationships simultaneously is provided, so that the efficiency and accuracy of knowledge extraction can be improved.

As an optional embodiment, the construction method further includes: before the medical knowledge graph is constructed based on the triples and the conditions, knowledge fusion is carried out on the triples, the knowledge fusion comprises the steps of synchronizing entities with the same reference in the triples, performing reference digestion on objects in the triples, establishing edge association, improving graph connectivity, and identifying parts with the same reference in a section of text through the reference digestion.

The disclosure further provides a device for constructing a medical knowledge graph, as shown in fig. 2, including:

the data access module 201 is configured to obtain text data to be identified. The first step in building a knowledge graph is data access, and existing data sources can be roughly divided into two types: one type is structured data, which includes relational data mediated by relational databases (Mysql, oracle, etc.), such as structured databases of disease knowledge bases, typical case bases, symptom sign bases, inspection bases, surgical operation bases, medication information bases, medication question-answering bases, medication case bases, traditional Chinese medicine prescription bases, etc., stored in tabular form. Another type is unstructured data, such as document or form data of books, documents, guides, etc. The method for acquiring the text data to be identified comprises the modes of database import, manual import from a disk, API access and the like.

The knowledge extraction module 202 is configured to extract information from the text data to be identified, identify at least two entities in the text data to be identified and a relationship between the at least two entities, and obtain triples and conditions corresponding to the text data to be identified. And extracting a computer-understandable and calculated structured triplet from the acquired text data to be identified, and converting the text data to be identified into knowledge graph data. In this embodiment, the triples of knowledge extraction are not regular SPOs, but SPOCs (Subject predicate, subject Object, condition conditions), and compared with regular triples, the present disclosure additionally extracts conditions in text data to be identified, and builds a more perfect medical knowledge graph for application in a complex medical decision system. For example, the text data to be recognized is "in order to avoid cerebral acidosis and hypernatremia, sodium bicarbonate solution is not suitable for ketoacidosis, and HCO is only used at blood pH of less than 7.1 ₃ When the concentration of the sodium bicarbonate is less than 12mmol/L, 1.4% sodium bicarbonate solution intravenous drip can be given according to 2mmol/kg, half of the sodium bicarbonate solution is firstly used, and the sodium bicarbonate solution is stopped when the pH of blood is more than or equal to 7.2, so that the excessive fast aggravation of cerebral edema can be avoided, SPO triplets (sodium bicarbonate solution, treatment and ketoacidosis) can be extracted by using a model, and conditions (blood pH, less than 7.1) and (HCO) in texts can be extracted ₃ Less than 12 mmol/L). Thus, in the subsequent application of the medical knowledge graph, decision advice more suitable for the symptoms and signs of the patient can be given according to the individual characteristics of the patient, for example, the blood pH of the patient is 7, HCO ₃ The individual characteristics of the patient can be matched with corresponding conditions for 11mmol/L, so that a decision suggestion of 'giving 1.4% sodium bicarbonate solution static drop according to 2mmol/kg and using half amount first' is obtained.

The knowledge-graph generation module 203 is configured to construct a medical knowledge-graph based on the triples and the conditions. I.e. medical knowledge graph comprising SPO triplets (sodium bicarbonate solution, treatment, ketoacidosis) and conditions (blood pH, less than 7.1), (HCO) ₃ Less than 12 mmol/L).

As an optional implementation manner, the knowledge extraction module 202 performs information extraction on the text data to be identified, identifies at least two entities in the text data to be identified and a relationship between the at least two entities, and obtains a triplet and a condition corresponding to the text data to be identified, where the obtaining includes: determining a first entity corresponding to the triplet; and extracting the relation among the second entity, the first entity and the second entity corresponding to the triples and the conditions corresponding to the triples based on the determined subject. The first entity is a subject of the SPO triplet, and the second entity is a predicate of the triplet. Embodiments of the present disclosure introduce a semi-open domain SPO extraction technique with conditional extraction that completes extraction of the pre, object, and Condition under the Condition given by the Subject only. For example, a Subject is a symptom of a disease, "lower abdominal pain", and in the extraction process, the Subject is defined as "lower abdominal pain", and entities and entity relationships related to "lower abdominal pain" are extracted. This is the extraction of the semi-open field, which only limits the Subject. If a fully open domain extraction mode is adopted, the model randomly extracts entities without any limitation, and the number of the entities possibly extracted is large, but the accuracy is also reduced; and by adopting a full-closed domain mode, the extracted entities and entity relations are limited, and complex knowledge application scenes cannot be met. The method for collecting the semi-open domain extracts the SPO triples with conditions, so that the diversity of extraction entities can be ensured, and the accuracy of the extraction entities can be ensured.

As an alternative embodiment, the knowledge extraction module 202 identifies at least two entities and the relationship between the at least two entities in the text data to be identified is performed simultaneously. In the prior art, knowledge extraction usually performs an entity identification task, and then combines the identified entities two by two to predict the relationship between the entities. Therefore, in this embodiment, a labeling strategy for identifying entity types and relationships simultaneously is provided, so that the efficiency and accuracy of knowledge extraction can be improved.

As an alternative embodiment, as shown in fig. 3, the construction apparatus further includes: the knowledge fusion module 204 is configured to perform knowledge fusion on the triples before the medical knowledge graph is constructed based on the triples and the conditions, wherein the knowledge fusion includes synchronizing entities with the same reference in the triples, performing reference resolution on objects in the triples, establishing edge association, and improving graph connectivity, wherein the reference resolution can identify parts with the same objects in a section of text.

The present disclosure also provides an auxiliary decision making system based on a medical knowledge graph, as shown in fig. 4, including:

an acquisition module 401 configured to acquire symptom information of a patient. As shown in fig. 5, the symptom information may be the symptoms of patient complaints, information such as symptoms and signs extracted from medical records, or information extracted from examination reports.

The knowledge reasoning module 402 is configured to perform knowledge reasoning based on the medical knowledge graph constructed by the method according to any one of the above embodiments to obtain a disease list and probability corresponding to symptom information, and generate a disease candidate result based on the disease list and probability. As shown in fig. 5, the constructed medical knowledge graph is imported into the auxiliary decision making system in advance before the auxiliary decision making system is used. The medical knowledge graph includes various entities or entity relationships associated with the symptoms 501, such as degrees, locations, durations, exacerbation factors, etc., extracted based on the symptoms 501 (entities), and the diseases 502 corresponding to the symptoms 501, and factors associated with the diseases 502, such as gender, age, medical history, inspection index, signs, inspection findings, etc. As shown in fig. 5, after the system acquires symptom information (which may be obtained by patient complaints and/or medical history structuring) of the patient in Step1, elements such as symptom characteristics, physical characteristics, and examination characteristics may be extracted therefrom, for example, symptoms of the patient are "lower abdominal pain", "increased leucorrhea", signs are "lower abdomen holding and blocking", various data reported as blood routine are examined, in Step2 evidence recommendation, the system may query medical knowledge patterns for diseases related to the symptoms according to the extracted characteristics, and analyze the results to obtain diseases that the patient may suffer from, for example, the probability of suffering from colpitis 0.35, the probability of pelvic inflammation is 0.5, the probability of uterine myoma is 0.65, the probability of uterine tumor is 0.25, and after comprehensively considering various elements, a conclusion is drawn in Step3, and a disease candidate result (uterine myoma, 0.65), (pelvic inflammatory disease, 0.25), (uterine tumor, 0.05) is obtained.

The feature extraction module 403 is configured to extract individual features corresponding to the patient based on the symptom information, and find conditions of the corresponding individual features in the medical knowledge graph. In the above embodiment, the condition is added to the medical knowledge graph construction, for example, the medication advice is given according to the symptoms of the patient, but some medications may also need to consider the weight of the patient, and the weight of each patient is different, which needs to extract the weight of the patient as the individual feature, and then query the corresponding condition.

The decision generation module 404 is configured to output corresponding auxiliary decision suggestions based on the disease candidate results and conditions corresponding to the individual features. After the disease of the patient is determined, for example, the symptoms of the patient are determined to correspond to hysteromyoma, relevant medication advice is given according to the disease of hysteromyoma, and meanwhile, the dosage or period of medication can be given by combining the weight of the patient, so that personalized advice is given according to the physical condition of the patient, and the accuracy of decision making is improved.

The medical knowledge graph of the present disclosure is composed of SPO triples with conditions, i.e., SPOC (Subject predicate, subject Object, condition conditions), which extract more conditions than traditional triples, and may give more refined suggestions in subsequent applications, giving better experience to users. SPOC knowledge extraction can support more complex clinical medical business scenarios, such as: the methylprednisolone sodium succinate for injection has various indications, generally only a dosage interval is reminded, and the application of a medical knowledge graph based on a semi-open domain SPO extraction technology with condition extraction can realize that the refined dosage recommendation can be given according to the condition of a patient.

For example, as shown in fig. 6, if the indication of methylprednisolone sodium succinate 600 for injection is many, the reasoning shows that the disease of the patient is rheumatoid arthritis 601, and the recommended dosage is 1g per day, and intravenous injection is performed; if the disease suffered by the patient is spinal cord injury 602, 30mg of the patient is injured for 3 hours per kilogram of body weight, the conditions of injury for 3-8 hours are different, and more refined suggestions can be further given based on the condition of time loss of the patient; for nausea and vomiting caused by chemotherapy of malignant tumor 603, the dosage is different in different cases.

Because the dosage of the methylprednisolone sodium succinate for injection is different for each disease, the system can generate personalized advice according to the conditions in the medical knowledge graph, and can judge what medicine is used under what condition and what dosage and method of the medicine are based on the conditions. Illustratively, the recommendations for rheumatoid arthritis 601 output are: 1g daily, intravenously, for 1, 2, 3 or 4 days; 1g per month, for intravenous injection, 6 months. Because large doses of corticosteroids can cause arrhythmias, the present treatment is limited to use in hospitals for timely electrocardiography and defibrillation. At least 30 minutes of administration is needed each time, and the treatment scheme can be repeated if the disease condition is not improved within one week after the treatment or the disease condition is needed. The details follow the doctor's advice.

The recommended regimen for chemotherapy-induced nausea and vomiting for malignancy 603 is for chemotherapy-induced light to moderate vomiting: 250mg of the product was intravenously injected 1 hour before, at the beginning and after the end of the chemotherapy for at least 5 minutes. The details follow the doctor's advice. Regarding severe vomiting caused by chemotherapy: the product is administered intravenously at least 5 minutes at a time of 1 hour prior to chemotherapy, while a suitable amount of metoclopramide or butyrophenone is administered, followed by administration of 250mg intravenously at the beginning and end of the chemotherapy, respectively. The details follow the doctor's advice.

For acute spinal cord injury 602: treatment should begin within 8 hours after injury. For patients who received treatment within 3 hours of injury: the initial dose was 30mg methylprednisolone per kg body weight, which was administered intravenously at 15 minutes under continuous medical monitoring. The bolus injection should be suspended for 45 minutes followed by intravenous drip at a rate of 5.4mg/kg per hour for 23 hours. Infusion pumps should be placed at injection sites selected differently than bolus injections. The details follow the doctor's advice. For patients who received treatment within 3-8 hours of injury: the initial dose was 30mg methylprednisolone per kg body weight, which was administered intravenously at 15 minutes under continuous medical monitoring. The bolus injection should be suspended for 45 minutes followed by an intravenous drip at a rate of 5.4mg/kg per hour for 47 hours. The details follow the doctor's advice. Only this indication can be given at this rate for high dose injections and should be done in case of electrocardiographic monitoring and defibrillator provision. Intravenous injection of large doses of methylprednisolone (greater than 500mg of methylprednisolone administered in less than 10 minutes) over a short period of time may cause arrhythmia, circulatory collapse and cardiac arrest. The details follow the doctor's advice.

For other indications: the initial dose varies from 10mg to 500mg, depending on the clinical condition. The details follow the doctor's advice. The high-dose methylprednisolone can be used for controlling certain acute severe diseases in a short period, such as bronchial asthma, seropathy, urticaria-like blood transfusion reaction and acute exacerbation stage of multiple sclerosis, and the initial dose less than or equal to 250mg should be injected intravenously for at least 5 minutes.

According to the technical scheme, the medicine dosage and the medicine mode corresponding to the diseases of the patient can be further obtained through analysis based on the conditions in the medical knowledge graph, and more refined medicine suggestions are given instead of only what medicine is needed to be used for treating the certain diseases.

As an alternative embodiment, before generating the disease candidate result based on the disease list and the probability, as shown in fig. 7, the method further includes:

a knowledge question and answer module 405 configured to generate corresponding question questions based on the symptom information. For example, when the patient complains of "lower abdominal pain", "increased leucorrhea", the system is not sufficient to determine the disease of the patient, and the patient may be further asked, for example, to ask the patient "if there is an odor" based on the symptoms of "lower abdominal pain", "increased leucorrhea". The knowledge question-answering module 405 obtains patient answer information based on the question, for example, obtains the answer "yes" of the patient; the knowledge reasoning module 402 screens the disease corresponding to the patient from the disease list based on the reply information as a disease candidate.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, for example, a method of constructing a method medical knowledge graph. For example, in some embodiments, the method of constructing a method medical knowledge graph may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the method of constructing a medical knowledge graph of a method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the method of constructing the method medical knowledge graph in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method for constructing a medical knowledge graph comprises the following steps:

acquiring text data to be identified;

2. The method of claim 1, wherein the extracting information from the text data to be identified, identifying at least two entities in the text data to be identified and a relationship between the at least two entities, and obtaining the triples and the conditions corresponding to the text data to be identified include:

determining a first entity corresponding to the triplet;

and extracting a second entity corresponding to the triplet, a relation between the first entity and the second entity and the condition corresponding to the triplet based on the determined subject.

3. The method of claim 2, wherein the first entity is a subject of the triplet and the second entity is a predicate of the triplet.

4. The method according to claim 1, wherein the identifying at least two entities in the text data to be identified and the relationship between the at least two entities is in particular: identifying the at least two entities and identifying the relationship between the at least two entities are performed simultaneously.

5. The method of any one of claims 1-4, further comprising, prior to constructing a medical knowledge-graph based on the triples and the conditions: knowledge fusion is performed on the triples, including synchronizing the entities identically referenced in the triples, and reference resolution is performed on the objects in the triples.

6. A medical knowledge graph construction apparatus, comprising:

the data access module is configured to acquire text data to be identified;

7. The construction device of claim 6, wherein the knowledge extraction module performs information extraction on the text data to be identified, identifies at least two entities in the text data to be identified and a relationship between the at least two entities, and obtains a triplet and a condition corresponding to the text data to be identified, including:

determining a first entity corresponding to the triplet;

8. The building apparatus of claim 7, wherein the first entity is a subject of the triplet and the second entity is a predicate of the triplet.

9. The construction device according to claim 6, wherein the knowledge extraction module identifies at least two entities in the text data to be identified and a relationship between the at least two entities, in particular: identifying the at least two entities and identifying the relationship between the at least two entities are performed simultaneously.

10. The build apparatus of any of claims 6-9, further comprising:

and a knowledge fusion module configured to perform knowledge fusion on the triples before the medical knowledge graph is constructed based on the triples and the conditions, wherein the knowledge fusion comprises synchronizing the entities with the same references in the triples and performing reference resolution on objects in the triples.

11. An auxiliary decision making system based on medical knowledge graph, comprising:

an acquisition module configured to acquire symptom information of a patient;

12. The decision-assist system according to claim 11, prior to generating a disease candidate based on the list of diseases and probabilities, further comprising:

a knowledge question-answering module configured to generate a corresponding question based on the symptom information and acquire answer information of the patient based on the question;

the knowledge reasoning module screens the diseases corresponding to the patient from the disease list based on the response information to serve as the disease candidate results.

13. The decision-aid system according to claim 11 or 12, wherein the symptom information comprises at least one of: medical record information of the patient; symptoms described by the patient; an examination report of the patient.

14. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

15. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.

16. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-5.