CN111028952B - Method and device for constructing Chinese medical implication knowledge graph - Google Patents

Method and device for constructing Chinese medical implication knowledge graph Download PDF

Info

Publication number
CN111028952B
CN111028952B CN201911179731.6A CN201911179731A CN111028952B CN 111028952 B CN111028952 B CN 111028952B CN 201911179731 A CN201911179731 A CN 201911179731A CN 111028952 B CN111028952 B CN 111028952B
Authority
CN
China
Prior art keywords
medical
entity
knowledge graph
medical entity
implication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911179731.6A
Other languages
Chinese (zh)
Other versions
CN111028952A (en
Inventor
史亚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN201911179731.6A priority Critical patent/CN111028952B/en
Publication of CN111028952A publication Critical patent/CN111028952A/en
Application granted granted Critical
Publication of CN111028952B publication Critical patent/CN111028952B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for constructing a Chinese medical implication knowledge graph. The method comprises the following steps: acquiring a first medical entity and a second medical entity; integrating the first medical entity and the second medical entity, and performing de-duplication filtering on the first medical entity and the second medical entity and the entity in a preset medical knowledge graph to obtain a third medical entity which does not exist in the preset medical knowledge graph and needs to be aligned; fine tuning the pre-trained model to obtain a medical entity implication model; inputting the third medical entity and the entity with the highest similarity into the medical entity implication model, and determining the relationship between the entities; and updating the preset medical knowledge graph according to the relation between the third medical entity and the entity.

Description

Method and device for constructing Chinese medical implication knowledge graph
Technical Field
The invention relates to the technical field of Internet, in particular to a method and a device for constructing a Chinese medical implication knowledge graph.
Background
With more and more semantic web data being opened on the internet, various internet search engine companies at home and abroad begin to construct knowledge maps based on the semantic web data so as to improve service quality, such as Google knowledge maps (Google Knowledge Graph), hundred degrees 'awareness' and the like. The Knowledge Graph (knowledgegraph) is essentially a Chinese network, nodes of which represent entities (entities) or concepts (concepts), and links represent various semantic relationships between the entities or concepts, and is a service mode of Knowledge management, which can connect trivial and scattered Knowledge in various fields with each other, so as to form a huge and networked Knowledge system constructed by taking a semantic network as a framework. Now, people have begun to apply knowledge graphs to intelligent systems such as comprehensive knowledge retrieval, question-answer and decision support.
At present, a medical knowledge graph is constructed, all entities extracted from a data source are input into a neural network model, a great deal of work is needed, the efficiency of constructing the medical knowledge graph is reduced, and how to improve the efficiency is a technical problem to be solved urgently.
Disclosure of Invention
The invention provides a method for constructing a Chinese medical implication knowledge graph, which comprises the following steps:
acquiring a first medical entity and a second medical entity;
integrating the first medical entity and the second medical entity, and performing de-duplication filtering on the first medical entity and the second medical entity and the entity in a preset medical knowledge graph to obtain a third medical entity which does not exist in the preset medical knowledge graph and needs to be aligned;
fine tuning the pre-trained model to obtain a medical entity implication model;
inputting the third medical entity and the entity with the highest similarity into the medical entity implication model, and determining the relationship between the entities;
and updating the preset medical knowledge graph according to the relation between the third medical entity and the entity.
The beneficial effects of this embodiment lie in: and performing de-duplication filtering on the acquired medical entities, leaving third medical entities which are not in the existing medical knowledge graph and need to be aligned, obtaining the entity with the highest similarity with the third medical entity in the existing medical knowledge graph through retrieval and calculation, and then inputting the third medical entity and the entity with the highest similarity into the model, so that other medical entities are not required to be input into the model, the number of the entities needing to be input into the model is greatly reduced, and the efficiency is improved.
Specifically, the acquiring the first medical entity and the second medical entity includes:
acquiring data from the network as a data source;
extracting related data in the medical field from the data source;
performing medical named entity identification by using the finely tuned deep learning pre-trained model to obtain the first medical entity;
the second medical entity is obtained from a structured medical document.
Specifically, the method for obtaining the medical entity implication model by fine tuning the pre-trained model includes:
acquiring a labeling data set from the preset medical knowledge graph;
extracting a training data set and a test data set required by constructing a medical entity implication model from the labeling data set;
and placing the training data set and the test data set in a pre-trained model for training and testing in a fine tuning mode to obtain the medical entity implication model.
Specifically, the third medical entity and the entity with the highest similarity include:
and searching the preset medical knowledge base according to a preset algorithm to obtain the entity with the highest similarity to the third medical entity.
Specifically, the updating the preset medical knowledge graph according to the relationship between the third medical entity and the entity includes:
judging whether the relationship between the third medical entity and the entity is an implication relationship, and if so, updating the preset medical knowledge graph.
The invention also provides a device for constructing the Chinese medical implication knowledge graph, which comprises the following steps:
the acquisition module is used for acquiring the first medical entity and the second medical entity;
the screening module is used for integrating the first medical entity with the second medical entity, de-duplicating and filtering the first medical entity and the second medical entity with the entity in the preset medical knowledge graph to obtain a third medical entity which does not exist in the preset medical knowledge graph and needs to be aligned;
the fine tuning module is used for fine tuning the pre-trained model to obtain a medical entity implication model;
the determining module is used for inputting the third medical entity and the entity with the highest similarity into the medical entity implication model to determine the relationship between the entities;
and the updating module is used for updating the preset medical knowledge graph according to the relation between the third medical entity and the entity.
Specifically, the acquisition module includes:
the first acquisition submodule is used for acquiring data from the network as a data source;
the extraction submodule is used for extracting relevant data in the medical field in the data source;
the identification sub-module is used for carrying out medical named entity identification by using the finely-tuned deep learning pre-trained model to obtain the first medical entity;
and the second acquisition sub-module is used for acquiring the second medical entity from the structured medical document.
Specifically, the fine tuning module includes:
the third acquisition sub-module is used for acquiring a labeling data set from the preset medical knowledge graph;
the extraction sub-module is used for extracting a training data set and a test data set required by constructing a medical entity implication model from the labeling data set;
and the fine tuning sub-module is used for placing the training data set and the test data set in a pre-trained model for training and testing in a fine tuning mode to obtain the medical entity implication model.
Specifically, the determining module includes:
and the retrieval sub-module is used for retrieving the preset medical knowledge base according to a preset algorithm to obtain an entity with the highest similarity to the third medical entity.
Specifically, the updating module includes:
and the judging sub-module is used for judging whether the relation between the third medical entity and the entity is an implication relation or not, and updating the preset medical knowledge graph if the relation is implication relation.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flowchart of a method for constructing a knowledge graph of Chinese medical implications according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for constructing a knowledge graph of Chinese medical implications according to an embodiment of the invention;
FIG. 3 is a block diagram of a device for constructing a knowledge graph of Chinese medical implications according to an embodiment of the invention;
fig. 4 is a block diagram of a device for constructing a knowledge graph of Chinese medical implications according to an embodiment of the invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Fig. 1 is a flowchart of a method for constructing a knowledge graph of chinese medical implications according to an embodiment of the invention, as shown in fig. 1, the method may be implemented as steps S11-S12 as follows:
in step S11, a first medical entity and a second medical entity are acquired;
in step S12, integrating the first medical entity with the second medical entity, and performing duplication elimination and filtration on the first medical entity and the second medical entity with the entity in the preset medical knowledge graph to obtain a third medical entity which does not exist in the preset medical knowledge graph and needs to be aligned;
in step S13, a medical entity implication model is obtained by fine tuning the pre-trained model;
in step S14, inputting the third medical entity and the entity with the highest similarity into the medical entity implication model, and determining the relationship between the entities;
in step S15, the preset medical knowledge graph is updated according to the relationship between the third medical entity and the entity.
In this embodiment, the preset medical knowledge graph may be an existing medical knowledge graph, the first medical entity is obtained from a network and data in reality, the second medical entity is obtained from an unstructured medical document, the first medical entity and the second medical entity are integrated, and compared with the existing medical knowledge graph, a medical entity which does not exist in the existing knowledge graph and needs to be aligned is left, and the medical entity is a third medical entity; pre-training a general model in the medical field through large-scale medical corpus, and fine-tuning the general model by using marked data to obtain a medical entity implication model; and (3) searching and calculating the third medical entity and the entity in the existing medical knowledge graph to obtain the entity with the highest similarity with the third medical entity, inputting the third medical entity and the entity with the highest similarity into the medical entity implication model, outputting implication relation between the third medical entity and the entity with the highest similarity by using the medical entity model, and updating the existing medical knowledge graph by using the implication relation between the third medical entity and the entity with the highest similarity to obtain a new medical knowledge graph.
For example: the medical entity may be "diabetes", "diuresis", "insulin injection", and the relationship between entities may be "diabetes" with symptoms of "diuresis", "insulin injection" may treat diabetes.
It should be noted that, the "first medical entity", "second medical entity", and "third medical entity" do not refer to a single entity; the entity with the highest similarity with the third medical entity can be one or a plurality of entities; medical entities requiring alignment refer to entities having different identities representing the same object, i.e. the alignment is the merging of the entities into an entity having a unique identity.
The beneficial effects of this embodiment lie in: and performing de-duplication filtering on the acquired medical entities, leaving third medical entities which are not in the existing medical knowledge graph and need to be aligned, obtaining the entity with the highest similarity with the third medical entity in the existing medical knowledge graph through retrieval and calculation, and then inputting the third medical entity and the entity with the highest similarity into the model, so that other medical entities are not required to be input into the model, the number of the entities needing to be input into the model is greatly reduced, and the efficiency is improved.
In one embodiment, the above step S11 may be implemented as steps A1-A4 as follows:
in step A1, acquiring data from the network as a data source;
in step A2, extracting relevant data in the medical field in a data source;
in step A3, performing medical named entity recognition by using a fine-tuned deep learning pre-trained model to obtain a first medical entity;
in step A4, a second medical entity is obtained from the structured medical document.
For example: taking network crawling data (medical encyclopedia, medical websites), medical documents (clinical guidelines, medical teaching materials) and unstructured data of clinical medical records as data sources, acquiring related data of medical fields from the data sources, and using a fine-tuned deep learning pre-trained model Bert as a medical naming entity to identify, so as to obtain a first medical entity; the second medical entity is obtained from the medical document that has been structured.
In one embodiment, as shown in FIG. 2, the above step S13 may be implemented as steps S21-S23 as follows:
in step S21, a labeling data set is obtained from a preset medical knowledge graph;
in step S22, a training data set and a test data set required for constructing a medical entity implication model are extracted from the labeling data set;
in step S23, the training data set and the test data set are placed in the pre-trained model for training and testing by means of fine tuning, so as to obtain the medical entity implication model.
In the embodiment, a general model in the medical field is pre-trained through large-scale medical corpus, and then a medical entity implication model is finely trained on the general model through marked data; specifically, a labeling data set required by the implication model is constructed by utilizing the entity upper-lower relationship and the synonym relationship in the existing medical knowledge graph.
The labeling data set required for constructing the implication model comprises:
constructing a positive example containing a model annotation data set, and randomly selecting an entity and a directly upper entity or a synonymous entity of the entity as a positive example sample;
and constructing a negative example containing the model annotation data set, and randomly selecting an entity and a direct system lower entity of the entity as negative example samples.
Taking 70% of the marked data set as a training data set and 30% as a test data set, and then placing the training data set and the test data set in a pretrained model Bert for training and testing in a fine tuning mode to obtain a medical entity implication model.
Note that, the implication relationship means that, for the entity a and the entity B, if the entity a is a lower relationship or a synonymous relationship of the entity B, the entity a implication the entity B.
In one embodiment, the step S14 may be implemented as the following steps, including:
and searching the preset medical knowledge base according to a preset algorithm to obtain the entity with the highest similarity to the third medical entity.
Calculating the final similarity score of the third medical entity Q and the medical entity D in the existing medical knowledge graph according to the following formula:
wherein ,qi Represents an element obtained by word segmentation of the medical entity D, f (q i D) represents q i Word frequency in entity D, |d| represents the number of words that medical entity D contains, avgdl represents the number of words that entities average contains in all medical knowledge maps, k 1 And b represents a freely adjustable parameter, default k.epsilon. 1.2,2.0]B=0.75; score (D, Q) is the final similarity score; IDF represents the inverse text frequency index; wherein, the IDF is calculated based on the following manner;
wherein ,IDFi An inverse text frequency index representing the i-th word, N being the total number of medical entities D in the existing medical knowledge graph, N (q i ) Representing the number of medical entities D containing the i-th word of the retrieval entity.
In one embodiment, the step S15 may be implemented as the following steps, including:
judging whether the relationship between the third medical entity and the entity is an implication relationship, and if so, updating the preset medical knowledge graph.
In this embodiment, the implication model is applied to the X entities with the highest similarity of the third medical entity Q, so as to obtain the upper-lower relationship or the synonymous relationship between the third medical entity Q and the X entities with the highest similarity.
For q i ∈Q,x i E, X, the detailed specification is as follows:
if q i Is filled with x i And x is i Implication q i Q is i And x i Belongs to the synonymous relation;
if q i Is filled with x i But x is i Does not contain q i Q is i Is x i Lower relationship of (2);
if q i Does not contain x i But x is i Implication q i Q is i Is x i Is a higher order relationship of (1);
if q i Does not contain x i And x is i Does not contain q i Q is i And x i There is no relation.
Judging the relation between the third medical entity and X entities with the highest similarity, and if the relation is satisfied, updating the existing medical knowledge graph by using the relation between the third medical entity satisfying the relation and the X entities with the highest similarity; if not, the third medical entity is removed.
Fig. 3 is a block diagram of a device for constructing a knowledge graph of chinese medical implications according to an embodiment of the present invention, and as shown in fig. 3, the device may include the following modules:
an acquisition module 31 for acquiring a first medical entity and a second medical entity;
a screening module 32, configured to integrate the first medical entity with the second medical entity, and perform duplicate removal and filtration on the first medical entity and the second medical entity in a preset medical knowledge graph to obtain a third medical entity that does not exist in the preset medical knowledge graph and needs to be aligned;
a fine tuning module 33 for obtaining a medical entity implication model by fine tuning the pre-trained model;
a determining module 34, configured to input the third medical entity and the entity with the highest similarity into the medical entity implication model, and determine a relationship between the entities;
and an updating module 35, configured to update the preset medical knowledge graph according to the relationship between the third medical entity and the entity.
In one embodiment, as shown in fig. 4, the acquiring module 31 includes:
a first obtaining sub-module 41, configured to obtain data from the network as a data source;
an extraction sub-module 42 for extracting data related to the medical field in the data source;
an identification sub-module 43, configured to use the fine-tuned deep learning pre-trained model to perform medical named entity identification, so as to obtain a first medical entity;
a second acquisition sub-module 44 for acquiring a second medical entity from the structured medical document.
In one embodiment, the trimming module comprises:
the third acquisition sub-module is used for acquiring a labeling data set from a preset medical knowledge graph;
the extraction sub-module is used for extracting a training data set and a test data set required by constructing the medical entity implication model from the labeling data set;
and the fine tuning sub-module is used for placing the training data set and the test data set in a pre-trained model for training and testing in a fine tuning mode to obtain a medical entity implication model.
In one embodiment, the determining module includes:
and the retrieval sub-module is used for retrieving a preset medical knowledge base according to a preset algorithm to obtain an entity with the highest similarity to the third medical entity.
In one embodiment, the update module includes:
and the judging sub-module is used for judging whether the relationship between the third medical entity and the entity is an implication relationship or not, and updating the preset medical knowledge graph if the relationship is implication relationship.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (6)

1. The method for constructing the Chinese medical implication knowledge graph is characterized by comprising the following steps of:
acquiring a first medical entity and a second medical entity, comprising:
acquiring data from the network as a data source;
extracting related data in the medical field from the data source;
performing medical named entity identification by using the finely tuned deep learning pre-trained model to obtain the first medical entity;
obtaining the second medical entity from a structured medical document;
integrating the first medical entity and the second medical entity, and performing de-duplication filtering on the first medical entity and the second medical entity and the entity in a preset medical knowledge graph to obtain a third medical entity which does not exist in the preset medical knowledge graph and needs to be aligned;
fine tuning the pre-trained model to obtain a medical entity implication model;
inputting the third medical entity and the entity with the highest similarity into the medical entity implication model, and determining the relationship between the entities;
updating the preset medical knowledge graph according to the relation between the third medical entity and the entity;
searching a preset medical knowledge base according to a preset algorithm to obtain an entity with the highest similarity to the third medical entity, wherein the searching comprises the following steps:
calculating the final similarity score of the third medical entity Q and the medical entity D in the existing medical knowledge graph according to the following formula:
wherein ,represents the elements after word segmentation of the medical entity D in the existing medical knowledge graph,representation ofWord frequency in the medical entity D in the existing medical knowledge-graph,indicating that the medical entity D in the existing medical knowledge graph contains the number of words, avgdl indicating the number of words contained in the average of the entities in all the medical knowledge graphs,and b represents a parameter that can be freely adjusted, by default,,b=0.75;IDF represents the inverse text frequency index; wherein, the IDF is calculated based on the following manner;
wherein ,IDFi An inverse text frequency index representing the ith word, N being the total number of medical entities D in the existing medical knowledge graph,representing the number of medical entities D in the existing medical knowledge-graph containing the i-th word of the retrieval entity.
2. The method of claim 1, wherein the obtaining the medical entity implication model by fine-tuning the pre-trained model comprises:
acquiring a labeling data set from the preset medical knowledge graph;
extracting a training data set and a test data set required by constructing a medical entity implication model from the labeling data set;
and placing the training data set and the test data set in a pre-trained model for training and testing in a fine tuning mode to obtain the medical entity implication model.
3. The method of claim 1, wherein updating the preset medical knowledge-graph according to the relationship between the third medical entity and the entity comprises:
judging whether the relationship between the third medical entity and the entity is an implication relationship, and if so, updating the preset medical knowledge graph.
4. The utility model provides a chinese medical science implication knowledge graph construction device which characterized in that includes:
an acquisition module for acquiring a first medical entity and a second medical entity, comprising:
the first acquisition submodule is used for acquiring data from the network as a data source;
the extraction submodule is used for extracting relevant data in the medical field in the data source;
the identification sub-module is used for carrying out medical named entity identification by using the finely-tuned deep learning pre-trained model to obtain the first medical entity;
a second acquisition sub-module for acquiring the second medical entity from the structured medical document;
the screening module is used for integrating the first medical entity with the second medical entity, de-duplicating and filtering the first medical entity and the second medical entity with the entity in the preset medical knowledge graph to obtain a third medical entity which does not exist in the preset medical knowledge graph and needs to be aligned;
the fine tuning module is used for fine tuning the pre-trained model to obtain a medical entity implication model;
the determining module is used for inputting the third medical entity and the entity with the highest similarity into the medical entity implication model to determine the relationship between the entities;
the updating module is used for updating the preset medical knowledge graph according to the relation between the third medical entity and the entity;
searching a preset medical knowledge base according to a preset algorithm to obtain an entity with the highest similarity to the third medical entity, wherein the searching comprises the following steps:
calculating the final similarity score of the third medical entity Q and the medical entity D in the existing medical knowledge graph according to the following formula:
wherein ,represents the elements after word segmentation of the medical entity D in the existing medical knowledge graph,representation ofWord frequency in the medical entity D in the existing medical knowledge-graph,indicating that the medical entity D in the existing medical knowledge graph contains the number of words, avgdl indicating the number of words contained in the average of the entities in all the medical knowledge graphs,and b represents a parameter that can be freely adjusted, by default,,b=0.75;IDF represents the inverse text frequency index; wherein, the IDF is calculated based on the following manner;
wherein ,IDFi An inverse text frequency index representing the ith word, N being the total number of medical entities D in the existing medical knowledge graph,representing the number of medical entities D in the existing medical knowledge-graph containing the i-th word of the retrieval entity.
5. The apparatus of claim 4, wherein the trimming module comprises:
the third acquisition sub-module is used for acquiring a labeling data set from the preset medical knowledge graph;
the extraction sub-module is used for extracting a training data set and a test data set required by constructing a medical entity implication model from the labeling data set;
and the fine tuning sub-module is used for placing the training data set and the test data set in a pre-trained model for training and testing in a fine tuning mode to obtain the medical entity implication model.
6. The apparatus of claim 4, wherein the update module comprises:
and the judging sub-module is used for judging whether the relation between the third medical entity and the entity is an implication relation or not, and updating the preset medical knowledge graph if the relation is implication relation.
CN201911179731.6A 2019-11-27 2019-11-27 Method and device for constructing Chinese medical implication knowledge graph Active CN111028952B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911179731.6A CN111028952B (en) 2019-11-27 2019-11-27 Method and device for constructing Chinese medical implication knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911179731.6A CN111028952B (en) 2019-11-27 2019-11-27 Method and device for constructing Chinese medical implication knowledge graph

Publications (2)

Publication Number Publication Date
CN111028952A CN111028952A (en) 2020-04-17
CN111028952B true CN111028952B (en) 2023-08-04

Family

ID=70202485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911179731.6A Active CN111028952B (en) 2019-11-27 2019-11-27 Method and device for constructing Chinese medical implication knowledge graph

Country Status (1)

Country Link
CN (1) CN111028952B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723570B (en) * 2020-06-09 2023-04-28 平安科技(深圳)有限公司 Construction method and device of medicine knowledge graph and computer equipment
CN112233803A (en) * 2020-09-11 2021-01-15 北京欧应信息技术有限公司 Data mining device for assisting doctor in optimizing diagnosis and treatment
CN116108000B (en) * 2023-04-14 2023-06-20 成都安哲斯生物医药科技有限公司 Medical data management query method
CN116383413B (en) * 2023-06-05 2023-08-29 湖南云略信息技术有限公司 Knowledge graph updating method and system based on medical data extraction

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105431839A (en) * 2013-03-15 2016-03-23 罗伯特·哈多克 Intelligent internet system with adaptive user interface providing one-step access to knowledge
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN108595708A (en) * 2018-05-10 2018-09-28 北京航空航天大学 A kind of exception information file classification method of knowledge based collection of illustrative plates
CN109192321A (en) * 2018-09-26 2019-01-11 北京理工大学 The construction method and calculating storage device of drug knowledge mapping
CN109271530A (en) * 2018-10-17 2019-01-25 长沙瀚云信息科技有限公司 A kind of disease knowledge map construction method and plateform system, equipment, storage medium
WO2019071661A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic apparatus, medical text entity name identification method, system, and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180357381A1 (en) * 2017-06-09 2018-12-13 Intelligent Medical Objects, Inc. Method and System for Generating Persistent Local Instances of Ontological Mappings

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105431839A (en) * 2013-03-15 2016-03-23 罗伯特·哈多克 Intelligent internet system with adaptive user interface providing one-step access to knowledge
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
WO2019071661A1 (en) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Electronic apparatus, medical text entity name identification method, system, and storage medium
CN108595708A (en) * 2018-05-10 2018-09-28 北京航空航天大学 A kind of exception information file classification method of knowledge based collection of illustrative plates
CN109192321A (en) * 2018-09-26 2019-01-11 北京理工大学 The construction method and calculating storage device of drug knowledge mapping
CN109271530A (en) * 2018-10-17 2019-01-25 长沙瀚云信息科技有限公司 A kind of disease knowledge map construction method and plateform system, equipment, storage medium

Also Published As

Publication number Publication date
CN111028952A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
CN111028952B (en) Method and device for constructing Chinese medical implication knowledge graph
WO2021093755A1 (en) Matching method and apparatus for questions, and reply method and apparatus for questions
CN110765257A (en) Intelligent consulting system of law of knowledge map driving type
CN111597347B (en) Knowledge embedding defect report reconstruction method and device
CN113779272B (en) Knowledge graph-based data processing method, device, equipment and storage medium
US20140172754A1 (en) Semi-supervised data integration model for named entity classification
CN110188147B (en) Knowledge graph-based document entity relationship discovery method and system
US20180341686A1 (en) System and method for data search based on top-to-bottom similarity analysis
EP3270303A1 (en) An automated monitoring and archiving system and method
WO2020010834A1 (en) Faq question and answer library generalization method, apparatus, and device
WO2020074023A1 (en) Deep learning-based method and device for screening for key sentences in medical document
CN111914550B (en) Knowledge graph updating method and system oriented to limited field
CN109408821A (en) A kind of corpus generation method, calculates equipment and storage medium at device
CN111061828B (en) Digital library knowledge retrieval method and device
CN111708899A (en) Engineering information intelligent searching method based on natural language and knowledge graph
CN110990676A (en) Social media hotspot topic extraction method and system
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN106844638A (en) Information retrieval method, device and electronic equipment
CN114386421A (en) Similar news detection method and device, computer equipment and storage medium
CN111626568A (en) Knowledge base construction method and device and knowledge search method and system
CN117149988B (en) Data management processing method and system based on education digitization
CN117216221A (en) Intelligent question-answering system based on knowledge graph and construction method
CN116976321A (en) Text processing method, apparatus, computer device, storage medium, and program product
CN116049376A (en) Method, device and system for retrieving and replying information and creating knowledge
CN116010662A (en) Construction method, device and medium of energy consumption-carbon emission query system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant