CN110609907A - Medicine field knowledge reasoning method based on random walk - Google Patents

Medicine field knowledge reasoning method based on random walk Download PDF

Info

Publication number
CN110609907A
CN110609907A CN201910876121.5A CN201910876121A CN110609907A CN 110609907 A CN110609907 A CN 110609907A CN 201910876121 A CN201910876121 A CN 201910876121A CN 110609907 A CN110609907 A CN 110609907A
Authority
CN
China
Prior art keywords
entities
medical field
medical
field
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910876121.5A
Other languages
Chinese (zh)
Inventor
张吉昕
秦拯
欧露
颜俊
陈浩
欧博
翟亚静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201910876121.5A priority Critical patent/CN110609907A/en
Publication of CN110609907A publication Critical patent/CN110609907A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to a medicine field knowledge reasoning method based on random walk. The invention mainly comprises (1) a medicine field named entity identification method based on context character binary and information entropy; (2) a method for extracting relationships between medical field entities based on predicate sentiment classification; (3) a medicine field knowledge graph reasoning method based on random walk. Based on the method, the named entities in the medical field are identified, and the relationship among the named entities is extracted, so that the knowledge graph in the medical field is automatically constructed, and the reasoning of the knowledge graph in the medical field is realized.

Description

Medicine field knowledge reasoning method based on random walk
Technical Field
The invention relates to the fields of knowledge engineering and machine learning, in particular to a medicine field knowledge reasoning method based on random walk.
Background
The knowledge graph technology is one of the current popular technical research fields as one of the key technologies in the fields of knowledge engineering and artificial intelligence. Different from a machine learning technology, the problems that local relations among features are difficult to interpret and global relations among the features and outputs are difficult to interpret often exist, a knowledge map technology expresses relations among knowledge entities through triples, association logic between a knowledge body and the knowledge entities is intuitively reflected, the interpretability is good, more and more attention is paid to the industry, and the knowledge map technology becomes one of important bases of an artificial intelligence technology.
The knowledge graph technology mainly comprises the aspects of construction, reasoning and the like, wherein the knowledge graph construction technology mainly comprises named entity identification, relationship extraction and the like, and the knowledge graph reasoning technology mainly comprises entity relationship prediction, knowledge reasoning and the like. Knowledge rules are extracted and inferred based on known relationships among entities in the knowledge graph.
The medicine field is used as a knowledge intensive field, depends on medical and pharmaceutical background knowledge, utilizes a knowledge graph to represent the medical and pharmaceutical background knowledge, and has an important supporting effect on auxiliary intelligent application in the medicine field. However, named entities, relationships among entities, knowledge logic and the like in the medical field have very distinct field characteristics, and have great differences compared with the general field, and a targeted knowledge graph construction and reasoning technology needs to be provided to support the auxiliary intelligent application of the knowledge graph in the medical field.
Disclosure of Invention
The invention aims to solve the problems of automatic construction and reasoning of a medical knowledge graph.
Therefore, the invention provides a medicine field knowledge inference method based on random walk, which mainly comprises three parts:
(1) a medicine field named entity identification method based on context character binary and information entropy;
(2) a method for extracting relationships between medical field entities based on predicate sentiment classification;
(3) a medicine field knowledge graph reasoning method based on random walk.
The specific contents are as follows:
the method (1) is adopted to identify named entities in the medical field, including concepts such as medicines, diseases, symptoms, crowds, components and the like; extracting positive relations and negative relations, including applicable relations, contraindications and the like, among named entities in the medicine field by adopting the method (2); and (3) automatically constructing a medical knowledge graph by using the named entities in the medical field and the relation between the entities, and realizing medical knowledge graph reasoning by adopting the method (3). Based on the method, automatic construction of the medical knowledge graph and knowledge reasoning in the medical field are realized.
(1) A medicine field named entity identification method based on context character binary and information entropy.
Collecting conventional linguistic data and medical professional linguistic data, removing punctuation marks and stop words in the conventional linguistic data, and respectively establishing two character transfer probability matrixes according to the context in the medical linguistic data and the conventional prediction library, wherein each element in the matrixes is a transfer frequency value in the context. Let MatmedicalMat, a contextual character transfer probability matrix for a corpus of medicinenormalThe probability matrix of context character transfer for regular corpus, let { ci,ci+1Is the continuous character context in the corpus by calculating { c }respectivelyi,ci+1Get matrix Mat according to the transition probability between the medical corpus and the conventional corpusmedical(ci,ci+1) And matrix Matnormal(ci,ci+1)。
Based on context character transfer probability matrixes of the medicine linguistic data and the conventional linguistic data, the significance degree of each group of character contexts in the medicine field is calculated by adopting the information entropy, and the character contexts in the medicine linguistic data which are significantly deviated from the character transfer probability of the conventional linguistic data are judged as medicine named entities because the character transfer probability in the conventional linguistic data is relatively stable.
The information Entropy of the character transition probability, Entropy of information (c) is calculated according to the following formulai,ci+1) For marking { ci,ci+1Whether it is a pharmaceutical domain named entity, if Encopy (c)i,ci+1) > t, where t (t ═ 1) is a critical value, then { c ═ ti,ci+1The character contexts of the same named entity are combined to form the medicine named entity.
(2) A method for extracting relationships between medical field entities based on predicate sentiment classification.
And (4) segmenting the medical corpus according to punctuation marks to obtain a short sentence set, and marking the emotion of a part of short sentences in the short sentence set, wherein the labels comprise positive direction, negative direction and neutrality. The conditional random field method based on the Viterbi is adopted to carry out Chinese word segmentation on all short sentences with emotion labels in the medical corpus, and a word orientation method is adopted to carry out vectorization on all words. And carrying out weighted average on word vectors of all words to obtain text vectors of short sentences, and training the text vectors with emotion labels by adopting a support vector machine to obtain a text emotion classification model. And carrying out emotion classification on all short sentences in the medical corpus based on the model, and extracting the short sentences with significant positive or negative emotions.
And performing Chinese word segmentation processing on the short sentences with positive or negative emotions, performing part-of-speech tagging on words and parts of speech in the short sentences, and extracting predicates (verbs) in the short sentences. If the number of the medical named entities contained in the short sentence is more than or equal to 2, and the predicates which belong to two sides of the position of the predicate respectively or the predicates are head words and tail words, extracting the entities on two sides of the predicate and establishing the relationship between the entities, and judging whether the relationship between the entities belongs to a positive relationship or a negative relationship according to the positive emotion or the negative emotion of the short sentence.
(3) A medicine field knowledge graph reasoning method based on random walk.
According to the medicine named entity recognition method and the entity relation extraction method, a medicine knowledge graph KG (V, E, P) is constructed based on a three-tuple expression method, wherein V represents a vertex in the knowledge graph, namely a medicine entity, E represents an edge between two vertexes in the knowledge graph, namely a relation between two entities, and P represents a positive or negative attribute of the edge in the knowledge graph.
As shown in the relation diagram of knowledge-graph concepts in FIG. 1, the concepts of medical entities include diseases, symptoms, medicines, people, departments, body parts, etc. The top points in the knowledge map also include disease entities, such as cold, gastritis, and the like; symptomatic entities, such as cough, stomachache, etc.; drug entities such as aspirin, cephalosporins, and the like; entities of the population, such as infants, pregnant women, etc.; body part entities such as head, chest, etc. The edges indicate the relationship between each two entities, e.g., cold-cough, gastritis-stomachache indicate that cold and gastritis cause cough and stomachache, respectively. In addition, the relationship between the medical entities also includes a positive relationship and a negative relationship, for example, cold-cough is a positive relationship because cold causes cough, whereas tetracycline-pregnant woman is a negative relationship because tetracycline is a contraindication for pregnant women. The positive relationship includes applicable, induced and the like, and the negative relationship includes cautious, contraindicated and the like. For example, given a phrase "gastritis is an inflammation of the gastric mucosa …, which is usually manifested as epigastric pain, nausea, vomiting … complications including bleeding, gastric ulcer …", the disease peak extracted from the phrase is { gastritis }, the extracted symptom peak is { epigastric pain, nausea, vomiting … bleeding, gastric ulcer … }, and the relationship and weight are { (gastritis, epigastric pain, 1.0), (gastritis, nausea, 1.0), (gastritis, vomiting, 1.0) … (gastritis, bleeding, 1.0), (gastritis, gastric ulcer, 1.0) … }.
And carrying out knowledge reasoning based on a random walk method according to the medicine knowledge graph. The inference process may be translated into a traversal process that iteratively searches for inference results starting from finite clues (several entities). V ═ V1,v2,....,vnIs a set of entities that can reason about candidates by inferring from the following formula.
Wherein, score (v)i) Is a specified entity viScore of (d), In (v)i) Is viIn degree of (v) Out (v)i) Is viOut of degree of (p)j,iIs viAnd vjThe attribute value of the edge between the two is 1 in the positive direction and-1 in the negative direction, and α (α ═ 0.85) is an empirical parameter. During reasoning, the known entity group is initialized on the knowledge graph, the corresponding vertex score is initialized to 1, and the other vertex scores are initialized to 0. And obtaining the scores of all the vertexes through random walk iterative calculation, sequencing the scores, and screening according to actual conditions, wherein the entity corresponding to the vertex with higher score is a candidate result which can be deduced by the group of known entities.
Drawings
FIG. 1 is a diagram of relationships between knowledge-graph concepts
Detailed Description
The invention comprises the following steps:
step 1: collecting conventional linguistic data and medical professional linguistic data, and removing punctuations and stop words in the linguistic data.
Step 2: according to the medicine linguistic data and the character context { c in the conventional pre-material libraryi,ci+1Establishing character transfer probability matrixes Mat of medical linguistic data and conventional linguistic data respectivelymedical(ci,ci+1) And Matnormal(ci,ci+1)。
And step 3: and calculating the significance degree of each group of character context belonging to the medicine field by adopting the information entropy based on the context character transfer probability matrix of the medicine corpus and the conventional corpus.
And 4, step 4: if Entropy of information Encopy (c)i,ci+1) > t, where t (t ═ 1) is a critical value, then { c ═ ti,ci+1The character contexts of the same named entity are combined to form the medicine named entity.
And 5: and (4) segmenting the medical corpus according to punctuation marks to obtain a short sentence set, and marking the emotion of a part of short sentences in the short sentence set, wherein the labels comprise positive direction, negative direction and neutrality.
Step 6: and vectorizing all the words by adopting a Chinese word segmentation and word vector method. And carrying out weighted average on word vectors of all words to obtain text vectors of short sentences, and training the text vectors with emotion labels by adopting a support vector machine to obtain a text emotion classification model. And carrying out emotion classification on short sentences in the medical corpus based on the model.
And 7: and extracting predicates in the sentences by using a part-of-speech tagging method, if the number of the medical named entities contained in the sentences is more than or equal to 2 and the predicates which belong to two sides of the positions of the predicates or the predicates are head words and tail words, extracting entities on two sides of the predicates and establishing the relationship between the entities, and judging whether the relationship between the entities belongs to a positive relationship or a negative relationship according to positive emotion or negative emotion of the short sentences.
And 8: according to the medicine named entity recognition and the relation between entities, a medicine knowledge graph is constructed based on a three-component representation method.
And step 9: and carrying out knowledge reasoning based on a random walk method according to the medicine knowledge graph. The inference process may be translated into a traversal process that iteratively searches for candidate inference results starting from finite clues (several entities). .

Claims (4)

1. A medicine field knowledge inference method based on random walk is characterized by comprising the following steps:
(1) a medicine field named entity identification method based on context character binary and information entropy;
(2) a method for extracting relationships between medical field entities based on predicate sentiment classification;
(3) a medicine field knowledge graph reasoning method based on random walk.
2. The method for identifying a named entity in the medical field based on a context character binary group and information entropy as claimed in claim 1, wherein the named entity in the medical field is identified by comparing the statistical representation of the character context of the named entity in the general field with the statistical representation of the character context of the named entity in the medical field by using a context character binary group and information entropy method, aiming at the problem that the traditional named entity identification method is inaccurate due to the fact that the statistical representation of the character context of the named entity in the medical field is not smooth.
3. The method for extracting relationships between medical field entities based on predicate sentiment classification as claimed in claim 1, wherein the relationships between medical field entities are extracted by carrying out sentiment classification on adjacent predicates aiming at positive and negative relationships between medical field entities and related to predicate sentiment between entities.
4. The random walk based reasoning method for knowledge base of medical field according to claim 1, wherein the random walk method is used to perform reasoning for medical knowledge in the knowledge base of medical science, in order to solve the problem that the knowledge base is difficult to be used directly for reasoning due to the intensive and complicated incidence relation between the entities of the knowledge base of medical field.
CN201910876121.5A 2019-09-17 2019-09-17 Medicine field knowledge reasoning method based on random walk Pending CN110609907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910876121.5A CN110609907A (en) 2019-09-17 2019-09-17 Medicine field knowledge reasoning method based on random walk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910876121.5A CN110609907A (en) 2019-09-17 2019-09-17 Medicine field knowledge reasoning method based on random walk

Publications (1)

Publication Number Publication Date
CN110609907A true CN110609907A (en) 2019-12-24

Family

ID=68891506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910876121.5A Pending CN110609907A (en) 2019-09-17 2019-09-17 Medicine field knowledge reasoning method based on random walk

Country Status (1)

Country Link
CN (1) CN110609907A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463895A (en) * 2020-12-01 2021-03-09 零氪科技(北京)有限公司 Method and device for automatically discovering medicine components based on medicine name mining
CN112967820A (en) * 2021-04-12 2021-06-15 平安科技(深圳)有限公司 Medicine property cognitive information extraction method, device, equipment and storage medium
CN116187868A (en) * 2023-04-27 2023-05-30 深圳市迪博企业风险管理技术有限公司 Knowledge graph-based industrial chain development quality evaluation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190113A (en) * 2018-08-10 2019-01-11 北京科技大学 A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records
CN109325131A (en) * 2018-09-27 2019-02-12 大连理工大学 A kind of drug identification method based on biomedical knowledge map reasoning
CN109783698A (en) * 2019-01-15 2019-05-21 辽宁大学 Industrial production data entity recognition method based on Merkle-tree
US20190171656A1 (en) * 2017-05-10 2019-06-06 Boe Technology Group Co., Ltd. Traditional chinese medicine knowledge graph and establishment method therefor, and computer system
CN110119451A (en) * 2019-05-08 2019-08-13 北京颢云信息科技股份有限公司 A kind of knowledge mapping construction method based on relation inference

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190171656A1 (en) * 2017-05-10 2019-06-06 Boe Technology Group Co., Ltd. Traditional chinese medicine knowledge graph and establishment method therefor, and computer system
CN109190113A (en) * 2018-08-10 2019-01-11 北京科技大学 A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records
CN109325131A (en) * 2018-09-27 2019-02-12 大连理工大学 A kind of drug identification method based on biomedical knowledge map reasoning
CN109783698A (en) * 2019-01-15 2019-05-21 辽宁大学 Industrial production data entity recognition method based on Merkle-tree
CN110119451A (en) * 2019-05-08 2019-08-13 北京颢云信息科技股份有限公司 A kind of knowledge mapping construction method based on relation inference

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463895A (en) * 2020-12-01 2021-03-09 零氪科技(北京)有限公司 Method and device for automatically discovering medicine components based on medicine name mining
CN112463895B (en) * 2020-12-01 2024-06-11 零氪科技(北京)有限公司 Method and device for automatically discovering medicine components based on medicine name mining
CN112967820A (en) * 2021-04-12 2021-06-15 平安科技(深圳)有限公司 Medicine property cognitive information extraction method, device, equipment and storage medium
CN112967820B (en) * 2021-04-12 2023-09-19 平安科技(深圳)有限公司 Drug-nature cognition information extraction method, device, equipment and storage medium
CN116187868A (en) * 2023-04-27 2023-05-30 深圳市迪博企业风险管理技术有限公司 Knowledge graph-based industrial chain development quality evaluation method and device

Similar Documents

Publication Publication Date Title
CN110032648B (en) Medical record structured analysis method based on medical field entity
CN110059185B (en) Medical document professional vocabulary automatic labeling method
Hermann et al. Semantic frame identification with distributed word representations
Ciaramita et al. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger
Sarawagi et al. Semi-markov conditional random fields for information extraction
CN110609907A (en) Medicine field knowledge reasoning method based on random walk
CN110688489B (en) Knowledge graph deduction method and device based on interactive attention and storage medium
Bellare et al. Learning extractors from unlabeled text using relevant databases
CN111950283B (en) Chinese word segmentation and named entity recognition system for large-scale medical text mining
Liu et al. Multi-granularity sequence labeling model for acronym expansion identification
US11250212B2 (en) System and method for interpreting contextual meaning of data
Popov Neural network models for word sense disambiguation: an overview
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
Ren et al. Detecting the scope of negation and speculation in biomedical texts by using recursive neural network
CN111859961A (en) Text keyword extraction method based on improved TopicRank algorithm
WO2023226309A1 (en) Model training method and related device
CN116340544B (en) Visual analysis method and system for ancient Chinese medicine books based on knowledge graph
Li et al. Adapting clip for phrase localization without further training
CN111581392B (en) Automatic composition scoring calculation method based on statement communication degree
CN114612767A (en) Scene graph-based image understanding and expressing method, system and storage medium
Cariello et al. A comparison between named entity recognition models in the biomedical domain
CN116775812A (en) Traditional Chinese medicine patent analysis and excavation tool based on natural voice processing
Seeha et al. ThaiLMCut: Unsupervised pretraining for Thai word segmentation
CN112800244B (en) Method for constructing knowledge graph of traditional Chinese medicine and national medicine
Tang et al. Enriching feature engineering for short text samples by language time series analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191224