CN117171329A - Semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method - Google Patents

Semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method Download PDF

Info

Publication number
CN117171329A
CN117171329A CN202311273745.0A CN202311273745A CN117171329A CN 117171329 A CN117171329 A CN 117171329A CN 202311273745 A CN202311273745 A CN 202311273745A CN 117171329 A CN117171329 A CN 117171329A
Authority
CN
China
Prior art keywords
question
chinese medicine
traditional chinese
entity
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311273745.0A
Other languages
Chinese (zh)
Inventor
王铮
嵇望
王媛媛
赵燕伟
徐新黎
屠杭垚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University City College ZUCC
Original Assignee
Zhejiang University City College ZUCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University City College ZUCC filed Critical Zhejiang University City College ZUCC
Priority to CN202311273745.0A priority Critical patent/CN117171329A/en
Publication of CN117171329A publication Critical patent/CN117171329A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The application discloses a knowledge graph question-answering method in the traditional Chinese medicine field based on semantic analysis, which comprises the following steps: constructing a multimodal traditional Chinese medicine knowledge graph; combining the multimodal Chinese medicine knowledge graph to construct a Chinese medicine domain exclusive question corpus; based on the question intention category in the exclusive question corpus in the traditional Chinese medicine field, obtaining a Cypher query sentence and a spoken language answer template, and constructing a question intention-Cypher query sentence and a question intention-answer template corresponding table; semantic analysis is carried out on the question, and a question entity and a question intention classification result are obtained; based on the question entity and the question intention classification result, answer inquiry of the multi-mode traditional Chinese medicine knowledge graph is carried out, a final answer is obtained, and the traditional Chinese medicine domain knowledge graph question and answer based on semantic analysis is realized. The application realizes the knowledge graph-based traditional Chinese medicine field question-answering method with higher adaptation degree, intelligent degree and accuracy, and realizes the popularization and propagation of traditional Chinese medicine knowledge.

Description

Semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method
Technical Field
The application belongs to the field of natural language processing, and particularly relates to a knowledge graph question-answering method in the field of traditional Chinese medicine based on semantic analysis.
Background
The traditional Chinese medicine is used as a main weapon for preventing and treating diseases in traditional Chinese medicine, and is used for curing and serving a plurality of people from ancient times. In recent years, the country has greatly promoted the culture of traditional Chinese medicine, and the inheritance and development of the culture of traditional Chinese medicine are realized, so that the development material basis, namely the mysterious veil of traditional Chinese medicine, needs to be uncovered. However, most of Chinese medicine knowledge is stored in Chinese medicine books in the form of text description, so that the problems of knowledge singleization and fragmentation are solved, and the requirement of the common masses for directly and conveniently acquiring diversified knowledge is hardly met. The question-answering system can effectively solve the problem by understanding the natural language question sentence input by the user and quickly returning the characteristic of exact answer, and in recent years, the knowledge graph is used as the knowledge support of the question-answering system, so that the development of the question-answering system is promoted, and an effective knowledge question-answering method needs to be constructed for the effective operation of the two question-answering systems.
The question-answering method based on the knowledge graph mainly comprises two types: semantic parsing based and information retrieval based methods. The method comprises the steps of converting a natural language question into a query sentence capable of being queried in a knowledge graph through various methods or models based on a semantic analysis method, and retrieving an answer through executing the query sentence; the method based on information retrieval mainly converts questions and answers into feature vector representation through feature engineering, obtains the matching probability of questions and answers through a scoring function, and sorts the answers to obtain the answers. The question-answering method based on semantic analysis is the most common knowledge question-answering method at present due to high accuracy.
The knowledge graph question-answering method based on semantic analysis obtains the final answer through the steps of named entity recognition, entity linking, question classification, answer retrieval and the like, and generally adopts a traditional method based on rule and keyword matching.
Disclosure of Invention
The application aims to provide a knowledge graph question-answering method in the traditional Chinese medicine field based on semantic analysis, which aims at the problem of corpus missing of a knowledge question-answering model in the vertical field, and realizes the knowledge graph question-answering method in the traditional Chinese medicine field based on higher adaptation degree, intelligent degree and accuracy.
In order to achieve the purpose, the application provides a knowledge graph question-answering method in the traditional Chinese medicine field based on semantic analysis, which comprises the following steps:
constructing a multimodal traditional Chinese medicine knowledge graph;
combining the multimodal traditional Chinese medicine knowledge graph to construct a dedicated question corpus in the traditional Chinese medicine field;
based on the question intention category in the Chinese medicine domain exclusive question corpus, obtaining a Cypher query sentence and a spoken language answer template, and constructing a question intention-Cypher query sentence and a question intention-answer template corresponding table;
semantic analysis is carried out on the question, and a question entity and a question intention classification result are obtained;
and carrying out answer inquiry of the multi-mode traditional Chinese medicine knowledge graph based on the question entity and the question intention classification result to obtain a final answer, thereby realizing the traditional Chinese medicine domain knowledge graph question and answer based on semantic analysis.
Optionally, constructing the multi-modal traditional Chinese medicine knowledge graph includes:
acquiring traditional Chinese medicine knowledge data, and cleaning the traditional Chinese medicine knowledge data, wherein the traditional Chinese medicine knowledge data comprises traditional Chinese medicine description data, prescription data and corresponding photos of traditional Chinese medicines;
investigation of knowledge in the traditional Chinese medicine field is conducted, analysis is conducted through combination of the cleaned traditional Chinese medicine knowledge data, and an ontology concept layer is constructed, wherein the ontology concept layer is used for describing data modes of the multi-mode traditional Chinese medicine knowledge graph;
extracting entities in the traditional Chinese medicine description data by adopting a rule-based method;
and extracting the entity in the prescription data by using a deep learning-based method.
Optionally, the knowledge in the traditional Chinese medicine domain comprises entities and defines entity relations;
the entity comprises traditional Chinese medicines, regions, efficacy, categories, symptoms, prescriptions, books and prescriptions, and diverges by taking the traditional Chinese medicines and the prescriptions as centers;
the entity relationship comprises mainly treating symptoms, traditional Chinese medicine functions, traditional Chinese medicine distribution regions, traditional Chinese medicine subordinate categories, prescription treatment diseases, traditional Chinese medicine basic prescription, prescription source books and prescription containing prescriptions.
Optionally, extracting the entity in the traditional Chinese medicine description data by adopting a rule-based method comprises:
analyzing the traditional Chinese medicine description data, wherein the traditional Chinese medicine description data comprises the characteristics and the taste of traditional Chinese medicines, namely meridian tropism, distribution of origin, efficacy, action and clinical application, and the data type of the traditional Chinese medicine description data is semi-structured data;
analyzing the semi-structured data to obtain a segmentation symbol or segmentation text of the semi-structured data;
and dividing the traditional Chinese medicine description data based on the dividing symbols or the dividing words of the semi-structured data, and extracting the entities in the traditional Chinese medicine description data.
Optionally, extracting the entity in the prescription data using a deep learning-based method includes:
performing prescription text naming entity identification by adopting an ALBERT-BiGRU-CRF model fusing an Attention mechanism, wherein the ALBERT-BiGRU-CRF model comprises an ALBERT layer, a BiGRU layer, an Attention layer and a CRF layer;
inputting the prescription text into an ALBERT layer for word embedding to obtain a dynamic vector of a character;
inputting the dynamic vector of the character into the BiGRU layer to learn so as to obtain the characteristic vector of the character;
weighting the dynamic vector of the character and the characteristic vector of the character by using the Attention layer, and inputting the character into the CRF layer for correction to obtain a final predicted sequence tag sequence;
extracting entities in the prescription data based on the final predicted sequence tag sequence.
Optionally, combining the multimodal traditional Chinese medicine knowledge graph, constructing the corpus of question marks exclusive to the traditional Chinese medicine field includes:
analyzing the multi-mode traditional Chinese medicine knowledge graph, dividing question intents by combining target question types, obtaining a plurality of types of questions, and determining labels of each type of questions, wherein the analysis content of the multi-mode traditional Chinese medicine knowledge graph comprises entities, relations and attributes;
the entity, the relation and the attribute are used as a content basis for generating a question corpus, and a question seed corpus is constructed through manual labeling and rules;
and carrying out data enhancement on the question seed corpus by using methods of synonymous replacement, sentence pattern reconstruction and entity word replacement, and constructing a Chinese medicine domain exclusive question corpus.
Optionally, based on the category of the question intention in the corpus of question specific to the traditional Chinese medicine field, obtaining the Cypher query sentence and the spoken language answer template, and constructing the correspondence table of the question intention-Cypher query sentence and the question intention-answer template includes:
acquiring the question intentions in the traditional Chinese medicine domain exclusive question corpus based on the traditional Chinese medicine domain exclusive question corpus;
writing a corresponding Cypher query sentence according to the question intention category to acquire the Cypher query sentence;
writing a corresponding spoken answer template according to the question intention category, and obtaining the spoken answer template;
and constructing a question intention-Cypher query statement and question intention-answer template corresponding table based on the Cypher query statement and the spoken language answer template.
Optionally, performing semantic analysis on the question, and obtaining a question entity and a question intention classification result includes:
performing hard matching by utilizing a HanLP natural language tool to acquire the question entities;
if the query is not matched, using the ALBERT-BiGRU-CRF model of the fusion attention mechanism to carry out named entity recognition on the query, and obtaining the mention of the query entity;
the entity link maps the question entity mention to the multi-mode traditional Chinese medicine knowledge graph, and the question entity is obtained by adopting a method of combining entity similarity calculation with the number of overlapping words;
and carrying out question intention recognition by utilizing an ERNIE-based dual-channel feature fusion question intention recognition model, and obtaining the question intention classification result.
Optionally, the entity linking uses the question entity to mention and map to the multi-mode traditional Chinese medicine knowledge graph, and the method for calculating the number of the overlapped words by using the entity similarity includes:
calculating the similarity between the question entity mention and the map entity based on the Sentence-BERT, and comparing the similarity with a similarity threshold;
if the similarity is not greater than the similarity threshold, the entity connection fails;
and if the similarity is larger than the similarity threshold, comparing the similarity with the overlapped word to obtain the question entity.
Optionally, performing question intention recognition by using the ERNIE-based dual-channel feature fusion question intention recognition model, where obtaining the question intention classification result includes:
the ERNIE-based dual-feature fusion question intention recognition model comprises an input layer, an embedding layer, a feature extraction layer, a feature fusion layer and an output layer;
preprocessing corpus texts in the corpus of exclusive question sentences in the traditional Chinese medicine field through the input layer to obtain processed text sentence vectors;
inputting the text sentence vector into the embedded layer to obtain text data semantic information;
inputting the text data semantic information into the feature extraction layer, and acquiring question category features and context information features by utilizing the improved DPCNN and BiGRU combined with an attention mechanism;
inputting the question category characteristics and the context information characteristics into the characteristic fusion layer for fusion, and obtaining fused characteristic vectors;
and inputting the fused feature vector into a softmax classifier to obtain the question intention classification result.
Optionally, based on the question entity and the question intention classification result, performing an answer query of the multi-modal traditional Chinese medicine knowledge graph, and obtaining the final answer includes:
selecting the corresponding Cypher query sentence and the spoken language answer template by using the question intention-Cypher query sentence and the question intention-answer template corresponding table according to the question entity and the question intention classification result;
and filling the Cypher query sentence by utilizing the question entity, and carrying out answer query in the multi-mode traditional Chinese medicine knowledge graph to obtain a query result, namely the final answer.
The application has the following beneficial effects:
according to the application, firstly, by analyzing the atlas and combining with a specific question corpus in the living practical construction field, aiming at the existing knowledge atlas question-answer method based on semantic analysis, the limitation of the performance of a certain link of semantic analysis is improved by using a deep learning method based on template matching, so that a deep learning model can be more suitable for the knowledge atlas corresponding field; the application uses a deep learning method and a traditional rule mixing method to emphasize key links of question analysis: the question entity recognition and the question intention recognition are improved and promoted, so that the rapid and accurate answer of knowledge in the traditional Chinese medicine field is achieved, manual operation is reduced, the intelligentization and automation level of the whole semantic analysis process and the accuracy of the answer are improved, and the efficiency is maximized; the application realizes the knowledge graph-based traditional Chinese medicine field question-answering method with higher adaptation degree, intelligent degree and accuracy, and realizes the popularization and propagation of traditional Chinese medicine knowledge.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
fig. 1 is a flow chart diagram of a knowledge graph question-answering method in the traditional Chinese medicine field based on semantic analysis according to an embodiment of the application;
FIG. 2 is a flowchart of an ALBERT-BiGRU-CRF named entity recognition algorithm entity recognition method with a fused attention mechanism according to an embodiment of the present application;
FIG. 3 is a flowchart of a question corpus construction according to an embodiment of the present application;
FIG. 4 is a flowchart of a question semantic parsing implementation provided in an embodiment of the present application;
FIG. 5 is a flow chart of entity linking according to an embodiment of the present application;
fig. 6 is a flowchart of question intention recognition of an ERNIE-based dual-channel feature fusion intention recognition algorithm according to an embodiment of the present application.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
As shown in fig. 1, in this embodiment, a method for question answering of a knowledge graph in a traditional Chinese medicine field based on semantic analysis is provided, including: constructing a multi-mode traditional Chinese medicine knowledge graph, analyzing the structural content of the domain knowledge graph, and training a question corpus by combining with an actual construction exclusive model, and a Cypher query sentence and an answer template corresponding to the question intent; semantic analysis is carried out on the question, and a mixed method based on rules and deep learning is adopted to carry out named entity recognition and question intention recognition on the question, so as to obtain a question entity and a question type; and finally, selecting corresponding query sentences and spoken answer templates in a question intention-cytoter query sentence and a question intention-answer template corresponding table according to the question intention classification result, filling the query sentences by using question entities, carrying out answer retrieval in a traditional Chinese medicine knowledge graph to obtain answer entities, filling the answer templates by using the question entities and the answer entity pairs, and returning to the user.
S100, constructing a multi-mode traditional Chinese medicine knowledge graph. Obtaining original data of a constructed map, identifying entities by using a rule-based method and a deep learning-based method, and importing the original data into a map database for data storage, wherein the specific steps are as follows:
s110, utilizing a scrapy crawler framework to acquire traditional Chinese medicine description data, prescription data and corresponding pictures of traditional Chinese medicines in a Chinese herbal medicine website, acquiring traditional Chinese medicine classics text data of 'materia medica schema', firstly cleaning the data, analyzing the data, constructing an ontology concept layer of a knowledge graph in the traditional Chinese medicine field, and acquiring entities in the data by adopting different methods.
S111, constructing a multi-mode traditional Chinese medicine knowledge graph by adopting a mode of combining top-down and bottom-up. Through investigation of knowledge in the field of traditional Chinese medicine, the construction and perfection of an ontology concept layer are completed by combining with cleaned traditional Chinese medicine data, wherein the ontology concept layer describes a data mode of a knowledge graph, and the traditional Chinese medicine knowledge graph realizes the expansion of an entity layer on the basis of the ontology concept layer. In this embodiment, the knowledge in the traditional Chinese medicine field is divided into 8 types of entities, namely TCM (traditional Chinese medicine), local (regional), function (efficacy), class (category), disease (disorder), prescription (prescription), book (Book) and prescname (prescription name), and the 8 types of relationships are defined by diverging with the traditional Chinese medicine and prescription as the center.
The 8-class relationships are major_cure (mainly used for disease), has_func (Chinese medicine function), is_loc (Chinese medicine distribution region), belongs_to (Chinese medicine dependent category), can_cure (prescription for disease treatment), has_pre (Chinese medicine basic prescription), com_from (prescription source book), and contacts_pre (prescription containing prescription), respectively.
S112, extracting entities in the traditional Chinese medicine description data by adopting a rule-based method. Based on different data formats, different rule extraction entities are constructed, and data are processed in batches by using the constructed regular expression of python.
Extracting entities in the traditional Chinese medicine description data by using a rule-based method comprises the following steps:
analysis of the traditional Chinese medicine description data shows that: in the data of each Chinese medicine, the data of nature, taste, meridian tropism, distribution of origin, efficacy, action, clinical application and the like are semi-structured data, different data with relatively fixed formats are separated by specific punctuation marks or characters,
wherein:
[ the nature and flavor enter meridians ] is used for dividing the medicine property and the medicine flavor, wherein the medicine property is prefixed by nature, and the medicine flavor is prefixed by flavor; by using. The nature and flavor of segmentation and meridian tropism, and the different meridian tropisms are used and segmented. Traditional Chinese medicine description data are pungent in nature and flavor. Enter lung meridian and spleen meridian. The final drug property is "hot", the drug property is "pungent" and enters the channels of the lung and spleen;
[ origin distribution ] uses "distribution in" as prefix, and uses "division between different place names;
[ efficacy and action ] for use. The "segmentation efficacy and category" are used and the "segmentation" and the "belonging and genus" are used as prefixes. Drawing out toxin, removing pus, removing putrefaction and promoting tissue regeneration. Belongs to a medicine for removing toxic materials and promoting tissue regeneration. The ' through entity extraction, the efficacy entity ' toxin drawing ', ' pus removing ', ' putrefaction removing ', ' granulation promoting ' is obtained, and the category entity ' toxin drawing granulation promoting medicine ' is obtained;
in [ clinical application ], use is made of. The dosage and symptoms of the division are indicated by the verbs of "use treatment, therapy, use for" and the like, and the different symptoms are used and divided.
Based on this, the present embodiment constructs different rule extraction entities according to different data formats.
In the semi-structured data, the entity identification mode based on rules can ensure the correct extraction of most data entities, but because the original crawled data has the condition of unsatisfied format, the repeated, wrong and missing of the data can be found in the knowledge extraction process, so the embodiment performs manual examination after the rule extraction is utilized, and the accuracy of the constructed data is improved to the greatest extent.
S113, extracting entities in prescription data by using a deep learning-based method. In this embodiment, the ALBERT-biglu-CRF model with the Attention mechanism is adopted to perform the recognition of the named entity of the prescription text, as shown in fig. 2, the model is divided into four layers of ALBERT layer, biglu layer, attention layer and CRF layer from bottom to top, and the entity recognition flow is specifically as follows:
the method comprises the steps of inputting prescription text of 60-90 g of glabrous greenbrier rhizome for treating dermatitis, firstly enabling the text to enter an ALBERT pre-training model for word embedding to obtain dynamic vector representation of characters, then inputting BiGRU layer learning to obtain feature vector codes of the characters, then weighting an original word vector and a learned text vector by an Attention layer, and finally correcting by a CRF layer to output a final predicted sequence tag sequence. For the input sequence, the label sequence of the output corresponding to the model, the Score of the correct label with label y being x is defined as Score, and the calculation formula is as follows:
finally, after the ALBERT_BiGRU_CRF named entity recognition model of the attention mechanism is fused, the dermatitis and the glabrous greenbrier rhizome are respectively marked with labels of B-DIS I-DIS and B-TCM I-TCM I-TCM.
S200, constructing a domain-specific question corpus. And determining the type of question intention, namely the type of question intention which can be supported by the knowledge graph and can be answered by combining analysis of entities, relations and attributes in the knowledge graph ontology concept layer and common traditional Chinese medicine related questions in life. Specifically:
s210, analyzing entities, relations and attributes in the knowledge graph ontology concept layer and combining life reality, and classifying question intentions into 9 large categories. Because the Chinese medicine has a single attribute and a Chinese medicine question pattern with a certain attribute, the embodiment summarizes the two into two major categories. The labels of 9 large categories are TCM_DIS, DIS_TCM, TCM_FUNC, FUNC_TCM, TCM_LOC, TCM_CLA, CLA_TCM, TCM_ATR and ATR_TCM respectively; the question labels related to the attributes are tcm_atr, atr_tcm.
S220, the embodiment creates corpus by adopting a mode of manually constructing templates according to the question intention category determined in S210, as shown in FIG. 3. Specifically:
firstly, analyzing data in a traditional Chinese medicine knowledge graph, and taking entities, relations and attributes in the data as a content basis for generating a question corpus;
then, manually making a template regular structure to construct a seed corpus;
and then, carrying out data enhancement on the seed question by using methods of synonymous replacement, sentence pattern reconstruction and entity word replacement, expanding corpus, and enhancing generalization of the question. Such as seed question "[ TCM ] what diseases are indicated? "question can be obtained by data enhancement" [ TCM ] what disorders are treated? "," which diseases can be treated with [ TCM? "etc.
S300, constructing a corresponding Cypher query sentence and a spoken language answer template according to the type of the question intention, and constructing a question intention-Cypher query sentence and a question intention-answer template corresponding table. According to different question types, a Cypher query sentence is constructed manually, when the label of the question is TCM_DIS, namely, a Disease which can be treated by inquiring traditional Chinese medicines, the corresponding Cypher query sentence written in the embodiment is MATCH (t: TCM) - [ r: major_cure ] - > [ d: disease ] WHERE t.name= { entity } RETURN d ", and the corresponding spoken language answer template is { TCM } and the mainly treated Disease is { DIS }.
S400, semantic analysis of the question, as shown in FIG. 4. And carrying out entity naming identification on the question to obtain a corresponding entity in the map, and carrying out intention identification on the question to obtain a question type so as to select a corresponding Cypher query sentence and a spoken query template. Specifically:
s410, identifying a question naming entity. An ALBER-BiGRU-CRF model combining HanLP hard matching and a fused attention mechanism. Firstly, hard matching is carried out by utilizing a HanLP natural language tool, and if the matching is not achieved, entity identification is carried out by utilizing an ALBERT-based model. Specifically:
s411, firstly, constructing a custom dictionary by utilizing the existing entities in the traditional Chinese medicine knowledge graph, wherein the custom dictionary comprises entity types such as traditional Chinese medicine names, symptoms, efficacy, nature and taste meridian tropism, regions, classification and the like. Then, configuring hanlp.properties files, adding a custom dictionary into a hanlp word segmentation dictionary path, and respectively specifying the parts of speech of words in the dictionary, for example, marking traditional Chinese medicines as 'nt', symptoms as 'nd', efficacy as 'nf', regions as 'nl', and the like. Next, a segment method of the HanLP tool is called, and the method automatically loads a custom dictionary and specified word parts of speech to segment the question, wherein the word part marks corresponding to the entities in the segmentation result are the entities matched in the question.
S412, if the HanLP is used to extract the entity in the question, especially the symptom and efficacy entity in the question, when the described symptom is spoken and can not be matched with the symptom library name, the ALBERT-BiGRU-Attention-CRF is used to identify the named entity of the question, so that the model can identify the question entity more accurately, and the named entity identification corpus in the question corpus is needed to be used for retrain.
S420, entity linking. The entity link maps the entity mention identified by the ALBERT-BiGRU-CRF named entity identification model of the fusion attention mechanism into a knowledge graph, and a method of combining entity similarity calculation with the number of overlapped words is adopted to obtain a corresponding entity of a question entity in the graph, so that whether answer inquiry is carried out or not is determined, as shown in figure 5. Wherein:
the present embodiment uses Sentence-BERT (sBERT) to calculate the similarity between entity references and map entities, and sets the similarity threshold to 0.97 to obtain a candidate set of entities, and combines the sizes of overlapping words to determine the final entity. If the "dry eyes" condition is input, 10 entities with similarity greater than 0.97 are obtained through sBERT calculation, namely "pupil abnormality, fundus abnormality, eye distention, pupil abnormality, dry eyes" and the like, wherein the overlapping words of the "dry eyes" and the "dry eyes" are the most, and therefore the final entity link result is "dry eyes".
S430, identifying the intention of the question. In the embodiment, 9 major classes of problems defined in the step S200 are classified by adopting an ERNIE-based two-channel feature fusion question intention recognition algorithm, and if the classification result is related to the attribute, the attribute type of the query is subdivided by using a feature word matching method. In the atr_tcm class, nature feature words are mainly four-way, the category feature words of the color are five-way, the feature words of the Channel are twelve viscera such as stomach, large intestine and bladder, the last label of each subclass is represented by a 'major class name-minor class name', for example, the attribute of an alias is inquired, and the last question label is 'tcm_atr-other name'. Specifically:
s431, the ERNIE-based dual-feature fusion question intention recognition algorithm model is divided into 5 layers, namely an input layer, an embedded layer, a feature extraction layer, a feature fusion layer and an output layer from bottom to top. The specific flow of question intention recognition by using the model is as follows, as shown in fig. 6:
firstly, preprocessing corpus texts in a question corpus through an input layer to form sentence vector codes, and inputting the processed text sentence vectors into an ERNIE pre-training model to obtain abundant text data semantic information;
then the text vectors are respectively input into the improved DPCNN and the BiGRU combined with the attention mechanism to obtain question category characteristics V d And contextual information feature V b The DPCNN is improved according to the characteristic of shorter question text in the question-answer corpus, the convolution layers in the convolution group are reduced, and more text features are reserved.
Splicing the obtained two feature vectors into a global feature vector V in a feature fusion layer * The calculation formula is as follows:
V * =[V d ,V b ](2)
and finally, inputting the fused feature vector into a softmax classifier to obtain a final question classification result y, wherein the calculation formula is as follows:
y=softmax(WV * +b)(3)
s500, answer inquiry. According to the step, through the question entity and question intention classification results obtained in S200, S300 and S400, a Cypher query sentence template and an answer template constructed in S300 are selected, and the Cypher query sentence is filled by the entity, so that a corresponding answer is obtained by query in a knowledge graph, and a spoken answer template is filled by the query result, so that a final answer is obtained, and an automatic question-answer function is realized. Specifically:
s510, after a user inputs a question, the entity and the question intention type result obtained in S400 utilize a question intention-cytoer query sentence and a question intention-answer template correspondence table, select a corresponding cytoer query sentence template and fill in the entity, convert the question into a semantic representation which can be understood by a computer, and perform answer query in a knowledge graph. The user enters the question "what diseases can be treated with angelica? The method comprises the steps of obtaining a question entity Chinese angelica through semantic analysis, identifying a question intention classification label of ' TCM_DIS ', selecting a Cypher query sentence ' MATCH (t: TCM) - [ r: major_cure ] - [ d: disease ] WHERE t.name= { entity } RETURN d ' based on the intention, filling the question entity Chinese angelica, and obtaining the question ' MATCH (t: TCM) - [ r: major_cure ] - [ d: disease ] WHERE t.name= ' Chinese angelica ' RETURN d ', and selecting a response template ' { TCM } to be mainly used for treating { DIS }.
S520, after obtaining an answer entity through S510 query, filling the answer into a corresponding answer template to obtain a final spoken language answer. Filling a corresponding answer template { tcm } with answers obtained by question entities and inquiry to obtain a final spoken answer { dis }, wherein the final spoken answer "Chinese angelica is mainly used for treating traumatic injury, dizziness, palpitation, amenorrhea, dysmenorrhea and irregular menstruation". "
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (10)

1. A knowledge graph question-answering method in the traditional Chinese medicine field based on semantic analysis is characterized by comprising the following steps:
constructing a multimodal traditional Chinese medicine knowledge graph;
combining the multimodal traditional Chinese medicine knowledge graph to construct a dedicated question corpus in the traditional Chinese medicine field;
based on the question intention category in the Chinese medicine domain exclusive question corpus, obtaining a Cypher query sentence and a spoken language answer template, and constructing a question intention-Cypher query sentence and a question intention-answer template corresponding table;
semantic analysis is carried out on the question, and a question entity and a question intention classification result are obtained;
and carrying out answer inquiry of the multi-mode traditional Chinese medicine knowledge graph based on the question entity and the question intention classification result to obtain a final answer, thereby realizing the traditional Chinese medicine domain knowledge graph question and answer based on semantic analysis.
2. The semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method according to claim 1, wherein constructing the multi-modal traditional Chinese medicine knowledge graph comprises:
acquiring traditional Chinese medicine knowledge data, and cleaning the traditional Chinese medicine knowledge data, wherein the traditional Chinese medicine knowledge data comprises traditional Chinese medicine description data, prescription data and corresponding photos of traditional Chinese medicines;
investigation of knowledge in the traditional Chinese medicine field is conducted, analysis is conducted through combination of the cleaned traditional Chinese medicine knowledge data, and an ontology concept layer is constructed, wherein the ontology concept layer is used for describing data modes of the multi-mode traditional Chinese medicine knowledge graph;
extracting entities in the traditional Chinese medicine description data by adopting a rule-based method;
and extracting the entity in the prescription data by using a deep learning-based method.
3. The semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method according to claim 2, wherein the traditional Chinese medicine domain knowledge comprises entities and defines entity relationships;
the entity comprises traditional Chinese medicines, regions, efficacy, categories, symptoms, prescriptions, books and prescriptions, and diverges by taking the traditional Chinese medicines and the prescriptions as centers;
the entity relationship comprises mainly treating symptoms, traditional Chinese medicine functions, traditional Chinese medicine distribution regions, traditional Chinese medicine subordinate categories, prescription treatment diseases, traditional Chinese medicine basic prescription, prescription source books and prescription containing prescriptions.
4. The semantic analysis-based knowledge-graph question-answering method of traditional Chinese medicine field according to claim 2, wherein extracting the entities in the prescription data by using a deep learning-based method comprises:
performing prescription text naming entity identification by adopting an ALBERT-BiGRU-CRF model fusing an Attention mechanism, wherein the ALBERT-BiGRU-CRF model comprises an ALBERT layer, a BiGRU layer, an Attention layer and a CRF layer;
inputting the prescription text into an ALBERT layer for word embedding to obtain a dynamic vector of a character;
inputting the dynamic vector of the character into the BiGRU layer to learn so as to obtain the characteristic vector of the character;
weighting the dynamic vector of the character and the characteristic vector of the character by using the Attention layer, and inputting the character into the CRF layer for correction to obtain a final predicted sequence tag sequence;
extracting entities in the prescription data based on the final predicted sequence tag sequence.
5. The semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method according to claim 1, wherein constructing the traditional Chinese medicine domain-specific question corpus by combining the multi-modal traditional Chinese medicine knowledge graph comprises:
analyzing the multi-mode traditional Chinese medicine knowledge graph, dividing question intents by combining target question types, obtaining a plurality of types of questions, and determining labels of each type of questions, wherein the analysis content of the multi-mode traditional Chinese medicine knowledge graph comprises entities, relations and attributes;
the entity, the relation and the attribute are used as a content basis for generating a question corpus, and a question seed corpus is constructed through manual labeling and rules;
and carrying out data enhancement on the question seed corpus by using methods of synonymous replacement, sentence pattern reconstruction and entity word replacement, and constructing a Chinese medicine domain exclusive question corpus.
6. The semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method according to claim 1, wherein obtaining the Cypher query sentence and the spoken language answer template based on question intention category in the traditional Chinese medicine domain specific question corpus, and constructing the question intention-Cypher query sentence and question intention-answer template correspondence table comprises:
acquiring the question intentions in the traditional Chinese medicine domain exclusive question corpus based on the traditional Chinese medicine domain exclusive question corpus;
writing a corresponding Cypher query sentence according to the question intention category to acquire the Cypher query sentence;
writing a corresponding spoken answer template according to the question intention category, and obtaining the spoken answer template;
and constructing a question intention-Cypher query statement and question intention-answer template corresponding table based on the Cypher query statement and the spoken language answer template.
7. The semantic analysis-based traditional Chinese medicine domain knowledge graph question answering method according to claim 4, wherein the semantic analysis of the question sentence is performed, and obtaining a question sentence entity and a question sentence intention classification result comprises:
performing hard matching by utilizing a HanLP natural language tool to acquire the question entities;
if the query is not matched, using the ALBERT-BiGRU-CRF model of the fusion attention mechanism to carry out named entity recognition on the query, and obtaining the mention of the query entity;
the entity link maps the question entity mention to the multi-mode traditional Chinese medicine knowledge graph, and the question entity is obtained by adopting a method of combining entity similarity calculation with the number of overlapping words;
and carrying out question intention recognition by utilizing an ERNIE-based dual-channel feature fusion question intention recognition model, and obtaining the question intention classification result.
8. The semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method according to claim 7, wherein the step of obtaining the question entities by using the method that the question entities are mapped to the multi-modal traditional Chinese medicine knowledge graph and the number of overlapping words is combined by using entity similarity calculation comprises the steps of:
calculating the similarity between the question entity mention and the map entity based on the Sentence-BERT, and comparing the similarity with a similarity threshold;
if the similarity is not greater than the similarity threshold, the entity connection fails;
and if the similarity is larger than the similarity threshold, comparing the similarity with the overlapped word to obtain the question entity.
9. The semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method according to claim 7, wherein the step of performing question intention recognition by using the ERNIE-based two-channel feature fusion question intention recognition model, and the step of obtaining the question intention classification result comprises the following steps:
the ERNIE-based dual-feature fusion question intention recognition model comprises an input layer, an embedding layer, a feature extraction layer, a feature fusion layer and an output layer;
preprocessing corpus texts in the corpus of exclusive question sentences in the traditional Chinese medicine field through the input layer to obtain processed text sentence vectors;
inputting the text sentence vector into the embedded layer to obtain text data semantic information;
inputting the text data semantic information into the feature extraction layer, and acquiring question category features and context information features by utilizing the improved DPCNN and BiGRU combined with an attention mechanism;
inputting the question category characteristics and the context information characteristics into the characteristic fusion layer for fusion, and obtaining fused characteristic vectors;
and inputting the fused feature vector into a softmax classifier to obtain the question intention classification result.
10. The semantic analysis-based traditional Chinese medicine domain knowledge graph question answering method according to claim 1, wherein performing answer query of the multi-modal traditional Chinese medicine knowledge graph based on the question entity and the question intention classification result, obtaining the final answer comprises:
selecting the corresponding Cypher query sentence and the spoken language answer template by using the question intention-Cypher query sentence and the question intention-answer template corresponding table according to the question entity and the question intention classification result;
and filling the Cypher query sentence by utilizing the question entity, and carrying out answer query in the multi-mode traditional Chinese medicine knowledge graph to obtain a query result, namely the final answer.
CN202311273745.0A 2023-09-28 2023-09-28 Semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method Pending CN117171329A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311273745.0A CN117171329A (en) 2023-09-28 2023-09-28 Semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311273745.0A CN117171329A (en) 2023-09-28 2023-09-28 Semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method

Publications (1)

Publication Number Publication Date
CN117171329A true CN117171329A (en) 2023-12-05

Family

ID=88941310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311273745.0A Pending CN117171329A (en) 2023-09-28 2023-09-28 Semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method

Country Status (1)

Country Link
CN (1) CN117171329A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688189A (en) * 2023-12-27 2024-03-12 珠江水利委员会珠江水利科学研究院 Knowledge graph, knowledge base and large language model fused question-answering system construction method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688189A (en) * 2023-12-27 2024-03-12 珠江水利委员会珠江水利科学研究院 Knowledge graph, knowledge base and large language model fused question-answering system construction method

Similar Documents

Publication Publication Date Title
CN110032648B (en) Medical record structured analysis method based on medical field entity
CN111475623B (en) Case Information Semantic Retrieval Method and Device Based on Knowledge Graph
CN109684448B (en) Intelligent question and answer method
CN111079377B (en) Method for recognizing named entities of Chinese medical texts
CN107368547A (en) A kind of intelligent medical automatic question-answering method based on deep learning
CN112487202B (en) Chinese medical named entity recognition method and device fusing knowledge map and BERT
CN112597774B (en) Chinese medical named entity recognition method, system, storage medium and equipment
CN106776711A (en) A kind of Chinese medical knowledge mapping construction method based on deep learning
CN108509409A (en) A method of automatically generating semantic similarity sentence sample
CN113505243A (en) Intelligent question-answering method and device based on medical knowledge graph
Schmidt et al. Data mining in clinical trial text: Transformers for classification and question answering tasks
CN114036281B (en) Knowledge graph-based citrus control question-answering module construction method and question-answering system
CN117171329A (en) Semantic analysis-based traditional Chinese medicine domain knowledge graph question-answering method
CN112328800A (en) System and method for automatically generating programming specification question answers
He Towards Visual Question Answering on Pathology Images.
CN115438674B (en) Entity data processing method, entity linking method, entity data processing device, entity linking device and computer equipment
CN112966117A (en) Entity linking method
CN111311459A (en) Interactive question setting method and system for international Chinese teaching
CN105677637A (en) Method and device for updating abstract semantics database in intelligent question-answering system
CN111143531A (en) Question-answer pair construction method, system, device and computer readable storage medium
CN112949308A (en) Method and system for identifying named entities of Chinese electronic medical record based on functional structure
CN112328773A (en) Knowledge graph-based question and answer implementation method and system
CN116561274A (en) Knowledge question-answering method based on digital human technology and natural language big model
CN106897274B (en) Cross-language comment replying method
CN116881413A (en) Intelligent medical question-answering method based on Chinese medical knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination