CN117591655A

CN117591655A - Intelligent question-answering system based on traditional Chinese medicine knowledge graph

Info

Publication number: CN117591655A
Application number: CN202311718179.XA
Authority: CN
Inventors: 刘青萍; 李冉; 刘东波; 晏峻峰; 王伟杰; 曹嘉璇
Original assignee: Hunan University of Chinese Medicine
Current assignee: Hunan University of Chinese Medicine
Priority date: 2023-12-13
Filing date: 2023-12-13
Publication date: 2024-02-23

Abstract

The application relates to an intelligent question-answering system based on a traditional Chinese medicine knowledge graph, which comprises: the problem acquisition module is used for acquiring natural questions of a user; the named entity and intention recognition module is used for carrying out entity recognition and intention recognition processing on the natural question of the user by adopting the BERT+slot-connected model to obtain the entity and intention contained in the natural question; and the answer extraction module is used for generating a query sentence according to the entity and the intention, inquiring in the traditional Chinese medicine knowledge graph according to the query sentence, obtaining an answer corresponding to the natural question of the user, and returning the answer to the user. The system utilizes the combination of the entity node information of the knowledge graph and the question template to automatically construct a Chinese medicine question training corpus, and uses the BERT model as the vector input of the Slot-connected model sentence code, thereby improving the recognition accuracy of the named entity and the intention in the question corpus and realizing the quick retrieval and accurate answer of the user questions.

Description

Intelligent question-answering system based on traditional Chinese medicine knowledge graph

Technical Field

The application relates to the technical field of information recommendation, in particular to an intelligent question-answering system based on a traditional Chinese medicine knowledge graph.

Background

The traditional Chinese medicine is taken as the treasure of the traditional Chinese medicine, has a long history and a rich theoretical system, plays a unique role in regulating yin-yang balance of human body, regulating qi and blood circulation, enhancing immunity and the like, and has extremely high research and popularization value. Along with the high-speed development of internet technology, the traditional Chinese medicine knowledge is more convenient to acquire through a network. At present, users mainly acquire Chinese medicine knowledge through Chinese medicine related websites, hundred degree and other traditional search engines, and the information acquired by the mode is huge in quantity and lacks pertinence, so that the scientificity and accuracy of the knowledge cannot be guaranteed. Compared with the traditional search engine, the intelligent question-answering system can acquire knowledge and information quickly and efficiently. In a traditional intelligent question-answering system which takes unstructured data as a knowledge source, the convenience and timeliness of retrieval are affected.

The traditional Chinese medicine field has a large number of word ambiguities, and the accuracy of entity and intention recognition by the existing Slot-formed model is not high, so that the accuracy of giving an answer by an intelligent question-answering system is affected.

Disclosure of Invention

Based on the above, it is necessary to provide an intelligent question-answering system based on a traditional Chinese medicine knowledge graph aiming at the technical problems.

An intelligent question-answering system based on a traditional Chinese medicine knowledge graph, the system comprising:

and the problem acquisition module is used for acquiring the natural question of the user.

And the named entity and intention recognition module is used for carrying out entity recognition and intention recognition processing on the natural question of the user by adopting the BERT+Slot-connected model to obtain the entity and intention contained in the natural question.

And the answer extraction module is used for generating a query sentence according to the entity and the intention, carrying out query operation in the traditional Chinese medicine knowledge graph according to the query sentence, obtaining an answer corresponding to the natural question of the user, and returning the answer to the user.

In one embodiment, the named entity and intention recognition module is used for inputting a natural question of a user into the BERT pre-training model for sentence coding, outputting a word vector to the Slot-connected model for entity recognition and intention recognition, and obtaining the entity and intention contained in the natural question.

In one embodiment, the traditional Chinese medicine knowledge graph is stored using a Neo4j graph database.

In one embodiment, the answer extraction module is further configured to determine a type of the natural question of the user according to the entity and the intention included in the natural question and a preset question classification system; matching the types of the user natural questions into corresponding question templates, filling the entities into the question templates, generating a Cypher query sentence, executing query operation in a graph database Neo4j storing the traditional Chinese medicine knowledge graph by using a query engine, retrieving answers corresponding to the user questions, and returning the answers to the user.

In one embodiment, the step of constructing the preset problem classification system includes:

and taking the entity information in the knowledge graph as a slot filling question, and combining the question intention and slot filling to generate a problem data set of the traditional Chinese medicine knowledge.

Performing slot labeling on the question by using a BIO labeling method; defining different query words to classify intention of the questions;

and designing a plurality of problems according to the knowledge and intention classification of the traditional Chinese medicines as a preset problem classification system of the system.

In one embodiment, the system further comprises a traditional Chinese medicine knowledge graph construction module, which is used for taking medicine names, nature and taste, meridian tropism, efficacy, main indications and dosage as bodies according to teaching material data of Chinese medicine planning, and combing the knowledge in the traditional Chinese medicine field by the system to extract entities, relations and attributes from structured or unstructured data; and expressing the extracted entity, relation and attribute into a triplet form, and transmitting the obtained triplet data to a graph database Neo4j for storage after finishing, so as to obtain the traditional Chinese medicine knowledge graph.

The intelligent question-answering system based on the traditional Chinese medicine knowledge graph comprises: the problem acquisition module is used for acquiring natural questions of a user; the named entity and intention recognition module is used for carrying out entity recognition and intention recognition processing on the natural question of the user by adopting the BERT+Slot-connected model to obtain the entity and intention contained in the natural question; and the answer extraction module is used for generating a query sentence according to the entity and the intention, carrying out query operation in the traditional Chinese medicine knowledge graph according to the query sentence, obtaining an answer corresponding to the natural question of the user, and returning the answer to the user. The system utilizes the combination of the entity node information of the knowledge graph and the question template to automatically construct a Chinese medicine question training corpus, and uses the BERT model as the vector input of the Slot-connected model sentence code, thereby improving the recognition accuracy of the named entity and the intention in the question corpus and realizing the quick retrieval and accurate answer of the user questions.

Drawings

FIG. 1 is a block diagram of an intelligent question-answering system of a traditional Chinese medicine knowledge graph in one embodiment;

FIG. 2 is a BERT+Slot-Gated model structure in another embodiment;

FIG. 3 is an example of user question corpus generation in another embodiment;

fig. 4 is a schematic diagram of a ontology layer of a knowledge graph in another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, an intelligent question-answering system based on a knowledge graph of traditional Chinese medicine is provided, the system comprising:

and the question acquisition module 10 is used for acquiring the natural question of the user.

Specifically, the user can ask questions in the question acquisition module 10 according to his own intention.

The named entity and intention recognition module 20 is configured to perform entity recognition and intention recognition processing on a natural question of a user by using the bert+slot-related model, so as to obtain an entity and an intention contained in the natural question.

Specifically, the problem identification of the intelligent question-answering system refers to classifying and understanding the problems presented by the user to determine the type, intention and related information of the problems, and can provide accurate answers or solutions for the user.

The problem identification process of intelligent question-answering systems generally comprises two parts, named entity identification and intention identification. The comprehensive use of named entity recognition and intention recognition can promote the constructed intelligent question-answering system to more comprehensively understand the questions of the user, help the system to better interact with the user and provide accurate answers.

Named entity recognition (Named Entity Recognition, NER): the named entities are intended to be identified and extracted from the user's questions. Through named entity recognition, the system can understand key information in the problem and provide a basis for subsequent processing; the named entity identification link of the intelligent question answering system of the traditional Chinese medicine knowledge graph mainly can identify important entities related to traditional Chinese medicines from the problems of users, such as traditional Chinese medicine names, traditional Chinese medicine property and taste names, traditional Chinese medicine efficacy, traditional Chinese medicine indications, traditional Chinese medicine dosage and the like.

Intelligent question-answering systems need to determine the intent or purpose of a user's question. The intention recognition judges the intention of the user asking questions by analyzing the semantics and the context of the questions, and the auxiliary traditional Chinese medicine intelligent question answering system extracts key information from the user questions, so that the intention of the user is more accurately understood, and a foundation is provided for subsequent answer generation and processing.

In the deep learning method, the Slot-connected joint model combines two tasks of entity recognition and intention recognition, synchronously trains and mutually influences, fully utilizes the strong correlation between two key tasks, and improves the question-answering effect. However, the model is input by adopting a static word vector method of a word vector, so that the problem of word ambiguity cannot be solved, and the traditional Chinese medicine information cannot be effectively recorded.

The converter-based bi-directional Encoder representation (Bidirectional Encoder Representations from Transformer; BERT) uses the Encoder module in the converter architecture and discards the Decoder module focusing on understanding the language. To train the feature extraction capabilities of the model, BERT trains the model using two methods, masked Language Model (MLM) and Next Sentence Prediction (NSP). The MLM method predicts randomly obscured partial words by context, giving the model the ability to capture word-level features. The NSP method can capture sentence-level relations by judging whether a connection relation exists between two sentences or not and training a model. BERT has demonstrated the ability to effectively classify traditional Chinese medicine records and can be pre-trained by using a larger corpus, yielding the most accurate results compared to other models.

Each word in the user question has a different slot label and the entire user question has a specific intent. Slot filling can be considered to be the sequence of words x= (X) to be entered ₁ ,…,x _T ) Mapping to a corresponding slot tag sequence The intent detection can be considered as the sequence of words x= (X) to be entered ₁ ,…,x _T ) A classification problem mapped to a corresponding intent label.

The BERT is used as the Embedding input by the deep learning algorithm model, each word of the user question is converted into 768-dimensional vector, and the vector is information-intensive representation of text semantic meaning, so that the distance between two words in the vector space is associated with the semantic similarity between two words in the original text.

Compared with a static word vector method of a word vector in an original model, the BERT which is used as a dynamic word vector fully calculates the context representation of a text in the pre-training process, and can solve the problem of word ambiguity which cannot be solved by the static word vector. Meanwhile, the text represented by the BERT is higher in vector density and higher in semantic information representation capability. The structure of the BERT + Slot-Gated model is shown in figure 2,

the answer extraction module 30 is configured to generate a query sentence according to the entity and the intention, perform a query operation in the traditional Chinese medicine knowledge graph according to the query sentence, obtain an answer corresponding to the natural question of the user, and return the answer to the user.

Specifically, after the system understands the question of the user, the system queries in the semantic library, and finally feeds back the best answer to the user.

Specifically, the BERT+Slot-connected model will beAs a weighted sum of the hidden states of the BILSTM, the calculation formula is:

wherein,e for learning the attention weight _i,k Calculated is h _k And input vector h _i Relation between sigma is an activation function, +.>Is a weight matrix of the feedforward neural network. The hidden state and the slot context vector are then used for slot filling.

The slot labels of the i-th word of the input layer are:

wherein,is the slot label of the i-th word of the input layer, < >>As weight momentAn array. Wherein h is used _i And->And (5) performing softmax to obtain a predicted value of the label corresponding to the ith word.

The intention detection is as follows:

wherein the intent context vector c ^I And c) ^S The same, but intended detection section takes only the last hidden state of BILSTM.

Slot gating mechanisms introduce an additional gate that uses intent context vectors to model the slot-intent relationship to improve slot filling performance. The slot gating mechanism is expressed as:

where g can be considered as a weighted feature of the joint context vector, v and W are trainable vector and matrix, respectively, slot context vectorAnd intent context vector c ^I The combination is activated by tanh.

Wherein p (Y) ^S ,Y ^I Andx) is the conditional probability of understanding slot filling and intent recognition given the sequence of input words, from which the final result of model prediction can be derived.

In one embodiment of verifying named entity recognition and intent recognition effects, the model training hardware environment is RTX3060 and Intel (R) Core (TM) i7-10750H. The software environment is Python3.6, torch1.10.0, tensorflow1.14.0, keras2.2.5, and BERT version is Bert-machine-base. The back end of the system is a flash framework.

And extracting 70% of data in the medical problem data set as a training set, 20% of data as a verification set, and the rest 10% of data as a test set to train the model. And (3) analyzing the BERT+Slot-connected model to compare with the AC multi-mode matching algorithm, the BERT model and the Slot-connected model. And evaluating indexes by adopting precision (P), recall (R) and harmonic average (F1-score) of the precision and the recall in the multi-classification task, and comparing and analyzing the results of the named entity recognition task. EM is a comparison result used to evaluate the percentage of predictions that match the correct answer as the task for which the algorithm is intended.

The model effect evaluation results of different named entity recognition and intention recognition are shown in table 1, wherein the deep learning algorithm result is greatly improved compared with the pattern matching prediction result, and the pattern matching can not recognize the traditional Chinese medicine entity containing wrongly written words in the question sentence, but can only recognize in a limited traditional Chinese medicine entity dictionary. The BERT model has better effect than the Slot-connected model, and the BERT model has stronger understanding ability on semantic information of the Chinese question. The BERT+Slot-connected model provided by the application can improve vector dimension of an input layer, and can better identify entity information and question intention by coding a Chinese question through the BERT model, so that the improved model F1 value is improved by 1.4%, and the EM is improved by 1.8%.

Table 1 model Effect evaluation of different named entity recognition and intention recognition

In one embodiment, the traditional Chinese medicine knowledge graph is stored using Neo4j graph database.

Specifically, a Cypher sentence is used for inquiring a Neo4j graph database, and after entity identification and intention identification are carried out on a natural question of a user by the method, the entity contained in the question can be obtained from the question of the user, and the type of the question can be judged. The system analyzes and identifies entity names and intention classifications in question sentences through the BERT+Slot-connected model, matches required question templates according to the acquired question classification attributes, fills the entity names in the question templates, and generates a Cypher query statement. And executing query operation in a graph database Neo4j storing knowledge by using a query engine, and retrieving a result corresponding to a user question, wherein the corresponding relation between a question template and a query sentence is shown in a table 2.

Table 2 question template and query statement correspondence example

Question in natural language, "what is the sexual taste of ephedra? "for example, the specific implementation of the intelligent question-answering system in a WeChat public number platform is introduced. After the BERT+Slot-connected algorithm analyzes the question, the ephedra is identified as belonging to the entity type of the medicine name, and the intention identification result is media_table. And returning the matched medium_paste question template according to the intention recognition result, and filling 'ephedra' in the question template. Executing the corresponding Cypher sentence in table 2, and returning the information "the sex flavor of ephedra" in the Neo4j map includes: pungent, slightly bitter and warm. The result shows that the system realizes the rapid search and accurate answer of various questions such as main treatment, efficacy, usage and the like of the traditional Chinese medicine, and can provide authoritative and high-quality traditional Chinese medicine knowledge according to the needs of users.

In one embodiment, the step of constructing the preset problem classification system includes: taking entity information in the knowledge graph as a slot filling question, and combining question intention and slot filling to generate a problem data set of traditional Chinese medicine knowledge; performing slot labeling on the question by using a BIO labeling method; defining different query words to classify intention of the questions; and designing a plurality of problems as a preset problem classification system of the system according to the knowledge and intention classification of the traditional Chinese medicines.

Specifically, the entity information in the knowledge graph is used as a slot filling question, the question data set for generating the traditional Chinese medicine knowledge is combined with question intention and slot filling, and the user question corpus is generated as shown in fig. 3. W (words) is a user question, the question uses a BIO labeling method as an S (slot) label, and I (intent) is sentence intention information. The manual labeling is shown in table 3, and 11 types of questions are designed according to the Chinese medicine knowledge and the category characteristic words and used as a preset question classification system of the intelligent question-answering system to generate 4520 medical question data sets.

Table 3 Manual labeling

In one embodiment, the system further comprises a traditional Chinese medicine knowledge graph construction module, which is used for taking medicine names, nature and taste, meridian tropism, efficacy, main indications and dosage as bodies according to the data of the Chinese medicine planning teaching materials, and combing the knowledge of the traditional Chinese medicine field by the system to extract entities, relations and attributes from the structured or unstructured data; and expressing the extracted entity, relation and attribute into a triplet form, and transmitting the obtained triplet data to a graph database Neo4j for storage after finishing, so as to obtain the traditional Chinese medicine knowledge graph.

Specifically, the construction of the knowledge graph refers to drawing elements such as entities, relations, attributes and the like into the graph by extracting a large amount of structured or unstructured data, and selecting a reasonable and efficient mode for storage. The process for constructing the traditional Chinese medicine knowledge graph comprises the following steps: determining a knowledge range, data acquisition, data preprocessing, entity extraction, relation extraction and visual knowledge graph construction.

In the embodiment, the higher education planning teaching material of Chinese medicine industry, namely Chinese medicine, is used as a data source, so that the scientificity and the accuracy of data are ensured. 400 pieces of effective data are collected by manually arranging teaching materials for planning Chinese medicine, the contents of Chinese pharmacopoeia, chinese great dictionary, chinese herbal medicine name, medicine efficacy and the like are normalized and supplemented, and finally a data text is checked by adopting a manual rechecking mode.

Data preprocessing includes processes such as data cleaning, deduplication, normalization, etc., to ensure data quality and consistency. The part of experiments are to acquire traditional Chinese medicine knowledge data by a manual extraction and arrangement method and establish an xlsx file of a full database of Chinese medicine. The method aims at cutting the manually-arranged data into json files meeting the importing requirements of Neo4j by means of a Python related tool. Taking the json format of the traditional Chinese medicine "ephedra" as an example, a total of 8 key-value pairs are contained: medicine name, nature and taste, meridian tropism, efficacy, indications, usage, dosage and use attention. The data are specifically as follows: { "name": "ephedra", "paste": [ "pungent", "warm", "slightly bitter" ], "meridian": [ "lung", "bladder" ], "effect": [ "sweat releasing and exterior releasing", "lung diffusing and relieving asthma", "diuresis and detumescence" ], "atting": 1 "for exterior syndrome due to wind-cold and perspiration, (2) for exterior syndrome due to cough and asthma, (3) for edema and exterior syndrome person,", "user": "for exterior releasing and producing use, honey-baked or used for relieving asthma. "Condition" is 2-10g and "attention" is strong in sweating, so it is contraindicated for spontaneous sweating due to exterior deficiency, night sweat due to yin deficiency and cough and dyspnea due to kidney deficiency. "}. By preprocessing the data, the subsequent data analysis process can be faster and more effective.

For the text data of Chinese medicine, the ontology layer of the knowledge graph needs to be designed first. The method is helpful for organizing scattered Chinese medicinal contents into structured knowledge, and forms content with strong relevance suitable for application in the field of Chinese medicine. The field body seven-step method comprises the following seven steps: (1) determining areas of expertise and categories; (2) examining the possibility of multiplexing existing ontologies; (3) listing important terms in the ontology; (4) defining classes and class hierarchical relationships; (5) defining attributes of the class; (6) defining facets of the attributes; (7) creating an instance. The key technology comprises entity extraction, relation extraction and attribute extraction. The knowledge content in the medical field is relatively straightforward and the relationships are relatively short, so that it is suitable for direct observation and classification by structured data. Entity extraction, also called named entity recognition, is a key part of building a knowledge graph, aiming at building nodes in the knowledge graph, and the quality of extraction directly influences the efficiency and quality of subsequent knowledge acquisition. According to the data of the teaching materials planned in Chinese medicine, the aspects of four-qi, five-flavor, ascending, descending, floating, sinking, toxicity, application, dosage and the like are combined, the medicine name, the nature, the meridian tropism, the efficacy, the main treatment and the dosage are taken as the body of the knowledge graph of the Chinese medicine (see figure 1), and the ephedra, the pungent herb, the lung, the sweating, the exterior syndrome relieving and the like are extracted from the data of the teaching materials and taken as the entity information of the knowledge graph (see table 1). The Chinese medicine names can be connected semantically through relational terms such as nature and taste, meridian tropism, efficacy, main treatment, dosage and the like, so that different contents in Chinese medicine knowledge can be connected in series to form structured knowledge. The types of the knowledge graph entities of the traditional Chinese medicine are shown in table 4.

TABLE 4 knowledge-graph entity types of Chinese herbs

The Chinese medicine names can be semantically connected through relational terms such as nature and taste, meridian tropism, efficacy, main treatment, dosage and the like, and the body layer of the knowledge graph is shown in figure 4. Therefore, different contents in the knowledge of the traditional Chinese medicine can be connected in series to form the structured knowledge.

The triples are generally used as knowledge representations of the knowledge graph, and the basic forms mainly comprise an entity 1, a relation, an entity 2, an attribute value and the like. The entity is the most basic element in the knowledge graph, and different relations exist among different Chinese medicinal entities; the attribute mainly refers to the characteristics of the traditional Chinese medicine entity; the attribute value mainly refers to the content of the corresponding attribute of the traditional Chinese medicine entity. In this embodiment, the knowledge of the traditional Chinese medicine is processed into the triple information according to the design of the body layer of the knowledge graph, as shown in table 5, the medicine name and the nature and taste can be expressed as (ephedra, nature and taste, pungent), (ephedra, nature and taste, slightly bitter), (ephedra, nature and taste, warm) and other triples. As shown in Table 6, usage attention is taken as an additional attribute of the drug name, which is helpful for subsequent knowledge graph application.

TABLE 5 knowledge-graph entity relationship types of Chinese herbs

TABLE 6 knowledge-graph entity attributes of Chinese herbs

Knowledge storage is usually in two modes of RDF and a graph database, the graph database belongs to one of NoSQL databases, and the application selects a graph database Neo4j with the advantages of high performance, high reliability, high expandability and the like as a knowledge storage tool. The Neo4j graph database comprises two basic data types of nodes and relations, each node represents an entity, and the node can have zero or more relations and attributes; the relationship refers to a relationship between two nodes, and the Neo4j graph database can be used for customizing the design relationship type. The method comprises the steps of constructing a traditional Chinese medicine knowledge graph, transmitting the arranged traditional Chinese medicine triplet data to a graph database Neo4j, executing a Cypher CREATE sentence by using a Python language through a py2Neo module, further operating the Neo4j graph database to CREATE entity nodes, entity attributes and entity relations, and transmitting the arranged traditional Chinese medicine triplet data to the graph database Neo4j to realize storage and visualization of the knowledge graph.

In a specific embodiment, the design of the intelligent question-answering system can reasonably apply the abundant structural semantic information in the knowledge graph, so that the interaction between people and machines is more efficient. The user can ask questions for the system according to his own intention. After the system understands the question of the user, the system queries in the semantic library, and finally feeds back the best answer to the user. The design and realization back end of the system is written by adopting Python language and flash framework. Because Python has a more advanced data structure, development and application speeds are high, and the Python has strong expansibility, the Python is widely applied to natural language processing tasks. The backend server is built by using a flash framework, on one hand, because of better expansibility, and on the other hand, compared with the Django framework, the backend server has higher flexibility and is easier to get on hand. The technical features of the above embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described. The scope of the present specification should be considered as long as there is no contradiction between the combinations of these technical features.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. An intelligent question-answering system based on a traditional Chinese medicine knowledge graph is characterized in that the system comprises:

the problem acquisition module is used for acquiring natural questions of a user;

the named entity and intention recognition module is used for carrying out entity recognition and intention recognition processing on a natural question of a user by adopting a BERT+Slot-connected model to obtain entities and intentions contained in the natural question;

2. The system of claim 1, wherein the named entity and intent recognition module is configured to input a natural question of a user into the BERT pre-training model for sentence coding, and output a word vector to the Slot-related model for entity recognition and intent recognition, so as to obtain an entity and intent contained in the natural question.

3. The system of claim 1, wherein the traditional Chinese medicine knowledge-graph is stored using a Neo4j graph database.

4. The system of claim 3, wherein the answer extraction module is further configured to determine a type of the natural question of the user according to the entities and intentions included in the natural question and a preset question classification system; matching the types of the user natural questions into corresponding question templates, filling the entities into the question templates, generating a Cypher query sentence, executing query operation in a graph database Neo4j storing the traditional Chinese medicine knowledge graph by using a query engine, retrieving answers corresponding to the user questions, and returning the answers to the user.

5. The system of claim 4, wherein the step of constructing the predetermined problem classification system comprises:

taking entity information in the knowledge graph as a slot filling question, and combining question intention and slot filling to generate a problem data set of traditional Chinese medicine knowledge;

6. The system according to claim 1, further comprising a traditional Chinese medicine knowledge graph construction module, wherein the system is used for taking medicine names, nature and taste, meridian tropism, efficacy, indications and dosage as bodies according to the teaching material data of the traditional Chinese medicine planning, and the system is used for combing knowledge in the traditional Chinese medicine field to extract entities, relations and attributes from structured or unstructured data; and expressing the extracted entity, relation and attribute into a triplet form, and transmitting the obtained triplet data to a graph database Neo4j for storage after finishing, so as to obtain the traditional Chinese medicine knowledge graph.