CN116719913A - Medical question-answering system based on improved named entity recognition and construction method thereof - Google Patents

Medical question-answering system based on improved named entity recognition and construction method thereof Download PDF

Info

Publication number
CN116719913A
CN116719913A CN202310469261.7A CN202310469261A CN116719913A CN 116719913 A CN116719913 A CN 116719913A CN 202310469261 A CN202310469261 A CN 202310469261A CN 116719913 A CN116719913 A CN 116719913A
Authority
CN
China
Prior art keywords
data
question
intention
layer
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310469261.7A
Other languages
Chinese (zh)
Inventor
姜芳艽
陈婕妤
王斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Normal University
Original Assignee
Jiangsu Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Normal University filed Critical Jiangsu Normal University
Priority to CN202310469261.7A priority Critical patent/CN116719913A/en
Publication of CN116719913A publication Critical patent/CN116719913A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/38Creation or generation of source code for implementing user interfaces
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

A medical question-answering system based on improved named entity recognition and a construction method thereof are provided, wherein the system and the method extract the characteristics of a text by utilizing a BERT pre-training language model, have strong semantic expression capability, add disturbance factors to an obtained word vector, and enhance the generalization capability and robustness of the model; introducing countermeasure training to solve the problem that the data set possibly has insufficient labeling quantity or missed labeling, and reducing the influence of the noise of the data set on the realization result; setting learning rate in layers, and achieving the effects of no decline of BERT layer effect, faster lower layer training and synchronous training; the BERT layer and the BiLSTM layer are spliced to output characteristics, so that the layers are more closely connected, deeper characteristics are obtained, and original characteristics of the BERT layer are not lost; the chatting sentences are added through the two-round intention recognition function, so that the system can answer chatting topics of users, the slot inheritance enables the system to have a multi-round question-answering function, and the accuracy of answer of the question-answering system is greatly improved.

Description

Medical question-answering system based on improved named entity recognition and construction method thereof
Technical Field
The invention relates to a medical question-answering system based on improved named entity recognition and a construction method thereof, belonging to the technical field of knowledge graph and natural language processing.
Background
Along with the continuous development of natural language processing technology, knowledge graphs are gradually applied to various fields, and a question-answering system based on the knowledge graphs is generated. At present, the accurate medical treatment vertical field has fewer question and answer assistant platforms, and the traditional question and answer system mainly searches according to keywords to obtain related contents, but returns too many pages, so that users are required to screen and judge the pages, and the accuracy of results is difficult to ensure. Moreover, the data in the medical industry is huge and complex, and the diversified requirements of users cannot be met only by the traditional method.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a medical question-answering system based on improved named entity recognition and a construction method thereof.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the medical question-answering system based on the improved named entity recognition comprises a data acquisition module, a knowledge storage module, a natural language understanding module, a knowledge calculation module and a dialogue management and interaction module, wherein the output end of the data acquisition module is connected with the input end of the knowledge storage module;
the data acquisition module is used for crawling medical data on a website, cleaning and preprocessing the data, and constructing a knowledge graph data set;
the knowledge storage module is used for storing the knowledge triples extracted from the data set by adopting a Neo4j graph database and displaying a visualized knowledge graph;
the natural language understanding module is used for carrying out a named entity recognition task and an intention recognition task on a question input by a user and understanding the specific meaning of the user question;
the knowledge calculation module is used for converting the question sentence passing through the natural language understanding module into a structured query sentence, and utilizing the Cypher sentence to query in the knowledge graph to obtain an answer;
the dialogue management and interaction module is used for constructing page development of a question-answering system based on a knowledge graph, supporting a user to input a question, returning corresponding answers and supporting multiple rounds of question-answering.
A medical question-answering system construction method based on improved named entity recognition comprises the following steps:
step one, crawling relevant medical information, cleaning and preprocessing data, constructing a data set, defining a required data mode, storing the collected data according to the defined data mode by adopting a Neo4j graph database, and completing the construction of a knowledge graph;
step two, constructing a named entity recognition model structure based on BERT-FGM-BiLSTM-CRF-lr splicing improved network, executing a medical named entity recognition task, and extracting medical entities in question sentences input by a user;
thirdly, constructing an intention recognition model structure based on the BERT-textCNN network, executing a question intention recognition task, and recognizing intention contained in a user input question;
step four, constructing a medical question-answering system by adopting a technical route of intention recognition, semantic slot design and template, carrying out two-round intention recognition, filling slots after judging intention, and inquiring in a knowledge graph according to the structured semantic slots;
fifthly, realizing a multi-round question-answering function of the system by utilizing slot inheritance;
step six, designing a medical question-answering system and a user page thereof, and carrying out page development by using PyQt5 to support man-machine interaction and realize the function of on-line auxiliary diagnosis.
Further, the step of constructing the data set and the knowledge graph in the step one is as follows:
the method comprises the steps of (1.1) extracting semi-structured data in a webpage, wherein a medicine searching and questioning network provides very comprehensive disease knowledge and treatment modes for users, the data has certain authority, has a clear structure and is suitable for crawling, and the medicine searching and questioning network is selected as a data source for constructing a knowledge graph in the text;
(1.2) analyzing the property of the webpage to obtain a URL address corresponding to the data;
(1.3) sending a network address by using a url lib.request data request module to acquire HTML format data of a webpage; the network address is the URL address obtained in the step (1.2);
(1.4) analyzing the HTML tag by using XPath to extract the required data and the association relation thereof;
(1.5) defining eight types of entities of diseases, symptoms, examination, medicines, medicine enterprises, departments, foods and recipes, and designing eleven types of relations of diseases-symptoms, diseases-concurrent diseases, diseases-examination, diseases-recommended medicines, diseases-general medicines, diseases-departments, diseases-recipes, diseases-food preference, diseases-food contraindicated, departments-departments and medicine enterprises-medicines;
(1.6) saving the data in the step (1.5) to the local to obtain an initial corpus;
(1.7) deleting or filling the missing value of the data directly;
(1.8) aiming at noise existing in the data, using a regular expression for standardization, and eliminating stop words, messy codes, special characters, redundant information and formats of punctuation marks and letters in the data;
(1.9) performing word segmentation processing by using a word segmentation tool, formatting the data, converting the data into a key value pair form, storing the key value pair form, and deriving normal JSON data as a data set for constructing a medical knowledge graph;
(1.10) analyzing entities, attributes and relationships among the entities in the constructed data set, and defining a required data mode by combining the application of the question-answering system;
(1.11) extracting knowledge triples from the data according to the Schema;
(1.12) using a Py2Neo module in Python to realize connection with Neo4j, respectively utilizing a Cypher statement to establish entity nodes and entity relation edges of the knowledge graph according to a Schema, and writing attributes of the entity nodes and entity relation edges into a disease entity to complete the construction of the knowledge graph.
Further, in the second step, the construction steps of the named entity recognition model based on BERT-FGM-BiLSTM-CRF-lr splicing improved network are as follows:
(2.1) fusing the CCKS2019 dataset and the cMaedQANER dataset: the CCKS2019 dataset contains disease and diagnosis, anatomy, surgery, examination, medication, and examination; the cMedQANER dataset includes disease, symptoms, detection, physiology, treatment regimen, body part, population, department, medicine, local, time; both data sets are marked by adopting a BIO marking method;
(2.2) merging the CCKS2019 dataset as a base dataset, extracting a part of the available categories of entities from the cMedQANER dataset into the CCKS2019 dataset, including "detection-examination, treatment-surgery, drug-medication", and performing a small scale expansion on the CCKS2019 dataset;
(2.3) introducing a BERT model to perform word embedding, wherein the word embedding is obtained through pre-training and a Fine-tune link, all word vectors are obtained through the BERT model, text features are extracted, the acquisition capacity of the model on character semantic features is enhanced, and the character semantic features are marked as e1;
(2.4) introducing countermeasure training, adding all word vectors e1 obtained through the BERT model into disturbance factors r of the countermeasure training, and marking as e2; wherein, the countermeasure training formula is as follows:
wherein D is a training set, x is input, y is a label, θ is a model parameter, L (x+Δx, y; θ) is a loss value of a single sample, Ω is a disturbance space, and Δx is an anti-disturbance;
the disturbance factor r is obtained by carrying out standardized processing on a word vector loss value output by BERT and a current gradient value, and the sum of the word vector and the disturbance quantity is an countermeasure sample; the calculation formula for the disturbance resistance is as follows:
where g is the gradient value, i.e., the partial derivative of the loss function to x, ε is the scaling factor;
(2.5) sending the disturbance added word vector e2 into the BiILSTM network to obtain context characteristic information, and marking the context characteristic information as e3;
(2.6) performing feature stitching on the output e3 of the BiLSTM layer and the disturbed word vector e2, and simultaneously reserving the output features of the two layers, and marking the output features as e4;
(2.7) sending the vector e4 into the full connection layer for dimension reduction treatment, and marking as e5;
(2.8) inputting e5 into the CRF layer for decoding processing to obtain a label sequence corresponding to each character, and marking the label sequence as output; in the decoding process, the CRF layer judges the label according to the transition probability matrix, specifically, the transition matrix is randomly initialized during training, and then the transition matrix is optimized, so that the transition matrix more accords with the actual transition probability among the training data labels, and the formula is as follows:
wherein A is a transfer matrix,representing tags yi through y i+1 P is the output of the BiLSTM network, +.>Represents the y-th of BiLSTM layer to the i-th character i Scoring of individual labels;
and (2.9) training the BERT layer and the model structure which is arranged below the BERT layer by adopting different learning rates, and properly adjusting according to different stages to keep training synchronization, wherein the BERT layer learning rate is set to be lr1, the BiLSTM layer learning rate is set to be lr2, and the CRF layer learning rate is set to be lr3.
Further, the method for constructing the intention recognition model structure based on the BERT-TextCNN network in the third step is as follows:
(3.1) selecting a published CMID dataset;
(3.2) extracting the intention of 13 types of systems which can answer from the CMID data set according to the constructed knowledge graph and defined entities and relations, and writing the intention into a label file;
(3.3) setting a rule generating template aiming at partial intention with small data quantity by adopting a mode of generating a supplementary data set based on the template, and manually writing keywords of each rule;
(3.4) combining the keywords according to a certain sequence, and finally randomly generating to obtain data in the supplementary corpus and the balanced original data set;
(3.5) converting the text into vectors using the BERT model as an Embedding layer, extracting text features, and recording as b 1 ,b 2 ,…,b n
(3.6) vector b 1 ,b 2 ,…,b n Splicing to obtain an embedded matrix, denoted as B 1:n
B 1:n =[b 1 ,b 2 ,…,b n ]
Wherein b 1 ,b 2 ,…,b n Representing a word vector;
(3.7) embedding matrix B 1:n Sending the sentence into a convolution layer for feature extraction, performing convolution operation by using convolution kernels with the sizes of (3, 4 and 5), extracting semantic features of the sentence, and marking the semantic features as a i e1;
Semantic feature a i The calculation formula of (2) is as follows:
a i =f(W·M i:i+h-1 +b)
wherein M is a word vector matrix, b is a bias, W is a neural network weight, h is a convolution kernel size, f is a nonlinear function used for calculating a feature value, M i:i+h-1 Word vectors for different positions in the text;
(3.8) extracting the semantic feature a from the step (3.7) i Sending to a pooling layer, wherein the pooling layer utilizes max_pooling to pool the semantic feature a i Downsampling is carried out to keep the same vector dimension, and the obtained output is marked as f1;
(3.9) after passing through the pooling layer, converting sentences with different lengths into fixed-length expression;
(3.10) adding Dropout to prevent overfitting;
(3.11) sending the pooled f1 into a full connection layer to obtain the probability of each label;
(3.12) outputting the classification result of the text by using softmax, and recording as output.
Further, the step of constructing the medical question-answering system in the step four is as follows:
(4.1) defining a question input by a user as Q;
(4.2) extracting the entities related to the medical treatment in the Q by using the named entity recognition model in the second step;
(4.3) carrying out first-round intention recognition on the Q by adopting a logistic regression algorithm, and judging whether the input of the user is boring intention or diagnostic intention;
(4.4) if the chat intention is judged, answering by using the set chat template;
(4.5) if the diagnosis intention is judged, entering a second round of intention recognition, namely judging the specific diagnosis intention of the user by using the intention recognition model in the third step;
(4.6) after obtaining the probability of each category, the system performs descending order sorting according to the probability, and takes the intention with the highest confidence;
(4.7) presetting threshold intervals of 3 confidence degrees, namely more than 0.8, between 0.4 and 0.8 and less than 0.4, and comparing the intention confidence degrees with a threshold;
(4.8) determining a reply strategy according to the obtained intention confidence, filling slots, and inquiring in a knowledge graph by using a Cypher statement according to the structured semantic slots;
(4.9) when the highest intention confidence level returned is greater than 0.8, adopting an 'accept' strategy to answer according to the intention and the slot position value by combining the reply template;
(4.10) if the slot position value is empty, proving that no related result is inquired in the knowledge graph, and directly replying to the dense_response template at the moment;
(4.11) when the highest intended confidence level returned is between 0.4 and 0.8, adopting a clarification strategy, and the system can inquire according to the template;
(4.12) when the highest intent confidence returned is less than 0.4, a "reject" strategy is employed to reject the answer.
Further, the steps of the multi-round question and answer of the system in the step five are as follows:
(5.1) predefining a semantic slot template for the entity;
(5.2) identifying which slots are contained in the question input by the user;
(5.3) extracting the slot values in the semantic slots and filling the semantic slots into predefined semantic slots, namely filling the slots;
(5.4) when analyzing the question input by the user, if the slot is filled, and the slot value is empty, the slot value of the previous question is inherited, namely, the slot inheritance is carried out;
and (5.5) inquiring in the knowledge graph by using a Cypher statement to realize multiple rounds of question and answer.
Further, the construction steps of the medical question-answering system and the user page thereof in the step six are as follows:
(6.1) adopting a hierarchical structure to design a medical question-answering system, wherein the medical question-answering system comprises a data layer, a construction layer and a user layer; wherein the data layer is responsible for providing data support; the construction layer comprises two major contents, namely, constructing a knowledge graph according to a self-built medical field data set, and constructing a medical question-answering system; the user layer is oriented to the user, mainly performs dialogue management and interaction, and a question input in the interface is transmitted into the construction layer to perform specific analysis operation;
(6.2) carrying out page design by using PyQt5, sequentially extracting entity, judging intention, slot filling, inquiring structured sentences and replying templates on questions input by a user, directly calling the trained model, and transmitting the input of the user to a construction layer for carrying out specific analysis operation;
and (6.3) feeding back the generated answer to the user in the user operation page.
According to the invention, the features of the text are extracted by utilizing the BERT pre-training language model, a large amount of corpus is subjected to unsupervised training, rich priori information can be learned, the semantic expression capability is strong, and disturbance factors are added to the obtained word vectors, so that the generalization capability and robustness of the model are enhanced; the problems of insufficient labeling quantity or missed labeling possibly existing in the data set are solved by introducing the countermeasure training, so that the influence of the noise of the data set on the realization result is reduced; the learning rate is set in a layered mode, the learning rate of the pre-training layer is reduced, and the learning rate of the lower joint layer is set to be larger, so that the effects of no decline of the BERT layer effect, faster training of the lower joint layer and synchronous training are achieved; the BERT layer and the BiLSTM layer are used for outputting characteristic splicing, so that the layers are more closely connected, deeper characteristics are obtained, and original characteristics of the BERT layer are not lost; the chatting sentences are added through the two-round intention recognition function, so that the system can answer chatting topics of users, and the slot inheritance enables the system to have a multi-round question-answering function, so that the accuracy of answer of the question-answering system is greatly improved.
Drawings
FIG. 1 is a schematic diagram of the configuration of the medical question-answering system of the present invention;
FIG. 2 is a workflow diagram of a method of constructing a medical question-answering system of the present invention;
FIG. 3 is a frame diagram of the construction of the medical question-answering system of the present invention;
FIG. 4 is a diagram of a model structure of a named entity recognition module of the present invention;
FIG. 5 is a workflow diagram of an intent recognition module of the present invention;
FIG. 6 is an example of a system question and answer of the present invention;
fig. 7 is an example of a system multiple round question and answer of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, the medical question-answering system based on improved named entity recognition comprises a data acquisition module, a knowledge storage module, a natural language understanding module, a knowledge calculation module and a dialogue management and interaction module, wherein the output end of the data acquisition module is connected with the input end of the knowledge storage module, the output end of the knowledge storage module is connected with the input end of the natural language understanding module, the output end of the natural language understanding module is connected with the input end of the knowledge calculation module, and the output end of the knowledge calculation module is connected with the input end of the dialogue management and interaction module;
the data acquisition module is used for crawling medical data on a website, cleaning and preprocessing the data, and constructing a knowledge graph data set;
the knowledge storage module is used for storing the knowledge triples extracted from the data set by adopting a Neo4j graph database and displaying a visualized knowledge graph;
the natural language understanding module is used for carrying out a named entity recognition task and an intention recognition task on a question input by a user and understanding the specific meaning of the user question;
the knowledge calculation module is used for converting the question sentence passing through the natural language understanding module into a structured query sentence, and utilizing the Cypher sentence to query in the knowledge graph to obtain an answer;
the dialogue management and interaction module is used for constructing page development of a question-answering system based on a knowledge graph, supporting a user to input a question, returning corresponding answers and supporting multiple rounds of question-answering.
As shown in fig. 2 and 3, a method for constructing a medical question-answering system based on improved named entity recognition includes the steps of:
step one, crawling relevant medical information, cleaning and preprocessing data, constructing a data set, defining a required data mode, storing the collected data according to the defined data mode by adopting a Neo4j graph database, and completing the construction of a knowledge graph;
step two, constructing a named entity recognition model structure based on BERT-FGM-BiLSTM-CRF-lr splicing improved network, executing a medical named entity recognition task, and extracting medical entities in question sentences input by a user;
thirdly, constructing an intention recognition model structure based on the BERT-textCNN network, executing a question intention recognition task, and recognizing intention contained in a user input question;
step four, constructing a medical question-answering system by adopting a technical route of intention recognition, semantic slot design and template, carrying out two-round intention recognition, filling slots after judging intention, and inquiring in a knowledge graph according to the structured semantic slots;
fifthly, realizing a multi-round question-answering function of the system by utilizing slot inheritance;
step six, designing a medical question-answering system and a user page thereof, and carrying out page development by using PyQt5 to support man-machine interaction and realize the function of on-line auxiliary diagnosis.
The steps of constructing the data set and the knowledge graph are as follows:
the method comprises the steps of (1.1) extracting semi-structured data in a webpage, wherein a medicine searching and questioning network provides very comprehensive disease knowledge and treatment modes for users, the data has certain authority, has a clear structure and is suitable for crawling, and the medicine searching and questioning network is selected as a data source for constructing a knowledge graph in the text;
(1.2) analyzing the property of the webpage to obtain a URL address corresponding to the data;
(1.3) sending a network address by using a url lib.request data request module to acquire HTML format data of a webpage; the network address is the URL address obtained in the step (1.2);
(1.4) analyzing the HTML tag by using XPath to extract the required data and the association relation thereof;
(1.5) defining eight types of entities of diseases, symptoms, examination, medicines, medicine enterprises, departments, foods and recipes, and designing eleven types of relations of diseases-symptoms, diseases-concurrent diseases, diseases-examination, diseases-recommended medicines, diseases-general medicines, diseases-departments, diseases-recipes, diseases-food preference, diseases-food contraindicated, departments-departments and medicine enterprises-medicines;
(1.6) saving the data in the step (1.5) to the local to obtain an initial corpus;
(1.7) deleting or filling the missing value of the data directly;
(1.8) aiming at noise existing in the data, using a regular expression for standardization, and eliminating stop words, messy codes, special characters, redundant information and formats of punctuation marks and letters in the data;
(1.9) performing word segmentation processing by using a word segmentation tool, formatting the data, converting the data into a key value pair form, storing the key value pair form, and deriving normal JSON data as a data set for constructing a medical knowledge graph;
(1.10) analyzing entities, attributes and relationships among the entities in the constructed data set, and defining a required data mode by combining the application of the question-answering system;
(1.11) extracting knowledge triples from the data according to the Schema;
(1.12) using a Py2Neo module in Python to realize connection with Neo4j, respectively utilizing a Cypher statement to establish entity nodes and entity relation edges of the knowledge graph according to a Schema, and writing attributes of the entity nodes and entity relation edges into a disease entity to complete the construction of the knowledge graph.
As shown in FIG. 4, the named entity recognition model of the improved network based on BERT-FGM-BiLSTM-CRF-lr splicing is constructed as follows:
(2.1) fusing the CCKS2019 dataset and the cMaedQANER dataset: the CCKS2019 dataset contains disease and diagnosis, anatomy, surgery, examination, medication, and examination; the cMedQANER dataset includes disease, symptoms, detection, physiology, treatment regimen, body part, population, department, medicine, local, time; both data sets are marked by adopting a BIO marking method;
(2.2) merging the CCKS2019 dataset as a base dataset, extracting a part of the available categories of entities from the cMedQANER dataset into the CCKS2019 dataset, including "detection-examination, treatment-surgery, drug-medication", and performing a small scale expansion on the CCKS2019 dataset;
(2.3) introducing a BERT model to perform word embedding, wherein the word embedding is obtained through pre-training and a Fine-tune link, all word vectors are obtained through the BERT model, text features are extracted, the acquisition capacity of the model on character semantic features is enhanced, and the character semantic features are marked as e1;
(2.4) introducing countermeasure training, adding all word vectors e1 obtained through the BERT model into disturbance factors r of the countermeasure training, and marking as e2; wherein, the countermeasure training formula is as follows:
wherein D is a training set, x is input, y is a label, θ is a model parameter, L (x+Δx, y; θ) is a loss value of a single sample, Ω is a disturbance space, and Δx is an anti-disturbance;
the disturbance factor r is obtained by carrying out standardized processing on a word vector loss value output by BERT and a current gradient value, and the sum of the word vector and the disturbance quantity is an countermeasure sample; the calculation formula for the disturbance resistance is as follows:
where g is the gradient value, i.e., the partial derivative of the loss function to x, ε is the scaling factor;
(2.5) sending the disturbance added word vector e2 into the BiILSTM network to obtain context characteristic information, and marking the context characteristic information as e3;
(2.6) performing feature stitching on the output e3 of the BiLSTM layer and the disturbed word vector e2, and simultaneously reserving the output features of the two layers, and marking the output features as e4;
(2.7) sending the vector e4 into the full connection layer for dimension reduction treatment, and marking as e5;
(2.8) inputting e5 into the CRF layer for decoding processing to obtain a label sequence corresponding to each character, and marking the label sequence as output; in the decoding process, the CRF layer judges the label according to the transition probability matrix, specifically, the transition matrix is randomly initialized during training, and then the transition matrix is optimized, so that the transition matrix more accords with the actual transition probability among the training data labels, and the formula is as follows:
wherein A is a transfer matrix,representing tags yi through y i+1 P is the output of the BiLSTM network, +.>Represents the y-th of BiLSTM layer to the i-th character i Scoring of individual labels;
and (2.9) training the BERT layer and the model structure which is arranged below the BERT layer by adopting different learning rates, and properly adjusting according to different stages to keep training synchronization, wherein the BERT layer learning rate is set to be lr1, the BiLSTM layer learning rate is set to be lr2, and the CRF layer learning rate is set to be lr3.
As shown in fig. 5, the construction method of the intention recognition model structure based on the BERT-TextCNN network is as follows:
(3.1) selecting a published CMID dataset;
(3.2) extracting the intention of 13 types of systems which can answer from the CMID data set according to the constructed knowledge graph and defined entities and relations, and writing the intention into a label file;
(3.3) setting a rule generating template aiming at partial intention with small data quantity by adopting a mode of generating a supplementary data set based on the template, and manually writing keywords of each rule;
(3.4) combining the keywords according to a certain sequence, and finally randomly generating to obtain data in the supplementary corpus and the balanced original data set;
(3.5) converting the text into vectors using the BERT model as an Embedding layer, extracting text features, and recording as b 1 ,b 2 ,…,b m
(3.6) vector b 1 ,b 2 ,…,b , Splicing to obtain an embedded matrix, denoted as B 1:n
B 1:n =[b 1 ,b 2 ,…,b n ]
Wherein b 1 ,b 2 ,…,b n Representing a word vector;
(3.7) embedding matrix B 1:n Sending the sentence into a convolution layer for feature extraction, performing convolution operation by using convolution kernels with the sizes of (3, 4 and 5), extracting semantic features of the sentence, and marking the semantic features as a i e1;
Semantic feature a i The calculation formula of (2) is as follows:
a i =f(W·M i:i+h-1 +b)
wherein M is a word vector matrix, b is a bias, W is a neural network weight, h is a convolution kernel size, f is a nonlinear function used for calculating a feature value, M i:i+h-1 Word vectors for different positions in the text;
(3.8) extracting the semantic feature a from the step (3.7) i Sending to a pooling layer, wherein the pooling layer utilizes max_pooling to pool the semantic feature a i Downsampling is carried out to keep the same vector dimension, and the obtained output is marked as f1;
(3.9) after passing through the pooling layer, converting sentences with different lengths into fixed-length expression;
(3.10) adding Dropout to prevent overfitting;
(3.11) sending the pooled f1 into a full connection layer to obtain the probability of each label;
(3.12) outputting the classification result of the text by using softmax, and recording as output.
As shown in fig. 6, the steps of constructing the medical question-answering system are as follows:
(4.1) defining a question input by a user as Q;
(4.2) extracting the entities related to the medical treatment in the Q by using the named entity recognition model in the second step;
(4.3) carrying out first-round intention recognition on the Q by adopting a logistic regression algorithm, and judging whether the input of the user is boring intention or diagnostic intention;
(4.4) if the chat intention is judged, answering by using the set chat template;
(4.5) if the diagnosis intention is judged, entering a second round of intention recognition, namely judging the specific diagnosis intention of the user by using the intention recognition model in the third step;
(4.6) after obtaining the probability of each category, the system performs descending order sorting according to the probability, and takes the intention with the highest confidence;
(4.7) presetting threshold intervals of 3 confidence degrees, namely more than 0.8, between 0.4 and 0.8 and less than 0.4, and comparing the intention confidence degrees with a threshold;
(4.8) determining a reply strategy according to the obtained intention confidence, filling slots, and inquiring in a knowledge graph by using a Cypher statement according to the structured semantic slots;
(4.9) when the highest intention confidence level returned is greater than 0.8, adopting an 'accept' strategy to answer according to the intention and the slot position value by combining the reply template;
(4.10) if the slot position value is empty, proving that no related result is inquired in the knowledge graph, and directly replying to the dense_response template at the moment;
(4.11) when the highest intended confidence level returned is between 0.4 and 0.8, adopting a clarification strategy, and the system can inquire according to the template;
(4.12) when the highest intent confidence returned is less than 0.4, a "reject" strategy is employed to reject the answer.
As shown in fig. 7, the steps of the system for multiple questions and answers are as follows:
(5.1) predefining a semantic slot template for the entity;
(5.2) identifying which slots are contained in the question input by the user;
(5.3) extracting the slot values in the semantic slots and filling the semantic slots into predefined semantic slots, namely filling the slots;
(5.4) when analyzing the question input by the user, if the slot is filled, and the slot value is empty, the slot value of the previous question is inherited, namely, the slot inheritance is carried out;
and (5.5) inquiring in the knowledge graph by using a Cypher statement to realize multiple rounds of question and answer.
Further, the construction steps of the medical question-answering system and the user page thereof in the step six are as follows:
(6.1) adopting a hierarchical structure to design a medical question-answering system, wherein the medical question-answering system comprises a data layer, a construction layer and a user layer; wherein the data layer is responsible for providing data support; the construction layer comprises two major contents, namely, constructing a knowledge graph according to a self-built medical field data set, and constructing a medical question-answering system; the user layer is oriented to the user, mainly performs dialogue management and interaction, and a question input in the interface is transmitted into the construction layer to perform specific analysis operation;
(6.2) carrying out page design by using PyQt5, sequentially extracting entity, judging intention, slot filling, inquiring structured sentences and replying templates on questions input by a user, directly calling the trained model, and transmitting the input of the user to a construction layer for carrying out specific analysis operation;
and (6.3) feeding back the generated answer to the user in the user operation page.
Experiments prove that the f1 value of the named entity recognition model based on the BERT-FGM-BiLSTM-CRF-lr splicing improved network is 87.70% on a self-built data set, and the f1 value of the intention recognition model is 76.64% on the self-built data set, so that the accuracy of the answer-question system recovery is greatly improved.

Claims (8)

1. The medical question-answering system based on the improved named entity recognition is characterized by comprising a data acquisition module, a knowledge storage module, a natural language understanding module, a knowledge calculation module and a dialogue management and interaction module, wherein the output end of the data acquisition module is connected with the input end of the knowledge storage module;
the data acquisition module is used for crawling medical data on a website, cleaning and preprocessing the data, and constructing a knowledge graph data set;
the knowledge storage module is used for storing the knowledge triples extracted from the data set by adopting a Neo4j graph database and displaying a visualized knowledge graph;
the natural language understanding module is used for carrying out a named entity recognition task and an intention recognition task on a question input by a user and understanding the specific meaning of the user question;
the knowledge calculation module is used for converting the question sentence passing through the natural language understanding module into a structured query sentence, and utilizing the Cypher sentence to query in the knowledge graph to obtain an answer;
the dialogue management and interaction module is used for constructing page development of a question-answering system based on a knowledge graph, supporting a user to input a question, returning corresponding answers and supporting multiple rounds of question-answering.
2. The medical question-answering system construction method based on the improved named entity recognition is characterized by comprising the following steps of:
step one, crawling relevant medical information, cleaning and preprocessing data, constructing a data set, defining a required data mode, storing the collected data according to the defined data mode by adopting a Neo4j graph database, and completing the construction of a knowledge graph;
step two, constructing a named entity recognition model structure based on BERT-FGM-BiLSTM-CRF-lr splicing improved network, executing a medical named entity recognition task, and extracting medical entities in question sentences input by a user;
thirdly, constructing an intention recognition model structure based on the BERT-textCNN network, executing a question intention recognition task, and recognizing intention contained in a user input question;
step four, constructing a medical question-answering system by adopting a technical route of intention recognition, semantic slot design and template, carrying out two-round intention recognition, filling slots after judging intention, and inquiring in a knowledge graph according to the structured semantic slots;
fifthly, realizing a multi-round question-answering function of the system by utilizing slot inheritance;
step six, designing a medical question-answering system and a user page thereof, and carrying out page development by using PyQt5 to support man-machine interaction and realize the function of on-line auxiliary diagnosis.
3. The method for constructing a medical question-answering system based on the improved named entity recognition according to claim 2, wherein the step of constructing the data set and the knowledge graph in the step one is as follows:
the method comprises the steps of (1.1) extracting semi-structured data in a webpage, wherein a medicine searching and questioning network provides very comprehensive disease knowledge and treatment modes for users, the data has certain authority, has a clear structure and is suitable for crawling, and the medicine searching and questioning network is selected as a data source for constructing a knowledge graph in the text;
(1.2) analyzing the property of the webpage to obtain a URL address corresponding to the data;
(1.3) sending a network address by using a url lib.request data request module to acquire HTML format data of a webpage; the network address is the URL address obtained in the step (1.2);
(1.4) analyzing the HTML tag by using XPath to extract the required data and the association relation thereof;
(1.5) defining eight types of entities of diseases, symptoms, examination, medicines, medicine enterprises, departments, foods and recipes, and designing eleven types of relations of diseases-symptoms, diseases-concurrent diseases, diseases-examination, diseases-recommended medicines, diseases-general medicines, diseases-departments, diseases-recipes, diseases-food preference, diseases-food contraindicated, departments-departments and medicine enterprises-medicines;
(1.6) saving the data in the step (1.5) to the local to obtain an initial corpus;
(1.7) deleting or filling the missing value of the data directly;
(1.8) aiming at noise existing in the data, using a regular expression for standardization, and eliminating stop words, messy codes, special characters, redundant information and formats of punctuation marks and letters in the data;
(1.9) performing word segmentation processing by using a word segmentation tool, formatting the data, converting the data into a key value pair form, storing the key value pair form, and deriving normal JSON data as a data set for constructing a medical knowledge graph;
(1.10) analyzing entities, attributes and relationships among the entities in the constructed data set, and defining a required data mode by combining the application of the question-answering system;
(1.11) extracting knowledge triples from the data according to the Schema;
(1.12) using a Py2Neo module in Python to realize connection with Neo4j, respectively utilizing a Cypher statement to establish entity nodes and entity relation edges of the knowledge graph according to a Schema, and writing attributes of the entity nodes and entity relation edges into a disease entity to complete the construction of the knowledge graph.
4. The method for constructing a medical question-answering system based on improved named entity recognition according to claim 2, wherein the construction steps of the named entity recognition model based on the BERT-FGM-BiLSTM-CRF-lr splicing improved network in the second step are as follows:
(2.1) fusing the CCKS2019 dataset and the cMaedQANER dataset: the CCKS2019 dataset contains disease and diagnosis, anatomy, surgery, examination, medication, and examination; the cMedQANER dataset includes disease, symptoms, detection, physiology, treatment regimen, body part, population, department, medicine, local, time; both data sets are marked by adopting a BIO marking method;
(2.2) merging the CCKS2019 dataset as a base dataset, extracting a part of the available categories of entities from the cMedQANER dataset into the CCKS2019 dataset, including "detection-examination, treatment-surgery, drug-medication", and performing a small scale expansion on the CCKS2019 dataset;
(2.3) introducing a BERT model to perform word embedding, wherein the word embedding is obtained through pre-training and a Fine-tune link, all word vectors are obtained through the BERT model, text features are extracted, the acquisition capacity of the model on character semantic features is enhanced, and the character semantic features are marked as e1;
(2.4) introducing countermeasure training, adding all word vectors e1 obtained through the BERT model into disturbance factors r of the countermeasure training, and marking as e2; wherein, the countermeasure training formula is as follows:
wherein D is a training set, x is input, y is a label, θ is a model parameter, L (x+Δx, y; θ) is a loss value of a single sample, Ω is a disturbance space, and Δx is an anti-disturbance;
the disturbance factor r is obtained by carrying out standardized processing on a word vector loss value output by BERT and a current gradient value, and the sum of the word vector and the disturbance quantity is an countermeasure sample; the calculation formula for the disturbance resistance is as follows:
where g is the gradient value, i.e., the partial derivative of the loss function to x, ε is the scaling factor;
(2.5) sending the disturbance added word vector e2 into the BiILSTM network to obtain context characteristic information, and marking the context characteristic information as e3;
(2.6) performing feature stitching on the output e3 of the BiLSTM layer and the disturbed word vector e2, and simultaneously reserving the output features of the two layers, and marking the output features as e4;
(2.7) sending the vector e4 into the full connection layer for dimension reduction treatment, and marking as e5;
(2.8) inputting e5 into the CRF layer for decoding processing to obtain a label sequence corresponding to each character, and marking the label sequence as output; in the decoding process, the CRF layer judges the label according to the transition probability matrix, specifically, the transition matrix is randomly initialized during training, and then the transition matrix is optimized, so that the transition matrix more accords with the actual transition probability among the training data labels, and the formula is as follows:
in the method, in the process of the invention,a is the transfer matrix of the transfer matrix,representative tag y i To y i+1 P is the output of the BiLSTM network, +.>Represents the y-th of BiLSTM layer to the i-th character i Scoring of individual labels;
and (2.9) training the BERT layer and the model structure which is arranged below the BERT layer by adopting different learning rates, and properly adjusting according to different stages to keep training synchronization, wherein the BERT layer learning rate is set to be lr1, the BiLSTM layer learning rate is set to be lr2, and the CRF layer learning rate is set to be lr3.
5. The method for constructing a medical question-answering system based on improved named entity recognition according to claim 2, wherein the method for constructing an intention recognition model structure based on the BERT-TextCNN network in the third step is as follows:
(3.1) selecting a published CMID dataset;
(3.2) extracting the intention of 13 types of systems which can answer from the CMID data set according to the constructed knowledge graph and defined entities and relations, and writing the intention into a label file;
(3.3) setting a rule generating template aiming at partial intention with small data quantity by adopting a mode of generating a supplementary data set based on the template, and manually writing keywords of each rule;
(3.4) combining the keywords according to a certain sequence, and finally randomly generating to obtain data in the supplementary corpus and the balanced original data set;
(3.5) converting the text into vectors using the BERT model as an Embedding layer, extracting text features, and recording as b 1 ,b 2 ,…,b n
(3.6) vector b 1 ,b 2 ,…,b n Splicing to obtain an embedded matrix, denoted as B 1:n
B 1:n =[b 1 ,b 2 ,…,b n ]
Wherein b 1 ,b 2 ,…,b n Representing a word vector;
(3.7) embedding matrix B 1:n Sending the sentence into a convolution layer for feature extraction, performing convolution operation by using convolution kernels with the sizes of (3, 4 and 5), extracting semantic features of the sentence, and marking the semantic features as a i
Semantic feature a i The calculation formula of (2) is as follows:
a i =f(W·M i:i+h-1 +b)
wherein M is a word vector matrix, b is a bias, W is a neural network weight, h is a convolution kernel size, f is a nonlinear function used for calculating a feature value, M i:i+h-1 Word vectors for different positions in the text;
(3.8) extracting the semantic feature a from the step (3.7) i The whole is sent to a pooling layer, and the pooling layer utilizes max_pooling to pool the semantic feature a i Downsampling is carried out to keep the same vector dimension, and the obtained output is marked as f1;
(3.9) after passing through the pooling layer, converting sentences with different lengths into fixed-length expression;
(3.10) adding Dropout to prevent overfitting;
(3.11) sending the pooled f1 into a full connection layer to obtain the probability of each label;
(3.12) outputting the classification result of the text by using softmax, and recording as output.
6. The method for constructing a medical question-answering system based on the improved named entity recognition according to claim 2, wherein the step of constructing a medical question-answering system in the fourth step is as follows:
(4.1) defining a question input by a user as Q;
(4.2) extracting the entities related to the medical treatment in the Q by using the named entity recognition model in the second step;
(4.3) carrying out first-round intention recognition on the Q by adopting a logistic regression algorithm, and judging whether the input of the user is boring intention or diagnostic intention;
(4.4) if the chat intention is judged, answering by using the set chat template;
(4.5) if the diagnosis intention is judged, entering a second round of intention recognition, namely judging the specific diagnosis intention of the user by using the intention recognition model in the third step;
(4.6) after obtaining the probability of each category, the system performs descending order sorting according to the probability, and takes the intention with the highest confidence;
(4.7) presetting threshold intervals of 3 confidence degrees, namely more than 0.8, between 0.4 and 0.8 and less than 0.4, and comparing the intention confidence degrees with a threshold;
(4.8) determining a reply strategy according to the obtained intention confidence, filling slots, and inquiring in a knowledge graph by using a Cypher statement according to the structured semantic slots;
(4.9) when the highest intention confidence level returned is greater than 0.8, adopting an 'accept' strategy to answer according to the intention and the slot position value by combining the reply template;
(4.10) if the slot position value is empty, proving that no related result is inquired in the knowledge graph, and directly replying to the dense_response template at the moment;
(4.11) when the highest intended confidence level returned is between 0.4 and 0.8, adopting a clarification strategy, and the system can inquire according to the template;
(4.12) when the highest intent confidence returned is less than 0.4, a "reject" strategy is employed to reject the answer.
7. The method for constructing a medical question-answering system based on the improved named entity recognition according to claim 2, wherein the steps of the system in the fifth step are as follows:
(5.1) predefining a semantic slot template for the entity;
(5.2) identifying which slots are contained in the question input by the user;
(5.3) extracting the slot values in the semantic slots and filling the semantic slots into predefined semantic slots, namely filling the slots;
(5.4) when analyzing the question input by the user, if the slot is filled, and the slot value is empty, the slot value of the previous question is inherited, namely, the slot inheritance is carried out;
and (5.5) inquiring in the knowledge graph by using a Cypher statement to realize multiple rounds of question and answer.
8. The method for constructing a medical question-answering system based on the improved named entity recognition according to claim 2, wherein the steps of constructing the medical question-answering system and the user page thereof in the step six are as follows:
(6.1) adopting a hierarchical structure to design a medical question-answering system, wherein the medical question-answering system comprises a data layer, a construction layer and a user layer; wherein the data layer is responsible for providing data support; the construction layer comprises two major contents, namely, constructing a knowledge graph according to a self-built medical field data set, and constructing a medical question-answering system; the user layer is oriented to the user, mainly performs dialogue management and interaction, and a question input in the interface is transmitted into the construction layer to perform specific analysis operation;
(6.2) carrying out page design by using PyQt5, sequentially extracting entity, judging intention, slot filling, inquiring structured sentences and replying templates on questions input by a user, directly calling the trained model, and transmitting the input of the user to a construction layer for carrying out specific analysis operation;
and (6.3) feeding back the generated answer to the user in the user operation page.
CN202310469261.7A 2023-04-27 2023-04-27 Medical question-answering system based on improved named entity recognition and construction method thereof Pending CN116719913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310469261.7A CN116719913A (en) 2023-04-27 2023-04-27 Medical question-answering system based on improved named entity recognition and construction method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310469261.7A CN116719913A (en) 2023-04-27 2023-04-27 Medical question-answering system based on improved named entity recognition and construction method thereof

Publications (1)

Publication Number Publication Date
CN116719913A true CN116719913A (en) 2023-09-08

Family

ID=87866790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310469261.7A Pending CN116719913A (en) 2023-04-27 2023-04-27 Medical question-answering system based on improved named entity recognition and construction method thereof

Country Status (1)

Country Link
CN (1) CN116719913A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955576A (en) * 2023-09-21 2023-10-27 神州医疗科技股份有限公司 Question-answer reply method, system and equipment based on human feedback and reinforcement learning
CN117235240A (en) * 2023-11-14 2023-12-15 神州医疗科技股份有限公司 Multi-model result fusion question-answering method and system based on asynchronous consumption queue
CN117521673A (en) * 2024-01-08 2024-02-06 安徽大学 Natural language processing system with analysis training performance
CN117995426A (en) * 2024-04-07 2024-05-07 北京惠每云科技有限公司 Medical knowledge graph construction method and device, electronic equipment and storage medium
CN117993391A (en) * 2024-04-07 2024-05-07 北京惠每云科技有限公司 Medical named entity recognition and clinical term standardization method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955576A (en) * 2023-09-21 2023-10-27 神州医疗科技股份有限公司 Question-answer reply method, system and equipment based on human feedback and reinforcement learning
CN117235240A (en) * 2023-11-14 2023-12-15 神州医疗科技股份有限公司 Multi-model result fusion question-answering method and system based on asynchronous consumption queue
CN117235240B (en) * 2023-11-14 2024-02-20 神州医疗科技股份有限公司 Multi-model result fusion question-answering method and system based on asynchronous consumption queue
CN117521673A (en) * 2024-01-08 2024-02-06 安徽大学 Natural language processing system with analysis training performance
CN117521673B (en) * 2024-01-08 2024-03-22 安徽大学 Natural language processing system with analysis training performance
CN117995426A (en) * 2024-04-07 2024-05-07 北京惠每云科技有限公司 Medical knowledge graph construction method and device, electronic equipment and storage medium
CN117993391A (en) * 2024-04-07 2024-05-07 北京惠每云科技有限公司 Medical named entity recognition and clinical term standardization method and device

Similar Documents

Publication Publication Date Title
CN111708874B (en) Man-machine interaction question-answering method and system based on intelligent complex intention recognition
CN110110335B (en) Named entity identification method based on stack model
CN116719913A (en) Medical question-answering system based on improved named entity recognition and construction method thereof
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN111078875B (en) Method for extracting question-answer pairs from semi-structured document based on machine learning
CN111858944B (en) Entity aspect level emotion analysis method based on attention mechanism
CN108182295A (en) A kind of Company Knowledge collection of illustrative plates attribute extraction method and system
CN111914558A (en) Course knowledge relation extraction method and system based on sentence bag attention remote supervision
Zubrinic et al. The automatic creation of concept maps from documents written using morphologically rich languages
CN110598000A (en) Relationship extraction and knowledge graph construction method based on deep learning model
CN112542223A (en) Semi-supervised learning method for constructing medical knowledge graph from Chinese electronic medical record
CN115455935A (en) Intelligent text information processing system
CN106776711A (en) A kind of Chinese medical knowledge mapping construction method based on deep learning
CN113535917A (en) Intelligent question-answering method and system based on travel knowledge map
CN113806563A (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
CN111858896B (en) Knowledge base question-answering method based on deep learning
CN112035675A (en) Medical text labeling method, device, equipment and storage medium
CN111143574A (en) Query and visualization system construction method based on minority culture knowledge graph
CN111914556A (en) Emotion guiding method and system based on emotion semantic transfer map
CN111813874B (en) Terahertz knowledge graph construction method and system
CN116562265B (en) Information intelligent analysis method, system and storage medium
CN115293161A (en) Reasonable medicine taking system and method based on natural language processing and medicine knowledge graph
CN114077673A (en) Knowledge graph construction method based on BTBC model
CN113468887A (en) Student information relation extraction method and system based on boundary and segment classification
CN115019906A (en) Multi-task sequence labeled drug entity and interaction combined extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination