CN113505243A

CN113505243A - Intelligent question-answering method and device based on medical knowledge graph

Info

Publication number: CN113505243A
Application number: CN202110863613.8A
Authority: CN
Inventors: 鲜湛; 贺昕; 曾柏霖; 张海滨
Original assignee: Shenzhen Wanhaisi Digital Medical Co ltd
Current assignee: Shenzhen Wanhaisi Digital Medical Co ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-10-15

Abstract

The invention relates to an intelligent question-answering method and device based on a medical knowledge graph, and belongs to the technical field of medical health. Wherein, the method comprises the following steps: acquiring a patient consultation text; simultaneously carrying out image search, text retrieval and semantic vector retrieval in a preset medical knowledge map according to the patient consultation text and a preset consultation question-answer model, and correspondingly obtaining three recall results; inputting all the recall results into a preset sorting scoring model to obtain scoring data of all the recall results; determining a target recall result with the highest score from all recall results; and sending the target recall result to a preset terminal. According to the method and the device, answers of the user questions are retrieved in the preset medical knowledge map through three retrieval modes, namely graph search, text retrieval and semantic vector retrieval, and the answer which best meets the user questions is determined in all the retrieved answers, so that the method and the device can provide answers with strong pertinence according to the user questions.

Description

Intelligent question-answering method and device based on medical knowledge graph

Technical Field

The invention relates to the technical field of medical treatment and health, in particular to an intelligent question-answering method and device based on a medical knowledge graph.

Background

The medical platform can provide intelligent question-answering service of medical knowledge for the user, so that the user who is inconvenient to go to the hospital can consult related medical answers through the medical platform. In the related technology, natural language question sentences input by users are understood in a template matching mode, graph data query sentences are constructed according to question sentence types of templates and medical entities in the question sentences of the users, and answers are retrieved from related medical knowledge maps.

However, since the medical field has strong knowledge expertise and is complex, the answer retrieved by the related art through the template matching in the single retrieval mode may not be the answer desired by the user, so that the related art is difficult to give a highly targeted answer according to the user's question, and the answer accuracy is low.

Disclosure of Invention

In view of this, an intelligent question-answering method and device based on a medical knowledge graph are provided to solve the problem that it is difficult to provide a highly targeted answer according to the user's problem in the related art.

The invention adopts the following technical scheme:

in a first aspect, the present application provides an intelligent question-answering method based on a medical knowledge graph, including:

acquiring a patient consultation text;

according to the patient consultation text and the preset consultation question-answer model, image search, text search and semantic vector search are simultaneously carried out in a preset medical knowledge map, and three recall results are correspondingly obtained;

inputting all the recall results into a preset sorting scoring model to obtain scoring data of all the recall results;

determining a target recall result with the highest score from all the recall results;

and sending the target recall result to a preset terminal.

Preferably, the preset medical knowledge map is constructed by the following method:

acquiring medical knowledge data; the medical knowledge data comprises structured data, semi-structured data, and unstructured data;

converting the structured data and the semi-structured data into first extraction result data based on a preset rule, and extracting second extraction result data from the unstructured data based on a preset medical knowledge automatic extraction model; the first extraction result data and the second extraction result data form a medical knowledge extraction result set;

and fusing the medical knowledge extraction result set with a preset open source knowledge base to obtain the preset medical knowledge map.

Preferably, the data format of the first extraction result data is RDF triple or graph data;

the data format of the second extraction result data is RDF triple or graph data.

Preferably, the preset automatic medical knowledge extraction model is obtained by a model training method as follows:

acquiring an initial training corpus data set;

expanding the initial training corpus data set according to a preset data enhancement rule to obtain a final training corpus data set;

training to obtain the preset medical knowledge automatic extraction model based on the final training corpus data set; the preset medical knowledge automatic extraction model is a model about BERT + BilSTM + CRF.

Preferably, the intelligent question-answering method based on the medical knowledge graph further comprises the following steps: replacing a base model of BERT in the model of BERT + BilSTM + CRF with a TinyALBERT Chinese model.

Preferably, the fusing the medical knowledge extraction result set with a preset open source knowledge base to obtain the preset medical knowledge map includes:

constructing a domain synonymous entity library based on the medical knowledge extraction result set and the preset open source knowledge library; the domain synonym entity library comprises synonym pairs of medical entities;

establishing a medical entity mapping relation between the medical knowledge extraction result set and the preset open source knowledge base according to the domain synonymous entity base;

and fusing the medical knowledge extraction result set with a preset open source knowledge base according to the medical entity mapping relation to obtain the preset medical knowledge map.

Preferably, after the medical knowledge extraction result set is fused with a preset open source knowledge base according to the medical entity mapping relationship to obtain the preset medical knowledge map, the method further includes:

storing the entity-relationship data in the preset medical knowledge graph into a preset Neo4j graph database, and storing the entity-attribute data in the preset medical knowledge graph into a preset ElasticSearch.

Preferably, the preset consultation question-answer model is constructed by the following method:

acquiring a training data set;

constructing a sensor transformations twin BERT model;

calculating sentence semantic similarity in the training dataset based on the sensor transformations twin BERT model;

subjecting the sensor transformations twin BERT model to a distillation compression process;

loading a sensor transformations twin BERT model after distillation compression, and selecting TinyALBERT;

finely adjusting the sensor transformations twin BERT model according to the sentence semantic similarity;

sending the fine-tuned sensor transformations twin BERT model to a preset intelligent question-answering system;

converting the problem sample in the preset intelligent question-answering system into a sentence vector of a domain knowledge question-answering sentence through a prediction interface of the fine-tuned sensor transformations twin BERT model;

and storing the sentence vectors of the domain knowledge question-answer sentences into a preset vector storage engine, and creating semantic indexes to obtain the preset consulting question-answer model.

Preferably, the preset ranking score model is an L2R model.

In a second aspect, the present application provides an intelligent question-answering device based on medical knowledge-graph, comprising:

the consultation text acquisition module is used for acquiring patient consultation texts;

the retrieval module is used for simultaneously carrying out image search, text retrieval and semantic vector retrieval in a preset medical knowledge map according to the patient consultation text and a preset consultation question-answer model to correspondingly obtain three recall results;

the scoring module is used for inputting all the recall results into a preset ranking scoring model to obtain scoring data of all the recall results;

the target recall result determining module is used for determining a target recall result with the highest score in all the recall results;

and the data sending module is used for sending the target recall result to a preset terminal.

By adopting the technical scheme, the invention provides an intelligent question-answering method based on a medical knowledge graph, which comprises the following steps: acquiring a patient consultation text; simultaneously carrying out image search, text retrieval and semantic vector retrieval in a preset medical knowledge map according to the patient consultation text and a preset consultation question-answer model, and correspondingly obtaining three recall results; inputting all the recall results into a preset sorting scoring model to obtain scoring data of all the recall results; determining a target recall result with the highest score from all recall results; and sending the target recall result to a preset terminal. Based on the method, the answers of the user questions are retrieved in the preset medical knowledge graph through three retrieval modes, namely graph search, text retrieval and semantic vector retrieval, and the answers most conforming to the user questions are determined from all the retrieved answers, so that the method can provide answers with strong pertinence according to the user questions, and is high in answer accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of an intelligent question-answering method based on a medical knowledge graph according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of an automatic extraction model of preset medical knowledge according to an embodiment of the present application.

Fig. 3 is a model of sensor transform training provided in an embodiment of the present application.

Fig. 4 is a representation of similarity between two sentences calculated by using a sentence vector according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of an intelligent question-answering device based on a medical knowledge graph according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

Fig. 1 is a schematic flow chart of an intelligent question-answering method based on a medical knowledge graph according to an embodiment of the present invention. As shown in fig. 1, the intelligent question-answering method based on medical knowledge graph of the present embodiment includes:

s101, acquiring a patient consultation text;

s102, simultaneously carrying out image search, text retrieval and semantic vector retrieval in a preset medical knowledge map according to the patient consultation text and a preset consultation question-answer model, and correspondingly obtaining three recall results;

s103, inputting all the recall results into a preset sorting scoring model to obtain scoring data of all the recall results;

s104, determining a target recall result with the highest score from all the recall results;

and S105, sending the target recall result to a preset terminal.

Specifically, there are various methods for obtaining the patient consultation text, for example, when the patient visits the medical platform, the patient fills and submits the consultation text according to the platform guidance, so that the platform obtains the patient consultation text.

After the platform acquires the patient consultation text, the consultation text is inquired and analyzed. Specifically, the consultation text is subjected to automatic error correction, query rewriting, word segmentation, keyword extraction, term normalization and query command conversion generation processing, and the consultation text is recalled based on the BM25 algorithm. The advisory text is converted into semantic vector representation by a preset sentence vector representation model and mapped to a semantic vector space which is the same as the question sentence vector in the related field. And then submitting retrieval based on semantic vectors to Milvus of a preset consultation question-answer model, and determining K answers with high matching degree with the consultation text through an Artificial Neural Network (ANN) algorithm to realize recall of the semantic vectors. The consulting text is also converted into a Cypher query sentence of a Neo4j graph database through corresponding rules, and answers are retrieved in a preset medical knowledge graph in a graph searching mode to obtain a corresponding recall result. The consulting text also carries out full-text retrieval through Elastic Search (ES) to obtain a corresponding recall result.

And after three recall results are obtained, inputting all the recall results into a preset ranking scoring model, and obtaining scoring data of all the recall results. And then determining the highest-grade target recall result from all the recall results. And finally, sending the target recall result to a preset terminal so that the patient can obtain answer information according to the content displayed by the preset terminal.

By adopting the technical scheme, the intelligent question-answering method based on the medical knowledge graph comprises the following steps: acquiring a patient consultation text; simultaneously carrying out image search, text retrieval and semantic vector retrieval in a preset medical knowledge map according to the patient consultation text and a preset consultation question-answer model, and correspondingly obtaining three recall results; inputting all the recall results into a preset sorting scoring model to obtain scoring data of all the recall results; determining a target recall result with the highest score from all recall results; and sending the target recall result to a preset terminal. Based on the method, the answers of the user questions are retrieved in the preset medical knowledge graph through three retrieval modes, namely graph search, text retrieval and semantic vector retrieval, and the answers which are most consistent with the user questions are determined in all the retrieved answers, so that the method can provide answers with strong pertinence according to the questions of the user.

In detail, the structured data includes data in a relational database, excel resources, professional classifications, and domain dictionaries. Semi-structured data includes network resources and encyclopedia data for the vertical medical domain. Unstructured data includes web resources in the vertical medical domain, medical professional literature, professional textbooks, and training courses.

For structured data and semi-structured data, the structured data and the semi-structured data are converted into ternary data through manually defined rules in advance, and initial domain knowledge representation data are quickly and efficiently acquired by adopting batch processing tasks. In a specific application process, the two-dimensional table data is converted into attribute map data for structured data, such as desensitized electronic medical record data, a Chinese symptom knowledge base, a mental health diagnosis table, medical industry standards and specifications, a professional classification system and industry open source data. For the semi-structured data, firstly, the content data of the subdivided field is selected according to the conversation scene, and a wrapper is customized according to the content data. The wrapper is then defined, generated, updated, and maintained. And finally, extracting target data from the related database through a wrapper, and performing structuring and normalization processing on the target data to convert the target data into the representation of the attribute graph database.

In a specific application process, firstly, a graph database schema is defined according to the medical field terms and the business rules to the entity types, the entity relations and the entity attribute ranges of the medical knowledge graph. The entity types include, among others, diseases, drugs and symptoms. The entity relationships include disease-symptom, disease-drug, disease-diet (appropriate or contraindicated), and disease-disease (complication). Entity attributes include disease attributes, drug attributes, auxiliary exam attributes, and surgical attributes. Then, a medical data source is selected, and a wrapper with a web crawler package is constructed according to the entity type, the entity relationship and the entity attribute which are defined in advance. Wherein the medical data source includes structured data and semi-structured data. And finally, applying medical service rules, website element rules and a wrapper of the customized web crawler, extracting target data and target service rules from the structured data and the semi-structured data through the wrapper, expressing all the target data in a triple form, and storing the triple form into a medical knowledge extraction result attribute database.

And under business scenes of psychological health consultation and the like, defining a strategy for carrying out combined annotation on the entity and the relationship, wherein the strategy simultaneously comprises entity information and the relationship between the entity information and the relationship. Based on this labeling strategy, the joint extraction of entities and relationships can be translated into a sequence labeling problem in natural language processing. An end-to-end modeling task is accomplished by using neural networks without the need for complex feature engineering. In this embodiment, based on collected and screened mental health consultation professional content data, a corpus of a preset medical knowledge automatic extraction model is constructed, a model based on BERT + BiLSTM + CRF is trained according to the corpus, and a joint learning model for named entity identification and relationship extraction is performed.

The construction method of the automatic extraction model of the preset medical knowledge comprises the following steps:

acquiring and cutting professional literature and website text content to obtain an initial text data set.

And step two, predefining entity relations. The knowledge extraction task is to extract entities and relationships between entities from unstructured text data to form triples like (entity a, entity a relationship to entity B, entity B). The relationship is a predefined entity relationship. Analysis and rule screening of the material data are carried out, and entity relations including alias, disease-symptom and disease-disease (complication) are predefined. And constructing data samples aiming at single entity 1-1, single entity 1-N, multi-entity multi-relation and the like.

And step three, determining a data annotation strategy and annotating the initial text data set. Specifically, a BIOES marking specification is adopted, entity and relation marking is carried out according to a predefined entity relation, data of a single entity 1-1 relation, a single entity 1-N relation and a plurality of entities and relationships are marked in the same way, and therefore the model can effectively complete entity identification and relation extraction in a complex scene. The labeled content comprises position information of entity words, type information of entity relations, role information of entities and directions representing entity relations.

And step four, expanding the initial training corpus data set according to a preset data enhancement rule to obtain a final training corpus data set. Specifically, the method comprises the steps of utilizing a natural language semantic expression form and a sentence pattern of an effective corpus, replacing the effective corpus by different entity items of the same type of words, and performing synonymy expression replacement and sentence pattern reconstruction to expand a labeled data set, wherein the labeled data set specifically comprises alias names and disease-symptom. Disease-disease (complication).

The method for replacing the same entity item by the same word and different entity items refers to a method for replacing other entity items by text segments with the same entity type in a sentence. Thus, the effect of data diversity and noise data can be achieved by the entity instance replacement mode. Synonymous expression substitution refers to a method of substituting some text segments in a sentence for different expressions having the same semantics. Sentence pattern reconstruction refers to sentence pattern replacement of sentences based on different expression rules of Chinese natural language sentence patterns without changing semantic information of original sentences.

It should be noted that, in the implementation process of the above different methods, the labeling information of the text cannot be lost, that is, the replaced text content also carries the corresponding entity relationship tag. And finally outputting the sample which is the corpus carrying the labeling information.

And step five, obtaining corresponding word vectors by the linguistic data carrying the labeling information through a BERT pre-training language model. And then, inputting the word vector into a BilSTM module for further processing to obtain a processed word vector, and inputting the processed word vector into a CRF module to obtain a prediction labeling sequence. Then, each entity in the sequence is extracted and classified, and the whole process of Chinese entity identification is completed.

The preset medical knowledge automatic extraction model of the embodiment does not need a user to train word vectors and word vectors in advance, and only needs to directly input the sequence into the BERT, so that the preset medical knowledge automatic extraction model can automatically extract rich word-level features, grammatical structure features and semantic features in the sequence. BERT can learn semantic features of the corpus, BilSTM can learn longer context relations between words, and CRF can correct sequence errors of BilSTM prediction. The present embodiment can directly use the BERT model, and the direct use of the BERT model has the advantages of high accuracy, but has the disadvantage of low inference speed. In order to solve the disadvantage, the present embodiment may also use a compressed base model using a BERT model instead of BERT, so as to achieve the purpose of improving the inference speed of the whole model while not reducing the accuracy of the whole model. In addition, the knowledge extraction is automatically completed through a preset medical knowledge automatic extraction model, so that the dependence on the expert knowledge in the medical field is reduced, the workload of manual labeling is reduced, and the cost of data cleaning is reduced.

In addition, in the embodiment, aiming at the problem of training of a small sample model with limited manually labeled high-quality training data, a data enhancement strategy is applied, and basic training data is expanded through template rule transformation operation, so that more new training data are created. The data quantity of model training can be increased through a data enhancement mode, data with diversity is generalized, the generalization capability of the model is improved, noise data can also be increased, and the robustness of the model is improved.

Fig. 2 is a schematic structural diagram of an automatic extraction model of preset medical knowledge according to an embodiment of the present application. As shown in fig. 2, in the preset medical knowledge automatic extraction model of this embodiment, B represents the beginning of a semantic block, and the first word in the semantic block is labeled; i denotes the middle content of the semantic block, O denotes the content not belonging to the semantic block, and E denotes the end of the semantic block.

And designing a question subject label needing to be identified and extracted aiming at the question in the specific field, wherein the specific field can be a mental health field. The method is used for marking the standard of training data in an NLP sequence marking task, and realizes the extraction of the domain entity and the entity relation in a question by a model, and the label is shown in the following table:

preferably, after the structured data and the semi-structured data are converted into first extraction result data based on a preset rule, and second extraction result data are extracted from the unstructured data based on a preset medical knowledge automatic extraction model, the preset medical knowledge graph construction method further includes: and manually checking the second extraction result data, and dividing the second extraction result data into a medical knowledge extraction result set after the second extraction result data passes the checking.

Preferably, the data format of the first extraction result data is RDF triple or graph data; the data format of the second extraction result data is RDF triple or graph data.

Preferably, the step of fusing the medical knowledge extraction result set with a preset open source knowledge base to obtain the preset medical knowledge map comprises:

In detail, firstly, a synonym pair is crawled on a relevant website based on an XPath data extraction rule based on a page content DOM model in a web crawler. Specifically, a web crawler wrapper is constructed for medical entity types such as diseases, symptoms, examinations, preventive measures, medicines and the like, aliases, English names, abbreviations and the like of medical concepts are acquired from related websites through the wrapper, and word lists are output to serve as the basis of the domain synonymous entity library.

And then, utilizing the alias relationship extracted in the stage of extracting the domain entities and the entity relationships by the medical knowledge extraction model, and adding the domain entity pair with the effective alias relationship acquired from the unstructured data into a domain synonymous entity library.

Next, a domain synonym library is constructed using word vector semantic similarity. Specifically, texts in medical textbooks and network articles are extracted, Chinese word segmentation is carried out, the texts are used as corpus training word2vec word vectors, semantic relevance of the word vectors is utilized, similarity of other strings is calculated, top-n similar entities of the entities are found out, and effective entity pairs are added into a domain synonymous entity library through screening.

And then, aligning the entities based on the similarity of the local relationship attributes of the domain entities. Specifically, with a disease entity as a target, important relationships and attributes of diseases are selected as influence factors for measuring entity similarity, corresponding weights are respectively set, and the overall similarity is calculated through weighted summation. And finally, similar disease entities among knowledge bases from different sources are found out through threshold value screening, and the effective synonymous entity pairs are added into the field synonymous entity base.

And then, establishing a medical entity mapping relation between the medical knowledge extraction result set and the preset open source knowledge base according to the domain synonymous entity base. And fusing the medical knowledge extraction result set with a preset open source knowledge base according to the medical entity mapping relation to obtain the preset medical knowledge map. Specifically, direct mapping of entities among knowledge bases is established through a domain synonymous entity base, after a synonymous entity pair is found, dissimilarity information among the entities is processed, and the relationship and the attribute of the medical entity are combined, wherein the tasks include redundant processing and difference value combination. In the merging process, when the knowledge of the attribute class is merged, the problem that the same attribute corresponds to different attribute values needs to be considered, in this embodiment, a mode of setting the source confidence of the knowledge base is adopted, and the confidence of the knowledge base is set according to the level, the confidence and the authority of the knowledge base, so that when a plurality of knowledge bases conflict, the attribute value of the knowledge base with high confidence is retained.

Preferably, the step of fusing the medical knowledge extraction result set with a preset open source knowledge base according to the medical entity mapping relationship to obtain the preset medical knowledge map further includes:

Specifically, the entity-relationship data in the preset medical knowledge graph is stored in a preset Neo4j graph database, and the entities and the relationships in the knowledge graph are represented, so that the front-end application can represent the relationships among various field concepts in a visual association network form. And storing the entity-attribute class data in the preset medical knowledge map into a preset elastic search, defining mapping, and constructing a full-text index. In the embodiment, the domain knowledge is stored by using a mode of fusing a graph database and an ElasticSearch database, and a multi-dimensional index is constructed, so that the embodiment supports intelligent search with fusion of various retrieval algorithms in a question answering stage.

acquiring a training data set;

constructing a sensor transformations twin BERT model;

loading a distillation compressed sensor transformations twin BERT model, and selecting albert _ chip _ tiny (TinyALBERT);

In detail, data are collected from a public psychological consulting question and answer corpus by using a cold start and data enhancement method, and then manual marking is carried out to generate a psychological consulting question similar sentence pair training data set with balanced positive and negative sample proportion.

And then, based on the pre-trained BERT model, calculating sentence semantic similarity, and completing the field migration learning of the expression model of the sentence vector. Fig. 3 is a model of sensor transform training provided in an embodiment of the present application. Fig. 4 is a representation of similarity between two sentences calculated by using a sentence vector according to an embodiment of the present application. As shown in fig. 3 and 4, u and v respectively represent vector representations of two input sentences, and | u-v | represents taking absolute values of the two vectors, (u, v, | u-v |) represents splicing the three vectors in a-1 dimension, so that the dimension of the obtained vector is 3 × d, wherein d represents a hidden layer dimension.

Next, the distillation compressed sensor transformations twin BERT model was loaded and TinyALBERT was selected. The sensor transformations model is trained, and the cosine values of two sentence vectors are used for measuring the similarity of two text semantics. And fine-tuning the pre-training model. And storing the trimmed BERT model, packaging and issuing the BERT model to the production environment of the mental health consultation intelligent question-answering system. And converting the question samples in the psychological consultation domain question-answer data set into sentence vector representation of domain knowledge question-answer sentences by using the finely adjusted BERT prediction interface. And storing the generated sentence vector into a vector storage engine milvus, creating a semantic index, and realizing high-speed retrieval of the vector. Therefore, the pre-trained BERT model is finely adjusted by using the professional field linguistic data, the purpose of field migration learning is achieved, the purpose of semantic vectorization expression of question sentences is achieved by using the BERT model, richer semantic features are extracted, and the performance of NLP downstream tasks is improved.

Preferably, the preset ranking score model is an L2R model.

Specifically, the L2R model is constructed by the following method:

acquiring training data of an L2R model; the training data for the L2R model may be data similar to the recall results previously described;

determining similarity scores of text lengths of the two questions; and determining the score of the target data based on the Skip-Gram Scorer, wherein the score is calculated according to the following formula:

wherein, P_sbAnd Q_sbA Skip-Gram set representing a question;

term Match Scorer, which calculates for each Term the sum of the idfs of the matching Term, and the sum of the idfs of all terms in the question. Because the importance of different vocabularies is different, the idf can meet the requirement of the vocabularies;

text Alignment score, specifically using the Waterman-Smith distance to calculate an Alignment score; this distance is more biased for local alignment, i.e., alignment of the optimal subsequence, than the edit distance or Needleman-Wunsch distance;

the Embedding Scorer specifically obtains a problem vector by averaging word vectors, and calculates the similarity of the two problem vectors, including the similarity based on words and words;

entity Scorer: an entity overlap score;

after the basic features are obtained, the final L2R model is obtained by GBDT training.

Fig. 5 is a schematic structural diagram of an intelligent question-answering device based on a medical knowledge graph according to an embodiment of the present application. As shown in fig. 5, the intelligent question-answering device based on medical knowledge graph of the present embodiment includes: a consultation text acquisition module 41, a retrieval module 42, a scoring module 43, a target recall result determination module 44 and a data transmission module 45.

Wherein, the consultation text acquiring module 41 is used for acquiring patient consultation texts; the retrieval module 42 is used for simultaneously performing graph search, text retrieval and semantic vector retrieval in a preset medical knowledge map according to the patient consultation text and a preset consultation question-answer model to correspondingly obtain three recall results; the scoring module 43 is configured to input all the recall results into a preset ranking scoring model to obtain scoring data of all the recall results; a target recall result determining module 44, configured to determine a highest-scoring target recall result from all the recall results; and the data sending module 45 is configured to send the target recall result to a preset terminal.

Preferably, the retrieval module 42 is further configured to construct a preset medical knowledge graph, and the preset medical knowledge graph is constructed by the following method:

Preferably, the retrieval module 42 is further configured to construct an automatic extraction model of preset medical knowledge, which is constructed by the following method:

acquiring an initial training corpus data set;

Preferably, the retrieval module 42 is also configured to replace the base model of the BERT with a TinyALBERT chinese model.

The retrieval module 42 is specifically configured to implement the following method:

The retrieval module 42 is further configured to store the entity-relationship data in the preset medical knowledge graph into a preset Neo4j graph database, and store the entity-attribute class data in the preset medical knowledge graph into a preset ElasticSearch.

Preferably, the retrieval module 42 is further configured to construct a preset consulting question-answer model, where the preset consulting question-answer model is constructed by the following method:

acquiring a training data set;

constructing a sensor transformations twin BERT model;

Preferably, the scoring module 43 is specifically configured to input all the recall results into a preset L2R model, so as to obtain scoring data of all the recall results.

The present embodiment and the above embodiments belong to a general inventive concept, and have the same or corresponding execution processes and beneficial effects, which are not described herein again.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It is to be noted that, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow diagrams or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An intelligent question-answering method based on a medical knowledge graph is characterized by comprising the following steps:

acquiring a patient consultation text;

and sending the target recall result to a preset terminal.

2. The intelligent question-answering method based on the medical knowledge graph according to claim 1, wherein the preset medical knowledge graph is constructed by the following method:

converting the structured data and the semi-structured data into first extraction result data based on a preset rule, and extracting second result data from the unstructured data based on a preset medical knowledge automatic extraction model; the first extraction result data and the second extraction result data form a medical knowledge extraction result set;

3. The medical knowledge graph-based intelligent question answering method according to claim 2, wherein the data format of the first extraction result data is RDF triple or graph data;

4. The medical knowledge graph-based intelligent question-answering method according to claim 2, wherein the preset automatic medical knowledge extraction model is obtained by a model training method comprising the following steps:

acquiring an initial training corpus data set;

5. The medical knowledge graph-based intelligent question answering method according to claim 4, further comprising: replacing a base model of BERT in the model of BERT + BilSTM + CRF with a TinyALBERT Chinese model.

6. The intelligent question-answering method based on the medical knowledge graph according to claim 2, wherein the step of fusing the medical knowledge extraction result set with a preset open source knowledge base to obtain the preset medical knowledge graph comprises the following steps:

7. The intelligent question-answering method based on the medical knowledge graph according to claim 6, wherein after the medical knowledge extraction result set is fused with a preset open source knowledge base according to the medical entity mapping relationship, the method further comprises:

8. The medical knowledge graph-based intelligent question-answering method according to claim 1, wherein the preset consulting question-answering model is constructed by the following method:

acquiring a training data set;

constructing a sensor transformations twin BERT model;

9. The medical knowledge graph-based intelligent question-answering method according to claim 1, wherein the preset ranking score model is an L2R model.

10. An intelligent question-answering device based on a medical knowledge graph is characterized by comprising: